preallocation of numpy array of numpy arrays

preallocation of numpy array of numpy arrays - python

I read about how important it is to preallocate a numpy array. In my case I am, however, not sure how to do this. I want to preallocate an nxm matrix. That sounds simple enough
M = np.zeros((n,m))
However, what if my matrix is a matrix of matrices? So what if each of these nxm elements is actually of the form
np.array([[t], [x0,x1,x2], [y0,y1,y2]])
I know that in that case, M would have the shape (n,m,3).
As an example, later I want to have something like this
[[[[0], [0,1,2], [3,4,5]],
[[1], [10,11,12], [13,14,15]]],
[[[0], [100,101,102], [103,104,105]],
[[1], [110,111,112], [113,114,115]]]]
I tried simply doing
M = np.zeros((2,2,3))
but then
M[0,0,:] = np.array([[0], [0,1,2], [3,4,5]])
will give me an error
ValueError: setting an array element with a sequence.
Can I not preallocate this monster? Or should I approach this in a completely different way?
Thanks for your help

You have to make sure you preallocate the correct number of dimensions and elements along each dimension to use simple assignments to fill it.
For example you want to save 3 2x3 matrices:
number_of_matrices = 3
matrix_dim_1 = 2
matrix_dim_2 = 3
M = np.empty((number_of_matrices, matrix_dim_1, matrix_dim_2))
M[0] = np.array([[ 0, 1, 2], [ 3, 4, 5]])
M[1] = np.array([[100, 101, 102], [103, 104, 105]])
M[2] = np.array([[ 10, 11, 12], [ 13, 14, 15]])
M
#array([[[ 0., 1., 2.], # matrix 1
# [ 3., 4., 5.]],
#
# [[ 100., 101., 102.], # matrix 2
# [ 103., 104., 105.]],
#
# [[ 10., 11., 12.], # matrix 3
# [ 13., 14., 15.]]])
You're approach contains some problems. The array you want to save is not a valid ndimensional numpy array:
np.array([[0], [0,1,2], [3,4,5]])
# array([[0], [0, 1, 2], [3, 4, 5]], dtype=object)
# |----!!----|
# ^-------^----------^ 3 items in first dimension
# ^ 1 item in first item of 2nd dim
# ^--^--^ 3 items in second item of 2nd dim
# ^--^--^ 3 items in third item of 2nd dim
It just creates an 3 item array containing python list objects. You probably want to have an array containing numbers so you need to care about dimensions. Your np.array([[0], [0,1,2], [3,4,5]]) could be a 3x1 array or a 3x3 array, numpy doesn't know what to do in this case and saves it as objects (the array now has only 1 dimension!).
The other problem is that you want to set one element of the preallocated array with another array that contains more than one element. This is not possible (except you already have an object-array). You have two options here:
Fill as many elements in the preallocated array as are required by the array:
M[0, :, :] = np.array([[0,1,2], [3,4,5]])
# ^--------------------^--------^ First dimension has 2 items
# ^---------------^-^-^ Second dimension has 3 items
# ^------------------------^-^-^ dito
# if it's the first dimension you could also use M[0]
Create a object array and set the element (not recommended, you loose most of the advantages of numpy arrays):
M = np.empty((3), dtype='object')
M[0] = np.array([[0,1,2], [3,4,5]])
M[1] = np.array([[0,1,2], [3,4,5]])
M[2] = np.array([[0,1,2], [3,4,5]])
M
#array([array([[0, 1, 2],
# [3, 4, 5]]),
# array([[0, 1, 2],
# [3, 4, 5]]),
# array([[0, 1, 2],
# [3, 4, 5]])], dtype=object)

If you know you will only store values t, y, x for each point in n,m then it may be easier, and faster computationally, to have three numpy arrays.
So:
M_T = np.zeros((n,m))
M_Y = np.zeros((n,m))
M_X = np.zeros((n,m))
I believe you can now type 'normal' python operators to do array logic, such as:
MX = np.ones((n,m))
MY = np.ones((n,m))
MT = MX + MY
MT ** MT
_ * 7.5
By defining array-friendly functions (similarly to MATLAB) you will get a big speed increase for calculations.
Of course if you need more variables at each point then this may become unwieldy.

Related

looping numpy error: all the input arrays must have same number of dimensions

I want to write the following code:
for i = 1:N
for j = 1:N
Ab(i,j) = (Ap(i)*Ap(j))^(0.5)*(1 - kij(i,j)) ;
end
end
However an error appears: "all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)"
ab=np.matrix((2, 2))
for i in range(0,nc):
for j in range(0, nc):
np.append(ab,((Ap[i]*Ap[j])**(0.5)*(1 - kij[i][j])))

There is a bit context missing, but if I guess correctly looking at Matlab part you can write something like this.
ab = np.zeros((2, 2))
for i in range(ab.shape[0]): # you do not have to put 0 and you can use size of array to limit iterations
for j in range(ab.shape[1]):
ab[i, j] = (Ap[i]*Ap[j])**(0.5)*(1 - kij[i][j])))
My assumptions
ab matrix meant to be 2x2 matrix, not 1x2 matrix with values [2, 2], this what np.matrix confusingly does (at least these were my expectations coming from Matlab). np.zeros - creates array with all zeros of size 2x2. Array and matrix are a bit different in numpy, by matrix is being slowly deprecated (more here https://numpy.org/doc/stable/reference/generated/numpy.matrix.html?highlight=matrix#numpy.matrix)
nc - is size of ab matrix
Why you had an error?
np.matrix((2, 2)) - creates 1x2 matrix with values 2 and 2 [[2, 2]]
(Ap[i]Ap[j])**(0.5)(1 - kij[i][j])) - this looks like a scalar value
np.append(ab, scalar_value) - tries to append scalar to matrix, but there is dimensions mismatch between ab and scalar value, which is stated in the error. Essentially, in order for this to work, they should be similar types of objects.
Examples
>>> np.zeros((2, 2))
array([[0., 0.],
[0., 0.]])
>>> np.matrix((2, 2))
matrix([[2, 2]])
>>> np.array((2, 2))
array([2, 2])
>> np.append(np.matrix((2, 2)), [[3, 3]], axis=0)
matrix([[2, 2],
[3, 3]])
>> np.append(np.zeros((2, 2)), [[3, 3]], axis=0)
array([[0., 0.],
[0., 0.],
[3., 3.]])

How to calculate x*x.T in python

I want to calculate the following:
but I have no idea how to do this in python, I do not want to implement this manually but use a predefined function for this, something from numpy for example.
But numpy seems to ignore that x.T should be transposed.
Code:
import numpy as np
x = np.array([1, 5])
print(np.dot(x, x.T)) # = 26, This is not the matrix it should be!

While your vectors are defined as 1-d arrays, you can use np.outer:
np.outer(x, x.T)
> array([[ 1, 5],
> [ 5, 25]])
Alternatively, you could also define your vectors as matrices and use normal matrix multiplication:
x = np.array([[1], [5]])
x # x.T
> array([[ 1, 5],
> [ 5, 25]])

You can do:
x = np.array([[1], [5]])
print(np.dot(x, x.T))
Your original x is of shape (2,), while you need a shape of (2,1). Another way is reshaping your x:
x = np.array([1, 5]).reshape(-1,1)
print(np.dot(x, x.T))
.reshape(-1,1) reshapes your array to have 1 column and implicitely takes care of number of rows.
output:
[[ 1 5]
[ 5 25]]

np.matmul(x[:, np.newaxis], [x])

How to add element to empty 2d numpy array

I'm trying to insert elements to an empty 2d numpy array. However, I am not getting what I want.
I tried np.hstack but it is giving me a normal array only. Then I tried using append but it is giving me an error.
Error:
ValueError: all the input arrays must have same number of dimensions
randomReleaseAngle1 = np.random.uniform(20.0, 77.0, size=(5, 1))
randomVelocity1 = np.random.uniform(40.0, 60.0, size=(5, 1))
randomArray =np.concatenate((randomReleaseAngle1,randomVelocity1),axis=1)
arr1 = np.empty((2,2), float)
arr = np.array([])
for i in randomArray:
data = [[170, 68.2, i[0], i[1]]]
df = pd.DataFrame(data, columns = ['height', 'release_angle', 'velocity', 'holding_angle'])
test_y_predictions = model.predict(df)
print(test_y_predictions)
if (np.any(test_y_predictions == 1)):
arr = np.hstack((arr, np.array([i[0], i[1]])))
arr1 = np.append(arr1, np.array([i[0], i[1]]), axis=0)
print(arr)
print(arr1)
I wanted to get something like
[[1.5,2.2],
[3.3,4.3],
[7.1,7.3],
[3.3,4.3],
[3.3,4.3]]
However, I'm getting
[56.60290125 49.79106307 35.45102444 54.89380834 47.09359271 49.19881675
22.96523274 44.52753514 67.19027156 54.10421167]

The recommended list append approach:
In [39]: alist = []
In [40]: for i in range(3):
...: alist.append([i, i+10])
...:
In [41]: alist
Out[41]: [[0, 10], [1, 11], [2, 12]]
In [42]: np.array(alist)
Out[42]:
array([[ 0, 10],
[ 1, 11],
[ 2, 12]])
If we start with a empty((2,2)) array:
In [47]: arr = np.empty((2,2),int)
In [48]: arr
Out[48]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952]])
In [49]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[49]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952],
[ 1, 10],
[ 2, 11]])
Note that empty does not mean the same thing as the list []. It's a real 2x2 array, with 'unspecified' values. And those values remain when we add other arrays to it.
I could start with an array with a 0 dimension:
In [51]: arr = np.empty((0,2),int)
In [52]: arr
Out[52]: array([], shape=(0, 2), dtype=int64)
In [53]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[53]:
array([[ 1, 10],
[ 2, 11]])
That looks more like the list append approach. But why start with the (0,2) array in the first place?
np.concatenate takes a list of arrays (or lists that can be made into arrays). I used nested lists that make (1,2) arrays. With this I can join them on axis 0.
Each concatenate makes a new array. So if done iteratively it is more expensive than the list append.
np.append just takes 2 arrays and does a concatenate. So doesn't add much. hstack tweaks shapes and joins on the 2nd (horizontal) dimension. vstack is another variant. But they all end up using concatenate.

With the hstack method, you can just reshape after you get the final array:
arr = arr.reshape(-1, 2)
print(arr)
The other method can be more easily done in a similar way:
arr1 = np.append(arr1, np.array([i[0], i[1]]) # in the loop
arr1 = arr1.reshape(-1, 2)
print(arr1)

Efficient Numpy computation of pairwise squared differences

The following code does exactly what I want, which is to compute the pairwise sum of squares of differences between elements of a vector (length three in the example), of which I have a long series (limited to five here). The desired result is shown at the bottom.
But the implementation feels kludgy for two reasons:
1) the need to add a phantom dimension, changing the shape from (5, 3) to (5,1,3) to avoid broadcast problems, and
2) the apparent necessity of an explicit 'for' loop, which I'm sure is why it's taking hours to execute on my much larger data set (a million vectors of length 2904).
Is there a more efficient and/or pythonic way to achieve the same result?
a = np.array([[ 4, 2, 3], [-1, -5, 4], [ 2, 1, 4], [-5, -1, 4], [6, -3, 3]])
a = a.reshape((5,1,3))
m = a.shape[0]
n = a.shape[2]
d = np.zeros((n,n))
for i in range(m):
c = a[i,:] - np.transpose(a[i,:])
c = c**2
d += c
print d
[[ 0. 118. 120.]
[ 118. 0. 152.]
[ 120. 152. 0.]]

If you don't mind the dependency on scipy, you can use functions from the scipy.spatial.distance library:
In [17]: from scipy.spatial.distance import pdist, squareform
In [18]: a = np.array([[ 4, 2, 3], [-1, -5, 4], [ 2, 1, 4], [-5, -1, 4], [6, -3, 3]])
In [19]: d = pdist(a.T, metric='sqeuclidean')
In [20]: d
Out[20]: array([ 118., 120., 152.])
In [21]: squareform(d)
Out[21]:
array([[ 0., 118., 120.],
[ 118., 0., 152.],
[ 120., 152., 0.]])

You could eliminate the for-loop by using:
In [48]: ((a - a.swapaxes(1,2))**2).sum(axis=0)
Out[48]:
array([[ 0, 118, 120],
[118, 0, 152],
[120, 152, 0]])
Note that if a has shape (N, 1, M) then (a - a.swapaxes(1,2)) has shape (N, M, M). Make sure you have enough RAM to accommodate an array of this size. Page swapping can also slow the calculation to a crawl.
If you do have too little memory, you will have to break up the calculation in chunks:
m, _, n = a.shape
chunksize = 10**4
d = np.zeros((n,n))
for i in range(0, m, chunksize):
b = a[i:i+chunksize]
d += ((b - b.swapaxes(1,2))**2).sum(axis=0)
This is a compromise between performing the calculation on the entire array and
calculating row-by-row. If there are a million rows, and the chunksize is 10**4, then there will be only 100 iterations of the loop instead of a million.
Thus, it should be significantly faster than calculating row-by-row. Choose the largest value of chunksize you can which allows the calculation to be performed in RAM.

Sum values according to an index array

I have two arrays of the same dimension:
a = np.array([ 1, 1, 2, 0, 0, 1])
b = np.array([50, 51, 6, 10, 3, 2])
I want to sum the elements of b according to the indices in a.
The ith element of the matrix I want will be the sum of all values b[j] such that a[j]==i.
So the result should be a 3-dimensional array of [10 + 3, 50 + 51 + 2, 6]
Is there a numpy way to do this? I have some very large arrays that I need to sum like this over multiple dimensions, so it would NOT be convenient to to have to perform explicit loops.

numpy.bincount has a weights parameter which does just what you need:
In [36]: np.bincount(a, weights=b)
Out[36]: array([ 13., 103., 6.])

In case you are not using numpy, something as simple as :
res = [0]*len(set(a))
for i, v in enumerate(b):
res[a[i]] += v
Assuming the indices in a are always 0-based and a continuous sequence.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

preallocation of numpy array of numpy arrays - python

Related

looping numpy error: all the input arrays must have same number of dimensions

How to calculate x*x.T in python

How to add element to empty 2d numpy array

Efficient Numpy computation of pairwise squared differences

Sum values according to an index array

Categories

Resources