Concatenate nested list of array with partial empty sublist

Concatenate nested list of array with partial empty sublist - python

The objective is to concatenate nested list of arrays (i.e., list_arr). However, some of the sublists within the list_arr is of len zero.
Simply using np.array or np.asarray on the list_arr does not produce the intended result.
import numpy as np
ncondition=2
nnodes=30
nsbj=6
np.random.seed(0)
# Example of nested list list_arr
list_arr=[[[np.concatenate([[idx_sbj],[ncondi],[nepoch] ,np.random.rand(nnodes)]) for nepoch in range(np.random.randint(5))] \
for ncondi in range(ncondition)] for idx_sbj in range(nsbj)]
The following does not produce the expected concatenate output
test1=np.asarray(list_arr)
test2=np.array(list_arr)
test3= np.vstack(list_arr)
The expected output is an array of shapes (15,33)

OK, my curiosity got the better of me.
Make an object dtype array from the list:
In [280]: arr=np.array(list_arr,object)
In [281]: arr.shape
Out[281]: (6, 2)
All elements of this array are lists, with len:
In [282]: np.frompyfunc(len,1,1)(arr)
Out[282]:
array([[4, 1],
[0, 2],
[0, 2],
[0, 0],
[2, 3],
[1, 0]], dtype=object)
Looking at specific sublists. One has two empty lists
In [283]: list_arr[3]
Out[283]: [[], []]
others have one empty list, either first or second:
In [284]: list_arr[-1]
Out[284]:
[[array([5. , 0. , 0. , 0.3681024 , 0.3127533 ,
0.80183615, 0.07044719, 0.68357296, 0.38072924, 0.63393096,
...])],
[]]
and some have lists of differing numbers of arrays:
If I add up the numbers in [282] I get 15, so that must be where you get the (15,33). And presumably all the arrays have the same length.
The outer layer of nesting isn't relevant, so we can ravel and remove it.
In [295]: alist = arr.ravel().tolist()
then filter out the empty lists, and apply vstack to the remaining:
In [296]: alist = [np.vstack(x) for x in alist if x]
In [297]: len(alist)
Out[297]: 7
and one more vstack to join those:
In [298]: arr2 = np.vstack(alist)
In [299]: arr2.shape
Out[299]: (15, 33)

Related

numpy array containing multi-dimension numpy arrays with variable shape

I have a list of numpy arrays, whose shape is one of the following: (10,4,4,20), (10,4,6,20). I want to convert the list to a numpy array. Since, they are of different shapes, I can't just stack them. So, I thought of creating numpy array considering each array as an object, as in here. I tried the below:
b = numpy.array(a)
b = numpy.array(a, dtype=object)
where a is the list of numpy arrays. Both are giving me the following error:
ValueError: could not broadcast input array from shape (10,4,4,20) into shape (10,4)
How can I convert that list to numpy array?
Example:
import numpy
a = [numpy.random.random((10,4,4,20)),
numpy.random.random((10,4,6,20)),
numpy.random.random((10,4,6,20)),
numpy.random.random((10,4,4,20)),
numpy.random.random((10,4,6,20)),
numpy.random.random((10,4,6,20)),
numpy.random.random((10,4,4,20)),
numpy.random.random((10,4,4,20)),
numpy.random.random((10,4,6,20))
]
b = numpy.array(a)
Use Case:
I know numpy array of objects are not efficient, but I'm not doing any operations on them. Usually, I have a list of same shape numpy arrays and so I can easily stack them. This array is passed to another function, which selects certain elements only. If my data is numpy array, I can just do b[[1,3,8]]. But I can't do the same with list. I get the following error if I try the same with list
c = a[[1,3,8]]
TypeError: list indices must be integers or slices, not list

np.array(alist) will make an object dtype array if the list arrays differ in the first dimension. But in your case they differ in the 3rd, producing this error. In effect, it can't unambiguously determine where the containing dimension ends, and where the objects begin.
In [270]: alist = [np.ones((10,4,4,20),int), np.zeros((10,4,6,20),int)]
In [271]: arr = np.array(alist)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-271-3fd8e9bd05a9> in <module>
----> 1 arr = np.array(alist)
ValueError: could not broadcast input array from shape (10,4,4,20) into shape (10,4)
Instead we need to make an object array of the right size, and copy the list to it. Sometimes this copy still produces broadcasting errors, but here it seems to be ok:
In [272]: arr = np.empty(2, object)
In [273]: arr
Out[273]: array([None, None], dtype=object)
In [274]: arr[:] = alist
In [275]: arr
Out[275]:
array([array([[[[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
...
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]]]])], dtype=object)
In [276]: arr[0].shape
Out[276]: (10, 4, 4, 20)
In [277]: arr[1].shape
Out[277]: (10, 4, 6, 20)

How to add element to empty 2d numpy array

I'm trying to insert elements to an empty 2d numpy array. However, I am not getting what I want.
I tried np.hstack but it is giving me a normal array only. Then I tried using append but it is giving me an error.
Error:
ValueError: all the input arrays must have same number of dimensions
randomReleaseAngle1 = np.random.uniform(20.0, 77.0, size=(5, 1))
randomVelocity1 = np.random.uniform(40.0, 60.0, size=(5, 1))
randomArray =np.concatenate((randomReleaseAngle1,randomVelocity1),axis=1)
arr1 = np.empty((2,2), float)
arr = np.array([])
for i in randomArray:
data = [[170, 68.2, i[0], i[1]]]
df = pd.DataFrame(data, columns = ['height', 'release_angle', 'velocity', 'holding_angle'])
test_y_predictions = model.predict(df)
print(test_y_predictions)
if (np.any(test_y_predictions == 1)):
arr = np.hstack((arr, np.array([i[0], i[1]])))
arr1 = np.append(arr1, np.array([i[0], i[1]]), axis=0)
print(arr)
print(arr1)
I wanted to get something like
[[1.5,2.2],
[3.3,4.3],
[7.1,7.3],
[3.3,4.3],
[3.3,4.3]]
However, I'm getting
[56.60290125 49.79106307 35.45102444 54.89380834 47.09359271 49.19881675
22.96523274 44.52753514 67.19027156 54.10421167]

The recommended list append approach:
In [39]: alist = []
In [40]: for i in range(3):
...: alist.append([i, i+10])
...:
In [41]: alist
Out[41]: [[0, 10], [1, 11], [2, 12]]
In [42]: np.array(alist)
Out[42]:
array([[ 0, 10],
[ 1, 11],
[ 2, 12]])
If we start with a empty((2,2)) array:
In [47]: arr = np.empty((2,2),int)
In [48]: arr
Out[48]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952]])
In [49]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[49]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952],
[ 1, 10],
[ 2, 11]])
Note that empty does not mean the same thing as the list []. It's a real 2x2 array, with 'unspecified' values. And those values remain when we add other arrays to it.
I could start with an array with a 0 dimension:
In [51]: arr = np.empty((0,2),int)
In [52]: arr
Out[52]: array([], shape=(0, 2), dtype=int64)
In [53]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[53]:
array([[ 1, 10],
[ 2, 11]])
That looks more like the list append approach. But why start with the (0,2) array in the first place?
np.concatenate takes a list of arrays (or lists that can be made into arrays). I used nested lists that make (1,2) arrays. With this I can join them on axis 0.
Each concatenate makes a new array. So if done iteratively it is more expensive than the list append.
np.append just takes 2 arrays and does a concatenate. So doesn't add much. hstack tweaks shapes and joins on the 2nd (horizontal) dimension. vstack is another variant. But they all end up using concatenate.

With the hstack method, you can just reshape after you get the final array:
arr = arr.reshape(-1, 2)
print(arr)
The other method can be more easily done in a similar way:
arr1 = np.append(arr1, np.array([i[0], i[1]]) # in the loop
arr1 = arr1.reshape(-1, 2)
print(arr1)

Difference between single and double bracket Numpy array?

What is the difference between these two numpy objects?
import numpy as np
np.array([[0,0,0,0]])
np.array([0,0,0,0])

In [71]: np.array([[0,0,0,0]]).shape
Out[71]: (1, 4)
In [72]: np.array([0,0,0,0]).shape
Out[72]: (4,)
The former is a 1 x 4 two-dimensional array, the latter a 4 element one-dimensional array.

The difference between single and double brackets starts with lists:
In [91]: ll=[0,1,2]
In [92]: ll1=[[0,1,2]]
In [93]: len(ll)
Out[93]: 3
In [94]: len(ll1)
Out[94]: 1
In [95]: len(ll1[0])
Out[95]: 3
ll is a list of 3 items. ll1 is a list of 1 item; that item is another list. Remember, a list can contain a variety of different objects, numbers, strings, other lists, etc.
Your 2 expressions effectively make arrays from two such lists
In [96]: np.array(ll)
Out[96]: array([0, 1, 2])
In [97]: _.shape
Out[97]: (3,)
In [98]: np.array(ll1)
Out[98]: array([[0, 1, 2]])
In [99]: _.shape
Out[99]: (1, 3)
Here the list of lists has been turned into a 2d array. In a subtle way numpy blurs the distinction between the list and the nested list, since the difference between the two arrays lies in their shape, not a fundamental structure. array(ll)[None,:] produces the (1,3) version, while array(ll1).ravel() produces a (3,) version.
In the end result the difference between single and double brackets is a difference in the number of array dimensions, but we shouldn't loose sight of the fact that Python first creates different lists.

When you defined an array with two brackets, what you were really doing was declaring an array with an array with 4 0's inside. Therefore, if you wanted to access the first zero you would be accessing
your_array[0][0] while in the second array you would just be accessing your array[0]. Perhaps a better way to visualize it is
array: [
[0,0,0,0],
]
vs
array: [0,0,0,0]

preallocation of numpy array of numpy arrays

I read about how important it is to preallocate a numpy array. In my case I am, however, not sure how to do this. I want to preallocate an nxm matrix. That sounds simple enough
M = np.zeros((n,m))
However, what if my matrix is a matrix of matrices? So what if each of these nxm elements is actually of the form
np.array([[t], [x0,x1,x2], [y0,y1,y2]])
I know that in that case, M would have the shape (n,m,3).
As an example, later I want to have something like this
[[[[0], [0,1,2], [3,4,5]],
[[1], [10,11,12], [13,14,15]]],
[[[0], [100,101,102], [103,104,105]],
[[1], [110,111,112], [113,114,115]]]]
I tried simply doing
M = np.zeros((2,2,3))
but then
M[0,0,:] = np.array([[0], [0,1,2], [3,4,5]])
will give me an error
ValueError: setting an array element with a sequence.
Can I not preallocate this monster? Or should I approach this in a completely different way?
Thanks for your help

You have to make sure you preallocate the correct number of dimensions and elements along each dimension to use simple assignments to fill it.
For example you want to save 3 2x3 matrices:
number_of_matrices = 3
matrix_dim_1 = 2
matrix_dim_2 = 3
M = np.empty((number_of_matrices, matrix_dim_1, matrix_dim_2))
M[0] = np.array([[ 0, 1, 2], [ 3, 4, 5]])
M[1] = np.array([[100, 101, 102], [103, 104, 105]])
M[2] = np.array([[ 10, 11, 12], [ 13, 14, 15]])
M
#array([[[ 0., 1., 2.], # matrix 1
# [ 3., 4., 5.]],
#
# [[ 100., 101., 102.], # matrix 2
# [ 103., 104., 105.]],
#
# [[ 10., 11., 12.], # matrix 3
# [ 13., 14., 15.]]])
You're approach contains some problems. The array you want to save is not a valid ndimensional numpy array:
np.array([[0], [0,1,2], [3,4,5]])
# array([[0], [0, 1, 2], [3, 4, 5]], dtype=object)
# |----!!----|
# ^-------^----------^ 3 items in first dimension
# ^ 1 item in first item of 2nd dim
# ^--^--^ 3 items in second item of 2nd dim
# ^--^--^ 3 items in third item of 2nd dim
It just creates an 3 item array containing python list objects. You probably want to have an array containing numbers so you need to care about dimensions. Your np.array([[0], [0,1,2], [3,4,5]]) could be a 3x1 array or a 3x3 array, numpy doesn't know what to do in this case and saves it as objects (the array now has only 1 dimension!).
The other problem is that you want to set one element of the preallocated array with another array that contains more than one element. This is not possible (except you already have an object-array). You have two options here:
Fill as many elements in the preallocated array as are required by the array:
M[0, :, :] = np.array([[0,1,2], [3,4,5]])
# ^--------------------^--------^ First dimension has 2 items
# ^---------------^-^-^ Second dimension has 3 items
# ^------------------------^-^-^ dito
# if it's the first dimension you could also use M[0]
Create a object array and set the element (not recommended, you loose most of the advantages of numpy arrays):
M = np.empty((3), dtype='object')
M[0] = np.array([[0,1,2], [3,4,5]])
M[1] = np.array([[0,1,2], [3,4,5]])
M[2] = np.array([[0,1,2], [3,4,5]])
M
#array([array([[0, 1, 2],
# [3, 4, 5]]),
# array([[0, 1, 2],
# [3, 4, 5]]),
# array([[0, 1, 2],
# [3, 4, 5]])], dtype=object)

If you know you will only store values t, y, x for each point in n,m then it may be easier, and faster computationally, to have three numpy arrays.
So:
M_T = np.zeros((n,m))
M_Y = np.zeros((n,m))
M_X = np.zeros((n,m))
I believe you can now type 'normal' python operators to do array logic, such as:
MX = np.ones((n,m))
MY = np.ones((n,m))
MT = MX + MY
MT ** MT
_ * 7.5
By defining array-friendly functions (similarly to MATLAB) you will get a big speed increase for calculations.
Of course if you need more variables at each point then this may become unwieldy.

Convert a numpy array to an array of numpy arrays

How can I convert numpy array a to numpy array b in a (num)pythonic way. Solution should ideally work for arbitrary dimensions and array lengths.
import numpy as np
a=np.arange(12).reshape(2,3,2)
b=np.empty((2,3),dtype=object)
b[0,0]=np.array([0,1])
b[0,1]=np.array([2,3])
b[0,2]=np.array([4,5])
b[1,0]=np.array([6,7])
b[1,1]=np.array([8,9])
b[1,2]=np.array([10,11])

For a start:
In [638]: a=np.arange(12).reshape(2,3,2)
In [639]: b=np.empty((2,3),dtype=object)
In [640]: for index in np.ndindex(b.shape):
b[index]=a[index]
.....:
In [641]: b
Out[641]:
array([[array([0, 1]), array([2, 3]), array([4, 5])],
[array([6, 7]), array([8, 9]), array([10, 11])]], dtype=object)
It's not ideal since it uses iteration. But I wonder whether it is even possible to access the elements of b in any other way. By using dtype=object you break the basic vectorization that numpy is known for. b is essentially a list with numpy multiarray shape overlay. dtype=object puts an impenetrable wall around those size 2 arrays.
For example, a[:,:,0] gives me all the even numbers, in a (2,3) array. I can't get those numbers from b with just indexing. I have to use iteration:
[b[index][0] for index in np.ndindex(b.shape)]
# [0, 2, 4, 6, 8, 10]
np.array tries to make the highest dimension array that it can, given the regularity of the data. To fool it into making an array of objects, we have to give an irregular list of lists or objects. For example we could:
mylist = list(a.reshape(-1,2)) # list of arrays
mylist.append([]) # make the list irregular
b = np.array(mylist) # array of objects
b = b[:-1].reshape(2,3) # cleanup
The last solution suggests that my first one can be cleaned up a bit:
b = np.empty((6,),dtype=object)
b[:] = list(a.reshape(-1,2))
b = b.reshape(2,3)
I suspect that under the covers, the list() call does an iteration like
[x for x in a.reshape(-1,2)]
So time wise it might not be much different from the ndindex time.
One thing that I wasn't expecting about b is that I can do math on it, with nearly the same generality as on a:
b-10
b += 10
b *= 2
An alternative to an object dtype would be a structured dtype, e.g.
In [785]: b1=np.zeros((2,3),dtype=[('f0',int,(2,))])
In [786]: b1['f0'][:]=a
In [787]: b1
Out[787]:
array([[([0, 1],), ([2, 3],), ([4, 5],)],
[([6, 7],), ([8, 9],), ([10, 11],)]],
dtype=[('f0', '<i4', (2,))])
In [788]: b1['f0']
Out[788]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]]])
In [789]: b1[1,1]['f0']
Out[789]: array([8, 9])
And b and b1 can be added: b+b1 (producing an object dtype). Curiouser and curiouser!

Based on hpaulj I provide a litte more generic solution. a is an array of dimension N which shall be converted to an array b of dimension N1 with dtype object holding arrays of dimension (N-N1).
In the example N equals 5 and N1 equals 3.
import numpy as np
N=5
N1=3
#create array a with dimension N
a=np.random.random(np.random.randint(2,20,size=N))
a_shape=a.shape
b_shape=a_shape[:N1] # shape of array b
b_arr_shape=a_shape[N1:] # shape of arrays in b
#Solution 1 with list() method (faster)
b=np.empty(np.prod(b_shape),dtype=object) #init b
b[:]=list(a.reshape((-1,)+b_arr_shape))
b=b.reshape(b_shape)
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b
#Solution 2 with ndindex loop (slower)
b=np.empty(b_shape,dtype=object)
for index in np.ndindex(b_shape):
b[index]=a[index]
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Concatenate nested list of array with partial empty sublist - python

Related

numpy array containing multi-dimension numpy arrays with variable shape

How to add element to empty 2d numpy array

Difference between single and double bracket Numpy array?

preallocation of numpy array of numpy arrays

Convert a numpy array to an array of numpy arrays

Categories

Resources