numpy array containing multi-dimension numpy arrays with variable shape

numpy array containing multi-dimension numpy arrays with variable shape - python

I have a list of numpy arrays, whose shape is one of the following: (10,4,4,20), (10,4,6,20). I want to convert the list to a numpy array. Since, they are of different shapes, I can't just stack them. So, I thought of creating numpy array considering each array as an object, as in here. I tried the below:
b = numpy.array(a)
b = numpy.array(a, dtype=object)
where a is the list of numpy arrays. Both are giving me the following error:
ValueError: could not broadcast input array from shape (10,4,4,20) into shape (10,4)
How can I convert that list to numpy array?
Example:
import numpy
a = [numpy.random.random((10,4,4,20)),
numpy.random.random((10,4,6,20)),
numpy.random.random((10,4,6,20)),
numpy.random.random((10,4,4,20)),
numpy.random.random((10,4,6,20)),
numpy.random.random((10,4,6,20)),
numpy.random.random((10,4,4,20)),
numpy.random.random((10,4,4,20)),
numpy.random.random((10,4,6,20))
]
b = numpy.array(a)
Use Case:
I know numpy array of objects are not efficient, but I'm not doing any operations on them. Usually, I have a list of same shape numpy arrays and so I can easily stack them. This array is passed to another function, which selects certain elements only. If my data is numpy array, I can just do b[[1,3,8]]. But I can't do the same with list. I get the following error if I try the same with list
c = a[[1,3,8]]
TypeError: list indices must be integers or slices, not list

np.array(alist) will make an object dtype array if the list arrays differ in the first dimension. But in your case they differ in the 3rd, producing this error. In effect, it can't unambiguously determine where the containing dimension ends, and where the objects begin.
In [270]: alist = [np.ones((10,4,4,20),int), np.zeros((10,4,6,20),int)]
In [271]: arr = np.array(alist)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-271-3fd8e9bd05a9> in <module>
----> 1 arr = np.array(alist)
ValueError: could not broadcast input array from shape (10,4,4,20) into shape (10,4)
Instead we need to make an object array of the right size, and copy the list to it. Sometimes this copy still produces broadcasting errors, but here it seems to be ok:
In [272]: arr = np.empty(2, object)
In [273]: arr
Out[273]: array([None, None], dtype=object)
In [274]: arr[:] = alist
In [275]: arr
Out[275]:
array([array([[[[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
...
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]]]])], dtype=object)
In [276]: arr[0].shape
Out[276]: (10, 4, 4, 20)
In [277]: arr[1].shape
Out[277]: (10, 4, 6, 20)

Related

Concatenate nested list of array with partial empty sublist

The objective is to concatenate nested list of arrays (i.e., list_arr). However, some of the sublists within the list_arr is of len zero.
Simply using np.array or np.asarray on the list_arr does not produce the intended result.
import numpy as np
ncondition=2
nnodes=30
nsbj=6
np.random.seed(0)
# Example of nested list list_arr
list_arr=[[[np.concatenate([[idx_sbj],[ncondi],[nepoch] ,np.random.rand(nnodes)]) for nepoch in range(np.random.randint(5))] \
for ncondi in range(ncondition)] for idx_sbj in range(nsbj)]
The following does not produce the expected concatenate output
test1=np.asarray(list_arr)
test2=np.array(list_arr)
test3= np.vstack(list_arr)
The expected output is an array of shapes (15,33)

OK, my curiosity got the better of me.
Make an object dtype array from the list:
In [280]: arr=np.array(list_arr,object)
In [281]: arr.shape
Out[281]: (6, 2)
All elements of this array are lists, with len:
In [282]: np.frompyfunc(len,1,1)(arr)
Out[282]:
array([[4, 1],
[0, 2],
[0, 2],
[0, 0],
[2, 3],
[1, 0]], dtype=object)
Looking at specific sublists. One has two empty lists
In [283]: list_arr[3]
Out[283]: [[], []]
others have one empty list, either first or second:
In [284]: list_arr[-1]
Out[284]:
[[array([5. , 0. , 0. , 0.3681024 , 0.3127533 ,
0.80183615, 0.07044719, 0.68357296, 0.38072924, 0.63393096,
...])],
[]]
and some have lists of differing numbers of arrays:
If I add up the numbers in [282] I get 15, so that must be where you get the (15,33). And presumably all the arrays have the same length.
The outer layer of nesting isn't relevant, so we can ravel and remove it.
In [295]: alist = arr.ravel().tolist()
then filter out the empty lists, and apply vstack to the remaining:
In [296]: alist = [np.vstack(x) for x in alist if x]
In [297]: len(alist)
Out[297]: 7
and one more vstack to join those:
In [298]: arr2 = np.vstack(alist)
In [299]: arr2.shape
Out[299]: (15, 33)

Masking out some rows of numpy array and recover back

I have a mask with a mask_re:(8781288, 1) including ones and zeros, label file (y_lbl:(8781288, 1)) and a feature vector with feat_re: (8781288, 64). I need to take only those rows from feature vector and label files that are 1 in the mask file. how can I do this, and how can apply the opposite action of putting (recovering back) prediction values (ypred) in the masked_label file based on the mask file in the elements that are one?
For example in Matlab can be done easily X=feat_re(mask_re==1) and can be recovered back new_lbl(mask_re==1)=ypred, where new_lbl=zeros(8781288, 1). I tried to do a similar thing in python:
X=feat_re[np.where(mask_re==1),:]
X.shape
(2, 437561, 64)
EDITED (SOLVED) According to what #hpaulj suggested
The problem was with the shape of my mask file, once I changed it to mask_new=mask_re.reshape((8781288)), it solved my issue, and then
X=feat_re[mask_new==1,:]
(437561, 64)

In [182]: arr = np.arange(12).reshape(3,4)
In [183]: mask = np.array([1,0,1], bool)
In [184]: arr[mask,:]
Out[184]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])
In [185]: new = np.zeros_like(arr)
In [186]: new[mask,:] = np.array([10,12,14,16])
In [187]: new
Out[187]:
array([[10, 12, 14, 16],
[ 0, 0, 0, 0],
[10, 12, 14, 16]])
I suspect your error comes from the shape of mask:
In [188]: mask1 = mask[:,None]
In [189]: mask1.shape
Out[189]: (3, 1)
In [190]: arr[mask1,:]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-190-6317c3ea0302> in <module>
----> 1 arr[mask1,:]
IndexError: too many indices for array
Remember, numpy can have 1d and 0d arrays; it doesn't force everything to be 2d.
With where (aka nonzero):
In [191]: np.nonzero(mask)
Out[191]: (array([0, 2]),) # 1 element tuple
In [192]: np.nonzero(mask1)
Out[192]: (array([0, 2]), array([0, 0])) # 2 element tuple
In [193]: arr[_191] # using the mask index
Out[193]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])

you can use boolean indexing for masking like below
X = feat_re[mask_re==1, :]
X = X.reshape(2, -1, 64)
this selects rows of feat_re where (mask_re==1) is True. Then you can reshape x using reshape function. you can again use reshape to get back to same array shape. "-1" in reshape indicate the size need to be calculated by numpy

How to add element to empty 2d numpy array

I'm trying to insert elements to an empty 2d numpy array. However, I am not getting what I want.
I tried np.hstack but it is giving me a normal array only. Then I tried using append but it is giving me an error.
Error:
ValueError: all the input arrays must have same number of dimensions
randomReleaseAngle1 = np.random.uniform(20.0, 77.0, size=(5, 1))
randomVelocity1 = np.random.uniform(40.0, 60.0, size=(5, 1))
randomArray =np.concatenate((randomReleaseAngle1,randomVelocity1),axis=1)
arr1 = np.empty((2,2), float)
arr = np.array([])
for i in randomArray:
data = [[170, 68.2, i[0], i[1]]]
df = pd.DataFrame(data, columns = ['height', 'release_angle', 'velocity', 'holding_angle'])
test_y_predictions = model.predict(df)
print(test_y_predictions)
if (np.any(test_y_predictions == 1)):
arr = np.hstack((arr, np.array([i[0], i[1]])))
arr1 = np.append(arr1, np.array([i[0], i[1]]), axis=0)
print(arr)
print(arr1)
I wanted to get something like
[[1.5,2.2],
[3.3,4.3],
[7.1,7.3],
[3.3,4.3],
[3.3,4.3]]
However, I'm getting
[56.60290125 49.79106307 35.45102444 54.89380834 47.09359271 49.19881675
22.96523274 44.52753514 67.19027156 54.10421167]

The recommended list append approach:
In [39]: alist = []
In [40]: for i in range(3):
...: alist.append([i, i+10])
...:
In [41]: alist
Out[41]: [[0, 10], [1, 11], [2, 12]]
In [42]: np.array(alist)
Out[42]:
array([[ 0, 10],
[ 1, 11],
[ 2, 12]])
If we start with a empty((2,2)) array:
In [47]: arr = np.empty((2,2),int)
In [48]: arr
Out[48]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952]])
In [49]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[49]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952],
[ 1, 10],
[ 2, 11]])
Note that empty does not mean the same thing as the list []. It's a real 2x2 array, with 'unspecified' values. And those values remain when we add other arrays to it.
I could start with an array with a 0 dimension:
In [51]: arr = np.empty((0,2),int)
In [52]: arr
Out[52]: array([], shape=(0, 2), dtype=int64)
In [53]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[53]:
array([[ 1, 10],
[ 2, 11]])
That looks more like the list append approach. But why start with the (0,2) array in the first place?
np.concatenate takes a list of arrays (or lists that can be made into arrays). I used nested lists that make (1,2) arrays. With this I can join them on axis 0.
Each concatenate makes a new array. So if done iteratively it is more expensive than the list append.
np.append just takes 2 arrays and does a concatenate. So doesn't add much. hstack tweaks shapes and joins on the 2nd (horizontal) dimension. vstack is another variant. But they all end up using concatenate.

With the hstack method, you can just reshape after you get the final array:
arr = arr.reshape(-1, 2)
print(arr)
The other method can be more easily done in a similar way:
arr1 = np.append(arr1, np.array([i[0], i[1]]) # in the loop
arr1 = arr1.reshape(-1, 2)
print(arr1)

Concatenating two NumPy arrays gives "ValueError: all the input arrays must have same number of dimensions"

header
output:
array(['Subject_ID', 'tube_label', 'sample_#', 'Relabel',
'sample_ID','cortisol_value', 'Group'], dtype='<U14')
body
output:
array([['STM002', '170714_STM002_1', 1, 1, 1, 1.98, 'HC'],
['STM002', '170714_STM002_2', 2, 2, 2, 2.44, 'HC'],], dtype=object)
testing = np.concatenate((header, body), axis=0)
ValueError Traceback (most recent call last) <ipython-input-302-efb002602b4b> in <module>()
1 # Merge names and the rest of the data in np array
2
----> 3 testing = np.concatenate((header, body), axis=0)
ValueError: all the input arrays must have same number of dimensions
Might someone be able to troubleshoot this?
I have tried different commands to merge the two (including stack) and am getting the same error. The dimensions (columns) do seem to be the same though.

You're right in trying to use numpy.concatenate() but you've to promote the first array to 2D before concatenating. Here's a simple example:
In [1]: import numpy as np
In [2]: arr1 = np.array(['Subject_ID', 'tube_label', 'sample_#', 'Relabel',
...: 'sample_ID','cortisol_value', 'Group'], dtype='<U14')
...:
In [3]: arr2 = np.array([['STM002', '170714_STM002_1', 1, 1, 1, 1.98, 'HC'],
...: ['STM002', '170714_STM002_2', 2, 2, 2, 2.44, 'HC'],], dtype=object)
...:
In [4]: arr1.shape
Out[4]: (7,)
In [5]: arr2.shape
Out[5]: (2, 7)
In [8]: concatenated = np.concatenate((arr1[None, :], arr2), axis=0)
In [9]: concatenated.shape
Out[9]: (3, 7)
And the resultant concatenated array would look like:
In [10]: concatenated
Out[10]:
array([['Subject_ID', 'tube_label', 'sample_#', 'Relabel', 'sample_ID',
'cortisol_value', 'Group'],
['STM002', '170714_STM002_1', 1, 1, 1, 1.98, 'HC'],
['STM002', '170714_STM002_2', 2, 2, 2, 2.44, 'HC']], dtype=object)
Explanation:
The reason you were getting the ValueError is because one of the arrays is 1D while the other is 2D. But, numpy.concatenate expects the arrays to be of same dimension in this case. That's why we promoted the array dimension of arr1 using None. But, you can also use numpy.newaxis in place of None

You need to align array dimensions first. You are currently trying to combine 1-dimensional and 2-dimensional arrays. After alignment, you can use numpy.vstack.
Note np.array([A]).shape returns (1, 7), while B.shape returns (2, 7). A more efficient alternative would be to use A[None, :].
Also note your array will become of dtype object, as this will accept arbitrary / mixed types.
A = np.array(['Subject_ID', 'tube_label', 'sample_#', 'Relabel',
'sample_ID','cortisol_value', 'Group'], dtype='<U14')
B = np.array([['STM002', '170714_STM002_1', 1, 1, 1, 1.98, 'HC'],
['STM002', '170714_STM002_2', 2, 2, 2, 2.44, 'HC'],], dtype=object)
res = np.vstack((np.array([A]), B))
print(res)
array([['Subject_ID', 'tube_label', 'sample_#', 'Relabel', 'sample_ID',
'cortisol_value', 'Group'],
['STM002', '170714_STM002_1', 1, 1, 1, 1.98, 'HC'],
['STM002', '170714_STM002_2', 2, 2, 2, 2.44, 'HC']], dtype=object)

Look at numpy.vstack and hstack, as well as the axis argument in np.append. Here it looks like you want vstack (i.e. the output array will have 3 columns, each with the same number of rows). You can also look into numpy.reshape, to change the shape of the input arrays so you can concatenate them.

Numpy concatenate 2D arrays with 1D array

I am trying to concatenate 4 arrays, one 1D array of shape (78427,) and 3 2D array of shape (78427, 375/81/103). Basically this are 4 arrays with features for 78427 images, in which the 1D array only has 1 value for each image.
I tried concatenating the arrays as follows:
>>> print X_Cscores.shape
(78427, 375)
>>> print X_Mscores.shape
(78427, 81)
>>> print X_Tscores.shape
(78427, 103)
>>> print X_Yscores.shape
(78427,)
>>> np.concatenate((X_Cscores, X_Mscores, X_Tscores, X_Yscores), axis=1)
This results in the following error:
Traceback (most recent call last):
File "", line 1, in
ValueError: all the input arrays must have same number of dimensions
The problem seems to be the 1D array, but I can't really see why (it also has 78427 values). I tried to transpose the 1D array before concatenating it, but that also didn't work.
Any help on what's the right method to concatenate these arrays would be appreciated!

Try concatenating X_Yscores[:, None] (or X_Yscores[:, np.newaxis] as imaluengo suggests). This creates a 2D array out of a 1D array.
Example:
A = np.array([1, 2, 3])
print A.shape
print A[:, None].shape
Output:
(3,)
(3,1)

I am not sure if you want something like:
a = np.array( [ [1,2],[3,4] ] )
b = np.array( [ 5,6 ] )
c = a.ravel()
con = np.concatenate( (c,b ) )
array([1, 2, 3, 4, 5, 6])
OR
np.column_stack( (a,b) )
array([[1, 2, 5],
[3, 4, 6]])
np.row_stack( (a,b) )
array([[1, 2],
[3, 4],
[5, 6]])

You can try this one-liner:
concat = numpy.hstack([a.reshape(dim,-1) for a in [Cscores, Mscores, Tscores, Yscores]])
The "secret" here is to reshape using the known, common dimension in one axis, and -1 for the other, and it automatically matches the size (creating a new axis if needed).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy array containing multi-dimension numpy arrays with variable shape - python

Related

Concatenate nested list of array with partial empty sublist

Masking out some rows of numpy array and recover back

How to add element to empty 2d numpy array

Concatenating two NumPy arrays gives "ValueError: all the input arrays must have same number of dimensions"

Numpy concatenate 2D arrays with 1D array

Categories

Resources