Masking out some rows of numpy array and recover back - python

I have a mask with a mask_re:(8781288, 1) including ones and zeros, label file (y_lbl:(8781288, 1)) and a feature vector with feat_re: (8781288, 64). I need to take only those rows from feature vector and label files that are 1 in the mask file. how can I do this, and how can apply the opposite action of putting (recovering back) prediction values (ypred) in the masked_label file based on the mask file in the elements that are one?
For example in Matlab can be done easily X=feat_re(mask_re==1) and can be recovered back new_lbl(mask_re==1)=ypred, where new_lbl=zeros(8781288, 1). I tried to do a similar thing in python:
X=feat_re[np.where(mask_re==1),:]
X.shape
(2, 437561, 64)
EDITED (SOLVED) According to what #hpaulj suggested
The problem was with the shape of my mask file, once I changed it to mask_new=mask_re.reshape((8781288)), it solved my issue, and then
X=feat_re[mask_new==1,:]
(437561, 64)

In [182]: arr = np.arange(12).reshape(3,4)
In [183]: mask = np.array([1,0,1], bool)
In [184]: arr[mask,:]
Out[184]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])
In [185]: new = np.zeros_like(arr)
In [186]: new[mask,:] = np.array([10,12,14,16])
In [187]: new
Out[187]:
array([[10, 12, 14, 16],
[ 0, 0, 0, 0],
[10, 12, 14, 16]])
I suspect your error comes from the shape of mask:
In [188]: mask1 = mask[:,None]
In [189]: mask1.shape
Out[189]: (3, 1)
In [190]: arr[mask1,:]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-190-6317c3ea0302> in <module>
----> 1 arr[mask1,:]
IndexError: too many indices for array
Remember, numpy can have 1d and 0d arrays; it doesn't force everything to be 2d.
With where (aka nonzero):
In [191]: np.nonzero(mask)
Out[191]: (array([0, 2]),) # 1 element tuple
In [192]: np.nonzero(mask1)
Out[192]: (array([0, 2]), array([0, 0])) # 2 element tuple
In [193]: arr[_191] # using the mask index
Out[193]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])

you can use boolean indexing for masking like below
X = feat_re[mask_re==1, :]
X = X.reshape(2, -1, 64)
this selects rows of feat_re where (mask_re==1) is True. Then you can reshape x using reshape function. you can again use reshape to get back to same array shape. "-1" in reshape indicate the size need to be calculated by numpy

Related

numpy array containing multi-dimension numpy arrays with variable shape

I have a list of numpy arrays, whose shape is one of the following: (10,4,4,20), (10,4,6,20). I want to convert the list to a numpy array. Since, they are of different shapes, I can't just stack them. So, I thought of creating numpy array considering each array as an object, as in here. I tried the below:
b = numpy.array(a)
b = numpy.array(a, dtype=object)
where a is the list of numpy arrays. Both are giving me the following error:
ValueError: could not broadcast input array from shape (10,4,4,20) into shape (10,4)
How can I convert that list to numpy array?
Example:
import numpy
a = [numpy.random.random((10,4,4,20)),
numpy.random.random((10,4,6,20)),
numpy.random.random((10,4,6,20)),
numpy.random.random((10,4,4,20)),
numpy.random.random((10,4,6,20)),
numpy.random.random((10,4,6,20)),
numpy.random.random((10,4,4,20)),
numpy.random.random((10,4,4,20)),
numpy.random.random((10,4,6,20))
]
b = numpy.array(a)
Use Case:
I know numpy array of objects are not efficient, but I'm not doing any operations on them. Usually, I have a list of same shape numpy arrays and so I can easily stack them. This array is passed to another function, which selects certain elements only. If my data is numpy array, I can just do b[[1,3,8]]. But I can't do the same with list. I get the following error if I try the same with list
c = a[[1,3,8]]
TypeError: list indices must be integers or slices, not list
np.array(alist) will make an object dtype array if the list arrays differ in the first dimension. But in your case they differ in the 3rd, producing this error. In effect, it can't unambiguously determine where the containing dimension ends, and where the objects begin.
In [270]: alist = [np.ones((10,4,4,20),int), np.zeros((10,4,6,20),int)]
In [271]: arr = np.array(alist)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-271-3fd8e9bd05a9> in <module>
----> 1 arr = np.array(alist)
ValueError: could not broadcast input array from shape (10,4,4,20) into shape (10,4)
Instead we need to make an object array of the right size, and copy the list to it. Sometimes this copy still produces broadcasting errors, but here it seems to be ok:
In [272]: arr = np.empty(2, object)
In [273]: arr
Out[273]: array([None, None], dtype=object)
In [274]: arr[:] = alist
In [275]: arr
Out[275]:
array([array([[[[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
...
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]]]])], dtype=object)
In [276]: arr[0].shape
Out[276]: (10, 4, 4, 20)
In [277]: arr[1].shape
Out[277]: (10, 4, 6, 20)

How to add element to empty 2d numpy array

I'm trying to insert elements to an empty 2d numpy array. However, I am not getting what I want.
I tried np.hstack but it is giving me a normal array only. Then I tried using append but it is giving me an error.
Error:
ValueError: all the input arrays must have same number of dimensions
randomReleaseAngle1 = np.random.uniform(20.0, 77.0, size=(5, 1))
randomVelocity1 = np.random.uniform(40.0, 60.0, size=(5, 1))
randomArray =np.concatenate((randomReleaseAngle1,randomVelocity1),axis=1)
arr1 = np.empty((2,2), float)
arr = np.array([])
for i in randomArray:
data = [[170, 68.2, i[0], i[1]]]
df = pd.DataFrame(data, columns = ['height', 'release_angle', 'velocity', 'holding_angle'])
test_y_predictions = model.predict(df)
print(test_y_predictions)
if (np.any(test_y_predictions == 1)):
arr = np.hstack((arr, np.array([i[0], i[1]])))
arr1 = np.append(arr1, np.array([i[0], i[1]]), axis=0)
print(arr)
print(arr1)
I wanted to get something like
[[1.5,2.2],
[3.3,4.3],
[7.1,7.3],
[3.3,4.3],
[3.3,4.3]]
However, I'm getting
[56.60290125 49.79106307 35.45102444 54.89380834 47.09359271 49.19881675
22.96523274 44.52753514 67.19027156 54.10421167]
The recommended list append approach:
In [39]: alist = []
In [40]: for i in range(3):
...: alist.append([i, i+10])
...:
In [41]: alist
Out[41]: [[0, 10], [1, 11], [2, 12]]
In [42]: np.array(alist)
Out[42]:
array([[ 0, 10],
[ 1, 11],
[ 2, 12]])
If we start with a empty((2,2)) array:
In [47]: arr = np.empty((2,2),int)
In [48]: arr
Out[48]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952]])
In [49]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[49]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952],
[ 1, 10],
[ 2, 11]])
Note that empty does not mean the same thing as the list []. It's a real 2x2 array, with 'unspecified' values. And those values remain when we add other arrays to it.
I could start with an array with a 0 dimension:
In [51]: arr = np.empty((0,2),int)
In [52]: arr
Out[52]: array([], shape=(0, 2), dtype=int64)
In [53]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[53]:
array([[ 1, 10],
[ 2, 11]])
That looks more like the list append approach. But why start with the (0,2) array in the first place?
np.concatenate takes a list of arrays (or lists that can be made into arrays). I used nested lists that make (1,2) arrays. With this I can join them on axis 0.
Each concatenate makes a new array. So if done iteratively it is more expensive than the list append.
np.append just takes 2 arrays and does a concatenate. So doesn't add much. hstack tweaks shapes and joins on the 2nd (horizontal) dimension. vstack is another variant. But they all end up using concatenate.
With the hstack method, you can just reshape after you get the final array:
arr = arr.reshape(-1, 2)
print(arr)
The other method can be more easily done in a similar way:
arr1 = np.append(arr1, np.array([i[0], i[1]]) # in the loop
arr1 = arr1.reshape(-1, 2)
print(arr1)

Modify multiple columns in an array numpy

I have a numpy array (nxn matrix), and I would like to modify only the columns which sum is 0. And I would like to assign the same value to all of these columns.
To do that, I have first taken the index of the columns that sum to 0:
sum_lines = np.sum(mat_trans, axis = 0)
indices = np.where(sum_lines == 0)[0]
then I did a loop on those indices:
for i in indices:
mat_trans[:, i] = rank_vect
so that each of these columns now has the value of the rank_vect column vector.
I was wondering if there was a way to do this without loop, something that would look like:
mat_trans[:, (np.where(sum_lines == 0)[0]))] = rank_vect
Thanks!
In [114]: arr = np.array([[0,1,2,3],[1,0,2,-3],[-1,2,0,0]])
In [115]: sumlines = np.sum(arr, axis=0)
In [116]: sumlines
Out[116]: array([0, 3, 4, 0])
In [117]: idx = np.where(sumlines==0)[0]
In [118]: idx
Out[118]: array([0, 3])
So the columns that we want to modify are:
In [119]: arr[:,idx]
Out[119]:
array([[ 0, 3],
[ 1, -3],
[-1, 0]])
In [120]: rv = np.array([10,11,12])
If rv is 1d, we get a shape error:
In [121]: arr[:,idx] = rv
ValueError: shape mismatch: value array of shape (3,) could not be broadcast to indexing result of shape (2,3)
But if it is a column vector (shape (3,1)) it can be broadcast to the (3,2) target:
In [122]: arr[:,idx] = rv[:,None]
In [123]: arr
Out[123]:
array([[10, 1, 2, 10],
[11, 0, 2, 11],
[12, 2, 0, 12]])
This should do the trick
mat_trans[:,indices] = np.stack((rank_vect,)*indices.size,-1)
Please test and let me know if it does what you want. It just stacks the rank_vect repeatedly to match the shape of the LHS on the RHS.
I believe this is equivalent to
for i in indices:
mat_trans[:, i] = rank_vec
I'd be interested to know the speed difference

Euclidean distances between several images and one base image

I have a matrix X of dimensions (30x8100) and another one Y of dimensions (1x8100). I want to generate an array containing the difference between them (X[1]-Y, X[2]-Y,..., X[30]-Y)
Can anyone help?
All you need for that is
X - Y
Since several people have offered answers that seem to try to make the shapes match manually, I should explain:
Numpy will automatically expand Y's shape so that it matches with that of X. This is called broadcasting, and it usually does a very good job of guessing what should be done. In ambiguous cases, an axis keyword can be applied to tell it which direction to do things. Here, since Y has a dimension of length 1, that is the axis that is expanded to be length 30 to match with X's shape.
For example,
In [87]: import numpy as np
In [88]: n, m = 3, 5
In [89]: x = np.arange(n*m).reshape(n,m)
In [90]: y = np.arange(m)[None,...]
In [91]: x.shape
Out[91]: (3, 5)
In [92]: y.shape
Out[92]: (1, 5)
In [93]: (x-y).shape
Out[93]: (3, 5)
In [106]: x
Out[106]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [107]: y
Out[107]: array([[0, 1, 2, 3, 4]])
In [108]: x-y
Out[108]:
array([[ 0, 0, 0, 0, 0],
[ 5, 5, 5, 5, 5],
[10, 10, 10, 10, 10]])
But this is not really a euclidean distance, as your title seems to suggest you want:
df = np.asarray(x - y) # the difference between the images
dst = np.sqrt(np.sum(df**2, axis=1)) # their euclidean distances
use array and use numpy broadcasting in order to subtract it from Y
init the matrix:
>>> from numpy import *
>>> a = array([[1,2,3],[4,5,6]])
Accessing the second row in a:
>>> a[1]
array([4, 5, 6])
Subtract array from Y
>>> Y = array([3,9,0])
>>> a - Y
array([[-2, -7, 3],
[ 1, -4, 6]])
Just iterate rows from your numpy array and you can actually just subtract them and numpy will make a new array with the differences!
import numpy as np
final_array = []
#X is a numpy array that is 30X8100 and Y is a numpy array that is 1X8100
for row in X:
output = row - Y
final_array.append(output)
output will be your resulting array of X[0] - Y, X[1] - Y etc. Now your final_array will be an array with 30 arrays inside, each that have the values of the X-Y that you need! Simple as that. Just make sure you convert your matrices to a numpy arrays first
Edit: Since numpy broadcasting will do the iteration, all you need is one line once you have your two arrays:
final_array = X - Y
And then that is your array with the differences!
a1 = numpy.array(X) #make sure you have a numpy array like [[1,2,3],[4,5,6],...]
a2 = numpy.array(Y) #make sure you have a 1d numpy array like [1,2,3,...]
a2 = [a2] * len(a1[0]) #make a2 as wide as a1
a2 = numpy.array(zip(*a2)) #transpose it (a2 is now same shape as a1)
print a1-a2 #idiomatic difference between a1 and a2 (or X and Y)

using numpy.unravel_index

Hi I have a 2x4 array called mi_reshaped. I used the argmax to find out the indeces of the largest elements in my array. Now I want to convert these indeces to x,y coordinates. So I used the numpy.unravel_index. I get this error:
Traceback (most recent call last):
File "CAfeb.py", line 273, in <module>
analyzeCA('full', im)
File "CAfeb.py", line 80, in analyzeCA
bg_params = parameterSearch( im, [3, 2], roi, ew, hist_sz, w_data);
File "CAfeb.py", line 185, in parameterSearch
ix = np.unravel_index(max_ix, mi_reshaped.shape)#(mi.size)
File "/usr/lib/pymodules/python2.7/numpy/lib/index_tricks.py", line 64, in unravel_index
if x > _nx.prod(dims)-1 or x < 0:
ValueError: The truth value of an array with more than one element isambiguous.
a.any() or a.all()
mi_reshaped=mi.reshape(2,4)
max_ix = np.argmax(mi_reshaped, axis=1)
ix = np.unravel_index(max_ix, mi_reshaped.shape)#(mi.size)
Thank you
You should skip the axis=1 for this. If you do a numpy.argmax(array) it will look for max in the flattened array, and then you can do the unravel_index with the array shape to find the actual index. When you pass the axis, numpy will look for the maximum for that axis for each entry in the array. For example:
>>>data = numpy.array(range(8)).reshape(2, 4)
>>>data
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
>>>max_ix = numpy.argmax(data, axis=1)
>>>max_ix
array([3, 3])
>>>numpy.unravel_index(max_ix, data.shape)
(array([0, 0]), array([3, 3]))
Now if you skip the axis:
>>>max_ix = numpy.argmax(data)
>>>max_ix
7
>>>numpy.unravel_index(max_ix, data.shape)
(1, 3)
Now what happened is you told numpy to give you the index for maximums on the dimension 1 and it finds the maximums '3' and '7' with indexes [3, 3]. Still you should't get an error with your code, just the wrong final result.
np.unravel_index expects an integer as its first argument. max_ix is an array.
Moreover, each value in max_ix is an index with respect to the second axis (axis = 1) of mi.
Try instead:
ix = [(row, ix) for row, ix in enumerate(max_ix)]
For example,
In [89]: mi_reshaped = np.array(range(8)).reshape(2, 4)
In [90]: mi_reshaped
Out[90]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
In [91]: max_ix = np.argmax(mi_reshaped, axis=1)
In [92]: max_ix
Out[92]: array([3, 3])
In [93]: ix = [(row, ix) for row, ix in enumerate(max_ix)]
In [94]: ix
Out[94]: [(0, 3), (1, 3)]

Categories