Modify multiple columns in an array numpy - python

I have a numpy array (nxn matrix), and I would like to modify only the columns which sum is 0. And I would like to assign the same value to all of these columns.
To do that, I have first taken the index of the columns that sum to 0:
sum_lines = np.sum(mat_trans, axis = 0)
indices = np.where(sum_lines == 0)[0]
then I did a loop on those indices:
for i in indices:
mat_trans[:, i] = rank_vect
so that each of these columns now has the value of the rank_vect column vector.
I was wondering if there was a way to do this without loop, something that would look like:
mat_trans[:, (np.where(sum_lines == 0)[0]))] = rank_vect
Thanks!

In [114]: arr = np.array([[0,1,2,3],[1,0,2,-3],[-1,2,0,0]])
In [115]: sumlines = np.sum(arr, axis=0)
In [116]: sumlines
Out[116]: array([0, 3, 4, 0])
In [117]: idx = np.where(sumlines==0)[0]
In [118]: idx
Out[118]: array([0, 3])
So the columns that we want to modify are:
In [119]: arr[:,idx]
Out[119]:
array([[ 0, 3],
[ 1, -3],
[-1, 0]])
In [120]: rv = np.array([10,11,12])
If rv is 1d, we get a shape error:
In [121]: arr[:,idx] = rv
ValueError: shape mismatch: value array of shape (3,) could not be broadcast to indexing result of shape (2,3)
But if it is a column vector (shape (3,1)) it can be broadcast to the (3,2) target:
In [122]: arr[:,idx] = rv[:,None]
In [123]: arr
Out[123]:
array([[10, 1, 2, 10],
[11, 0, 2, 11],
[12, 2, 0, 12]])

This should do the trick
mat_trans[:,indices] = np.stack((rank_vect,)*indices.size,-1)
Please test and let me know if it does what you want. It just stacks the rank_vect repeatedly to match the shape of the LHS on the RHS.
I believe this is equivalent to
for i in indices:
mat_trans[:, i] = rank_vec
I'd be interested to know the speed difference

Related

Masking out some rows of numpy array and recover back

I have a mask with a mask_re:(8781288, 1) including ones and zeros, label file (y_lbl:(8781288, 1)) and a feature vector with feat_re: (8781288, 64). I need to take only those rows from feature vector and label files that are 1 in the mask file. how can I do this, and how can apply the opposite action of putting (recovering back) prediction values (ypred) in the masked_label file based on the mask file in the elements that are one?
For example in Matlab can be done easily X=feat_re(mask_re==1) and can be recovered back new_lbl(mask_re==1)=ypred, where new_lbl=zeros(8781288, 1). I tried to do a similar thing in python:
X=feat_re[np.where(mask_re==1),:]
X.shape
(2, 437561, 64)
EDITED (SOLVED) According to what #hpaulj suggested
The problem was with the shape of my mask file, once I changed it to mask_new=mask_re.reshape((8781288)), it solved my issue, and then
X=feat_re[mask_new==1,:]
(437561, 64)
In [182]: arr = np.arange(12).reshape(3,4)
In [183]: mask = np.array([1,0,1], bool)
In [184]: arr[mask,:]
Out[184]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])
In [185]: new = np.zeros_like(arr)
In [186]: new[mask,:] = np.array([10,12,14,16])
In [187]: new
Out[187]:
array([[10, 12, 14, 16],
[ 0, 0, 0, 0],
[10, 12, 14, 16]])
I suspect your error comes from the shape of mask:
In [188]: mask1 = mask[:,None]
In [189]: mask1.shape
Out[189]: (3, 1)
In [190]: arr[mask1,:]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-190-6317c3ea0302> in <module>
----> 1 arr[mask1,:]
IndexError: too many indices for array
Remember, numpy can have 1d and 0d arrays; it doesn't force everything to be 2d.
With where (aka nonzero):
In [191]: np.nonzero(mask)
Out[191]: (array([0, 2]),) # 1 element tuple
In [192]: np.nonzero(mask1)
Out[192]: (array([0, 2]), array([0, 0])) # 2 element tuple
In [193]: arr[_191] # using the mask index
Out[193]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])
you can use boolean indexing for masking like below
X = feat_re[mask_re==1, :]
X = X.reshape(2, -1, 64)
this selects rows of feat_re where (mask_re==1) is True. Then you can reshape x using reshape function. you can again use reshape to get back to same array shape. "-1" in reshape indicate the size need to be calculated by numpy

How to add element to empty 2d numpy array

I'm trying to insert elements to an empty 2d numpy array. However, I am not getting what I want.
I tried np.hstack but it is giving me a normal array only. Then I tried using append but it is giving me an error.
Error:
ValueError: all the input arrays must have same number of dimensions
randomReleaseAngle1 = np.random.uniform(20.0, 77.0, size=(5, 1))
randomVelocity1 = np.random.uniform(40.0, 60.0, size=(5, 1))
randomArray =np.concatenate((randomReleaseAngle1,randomVelocity1),axis=1)
arr1 = np.empty((2,2), float)
arr = np.array([])
for i in randomArray:
data = [[170, 68.2, i[0], i[1]]]
df = pd.DataFrame(data, columns = ['height', 'release_angle', 'velocity', 'holding_angle'])
test_y_predictions = model.predict(df)
print(test_y_predictions)
if (np.any(test_y_predictions == 1)):
arr = np.hstack((arr, np.array([i[0], i[1]])))
arr1 = np.append(arr1, np.array([i[0], i[1]]), axis=0)
print(arr)
print(arr1)
I wanted to get something like
[[1.5,2.2],
[3.3,4.3],
[7.1,7.3],
[3.3,4.3],
[3.3,4.3]]
However, I'm getting
[56.60290125 49.79106307 35.45102444 54.89380834 47.09359271 49.19881675
22.96523274 44.52753514 67.19027156 54.10421167]
The recommended list append approach:
In [39]: alist = []
In [40]: for i in range(3):
...: alist.append([i, i+10])
...:
In [41]: alist
Out[41]: [[0, 10], [1, 11], [2, 12]]
In [42]: np.array(alist)
Out[42]:
array([[ 0, 10],
[ 1, 11],
[ 2, 12]])
If we start with a empty((2,2)) array:
In [47]: arr = np.empty((2,2),int)
In [48]: arr
Out[48]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952]])
In [49]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[49]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952],
[ 1, 10],
[ 2, 11]])
Note that empty does not mean the same thing as the list []. It's a real 2x2 array, with 'unspecified' values. And those values remain when we add other arrays to it.
I could start with an array with a 0 dimension:
In [51]: arr = np.empty((0,2),int)
In [52]: arr
Out[52]: array([], shape=(0, 2), dtype=int64)
In [53]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[53]:
array([[ 1, 10],
[ 2, 11]])
That looks more like the list append approach. But why start with the (0,2) array in the first place?
np.concatenate takes a list of arrays (or lists that can be made into arrays). I used nested lists that make (1,2) arrays. With this I can join them on axis 0.
Each concatenate makes a new array. So if done iteratively it is more expensive than the list append.
np.append just takes 2 arrays and does a concatenate. So doesn't add much. hstack tweaks shapes and joins on the 2nd (horizontal) dimension. vstack is another variant. But they all end up using concatenate.
With the hstack method, you can just reshape after you get the final array:
arr = arr.reshape(-1, 2)
print(arr)
The other method can be more easily done in a similar way:
arr1 = np.append(arr1, np.array([i[0], i[1]]) # in the loop
arr1 = arr1.reshape(-1, 2)
print(arr1)

Removing max and min elements of array from mean calculation

I am hoping to delete the highest number and the lowest number from the array 3*4. Let's say, the data looks like this:
a=np.array([[1,4,5,10],[2,6,5,0],[3,9,9,0]])
so I expected to see the result like this:
deleted_data=[4,5],[2,5],[3]
Could you advise me how to delete the max and min from each array?
to do so, I did like this (UPDATE):
#to find out the max / min values:
b = np.max(a,1) #max
c = np.min(a,1) #min
#creating dataset after deleting max & min
d=(a!=b[:,None]) & (a!=c[:,None])
f=[i[j] for i,j in zip(a, d)]
output: [array([8, 7, 7, 9, 9, 8]), array([8, 7, 8, 6, 8, 8]), array([9, 8, 9, 9, 8]), array([6, 7, 7, 6, 6, 7]), array([7, 7, 7, 7, 6])]
Now I am not sure how to calculate the mean of the list objects?
I would like to calculate the mean of each array, so I have tried this:
mean1=f.mean(axis=0)
but it did not work.
Another method is to use a Masked Array
import numpy.ma as ma
mask = np.logical_or(a == a.max(1, keepdims = 1), a == a.min(1, keepdims = 1))
a_masked = ma.masked_array(a, mask = mask)
from there if you want an average of the unmasked elements you can just do
a_masked.mean()
Or you could even do the mean of the rows
a_masked.mean(1).data
or columns (strange, but seems to be what you're asking for)
a_masked.mean(0).data
A python list has a remove method.
With a utility function we could remove the min and max elements from a row:
def foo(i,j,k):
il = i.tolist()
il.remove(j)
il.remove(k)
return il
In [230]: [foo(i,j,k) for i,j,k in zip(a,b,c)]
Out[230]: [[4, 5], [2, 5], [3, 9]]
This could be turned back into an array with np.array(...). Note that this removed just one of the 9 in the last row. If it had removed both, the last list would have just 1 value, and the result could not be turned back into a 2d array.
I'm sure we could come up with a pure-array method, possibly useing argmax and argmin instead of max and min. But I think the list approach is a better starting point for a Python beginner.
An array masking approach
In [232]: bi = np.argmax(a,1)
In [233]: ci = np.argmin(a,1)
In [234]: bi
Out[234]: array([3, 1, 1], dtype=int32)
In [235]: ci
Out[235]: array([0, 3, 3], dtype=int32)
In [243]: mask = np.ones_like(a, bool)
In [244]: mask[np.arange(3),bi]=False
In [245]: mask[np.arange(3),ci]=False
In [246]: mask
Out[246]:
array([[False, True, True, False],
[ True, False, True, False],
[ True, False, True, False]], dtype=bool)
In [247]: a[mask]
Out[247]: array([4, 5, 2, 5, 3, 9])
In [248]: _.reshape(3,-1)
Out[248]:
array([[4, 5],
[2, 5],
[3, 9]])
Again this is better if we just delete one max and one min from each row.
Another masking approach:
In [257]: (a!=b[:,None]) & (a!=c[:,None])
Out[257]:
array([[False, True, True, False],
[ True, False, True, False],
[ True, False, False, False]], dtype=bool)
In [258]: a[(a!=b[:,None]) & (a!=c[:,None])]
Out[258]: array([4, 5, 2, 5, 3])
This does remove all '9's in the last row. But it does not preserve the row split.
This preserves the row structure, and allows variable lengths:
In [259]: mask=(a!=b[:,None]) & (a!=c[:,None])
In [260]: [i[j] for i,j in zip(a, mask)]
Out[260]: [array([4, 5]), array([2, 5]), array([3])]
As #hpaulj predicted, there is an array-only method. And it's a doozy. As a one-liner:
a[np.arange(a.shape[0])[:, None], np.sort(np.argpartition(a, (0,-1), axis = 1)[:, 1:-1], axis = 1)]
Let's break that down:
y_ = np.argpartition(a, (0,-1), axis = 1)[:, 1:-1]
argpartiton takes the index of the 0th (smallest) and -1th (largest) elements of each row and moves them to the first and last position repsectively. [:,1:-1] indexes everything else. Now argpartition can sometimes reorder the rest of the elements, so
y = np.sort(y_ , axis = 1)
We sort the rest of the indices back to their orginal positions. Now we have a y.shape -> (m, n-2) array of indices with the max and min removed, for your original (m, n) = a.shape array.
Now to use this, we need the row indicies as well.
x = np.arange(a.shape[0])[:, None]
arange just gives the m row indices. To broadcast this x.shape -> (a.shape[0],) -> (m,) array to your index array, you need the [:, None] to make x.shape -> (m, 1). Now the m lines up for broadcasting and you have your two sets of indices.
a[x, y]
array([[4, 5],
[2, 5],
[3, 9]])
You could get to the final destination of average of elements that are not the max or min per row in two steps with masking -
In [140]: a # input array
Out[140]:
array([[ 1, 4, 5, 10],
[ 2, 6, 5, 0],
[ 3, 9, 9, 0]])
In [141]: m = (a!=a.min(1,keepdims=1)) & (a!=a.max(1,keepdims=1))
In [142]: (a*m).sum(1)/m.sum(1).astype(float)
Out[142]: array([ 4.5, 3.5, 3. ])
This avoids the mess of creating the intermediate ragged arrays, which arent the most convenient data formats to operate with NumPy funcs.
Alternatively, for performance boost, use np.einsum to get the equivalent of (a*m).sum(1) with np.einsum('ij,ij->i',a,m).
Runtime test on bigger array -
In [181]: np.random.seed(0)
In [182]: a = np.random.randint(0,10,(5000,5000))
# #Daniel F' soln from https://stackoverflow.com/a/47325431/
In [183]: %%timeit
...: mask = np.logical_or(a == a.max(1, keepdims = 1), a == a.min(1, keepdims = 1))
...: a_masked = ma.masked_array(a, mask = mask)
...: out = a_masked.mean(1).data
1 loop, best of 3: 251 ms per loop
# Posted in here
In [184]: %%timeit
...: m = (a!=a.min(1,keepdims=1)) & (a!=a.max(1,keepdims=1))
...: out = (a*m).sum(1)/m.sum(1).astype(float)
10 loops, best of 3: 165 ms per loop
# Posted in here with additional einsum
In [185]: %%timeit
...: m = (a!=a.min(1,keepdims=1)) & (a!=a.max(1,keepdims=1))
...: out = np.einsum('ij,ij->i',a,m)/m.sum(1).astype(float)
10 loops, best of 3: 124 ms per loop
If the question is to remove min and/or max elements from a numpy array arr then this is the easiest way in my opinion.
np.delete(arr, np.argmax(arr))
example
tmp = np.random.random(3)
print(tmp)
tmp = np.delete(tmp, np.argmax(tmp))
print(tmp)
returns
[0.7366768 0.65492774 0.93632866]
[0.7366768 0.65492774]

How to group values in matrix with items of unequal length

Lets say I have a simple array:
a = np.arange(3)
And an array of indices with the same length:
I = np.array([0, 0, 1])
I now want to group the values based on the indices.
How would I group the elements of the first array to produce the result below?
np.array([[0, 1], [2], dtype=object)
Here is what I tried:
a = np.arange(3)
I = np.array([0, 0, 1])
out = np.empty(2, dtype=object)
out.fill([])
aslists = np.vectorize(lambda x: [x], otypes=['object'])
out[I] += aslists(a)
However, this approach does not concatenate the lists, but only maintains the last value for each index:
array([[1], [2]], dtype=object)
Or, for a 2-dimensional case:
a = np.random.rand(100)
I = (np.random.random(100) * 5 //1).astype(int)
J = (np.random.random(100) * 5 //1).astype(int)
out = np.empty((5, 5), dtype=object)
out.fill([])
How can I append the items from a to out based on the two index arrays?
1D Case
Assuming I being sorted, for a list of arrays as output -
idx = np.unique(I, return_index=True)[1]
out = np.split(a,idx)[1:]
Another with slicing to get idx for splitting a -
out = np.split(a, np.flatnonzero(I[1:] != I[:-1])+1)
To get an array of lists as output -
np.array([i.tolist() for i in out])
Sample run -
In [84]: a = np.arange(3)
In [85]: I = np.array([0, 0, 1])
In [86]: out = np.split(a, np.flatnonzero(I[1:] != I[:-1])+1)
In [87]: out
Out[87]: [array([0, 1]), array([2])]
In [88]: np.array([i.tolist() for i in out])
Out[88]: array([[0, 1], [2]], dtype=object)
2D Case
For 2D case of filling into a 2D array with groupings made from indices in two arrays I and J that represent the rows and columns where the groups are to be assigned, we could do something like this -
ncols = 5
lidx = I*ncols+J
sidx = lidx.argsort() # Use kind='mergesort' to keep order
lidx_sorted = lidx[sidx]
unq_idx, split_idx = np.unique(lidx_sorted, return_index=True)
out.flat[unq_idx] = np.split(a[sidx], split_idx)[1:]

using numpy.unravel_index

Hi I have a 2x4 array called mi_reshaped. I used the argmax to find out the indeces of the largest elements in my array. Now I want to convert these indeces to x,y coordinates. So I used the numpy.unravel_index. I get this error:
Traceback (most recent call last):
File "CAfeb.py", line 273, in <module>
analyzeCA('full', im)
File "CAfeb.py", line 80, in analyzeCA
bg_params = parameterSearch( im, [3, 2], roi, ew, hist_sz, w_data);
File "CAfeb.py", line 185, in parameterSearch
ix = np.unravel_index(max_ix, mi_reshaped.shape)#(mi.size)
File "/usr/lib/pymodules/python2.7/numpy/lib/index_tricks.py", line 64, in unravel_index
if x > _nx.prod(dims)-1 or x < 0:
ValueError: The truth value of an array with more than one element isambiguous.
a.any() or a.all()
mi_reshaped=mi.reshape(2,4)
max_ix = np.argmax(mi_reshaped, axis=1)
ix = np.unravel_index(max_ix, mi_reshaped.shape)#(mi.size)
Thank you
You should skip the axis=1 for this. If you do a numpy.argmax(array) it will look for max in the flattened array, and then you can do the unravel_index with the array shape to find the actual index. When you pass the axis, numpy will look for the maximum for that axis for each entry in the array. For example:
>>>data = numpy.array(range(8)).reshape(2, 4)
>>>data
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
>>>max_ix = numpy.argmax(data, axis=1)
>>>max_ix
array([3, 3])
>>>numpy.unravel_index(max_ix, data.shape)
(array([0, 0]), array([3, 3]))
Now if you skip the axis:
>>>max_ix = numpy.argmax(data)
>>>max_ix
7
>>>numpy.unravel_index(max_ix, data.shape)
(1, 3)
Now what happened is you told numpy to give you the index for maximums on the dimension 1 and it finds the maximums '3' and '7' with indexes [3, 3]. Still you should't get an error with your code, just the wrong final result.
np.unravel_index expects an integer as its first argument. max_ix is an array.
Moreover, each value in max_ix is an index with respect to the second axis (axis = 1) of mi.
Try instead:
ix = [(row, ix) for row, ix in enumerate(max_ix)]
For example,
In [89]: mi_reshaped = np.array(range(8)).reshape(2, 4)
In [90]: mi_reshaped
Out[90]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
In [91]: max_ix = np.argmax(mi_reshaped, axis=1)
In [92]: max_ix
Out[92]: array([3, 3])
In [93]: ix = [(row, ix) for row, ix in enumerate(max_ix)]
In [94]: ix
Out[94]: [(0, 3), (1, 3)]

Categories