From argwhere to where? - python

Is there a fast way of getting the output of argwhere in the output of where format ?
Let me show you what I'm doing with a bit of code:
In [123]: filter = np.where(scores[:,:,:,4,:] > 21000)
In [124]: filter
Out[124]:
(array([ 2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 23, 23, 23, 23, 23]),
array([13, 13, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5]),
array([0, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2]),
array([44, 44, 0, 1, 2, 3, 6, 8, 12, 14, 22, 31, 58, 76, 82, 41]))
In [125]: filter2 = np.argwhere(scores[:,:,:,4,:] > 21000)
In [126]: filter2
Out[126]:
array([[ 2, 13, 0, 44],
[ 2, 13, 1, 44],
[ 4, 4, 3, 0],
[ 4, 4, 3, 1],
[ 4, 4, 3, 2],
[ 4, 4, 3, 3],
[ 4, 4, 3, 6],
[ 4, 4, 3, 8],
[ 4, 4, 3, 12],
[ 4, 4, 3, 14],
[ 4, 4, 3, 22],
[23, 4, 2, 31],
[23, 4, 2, 58],
[23, 4, 2, 76],
[23, 4, 2, 82],
[23, 5, 2, 41]])
In [150]: scores[:,:,:,4,:][filter]
Out[150]:
array([ 21344., 21344., 24672., 24672., 24672., 24672., 25232.,
25232., 25232., 25232., 24672., 21152., 21152., 21152.,
21152., 21344.], dtype=float16)
In [129]: filter2[np.argsort(scores[:,:,:,4,:][filter])]
Out[129]:
array([[23, 4, 2, 31],
[23, 4, 2, 58],
[23, 4, 2, 76],
[23, 4, 2, 82],
[ 2, 13, 0, 44],
[ 2, 13, 1, 44],
[23, 5, 2, 41],
[ 4, 4, 3, 0],
[ 4, 4, 3, 1],
[ 4, 4, 3, 2],
[ 4, 4, 3, 3],
[ 4, 4, 3, 22],
[ 4, 4, 3, 6],
[ 4, 4, 3, 8],
[ 4, 4, 3, 12],
[ 4, 4, 3, 14]])
129 is my desired output, so my code works, but I'm trying to make it as fast as possible. Should I get filter2 with np.array(filter).transpose() ? Is there something even better ?
Edit, trying to put it more clearly: I want a list of indices ordered by the value they return when applied to an array. To do that, I need both the output of np.where and np.argwhere, and I'm wondering what is the fastest way to switch from one output to the other, or if there's another of getting my result.

Look at the code for argwhere:
return transpose(asanyarray(a).nonzero())
while where docs say:
where(condition, [x, y])
If only condition is given, return condition.nonzero().
In effect, both use a.nonzero(). One uses it as is, the other transposes it.
In [933]: x=np.zeros((2,3),int)
In [934]: x[[0,1,0],[0,1,2]]=1
In [935]: x
Out[935]:
array([[1, 0, 1],
[0, 1, 0]])
In [936]: x.nonzero()
Out[936]: (array([0, 0, 1], dtype=int32), array([0, 2, 1], dtype=int32))
In [937]: np.where(x) # same as nonzero()
Out[937]: (array([0, 0, 1], dtype=int32), array([0, 2, 1], dtype=int32))
In [938]: np.argwhere(x)
Out[938]:
array([[0, 0],
[0, 2],
[1, 1]], dtype=int32)
In [939]: np.argwhere(x).T
Out[939]:
array([[0, 0, 1],
[0, 2, 1]], dtype=int32)
argwhere().T is the same as where except in a 2d rather than a tuple.
np.transpose(filter) and np.array(filter).T look equally good. For a large array the time spent in nonzero is much larger than the time spent on these transformations.

Related

The most efficient way to assign several small matrices onto a large matrix in numpy

I have a big matrix A with shape (10, 10)
array([[2, 1, 2, 1, 1, 4, 3, 2, 2, 2],
[3, 2, 1, 2, 3, 3, 2, 3, 2, 4],
[1, 3, 3, 4, 2, 4, 4, 3, 4, 1],
[1, 3, 1, 3, 3, 1, 4, 2, 1, 2],
[3, 3, 1, 3, 3, 2, 3, 4, 3, 2],
[2, 4, 1, 4, 2, 1, 1, 2, 1, 1],
[2, 3, 2, 3, 1, 4, 3, 1, 2, 3],
[3, 1, 3, 2, 2, 4, 2, 3, 3, 3],
[1, 2, 3, 2, 1, 3, 4, 4, 1, 3],
[3, 1, 3, 2, 4, 3, 1, 1, 1, 1]])
and an array of positions B with shape (5, 2)
array([[4, 5], # row 4, column 5
[2, 1],
[2, 5],
[4, 1],
[6, 7]])
and several small matrices C with shape (5, 2, 2)
array([[[7, 9],
[6, 7]],
[[6, 6],
[9, 6]],
[[9, 6],
[8, 9]],
[[8, 7],
[8, 7]],
[[8, 6],
[7, 7]]])
Now, I want to assign these 5 small matrices to the large matrix. The positions are the position for the up-left corner of the small matrix. If there exists overlapping area, we can use the last one, maximum or just sum it up. The effect I want looks like
A[B] += C
A for loop implementation looks like:
for i in range(B.shape[0]):
A[B[i][0]:B[i][0]+2,B[i][1]:B[i][1]+2] += C[i]
The expected result looks like
array([[ 2, 1, 2, 1, 1, 4, 3, 2, 2, 2],
[ 3, 2, 1, 2, 3, 3, 2, 3, 2, 4],
[ 1, 9, 9, 4, 2, 13, 10, 3, 4, 1],
[ 1, 12, 7, 3, 3, 9, 13, 2, 1, 2],
[ 3, 11, 8, 3, 3, 9, 12, 4, 3, 2],
[ 2, 12, 8, 4, 2, 7, 8, 2, 1, 1],
[ 2, 3, 2, 3, 1, 4, 3, 9, 8, 3],
[ 3, 1, 3, 2, 2, 4, 2, 10, 10, 3],
[ 1, 2, 3, 2, 1, 3, 4, 4, 1, 3],
[ 3, 1, 3, 2, 4, 3, 1, 1, 1, 1]])
Is there a solution without for loop?
Your arrays:
In [58]: A = np.array([[2, 1, 2, 1, 1, 4, 3, 2, 2, 2],
...: [3, 2, 1, 2, 3, 3, 2, 3, 2, 4],
...: [1, 3, 3, 4, 2, 4, 4, 3, 4, 1],
...: [1, 3, 1, 3, 3, 1, 4, 2, 1, 2],
...: [3, 3, 1, 3, 3, 2, 3, 4, 3, 2],
...: [2, 4, 1, 4, 2, 1, 1, 2, 1, 1],
...: [2, 3, 2, 3, 1, 4, 3, 1, 2, 3],
...: [3, 1, 3, 2, 2, 4, 2, 3, 3, 3],
...: [1, 2, 3, 2, 1, 3, 4, 4, 1, 3],
...: [3, 1, 3, 2, 4, 3, 1, 1, 1, 1]])
In [59]: B=np.array([[4, 5], # row 4, column 5
...: [2, 1],
...: [2, 5],
...: [4, 1],
...: [6, 7]])
In [60]: C=np.array([[[7, 9],
...: [6, 7]],
...:
...: [[6, 6],
...: [9, 6]],
...:
...: [[9, 6],
...: [8, 9]],
...:
...: [[8, 7],
...: [8, 7]],
...:
...: [[8, 6],
...: [7, 7]]])
Your iteration, cleaned up a bit:
In [72]: for cnt,(i,j) in enumerate(B):
...: A[i:i+2, j:j+2] += C[cnt]
...:
In [73]: A
Out[73]:
array([[ 2, 1, 2, 1, 1, 4, 3, 2, 2, 2],
[ 3, 2, 1, 2, 3, 3, 2, 3, 2, 4],
[ 1, 9, 9, 4, 2, 13, 10, 3, 4, 1],
[ 1, 12, 7, 3, 3, 9, 13, 2, 1, 2],
[ 3, 11, 8, 3, 3, 9, 12, 4, 3, 2],
[ 2, 12, 8, 4, 2, 7, 8, 2, 1, 1],
[ 2, 3, 2, 3, 1, 4, 3, 9, 8, 3],
[ 3, 1, 3, 2, 2, 4, 2, 10, 10, 3],
[ 1, 2, 3, 2, 1, 3, 4, 4, 1, 3],
[ 3, 1, 3, 2, 4, 3, 1, 1, 1, 1]])
And to make the action clearer, lets start with a 0 array:
In [76]: A = np.zeros_like(Acopy)
In [77]: for cnt,(i,j) in enumerate(B):
...: A[i:i+2, j:j+2] += C[cnt]
...:
In [78]: A
Out[78]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 6, 6, 0, 0, 9, 6, 0, 0, 0],
[0, 9, 6, 0, 0, 8, 9, 0, 0, 0],
[0, 8, 7, 0, 0, 7, 9, 0, 0, 0],
[0, 8, 7, 0, 0, 6, 7, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 8, 6, 0],
[0, 0, 0, 0, 0, 0, 0, 7, 7, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
I don't see overlap, so I think we could construct an index array from B, that would allow us to:
A[B1] += C
and if there was a overlap, it would write the last C value.
If we don't like that, there is the np.add.at ufunc that can perform unbuffered addition (or even np.max.at).
But it will take some time to work out the required B1 indices.
edit
Here's a way of using +=. I'm using linspace to construct a multidimensional index, which will be used inplace of the slices. Getting shapes right took a lot of trial and error and testing (in an interactive session). As long as the blocks don't overlap this is fast and correct. But as documented with np.add.at, this won't match the iterative approach when there are duplicate indices.
In [125]: B1 = B+2
In [126]: I = np.linspace(B,B1,2,endpoint=False).astype(int)
In [127]: A1 =np.zeros_like(Acopy)
In [128]: A1[I[:,:,0][:,None], I[:,:,1]] += C.transpose(1,2,0)
In [129]: np.allclose(A1,A)
Out[129]: True
I is a (2,5,2) shape array, where the first 2 in the number of "steps"
In [130]: I
Out[130]:
array([[[4, 5],
[2, 1],
[2, 5],
[4, 1],
[6, 7]],
[[5, 6],
[3, 2],
[3, 6],
[5, 2],
[7, 8]]])
And since the C subarrays are (2,2), this is the same as: np.stack([B,B+1])
The C transpose is needed since this indexing of A1 produces a (2,2,5) array:
In [134]: A1[I[:,:,0][:,None], I[:,:,1]]
Out[134]:
array([[[7, 6, 9, 8, 8],
[9, 6, 6, 7, 6]],
[[6, 9, 8, 8, 7],
[7, 6, 9, 7, 7]]])
In [135]: _.shape
Out[135]: (2, 2, 5)
If some blocks overlap, np.add.at can be used to sum the overlaps:
In [137]: A1 =np.zeros_like(Acopy)
In [138]: np.add.at(A1, (I[:,:,0][:,None], I[:,:,1]), C.transpose(1,2,0))
In [140]: np.allclose(A1,A)
Out[140]: True
or for the largest
In [143]: np.maximum.at(A1, (I[:,:,0][:,None], I[:,:,1]), C.transpose(1,2,0))
In [144]: np.allclose(A1,A)
A simple forloop can solve this:
import numpy as np
initial = np.array([
[2, 1, 2, 1, 1, 4, 3, 2, 2, 2], [3, 2, 1, 2, 3, 3, 2, 3, 2, 4], [1, 3, 3, 4, 2, 4, 4, 3, 4, 1], [1, 3, 1, 3, 3, 1, 4, 2, 1, 2],
[3, 3, 1, 3, 3, 2, 3, 4, 3, 2], [2, 4, 1, 4, 2, 1, 1, 2, 1, 1], [2, 3, 2, 3, 1, 4, 3, 1, 2, 3], [3, 1, 3, 2, 2, 4, 2, 3, 3, 3],
[1, 2, 3, 2, 1, 3, 4, 4, 1, 3], [3, 1, 3, 2, 4, 3, 1, 1, 1, 1],
])
offsets = np.array([[4, 5], [2, 1], [2, 5], [4, 1], [6, 7]])
subarrays = np.array([
[[7, 9], [6, 7]], [[6, 6], [9, 6]], [[9, 6], [8, 9]],
[[8, 7], [8, 7]], [[8, 6], [7, 7]],
])
for subarray, offset in zip(subarrays, offsets):
(a, b), (c, d) = offset, subarray.shape
initial[a:a+c, b:b+d] += subarray
print(initial)
See, what I have tried, without using any kind of loop
import numpy as np
A=np.array([[2, 1, 2, 1, 1, 4, 3, 2, 2, 2],
[3, 2, 1, 2, 3, 3, 2, 3, 2, 4],
[1, 3, 3, 4, 2, 4, 4, 3, 4, 1],
[1, 3, 1, 3, 3, 1, 4, 2, 1, 2],
[3, 3, 1, 3, 3, 2, 3, 4, 3, 2],
[2, 4, 1, 4, 2, 1, 1, 2, 1, 1],
[2, 3, 2, 3, 1, 4, 3, 1, 2, 3],
[3, 1, 3, 2, 2, 4, 2, 3, 3, 3],
[1, 2, 3, 2, 1, 3, 4, 4, 1, 3],
[3, 1, 3, 2, 4, 3, 1, 1, 1, 1]])
B= np.array([[4, 5], # row 4, column 5
[2, 1],
[2, 5],
[4, 1],
[6, 7]])
C=np.array([[[7, 9],
[6, 7]],
[[6, 6],
[9, 6]],
[[9, 6],
[8, 9]],
[[8, 7],
[8, 7]],
[[8, 6],
[7, 7]]])
D= np.array([[ 2, 1, 2, 1, 1, 4, 3, 2, 2, 2], # this is required
[ 3, 2, 1, 2, 3, 3, 2, 3, 2, 4],
[ 1, 9, 9, 4, 2, 13, 10, 3, 4, 1],
[ 1, 12, 7, 3, 3, 9, 13, 2, 1, 2],
[ 3, 11, 8, 3, 3, 9, 12, 4, 3, 2],
[ 2, 12, 8, 4, 2, 7, 8, 2, 1, 1],
[ 2, 3, 2, 3, 1, 4, 3, 9, 8, 3],
[ 3, 1, 3, 2, 2, 4, 2, 10, 10, 3],
[ 1, 2, 3, 2, 1, 3, 4, 4, 1, 3],
[ 3, 1, 3, 2, 4, 3, 1, 1, 1, 1]])
We need A==D.
I created row and column indexes for all value of C.
b_row=np.repeat(np.c_[B[:,0],B[:,0]+1], repeats=2, axis=1).ravel()
b_col=np.repeat(np.c_[B[:,1],B[:,1]+1], repeats=2, axis=0).ravel()
print(np.c_[bx,by]) # to see indexes
A[b_row,b_col]+=C.ravel()
Now you can check
print(A==D)
False in (A==D)

Convert a 2D array into 3D array repeating existing values

I have an array of shape (360,480) containing values from 1 to 11,
Array([[ 1, 1, 1, ..., 1, 1, 1],
[ 1, 1, 1, ..., 1, 1, 1],
[ 1, 1, 1, ..., 1, 1, 1],
...,
[ 4, 4, 4, ..., 11, 11, 11],
[ 4, 4, 4, ..., 11, 11, 11],
[ 4, 4, 4, ..., 11, 11, 11]])
How could I reshape this array into an array of shape (360,480,3) in a way that
np.all(array[:,:,0]==array[:,:,1])
and
np.all(array[:,:,0]==array[:,:,2])
are both True?
The expected outcome should be
array([[[ 1, 1, 1],
[ 1, 1, 1],
[ 1, 1, 1],
...,
[ 1, 1, 1],
[ 1, 1, 1],
[ 1, 1, 1]],
[[ 4, 4, 4],
[ 4, 4, 4],
[ 4, 4, 4],
...,
[11, 11, 11],
[11, 11, 11],
[11, 11, 11]],
[[ 4, 4, 4],
[ 4, 4, 4],
[ 4, 4, 4],
...,
[11, 11, 11],
[11, 11, 11],
[11, 11, 11]]])
You could use numpy.repeat function
https://numpy.org/doc/stable/reference/generated/numpy.repeat.html
array3d = np.repeat(array2d[:, :, None], repeats=3, axis=2)

Get ndarray from pandas column when cell elements are list

I have a pandas data frame that looks like this:
>>> df = pd.DataFrame({'a': list(range(10))})
>>> df['a'] = df.a.apply(lambda x: x*np.array([1,2,3]))
>>>df.head()
a
0 [0, 0, 0]
1 [1, 2, 3]
2 [2, 4, 6]
3 [3, 6, 9]
4 [4, 8, 12]
I would like to get column a from the df as a ndarray. But when I do that I get an array of arrays
>>> df.a.values
array([array([0, 0, 0]), array([1, 2, 3]), array([2, 4, 6]),
array([3, 6, 9]), array([ 4, 8, 12]), array([ 5, 10, 15]),
array([ 6, 12, 18]), array([ 7, 14, 21]), array([ 8, 16, 24]),
array([ 9, 18, 27])], dtype=object)
How can I get the returnd output to be
array([[ 0, 0, 0],
[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9],
[ 4, 8, 12],
# ...
])
Using pandas,
df.a.apply(pd.Series).values
Using numpy,
np.vstack(df.a.values)
You get
array([[ 0, 0, 0],
[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9],
[ 4, 8, 12],
[ 5, 10, 15],
[ 6, 12, 18],
[ 7, 14, 21],
[ 8, 16, 24],
[ 9, 18, 27]])
Check
np.array(df['a'].tolist())
array([[ 0, 0, 0],
[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9],
[ 4, 8, 12],
[ 5, 10, 15],
[ 6, 12, 18],
[ 7, 14, 21],
[ 8, 16, 24],
[ 9, 18, 27]], dtype=int64)

How do I sum data from certain columns and rows in a dataframe?

I have a bunch of matrices that I stored in a big dataframe. Let's say here is my dataframe.
data = pd.DataFrame([[13, 1, 3, 4, 0, 0], [0, 2, 6, 2, 0, 0], [3, 1, 5, 2, 2, 0], [0, 0, 10, 11, 6, 0], [5, 5, 21, 25, 41, 0],
[11, 1, 3, 2, 0, 1], [3, 1, 7, 3, 1, 1], [1, 1, 6, 5, 3, 1], [1, 1, 6, 7, 6, 1], [6, 6, 21, 24, 42, 1],
[17, 1, 7, 0, 0, 2], [1, 1, 6, 1, 1, 2], [2, 4, 6, 2, 1, 2], [0, 2, 11, 7, 8, 2], [5, 6, 17, 16, 46, 2],
[11, 1, 10, 2, 1, 3], [2, 2, 7, 1, 1, 3], [0, 0, 14, 4, 1, 3], [0, 0, 7, 7, 5, 3], [5, 1, 20, 18, 48, 3],
[16, 3, 7, 1, 2, 4], [1, 2, 4, 1, 0, 4], [2, 4, 7, 5, 3, 4], [3, 0, 4, 4, 7, 4], [7, 2, 13, 12, 58, 4]],
columns=['1', '2', '3', '4', '5', 'iteration'])
print(pd.DataFrame(data))
Each data['iteration'] is a matrix on its own. So, as you can see there are 5 matrices here (iteration-0 to 4). I want to add them all, like in basic matrix addition, to get one single matrix.
I have tried the following, but there's something wrong with it. It doesn't work.
matrix = data[['1','2','3','4','5']]
print(np.sum([matrix[matrix_list['iteration']==i] for i in range(0,9)], axis=0))
How do I do this the right way?
You can use:
In [98]: d = data.set_index('iteration')
In [99]: np.sum(d.loc[i].values for i in d.index.drop_duplicates().values)
Out[99]:
array([[ 68, 7, 30, 9, 3],
[ 7, 8, 30, 8, 3],
[ 8, 10, 38, 18, 10],
[ 4, 3, 38, 36, 32],
[ 28, 20, 92, 95, 235]])
Or alternatively, use groupby():
np.sum(e[1].iloc[:, :-1].values for e in data.groupby('iteration'))
array([[ 68, 7, 30, 9, 3],
[ 7, 8, 30, 8, 3],
[ 8, 10, 38, 18, 10],
[ 4, 3, 38, 36, 32],
[ 28, 20, 92, 95, 235]])

Numpy array: Changing the values of the last column when lines of a 2d-array are equal to lines of another 2d-array

I have huge 2D numpy array (called DATA). I want to change the last value (column) of all lines of DATA if those ones are similar to a same shaped external line (called ExtLine).
# -*- coding: utf-8 -*-
import numpy
DATA=numpy.array([
[1,2,3,4,5,6,0],
[2,5,6,84,1,6,0],
[9,9,9,9,9,9,0],
[1,2,3,4,5,6,0],
[2,5,6,84,1,6,0],
[0,2,5,4,8,9,0] ])
# Pool of lines that will be compared to DATA
PoolOfExtLines=numpy.array([[1,2,3,4,5,6,0],[2,5,6,84,1,6,0]])
for j in xrange(PoolOfExtLines.shape[0]): # loop on pool of lines
# convert ExtLine into a continous code (to be compare to future lines of DATA
b=numpy.ascontiguousarray(PoolOfExtLines[j]).view(numpy.dtype((numpy.void, PoolOfExtLines[j].dtype.itemsize * PoolOfExtLines[j].shape[0])))
for i in xrange(DATA.shape[0]): # loop on DATA lines
# convert the current line into a continous code (to be compare to b)
a=numpy.ascontiguousarray(DATA[i]).view(numpy.dtype((numpy.void, DATA[i].dtype.itemsize * DATA[i].shape[0])))
if a == b:
DATA[i,-1]=-1
it results into a DATA arrays modified as I want (tag -1 at the end of lines that where similar to those of PoolOfExtLines:
[[ 1, 2, 3, 4, 5, 6, -1],
[ 2, 5, 6, 84, 1, 6, -1],
[ 9, 9, 9, 9, 9, 9, 0],
[ 1, 2, 3, 4, 5, 6, -1],
[ 2, 5, 6, 84, 1, 6, -1],
[ 0, 2, 5, 4, 8, 9, 0]]
My question: I feel that this code can be enhance and is quite complicated in regard to what I want to do. I feel that using some (built-in) methods I missed or smart direct (how?) comparisons, I can make the code clearer and faster. Thanks for your incoming help.
You can use NumPy's broadcasting capability alongwith boolean indexing to solve it in a vectorized manner -
DATA[((DATA == PoolOfExtLines[:,None,:]).all(2)).any(0),-1] = -1
Sample run -
In [17]: DATA
Out[17]:
array([[ 1, 2, 3, 4, 5, 6, 0],
[ 2, 5, 6, 84, 1, 6, 0],
[ 9, 9, 9, 9, 9, 9, 0],
[ 1, 2, 3, 4, 5, 6, 0],
[ 2, 5, 6, 84, 1, 6, 0],
[ 0, 2, 5, 4, 8, 9, 0]])
In [18]: PoolOfExtLines
Out[18]:
array([[ 1, 2, 3, 4, 5, 6, 0],
[ 2, 5, 6, 84, 1, 6, 0]])
In [19]: DATA[((DATA == PoolOfExtLines[:,None,:]).all(2)).any(0),-1] = -1
In [20]: DATA
Out[20]:
array([[ 1, 2, 3, 4, 5, 6, -1],
[ 2, 5, 6, 84, 1, 6, -1],
[ 9, 9, 9, 9, 9, 9, 0],
[ 1, 2, 3, 4, 5, 6, -1],
[ 2, 5, 6, 84, 1, 6, -1],
[ 0, 2, 5, 4, 8, 9, 0]])

Categories