Related
how can I execute the following piece of code without using loop structure, instead using the numpy einsum function? I want the product matrix to be a 2D matrix and not 3D. Simply doing
"D = np.einsum('ijk,ijk->jk',A,B)
D += np.einsum('ijk,ijk->jk',B,C) " gives different result. Should I introduce any intermediate temporary array or something to use the einsum function?
import numpy as np
A = np.array( [[[1, 2, 3, 0],
[ 4, 2, 1, 1]],
[[2, 0, 0, 3],
[1, 0, 2, 4]]] )
B = np.array( [[[0, 2, 3, 1],
[0, 2, 5, 0]],
[[0, 1, 2, 2],
[3, 3, 2, 1]]] )
C = np.array( [[[0, 2, 2, 1],
[0, 2, 1, 0]],
[[0, 0, 2, 0],
[3, 1, 2, 1]]] )
X = np.zeros([2,4])
for i in range(2):
for j in range(2):
for k in range(4):
X[j,k] = A[i,j,k]*B[i,j,k]
X[j,k] += B[i,j,k]*C[i,j,k]
D = np.einsum('ijk,ijk->jk',A,B)
D += np.einsum('ijk,ijk->jk',B,C)
Following up on my comment, replace the i loop with just 1 step
In [64]: X = np.zeros([2,4])
...: i=1
...: for j in range(2):
...: for k in range(4):
...:
...: X[j,k] = A[i,j,k]*B[i,j,k]
...: X[j,k] += B[i,j,k]*C[i,j,k]
In [65]: X
Out[65]:
array([[ 0., 0., 4., 6.],
[12., 3., 8., 5.]])
This is the value as produced by your loop. You've thrown away the i=0 values.
But you don't need the loops to do:
In [68]: A[1,:,:]*B[1,:,:]+B[1,:,:]*C[1,:,:]
Out[68]:
array([[ 0, 0, 4, 6],
[12, 3, 8, 5]])
In [69]: A*B+B*C
Out[69]:
array([[[ 0, 8, 15, 1],
[ 0, 8, 10, 0]],
[[ 0, 0, 4, 6],
[12, 3, 8, 5]]])
The same thing with einsum:
In [71]: np.einsum('ijk,ijk->ijk',A,B)+np.einsum('ijk,ijk->ijk',B,C)
Out[71]:
array([[[ 0, 8, 15, 1],
[ 0, 8, 10, 0]],
[[ 0, 0, 4, 6],
[12, 3, 8, 5]]])
and if you want to sum across i:
In [72]: np.einsum('ijk,ijk->jk',A,B)+np.einsum('ijk,ijk->jk',B,C)
Out[72]:
array([[ 0, 8, 19, 7],
[12, 11, 18, 5]])
In [73]: (A*B+B*C).sum(axis=0)
Out[73]:
array([[ 0, 8, 19, 7],
[12, 11, 18, 5]])
I have an ndimenstional array of shape 1x17x3 whose output looks as follows
[[[x1 y1 z1]
[x2 y2 z2]
[x3 y3 z3]
[x17 y17 z17]]]
I want to extract the first two elements in a 1D vector as follows
vec1 = [x1,y1,x2,y2, ....... x17,y17]
vec20 [z1,z2,z3..............z17]
I tried to flatten the whole array and extract the elements but , I am not sure thats the right way to work .
import numpy as np
test = np.random.rand(1,17,3)
print(test)
rev_mat = test.flatten()
print(rev_mat)
Let x denote your array of shape 1x17x3.
vec1 = np.empty(x.shape[1] * 2)
vec1[0::2] = x[:,:,0]
vec1[1::2] = x[:,:,1]
vec20 = x[:,:,2].squeeze()
Hope that helps!
With a 2d array:
In [42]: arr = np.arange(12).reshape(4,3)
In [43]: arr
Out[43]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
we can get the first 2 columns with:
In [44]: arr[:,:2]
Out[44]:
array([[ 0, 1],
[ 3, 4],
[ 6, 7],
[ 9, 10]])
and make that 1d with ravel
In [45]: arr[:,:2].ravel()
Out[45]: array([ 0, 1, 3, 4, 6, 7, 9, 10])
third column:
In [46]: arr[:,2]
Out[46]: array([ 2, 5, 8, 11])
You have 3d, but the first dimension is 1, so you can just 'index' it a way:
In [47]: arr = np.arange(12).reshape(1,4,3)
In [48]: arr
Out[48]:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]]])
In [49]: arr[0,:,:2].ravel()
Out[49]: array([ 0, 1, 3, 4, 6, 7, 9, 10])
I just came across a showstopper for a part of my code and I am not sure what I am doing wrong...
I simply have a large data cube and want to change the maximum values along the z axis to some other number:
import numpy as np
from time import time
x, y, z = 100, 100, 10
a = np.arange(x*y*z).reshape((z, y, x))
t = time()
a[np.argmax(a, axis=0)] = 1
print(time() - t)
This takes about 0.02s which is a bit slow for such a small array, but ok. My problem is that I need to do this with arrays as large as (32, 4096, 4096) and I have not had the patience to let this finish with the above code...it's just too inefficient, but it should actually be very fast! Am I doing something wrong with setting the array elements?
You are basically indexing your numpy array with a numpy array containing numbers. I think that is the reason why it is so slow (and I'm not sure if it really does what you want it to do).
If you create a boolean numpy array and use this as slice it's orders of magnitudes faster.
For example:
pos_max = np.expand_dims(np.argmax(a, axis=0), axis=0)
pos_max_indices = np.arange(a.shape[0]).reshape(10,1,1) == pos_max
a[pos_max_indices] = 1
is 20 times faster than the original and does the same.
I don't think it is the indexing with numbers that's slowing it down. Usually indexing a single dimension with a boolean vector is slower than indexing with the corresponding np.where.
Something else is going on here. Look at these shapes:
In [14]: a.shape
Out[14]: (10, 100, 100)
In [15]: np.argmax(a,axis=0).shape
Out[15]: (100, 100)
In [16]: a[np.argmax(a,axis=0)].shape
Out[16]: (100, 100, 100, 100)
The indexed a is much larger than the original, 1000x.
#MSeifert's solution is faster, but I can't help feeling it is more complex than needed.
In [35]: %%timeit
....: a=np.arange(x*y*z).reshape((z,y,x))
....: pos_max = np.expand_dims(np.argmax(a, axis=0), axis=0)
....: pos_max_indices = np.arange(a.shape[0]).reshape(10,1,1) == pos_max
....: a[pos_max_indices]=1
....:
1000 loops, best of 3: 1.28 ms per loop
I'm still working on an improvement.
The sample array isn't a good one - it's too big to display, and all the max values on the last z plane:
In [46]: x,y,z=4,2,3
In [47]: a=np.arange(x*y*z).reshape((z,y,x))
In [48]: a
Out[48]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [49]: a[np.argmax(a,axis=0)]=1
In [50]: a
Out[50]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[ 1, 1, 1, 1],
[ 1, 1, 1, 1]]])
I could access those same argmax values with:
In [51]: a[-1,...]
Out[51]:
array([[1, 1, 1, 1],
[1, 1, 1, 1]])
Let's try a random array, where the argmax can be in any plane:
In [57]: a=np.random.randint(2,10,(z,y,x))
In [58]: a
Out[58]:
array([[[9, 7, 6, 5],
[6, 3, 5, 2]],
[[5, 6, 2, 3],
[7, 9, 6, 9]],
[[7, 7, 8, 9],
[2, 4, 9, 7]]])
In [59]: a[np.argmax(a,axis=0)]=0
In [60]: a
Out[60]:
array([[[0, 0, 0, 0],
[0, 0, 0, 0]],
[[0, 0, 0, 0],
[0, 0, 0, 0]],
[[0, 0, 0, 0],
[0, 0, 0, 0]]])
Oops - I turned everything to 0. Is that what you want?
Let's try the pos_max method:
In [61]: a=np.random.randint(0,10,(z,y,x))
In [62]: a
Out[62]:
array([[[9, 3, 9, 0],
[6, 6, 2, 4]],
[[9, 9, 4, 9],
[5, 9, 7, 9]],
[[1, 8, 1, 7],
[1, 0, 2, 3]]])
In [63]: pos_max = np.expand_dims(np.argmax(a, axis=0), axis=0)
In [64]: pos_max
Out[64]:
array([[[0, 1, 0, 1],
[0, 1, 1, 1]]], dtype=int32)
In [66]: pos_max_indices = np.arange(a.shape[0]).reshape(z,1,1) == pos_max
In [67]: pos_max_indices
Out[67]:
array([[[ True, False, True, False],
[ True, False, False, False]],
[[False, True, False, True],
[False, True, True, True]],
[[False, False, False, False],
[False, False, False, False]]], dtype=bool)
In [68]: a[pos_max_indices]=0
In [69]: a
Out[69]:
array([[[0, 3, 0, 0],
[0, 6, 2, 4]],
[[9, 0, 4, 0],
[5, 0, 0, 0]],
[[1, 8, 1, 7],
[1, 0, 2, 3]]])
That looks more reasonable. There still is a 9 in the 2nd plane, but that's because there was also a 9 in the 1st.
This still needs to be cleaned up, but here's a non-boolean mask solution:
In [98]: a=np.random.randint(0,10,(z,y,x))
In [99]: a1=a.reshape(z,-1) # it's easier to work with a 2d view
In [100]: ind=np.argmax(a1,axis=0)
In [101]: ind
Out[101]: array([2, 2, 1, 0, 2, 0, 1, 2], dtype=int32)
In [102]: a1[ind,np.arange(a1.shape[1])] # the largest values
Out[102]: array([9, 8, 7, 4, 9, 7, 9, 6])
In [104]: a1
Out[104]:
array([[3, 1, 5, 4, 2, 7, 4, 5],
[4, 4, 7, 1, 3, 7, 9, 4],
[9, 8, 3, 3, 9, 1, 2, 6]])
In [105]: a1[ind,np.arange(a1.shape[1])]=0
In [106]: a
Out[106]:
array([[[3, 1, 5, 0],
[2, 0, 4, 5]],
[[4, 4, 0, 1],
[3, 7, 0, 4]],
[[0, 0, 3, 3],
[0, 1, 2, 0]]])
Working with a1 the 2d view is easier; the exact shape of the x,y dimensions is not important to this problem. We are changing individual values, not columns or planes. Still I'd like to do get it working without `a1.
Here are two functions that replace the maximum value (in the 1st plane). I use copy since it makes repeated time testing easier.
def setmax0(a, value=-1):
# #MSeifert's
a = a.copy()
z = a.shape[0]
# a=np.arange(x*y*z).reshape((z,y,x))
pos_max = np.expand_dims(np.argmax(a, axis=0), axis=0)
pos_max_indices = np.arange(z).reshape(z,1,1) == pos_max
a[pos_max_indices]=value
return a
def setmax1(a, value=-2):
a = a.copy()
z = a.shape[0]
a1 = a.reshape(z, -1)
ind = np.argmax(a1, axis=0)
a1[ind, np.arange(a1.shape[1])] = value
return a
They produce the same result in a test like:
ab = np.random.randint(0,100,(20,1000,1000))
test = np.allclose(setmax1(ab,-1),setmax0(ab,-1))
Timings (using ipython timeit) are basically the same.
They do assign values in different orders, so setmax0(ab,-np.arange(...)) will be different.
I'm trying to do the following:
I have a (4,2)-shaped array:
a = np.array([[-1, 0],[1, 0],[0, -1], [0, 1]])
I have another (2, 2)-shaped array:
b = np.array([[10, 10], [5, 5]])
I'd like to add them along rows of b and concatenate, so that I end up with:
[[ 9, 10],
[11, 10],
[10, 9],
[10, 11],
[4, 5],
[6, 5],
[5, 4],
[5, 6]]
The first 4 elements are b[0]+a, and the last four are b[1]+a. How can i generalize that if b is (N, 2)-shaped, not using a for loop over its elements?
You can use broadcasting to get all the summations in a vectorized manner to have a 3D array, which could then be stacked into a 2D array with np.vstack for the desired output. Thus, the implementation would be something like this -
np.vstack((a + b[:,None,:]))
Sample run -
In [74]: a
Out[74]:
array([[-1, 0],
[ 1, 0],
[ 0, -1],
[ 0, 1]])
In [75]: b
Out[75]:
array([[10, 10],
[ 5, 5]])
In [76]: np.vstack((a + b[:,None,:]))
Out[76]:
array([[ 9, 10],
[11, 10],
[10, 9],
[10, 11],
[ 4, 5],
[ 6, 5],
[ 5, 4],
[ 5, 6]])
You can replace np.dstack with some reshaping and this might be a bit more efficient, like so -
(a + b[:,None,:]).reshape(-1,a.shape[1])
I am a bit confused about the behaviour of fancy indexing, see:
>>> t = np.arange(2*2*3).reshape((2, 2, 3))
>>> t
array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 6, 7, 8],
[ 9, 10, 11]]])
>>> t[1, :, [1, 2]]
array([[ 7, 10],
[ 8, 11]])
I thought that after indexing with t[1, :, [1, 2]] I would have had the array:
array([[ 7, 8],
[10, 11]])
but instead I get the tranpose, as can be seen above.
Also, consider the following:
>>> t[:, :, [1, 2]][1]
array([[ 7, 8],
[10, 11]])
This doesn't follow the pattern of behaviour we just noted as being unintuitive...this behaves "as expected". Why?
Why do I get this behaviour, and how can I get the behaviour that I expected?
Per the docs,
In simple cases (i.e. one indexing array
and N - 1 slice objects) it does exactly what you would expect (concatenation of
repeated application of basic slicing).
In this case, the indexing array is [1, 2], and the slice objects are 1 and : (or more precisely, slice(1,2), and slice(None)):
So the result is the concatenation of the slices
In [43]: t[1,:,1]
Out[43]: array([ 7, 10])
In [44]: t[1,:,2]
Out[44]: array([ 8, 11])
Also note that the shape of t[1, :, [1,2]] will be (2,2) since the scalar 1
removes the 0 axis and the : spans all of axis 1 (which has length 2), and [1,2]
has length 2. So as you run over the last (i.e. second) axis of the result, you get the arrays
array([ 7, 10]) and array([ 8, 11]).
In [45]: t[1, :, [1,2]]
Out[45]:
array([[ 7, 10],
[ 8, 11]])
The easiest way to get the result you want is to use basic slicing,
In [45]: t[1, :, 1:3]
Out[45]:
array([[ 7, 8],
[10, 11]])
Another way, which uses "fancy" integer indexing is:
In [121]: t[1, [(0,0),(1,1)], [1,2]]
Out[121]:
array([[ 7, 8],
[10, 11]])
or (using broadcasting)
In [154]: t[1, [[0],[1]], [1,2]]
Out[154]:
array([[ 7, 8],
[10, 11]])
This might actually be closer to what you want, since it is generalizable to the case where your indexing array is some arbitrary list like [1, 5, 9, 10].
In [157]: t = np.arange(2*2*11).reshape(2,2,11)
In [158]: t[1, [[0],[1]], [1,5,9,10]]
Out[158]:
array([[23, 27, 31, 32],
[34, 38, 42, 43]])
The same rule applies to
In [101]: t[:, :, [1, 2]][1]
Out[101]:
array([[ 7, 8],
[10, 11]])
First note that the shape of t[:, :, [1, 2]] will be (2,2,2). The result will be the concatenation of the basic slices
In [102]: t[:, :, 1]
Out[102]:
array([[ 1, 4],
[ 7, 10]])
In [103]: t[:, :, 2]
Out[103]:
array([[ 2, 5],
[ 8, 11]])
So as you run over the last (i.e. third) axis of the result, you get the arrays
array([[ 1, 4], [ 7, 10]]) and array([[ 2, 5], [ 8, 11]]).
In [107]: np.allclose(t[:, :, [1,2]], np.dstack([np.array([[ 1, 4], [ 7, 10]]), np.array([[ 2, 5], [ 8, 11]])]))
Out[107]: True
It seems like the behavior changes when there's a non-slicing in any previous axes:
>>> import numpy as np
>>> t = np.arange(2*2*3).reshape((2, 2, 3))
>>> t[1:2, :, [1, 2]] # 1 -> 1:2 (slicing)
array([[[ 7, 8],
[10, 11]]])
>>> t[1][:, [1, 2]]
array([[ 7, 8],
[10, 11]])