I have a big matrix A with shape (10, 10)
array([[2, 1, 2, 1, 1, 4, 3, 2, 2, 2],
[3, 2, 1, 2, 3, 3, 2, 3, 2, 4],
[1, 3, 3, 4, 2, 4, 4, 3, 4, 1],
[1, 3, 1, 3, 3, 1, 4, 2, 1, 2],
[3, 3, 1, 3, 3, 2, 3, 4, 3, 2],
[2, 4, 1, 4, 2, 1, 1, 2, 1, 1],
[2, 3, 2, 3, 1, 4, 3, 1, 2, 3],
[3, 1, 3, 2, 2, 4, 2, 3, 3, 3],
[1, 2, 3, 2, 1, 3, 4, 4, 1, 3],
[3, 1, 3, 2, 4, 3, 1, 1, 1, 1]])
and an array of positions B with shape (5, 2)
array([[4, 5], # row 4, column 5
[2, 1],
[2, 5],
[4, 1],
[6, 7]])
and several small matrices C with shape (5, 2, 2)
array([[[7, 9],
[6, 7]],
[[6, 6],
[9, 6]],
[[9, 6],
[8, 9]],
[[8, 7],
[8, 7]],
[[8, 6],
[7, 7]]])
Now, I want to assign these 5 small matrices to the large matrix. The positions are the position for the up-left corner of the small matrix. If there exists overlapping area, we can use the last one, maximum or just sum it up. The effect I want looks like
A[B] += C
A for loop implementation looks like:
for i in range(B.shape[0]):
A[B[i][0]:B[i][0]+2,B[i][1]:B[i][1]+2] += C[i]
The expected result looks like
array([[ 2, 1, 2, 1, 1, 4, 3, 2, 2, 2],
[ 3, 2, 1, 2, 3, 3, 2, 3, 2, 4],
[ 1, 9, 9, 4, 2, 13, 10, 3, 4, 1],
[ 1, 12, 7, 3, 3, 9, 13, 2, 1, 2],
[ 3, 11, 8, 3, 3, 9, 12, 4, 3, 2],
[ 2, 12, 8, 4, 2, 7, 8, 2, 1, 1],
[ 2, 3, 2, 3, 1, 4, 3, 9, 8, 3],
[ 3, 1, 3, 2, 2, 4, 2, 10, 10, 3],
[ 1, 2, 3, 2, 1, 3, 4, 4, 1, 3],
[ 3, 1, 3, 2, 4, 3, 1, 1, 1, 1]])
Is there a solution without for loop?
Your arrays:
In [58]: A = np.array([[2, 1, 2, 1, 1, 4, 3, 2, 2, 2],
...: [3, 2, 1, 2, 3, 3, 2, 3, 2, 4],
...: [1, 3, 3, 4, 2, 4, 4, 3, 4, 1],
...: [1, 3, 1, 3, 3, 1, 4, 2, 1, 2],
...: [3, 3, 1, 3, 3, 2, 3, 4, 3, 2],
...: [2, 4, 1, 4, 2, 1, 1, 2, 1, 1],
...: [2, 3, 2, 3, 1, 4, 3, 1, 2, 3],
...: [3, 1, 3, 2, 2, 4, 2, 3, 3, 3],
...: [1, 2, 3, 2, 1, 3, 4, 4, 1, 3],
...: [3, 1, 3, 2, 4, 3, 1, 1, 1, 1]])
In [59]: B=np.array([[4, 5], # row 4, column 5
...: [2, 1],
...: [2, 5],
...: [4, 1],
...: [6, 7]])
In [60]: C=np.array([[[7, 9],
...: [6, 7]],
...:
...: [[6, 6],
...: [9, 6]],
...:
...: [[9, 6],
...: [8, 9]],
...:
...: [[8, 7],
...: [8, 7]],
...:
...: [[8, 6],
...: [7, 7]]])
Your iteration, cleaned up a bit:
In [72]: for cnt,(i,j) in enumerate(B):
...: A[i:i+2, j:j+2] += C[cnt]
...:
In [73]: A
Out[73]:
array([[ 2, 1, 2, 1, 1, 4, 3, 2, 2, 2],
[ 3, 2, 1, 2, 3, 3, 2, 3, 2, 4],
[ 1, 9, 9, 4, 2, 13, 10, 3, 4, 1],
[ 1, 12, 7, 3, 3, 9, 13, 2, 1, 2],
[ 3, 11, 8, 3, 3, 9, 12, 4, 3, 2],
[ 2, 12, 8, 4, 2, 7, 8, 2, 1, 1],
[ 2, 3, 2, 3, 1, 4, 3, 9, 8, 3],
[ 3, 1, 3, 2, 2, 4, 2, 10, 10, 3],
[ 1, 2, 3, 2, 1, 3, 4, 4, 1, 3],
[ 3, 1, 3, 2, 4, 3, 1, 1, 1, 1]])
And to make the action clearer, lets start with a 0 array:
In [76]: A = np.zeros_like(Acopy)
In [77]: for cnt,(i,j) in enumerate(B):
...: A[i:i+2, j:j+2] += C[cnt]
...:
In [78]: A
Out[78]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 6, 6, 0, 0, 9, 6, 0, 0, 0],
[0, 9, 6, 0, 0, 8, 9, 0, 0, 0],
[0, 8, 7, 0, 0, 7, 9, 0, 0, 0],
[0, 8, 7, 0, 0, 6, 7, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 8, 6, 0],
[0, 0, 0, 0, 0, 0, 0, 7, 7, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
I don't see overlap, so I think we could construct an index array from B, that would allow us to:
A[B1] += C
and if there was a overlap, it would write the last C value.
If we don't like that, there is the np.add.at ufunc that can perform unbuffered addition (or even np.max.at).
But it will take some time to work out the required B1 indices.
edit
Here's a way of using +=. I'm using linspace to construct a multidimensional index, which will be used inplace of the slices. Getting shapes right took a lot of trial and error and testing (in an interactive session). As long as the blocks don't overlap this is fast and correct. But as documented with np.add.at, this won't match the iterative approach when there are duplicate indices.
In [125]: B1 = B+2
In [126]: I = np.linspace(B,B1,2,endpoint=False).astype(int)
In [127]: A1 =np.zeros_like(Acopy)
In [128]: A1[I[:,:,0][:,None], I[:,:,1]] += C.transpose(1,2,0)
In [129]: np.allclose(A1,A)
Out[129]: True
I is a (2,5,2) shape array, where the first 2 in the number of "steps"
In [130]: I
Out[130]:
array([[[4, 5],
[2, 1],
[2, 5],
[4, 1],
[6, 7]],
[[5, 6],
[3, 2],
[3, 6],
[5, 2],
[7, 8]]])
And since the C subarrays are (2,2), this is the same as: np.stack([B,B+1])
The C transpose is needed since this indexing of A1 produces a (2,2,5) array:
In [134]: A1[I[:,:,0][:,None], I[:,:,1]]
Out[134]:
array([[[7, 6, 9, 8, 8],
[9, 6, 6, 7, 6]],
[[6, 9, 8, 8, 7],
[7, 6, 9, 7, 7]]])
In [135]: _.shape
Out[135]: (2, 2, 5)
If some blocks overlap, np.add.at can be used to sum the overlaps:
In [137]: A1 =np.zeros_like(Acopy)
In [138]: np.add.at(A1, (I[:,:,0][:,None], I[:,:,1]), C.transpose(1,2,0))
In [140]: np.allclose(A1,A)
Out[140]: True
or for the largest
In [143]: np.maximum.at(A1, (I[:,:,0][:,None], I[:,:,1]), C.transpose(1,2,0))
In [144]: np.allclose(A1,A)
A simple forloop can solve this:
import numpy as np
initial = np.array([
[2, 1, 2, 1, 1, 4, 3, 2, 2, 2], [3, 2, 1, 2, 3, 3, 2, 3, 2, 4], [1, 3, 3, 4, 2, 4, 4, 3, 4, 1], [1, 3, 1, 3, 3, 1, 4, 2, 1, 2],
[3, 3, 1, 3, 3, 2, 3, 4, 3, 2], [2, 4, 1, 4, 2, 1, 1, 2, 1, 1], [2, 3, 2, 3, 1, 4, 3, 1, 2, 3], [3, 1, 3, 2, 2, 4, 2, 3, 3, 3],
[1, 2, 3, 2, 1, 3, 4, 4, 1, 3], [3, 1, 3, 2, 4, 3, 1, 1, 1, 1],
])
offsets = np.array([[4, 5], [2, 1], [2, 5], [4, 1], [6, 7]])
subarrays = np.array([
[[7, 9], [6, 7]], [[6, 6], [9, 6]], [[9, 6], [8, 9]],
[[8, 7], [8, 7]], [[8, 6], [7, 7]],
])
for subarray, offset in zip(subarrays, offsets):
(a, b), (c, d) = offset, subarray.shape
initial[a:a+c, b:b+d] += subarray
print(initial)
See, what I have tried, without using any kind of loop
import numpy as np
A=np.array([[2, 1, 2, 1, 1, 4, 3, 2, 2, 2],
[3, 2, 1, 2, 3, 3, 2, 3, 2, 4],
[1, 3, 3, 4, 2, 4, 4, 3, 4, 1],
[1, 3, 1, 3, 3, 1, 4, 2, 1, 2],
[3, 3, 1, 3, 3, 2, 3, 4, 3, 2],
[2, 4, 1, 4, 2, 1, 1, 2, 1, 1],
[2, 3, 2, 3, 1, 4, 3, 1, 2, 3],
[3, 1, 3, 2, 2, 4, 2, 3, 3, 3],
[1, 2, 3, 2, 1, 3, 4, 4, 1, 3],
[3, 1, 3, 2, 4, 3, 1, 1, 1, 1]])
B= np.array([[4, 5], # row 4, column 5
[2, 1],
[2, 5],
[4, 1],
[6, 7]])
C=np.array([[[7, 9],
[6, 7]],
[[6, 6],
[9, 6]],
[[9, 6],
[8, 9]],
[[8, 7],
[8, 7]],
[[8, 6],
[7, 7]]])
D= np.array([[ 2, 1, 2, 1, 1, 4, 3, 2, 2, 2], # this is required
[ 3, 2, 1, 2, 3, 3, 2, 3, 2, 4],
[ 1, 9, 9, 4, 2, 13, 10, 3, 4, 1],
[ 1, 12, 7, 3, 3, 9, 13, 2, 1, 2],
[ 3, 11, 8, 3, 3, 9, 12, 4, 3, 2],
[ 2, 12, 8, 4, 2, 7, 8, 2, 1, 1],
[ 2, 3, 2, 3, 1, 4, 3, 9, 8, 3],
[ 3, 1, 3, 2, 2, 4, 2, 10, 10, 3],
[ 1, 2, 3, 2, 1, 3, 4, 4, 1, 3],
[ 3, 1, 3, 2, 4, 3, 1, 1, 1, 1]])
We need A==D.
I created row and column indexes for all value of C.
b_row=np.repeat(np.c_[B[:,0],B[:,0]+1], repeats=2, axis=1).ravel()
b_col=np.repeat(np.c_[B[:,1],B[:,1]+1], repeats=2, axis=0).ravel()
print(np.c_[bx,by]) # to see indexes
A[b_row,b_col]+=C.ravel()
Now you can check
print(A==D)
False in (A==D)
I have a bunch of matrices that I stored in a big dataframe. Let's say here is my dataframe.
data = pd.DataFrame([[13, 1, 3, 4, 0, 0], [0, 2, 6, 2, 0, 0], [3, 1, 5, 2, 2, 0], [0, 0, 10, 11, 6, 0], [5, 5, 21, 25, 41, 0],
[11, 1, 3, 2, 0, 1], [3, 1, 7, 3, 1, 1], [1, 1, 6, 5, 3, 1], [1, 1, 6, 7, 6, 1], [6, 6, 21, 24, 42, 1],
[17, 1, 7, 0, 0, 2], [1, 1, 6, 1, 1, 2], [2, 4, 6, 2, 1, 2], [0, 2, 11, 7, 8, 2], [5, 6, 17, 16, 46, 2],
[11, 1, 10, 2, 1, 3], [2, 2, 7, 1, 1, 3], [0, 0, 14, 4, 1, 3], [0, 0, 7, 7, 5, 3], [5, 1, 20, 18, 48, 3],
[16, 3, 7, 1, 2, 4], [1, 2, 4, 1, 0, 4], [2, 4, 7, 5, 3, 4], [3, 0, 4, 4, 7, 4], [7, 2, 13, 12, 58, 4]],
columns=['1', '2', '3', '4', '5', 'iteration'])
print(pd.DataFrame(data))
Each data['iteration'] is a matrix on its own. So, as you can see there are 5 matrices here (iteration-0 to 4). I want to add them all, like in basic matrix addition, to get one single matrix.
I have tried the following, but there's something wrong with it. It doesn't work.
matrix = data[['1','2','3','4','5']]
print(np.sum([matrix[matrix_list['iteration']==i] for i in range(0,9)], axis=0))
How do I do this the right way?
You can use:
In [98]: d = data.set_index('iteration')
In [99]: np.sum(d.loc[i].values for i in d.index.drop_duplicates().values)
Out[99]:
array([[ 68, 7, 30, 9, 3],
[ 7, 8, 30, 8, 3],
[ 8, 10, 38, 18, 10],
[ 4, 3, 38, 36, 32],
[ 28, 20, 92, 95, 235]])
Or alternatively, use groupby():
np.sum(e[1].iloc[:, :-1].values for e in data.groupby('iteration'))
array([[ 68, 7, 30, 9, 3],
[ 7, 8, 30, 8, 3],
[ 8, 10, 38, 18, 10],
[ 4, 3, 38, 36, 32],
[ 28, 20, 92, 95, 235]])
I have huge 2D numpy array (called DATA). I want to change the last value (column) of all lines of DATA if those ones are similar to a same shaped external line (called ExtLine).
# -*- coding: utf-8 -*-
import numpy
DATA=numpy.array([
[1,2,3,4,5,6,0],
[2,5,6,84,1,6,0],
[9,9,9,9,9,9,0],
[1,2,3,4,5,6,0],
[2,5,6,84,1,6,0],
[0,2,5,4,8,9,0] ])
# Pool of lines that will be compared to DATA
PoolOfExtLines=numpy.array([[1,2,3,4,5,6,0],[2,5,6,84,1,6,0]])
for j in xrange(PoolOfExtLines.shape[0]): # loop on pool of lines
# convert ExtLine into a continous code (to be compare to future lines of DATA
b=numpy.ascontiguousarray(PoolOfExtLines[j]).view(numpy.dtype((numpy.void, PoolOfExtLines[j].dtype.itemsize * PoolOfExtLines[j].shape[0])))
for i in xrange(DATA.shape[0]): # loop on DATA lines
# convert the current line into a continous code (to be compare to b)
a=numpy.ascontiguousarray(DATA[i]).view(numpy.dtype((numpy.void, DATA[i].dtype.itemsize * DATA[i].shape[0])))
if a == b:
DATA[i,-1]=-1
it results into a DATA arrays modified as I want (tag -1 at the end of lines that where similar to those of PoolOfExtLines:
[[ 1, 2, 3, 4, 5, 6, -1],
[ 2, 5, 6, 84, 1, 6, -1],
[ 9, 9, 9, 9, 9, 9, 0],
[ 1, 2, 3, 4, 5, 6, -1],
[ 2, 5, 6, 84, 1, 6, -1],
[ 0, 2, 5, 4, 8, 9, 0]]
My question: I feel that this code can be enhance and is quite complicated in regard to what I want to do. I feel that using some (built-in) methods I missed or smart direct (how?) comparisons, I can make the code clearer and faster. Thanks for your incoming help.
You can use NumPy's broadcasting capability alongwith boolean indexing to solve it in a vectorized manner -
DATA[((DATA == PoolOfExtLines[:,None,:]).all(2)).any(0),-1] = -1
Sample run -
In [17]: DATA
Out[17]:
array([[ 1, 2, 3, 4, 5, 6, 0],
[ 2, 5, 6, 84, 1, 6, 0],
[ 9, 9, 9, 9, 9, 9, 0],
[ 1, 2, 3, 4, 5, 6, 0],
[ 2, 5, 6, 84, 1, 6, 0],
[ 0, 2, 5, 4, 8, 9, 0]])
In [18]: PoolOfExtLines
Out[18]:
array([[ 1, 2, 3, 4, 5, 6, 0],
[ 2, 5, 6, 84, 1, 6, 0]])
In [19]: DATA[((DATA == PoolOfExtLines[:,None,:]).all(2)).any(0),-1] = -1
In [20]: DATA
Out[20]:
array([[ 1, 2, 3, 4, 5, 6, -1],
[ 2, 5, 6, 84, 1, 6, -1],
[ 9, 9, 9, 9, 9, 9, 0],
[ 1, 2, 3, 4, 5, 6, -1],
[ 2, 5, 6, 84, 1, 6, -1],
[ 0, 2, 5, 4, 8, 9, 0]])