numpy array - efficiently subtract each row of B from A - python

I have two numpy arrays a and b. I want to subtract each row of b from a. I tried to use:
a1 - b1[:, None]
This works for small arrays, but takes too long when it comes to real world data sizes.
a = np.arange(16).reshape(8,2)
a
Out[35]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15]])
b = np.arange(6).reshape(3,2)
b
Out[37]:
array([[0, 1],
[2, 3],
[4, 5]])
a - b[:, None]
Out[38]:
array([[[ 0, 0],
[ 2, 2],
[ 4, 4],
[ 6, 6],
[ 8, 8],
[10, 10],
[12, 12],
[14, 14]],
[[-2, -2],
[ 0, 0],
[ 2, 2],
[ 4, 4],
[ 6, 6],
[ 8, 8],
[10, 10],
[12, 12]],
[[-4, -4],
[-2, -2],
[ 0, 0],
[ 2, 2],
[ 4, 4],
[ 6, 6],
[ 8, 8],
[10, 10]]])
%%timeit
a - b[:, None]
The slowest run took 10.36 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.18 µs per loop
This approach is too slow / inefficient for larger arrays.
a1 = np.arange(18900 * 41).reshape(18900, 41)
b1 = np.arange(2674 * 41).reshape(2674, 41)
%%timeit
a1 - b1[:, None]
1 loop, best of 3: 12.1 s per loop
%%timeit
for index in range(len(b1)):
a1 - b1[index]
1 loop, best of 3: 2.35 s per loop
Is there any numpy trick I can use to speed this up?

You are playing with memory limits.
If like in your examples, 8 bits are sufficient to store data, use uint8:
import numpy as np
a1 = np.arange(18900 * 41,dtype=np.uint8).reshape(18900, 41)
b1 = np.arange(2674 * 41,dtype=np.uint8).reshape(2674, 41)
%time c1=(a1-b1[:,None])
#1.02 s

Related

Matlab reshape equivalent in Python

I'm currently porting a MATLAB library over to Python. As of right now, I am trying to keep the code as one-to-one as possible. I'm noticing some differences between reshape in Matlab vs Python that is causing some issues.
I've heard people talk about the difference in 'C' and 'Fortran' order. How numpy defaults to 'C' order and Matlab uses 'Fortran'. Below are two Python examples using both orders.
>>> a = np.arange(12).reshape((2,3,2))
>>> a
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]]])
>>> b = np.arange(12).reshape((2,3,2), order='F')
>>> b
array([[[ 0, 6],
[ 2, 8],
[ 4, 10]],
[[ 1, 7],
[ 3, 9],
[ 5, 11]]])
Below is the matlab/octave equivalent to the above python code.
octave:12> a = reshape((0:11), [3,2,2])
a =
ans(:,:,1) =
0 3
1 4
2 5
ans(:,:,2) =
6 9
7 10
8 11
Notice that each example yields a different result.
These examples are meant to illustrate the discrepancy that I'm referring to. The datasets that I'm working on in my project are significantly larger. I need to be able to reshape arrays in Python and be confident that it is performing the same reshape operations as it would in Matlab. Any help would be appreciated.
Why are you using a (2,3,2) shape in one, and (3,2,2) in the other?
In [82]: arr = np.arange(12).reshape((3,2,2), order='F')
In [83]: arr
Out[83]:
array([[[ 0, 6],
[ 3, 9]],
[[ 1, 7],
[ 4, 10]],
[[ 2, 8],
[ 5, 11]]])
In [84]: arr[:,:,0]
Out[84]:
array([[0, 3],
[1, 4],
[2, 5]])
In [85]: arr[:,:,1]
Out[85]:
array([[ 6, 9],
[ 7, 10],
[ 8, 11]])
===
Looking a strides may help identify the differences between c and f orders
In [86]: arr.shape
Out[86]: (3, 2, 2)
In [87]: arr.strides
Out[87]: (8, 24, 48)
Notice how the smallest steps, 1 element (8 bytes) is taken in first dimension.
Contrast that with a C order:
In [89]: np.arange(12).reshape(2,2,3)
Out[89]:
array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 6, 7, 8],
[ 9, 10, 11]]])
In [90]: np.arange(12).reshape(2,2,3).strides
Out[90]: (48, 24, 8)
===
OK lets try the (2,3,2) shape:
>> a = reshape((0:11),[2,3,2])
a =
ans(:,:,1) =
0 2 4
1 3 5
ans(:,:,2) =
6 8 10
7 9 11
Samething with order 'F':
In [94]: arr = np.arange(12).reshape((2,3,2), order='F')
In [95]: arr
Out[95]:
array([[[ 0, 6],
[ 2, 8],
[ 4, 10]],
[[ 1, 7],
[ 3, 9],
[ 5, 11]]])
In [96]: arr[:,:,0]
Out[96]:
array([[0, 2, 4],
[1, 3, 5]])
>> squeeze(a(1,:,:))
ans =
0 6
2 8
4 10
In [98]: arr[0,:,:]
Out[98]:
array([[ 0, 6],
[ 2, 8],
[ 4, 10]])

Sort paired array of 3d array (replace for loop)

I have the following 3d array:
import numpy as np
z = np.array([[[10, 2],
[ 5, 3],
[ 4, 4]],
[[ 7, 6],
[ 4, 2],
[ 5, 8]]])
I want to sort them according to 3rd dim & 1st value.
Currently I am using following code:
from operator import itemgetter
np.array([sorted(x,key=itemgetter(0)) for x in z])
array([[[ 4, 4],
[ 5, 3],
[10, 2]],
[[ 4, 2],
[ 5, 8],
[ 7, 6]]])
I would like to make the code more efficient/faster by removing the for loop?
For a numpy one liner you can use numpy.argsort:
import numpy as np
a = np.array([[[10, 2],
[ 5, 3],
[ 4, 4]],
[[ 7, 6],
[ 4, 2],
[ 5, 8]]])
a[np.arange(0,2)[:,None], a[:,:,0].argsort()]
array([[[ 4, 4],
[ 5, 3],
[10, 2]],
[[ 4, 2],
[ 5, 8],
[ 7, 6]]])
Which for such small size array takes about the same time, yet scaling up the size will result in quite an improvement, for instance:
from operator import itemgetter
a = np.random.randint(0,10, (2,100_000,2))
%timeit a[np.arange(0,2)[:,None], a[:,:,0].argsort()]
26.9 ms ± 351 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit [sorted(x,key=itemgetter(0)) for x in a]
327 ms ± 6.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
You can use map() to achieve the same result without a for-loop. And with the sort function being either user-defined, or a lambda, or a partial of sorted:
By first creating a sort function:
>>> def mysort(it):
... return sorted(it, key=itemgetter(0))
...
>>> list(map(mysort, z))
[[[4, 4], [5, 3], [10, 2]], [[4, 2], [5, 8], [7, 6]]]
Same as above, but with a lambda instead:
>>> list(map(lambda it: sorted(it, key=itemgetter(0)), z))
[[[4, 4], [5, 3], [10, 2]], [[4, 2], [5, 8], [7, 6]]]
With a partial:
>>> from functools import partial
>>> psort = partial(sorted, key=itemgetter(0))
>>> list(map(psort, z))
[[[4, 4], [5, 3], [10, 2]], [[4, 2], [5, 8], [7, 6]]]
Or the partial defined in-place:
>>> list(map(partial(sorted, key=itemgetter(0)), z))
[[[4, 4], [5, 3], [10, 2]], [[4, 2], [5, 8], [7, 6]]]
Your question has a list of lists of lists, rather than a 3d numpy array. For numpy-oriented solutions, see this answer.
FYI, (2) and (3b) are roughly equivalent, but have their differences.
Among options 1-3, my preference is the lambda in (2).
Why not simply : np.sort(z,axis=1) ?
import numpy as np
z = np.array([[[10, 2],
[ 5, 3],
[ 4, 4]],
[[ 7, 6],
[ 4, 2],
[ 5, 8]]])
print(np.sort(z,axis=1))
[[[ 4 2]
[ 5 3]
[10 4]]
[[ 4 2]
[ 5 6]
[ 7 8]]]

Delete and duplicate rows in numpy array

In Python, let's say I have a 1366x768 numpy array. And I want to delete each second row from it (0th row remains, 1st removed, 2nd remains, 3rd removed.. and so on), and replace the empty space with a duplicate from the row which was before it (the undeleted row) at the same time.
Is it possible in numpy?
One approach -
a[::2].repeat(2,axis=0)
To make the changes in the array, assign it back.
Sample run -
In [105]: a
Out[105]:
array([[2, 5, 1, 1],
[2, 0, 2, 5],
[1, 1, 5, 7],
[0, 7, 1, 8],
[8, 5, 2, 3],
[2, 1, 0, 6],
[5, 6, 1, 6],
[7, 1, 4, 7],
[3, 8, 1, 4],
[5, 8, 8, 8]])
In [106]: a[::2].repeat(2,axis=0)
Out[106]:
array([[2, 5, 1, 1],
[2, 5, 1, 1],
[1, 1, 5, 7],
[1, 1, 5, 7],
[8, 5, 2, 3],
[8, 5, 2, 3],
[5, 6, 1, 6],
[5, 6, 1, 6],
[3, 8, 1, 4],
[3, 8, 1, 4]])
If we care about performance, here's another approach using NumPy strides -
def strided_app(a):
m0,n0 = a.strides
m,n = a.shape
strided = np.lib.stride_tricks.as_strided
return strided(a,shape=(m//2,2,n),strides=(2*m0,0,n0)).reshape(-1,n)
Sample run -
In [154]: a
Out[154]:
array([[4, 8, 7, 7],
[5, 5, 1, 7],
[1, 8, 1, 3],
[6, 6, 5, 6],
[0, 2, 6, 3],
[6, 6, 8, 7],
[7, 6, 8, 1],
[7, 8, 8, 2],
[4, 0, 2, 8],
[5, 8, 1, 4]])
In [155]: strided_app(a)
Out[155]:
array([[4, 8, 7, 7],
[4, 8, 7, 7],
[1, 8, 1, 3],
[1, 8, 1, 3],
[0, 2, 6, 3],
[0, 2, 6, 3],
[7, 6, 8, 1],
[7, 6, 8, 1],
[4, 0, 2, 8],
[4, 0, 2, 8]])
Timings -
In [156]: arr = np.arange(1000000).reshape(1000, 1000)
# Proposed soln-1
In [157]: %timeit arr[::2].repeat(2,axis=0)
1000 loops, best of 3: 1.26 ms per loop
# #Psidom 's soln
In [158]: %timeit arr[1::2] = arr[::2]
1000 loops, best of 3: 928 µs per loop
In [159]: arr = np.arange(1000000).reshape(1000, 1000)
# Proposed soln-2
In [160]: %timeit strided_app(arr)
1000 loops, best of 3: 830 µs per loop
Looks like you have an even number of rows, in which case, you can use assignment (assign the odd rows values to corresponding even rows):
arr = np.array([[1,4],[3,1],[2,3],[2,2]])
arr[1::2] = arr[::2]
arr
#array([[1, 4],
# [1, 4],
# [2, 3],
# [2, 3]])
This avoids copying the entire array, but doesn't work if the array has odd number of rows.
Timing: Here is a comparison of the timing, the assignment does seem faster.
arr = np.arange(1000000).reshape(1000, 1000)
%timeit arr[::2].repeat(2,axis=0)
1000 loops, best of 3: 913 µs per loop
%timeit arr[1::2] = arr[::2]
1000 loops, best of 3: 655 µs per loop
This works for both even and an odd number of rows.
for i in range(1,len(a),2):
a[i] = a[i-1]

Combine array along axis

I'm trying to do the following:
I have a (4,2)-shaped array:
a = np.array([[-1, 0],[1, 0],[0, -1], [0, 1]])
I have another (2, 2)-shaped array:
b = np.array([[10, 10], [5, 5]])
I'd like to add them along rows of b and concatenate, so that I end up with:
[[ 9, 10],
[11, 10],
[10, 9],
[10, 11],
[4, 5],
[6, 5],
[5, 4],
[5, 6]]
The first 4 elements are b[0]+a, and the last four are b[1]+a. How can i generalize that if b is (N, 2)-shaped, not using a for loop over its elements?
You can use broadcasting to get all the summations in a vectorized manner to have a 3D array, which could then be stacked into a 2D array with np.vstack for the desired output. Thus, the implementation would be something like this -
np.vstack((a + b[:,None,:]))
Sample run -
In [74]: a
Out[74]:
array([[-1, 0],
[ 1, 0],
[ 0, -1],
[ 0, 1]])
In [75]: b
Out[75]:
array([[10, 10],
[ 5, 5]])
In [76]: np.vstack((a + b[:,None,:]))
Out[76]:
array([[ 9, 10],
[11, 10],
[10, 9],
[10, 11],
[ 4, 5],
[ 6, 5],
[ 5, 4],
[ 5, 6]])
You can replace np.dstack with some reshaping and this might be a bit more efficient, like so -
(a + b[:,None,:]).reshape(-1,a.shape[1])

Row-wise scaling with Numpy

I have an array H of dimension MxN, and an array A of dimension M . I want to scale H rows with array A. I do it this way, taking advantage of element-wise behaviour of Numpy
H = numpy.swapaxes(H, 0, 1)
H /= A
H = numpy.swapaxes(H, 0, 1)
It works, but the two swapaxes operations are not very elegant, and I feel there is a more elegant and consise way to achieve the result, without creating temporaries. Would you tell me how ?
I think you can simply use H/A[:,None]:
In [71]: (H.swapaxes(0, 1) / A).swapaxes(0, 1)
Out[71]:
array([[ 8.91065496e-01, -1.30548362e-01, 1.70357901e+00],
[ 5.06027691e-02, 3.59913305e-01, -4.27484490e-03],
[ 4.72868136e-01, 2.04351398e+00, 2.67527572e+00],
[ 7.87239835e+00, -2.13484271e+02, -2.44764975e+02]])
In [72]: H/A[:,None]
Out[72]:
array([[ 8.91065496e-01, -1.30548362e-01, 1.70357901e+00],
[ 5.06027691e-02, 3.59913305e-01, -4.27484490e-03],
[ 4.72868136e-01, 2.04351398e+00, 2.67527572e+00],
[ 7.87239835e+00, -2.13484271e+02, -2.44764975e+02]])
because None (or newaxis) extends A in dimension (example link):
In [73]: A
Out[73]: array([ 1.1845468 , 1.30376536, -0.44912446, 0.04675434])
In [74]: A[:,None]
Out[74]:
array([[ 1.1845468 ],
[ 1.30376536],
[-0.44912446],
[ 0.04675434]])
You just need to reshape A so that it will broad cast properly:
A = A.reshape((-1, 1))
so:
In [21]: M
Out[21]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20]])
In [22]: A
Out[22]: array([1, 2, 3, 4, 5, 6, 7])
In [23]: M / A.reshape((-1, 1))
Out[23]:
array([[0, 1, 2],
[1, 2, 2],
[2, 2, 2],
[2, 2, 2],
[2, 2, 2],
[2, 2, 2],
[2, 2, 2]])

Categories