How do I vectorize addition between columns in a numpy array? For example, what is the fastest way to implement something like:
import numpy
ary = numpy.array([[1,2,3],[3,4,5],[5,6,7],[7,8,9],[9,10,11]])
for i in range(ary.shape[0]):
ary[i,0] += ary[i,1]
With numpy.ndarray.sum over axis 1:
ary[:,0] = ary.sum(axis=1)
Or the same with straightforward addition on slices:
ary[:,0] = ary[:, 0] + ary[:, 1]
Related
I am trying to sum specific indices per row in a numpy matrix, based on values in a second numpy vector. For example, in the image, there is the matrix A and the vector of indices inds. Here I want to sum:
A[0, inds[0]] + A[1, inds[1]] + A[2, inds[2]] + A[3, inds[3]]
I am currently using a python for loop, making the code quite slow. Is there a way to do this using vectorisation? Thanks!
Yes, numpy's magic indexing can do this. Just generate a range for the 1st dimension and use your coords for the second:
import numpy as np
x1 = np.array( [[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]] )
print(x1[ [0,1,2,3],[2,0,3,1] ].sum())
Out of huge matrix in numpy (currently 1000x1000) only a few elements are relevant for me. Say these elements are >1000 in value and others are way lower. I need to find indices of all such elements in the most efficient way because the search will be repeated often and the matrix can become even bigger.
For now I have two different approaches which should be about the same complexity (I omit possible solutions with for as inefficient):
import numpy as np
A = np.zeros((1000,1000))
#do something with the matrix
#first solution with np.where
np.where(A > 999).T
# array([[0, 0],[1, 20]....[785, 445]], dtype=int64) - made up numbers
#another solution with np.argwhere
np.argwhere(A > 999)
# array([[0, 0],[1, 20]....[785, 445]], dtype=int64) - outputs the same
Is there any possible way to speed up this search or is my solution the most efficient?
Thanks for any advices and suggestion!
You can try this, the filter directly included in the numpy array!
import numpy as np
arr = np.array([998, 999, 1000, 1001])
filter_arr = arr > 999
newarr = arr[filter_arr]
print(filter_arr)
print(newarr)
https://www.w3schools.com/python/numpy_array_filter.asp
Suppose I have two arrays A and B with dimensions (n1,m1,m2) and (n2,m1,m2), respectively. I want to compute the matrix C with dimensions (n1,n2) such that C[i,j] = sum((A[i,:,:] - B[j,:,:])^2). Here is what I have so far:
import numpy as np
A = np.array(range(1,13)).reshape(3,2,2)
B = np.array(range(1,9)).reshape(2,2,2)
C = np.zeros(shape=(A.shape[0], B.shape[0]) )
for i in range(A.shape[0]):
for j in range(B.shape[0]):
C[i,j] = np.sum(np.square(A[i,:,:] - B[j,:,:]))
C
What is the most efficient way to do this? In R I would use a vectorized approach, such as outer. Is there a similar method for Python?
Thanks.
You can use scipy's cdist, which is pretty efficient for such calculations after reshaping the input arrays to 2D, like so -
from scipy.spatial.distance import cdist
C = cdist(A.reshape(A.shape[0],-1),B.reshape(B.shape[0],-1),'sqeuclidean')
Now, the above approach must be memory efficient and thus a better one when working with large datasizes. For small input arrays, one can also use np.einsum and leverage NumPy broadcasting, like so -
diffs = A[:,None]-B
C = np.einsum('ijkl,ijkl->ij',diffs,diffs)
I'd like to sample from indices of a 2D Numpy array, considering that each index is weighted by the number inside of that array. The way I know it is with numpy.random.choice however that does not return the index but the number itself. Is there any efficient way of doing so?
Here is my code:
import numpy as np
A=np.arange(1,10).reshape(3,3)
A_flat=A.flatten()
d=np.random.choice(A_flat,size=10,p=A_flat/float(np.sum(A_flat)))
print d
You could do something like:
import numpy as np
def wc(weights):
cs = np.cumsum(weights)
idx = cs.searchsorted(np.random.random() * cs[-1], 'right')
return np.unravel_index(idx, weights.shape)
Notice that the cumsum is the slowest part of this, so if you need to do this repeatidly for the same array I'd suggest computing the cumsum ahead of time and reusing it.
To expand on my comment: Adapting the weighted choice method presented here https://stackoverflow.com/a/10803136/553404
def weighted_choice_indices(weights):
cs = np.cumsum(weights.flatten())/np.sum(weights)
idx = np.sum(cs < np.random.rand())
return np.unravel_index(idx, weights.shape)
Having an array and a mask for this array, using fancy indexing, it is easy to select only the data of the array corresponding to the mask.
import numpy as np
a = np.arange(20).reshape(4, 5)
mask = [0, 2]
data = a[:, mask]
But is there a rapid way to select all the data of the array that does not belong to the mask (i.e. the mask is the data we want to reject)?
I tried to find a general solution going through an intermediate boolean array, but I'm sure there is something really easier.
mask2 = np.ones(a.shape)==1
mask2[:, mask]=False
data = a[mask2].reshape(a.shape[0], a.shape[1]-size(mask))
Thank you
Have a look at numpy.invert, numpy.bitwise_not, numpy.logical_not, or more concisely ~mask. (They all do the same thing, in this case.)
As a quick example:
import numpy as np
x = np.arange(10)
mask = x > 5
print x[mask]
print x[~mask]