I have this for loop that I need to vectorize. The code below works, but takes a lot of time (this is a simplified example, the full version will have about 1e6 rows in col_ids). Can someone give me an idea how to vectorize this code to get rid of the loop? If it matters, the col_ids are fixed (will be the same every time the code is run), while the values will change.
values = np.array([1.5, 2, 2.3])
col_ids = np.array([[0,0,0,0], [0,0,0,1], [0,0,1,1]])
result = np.zeros((4,3))
for idx, col_idx in enumerate(col_ids):
result[np.arange(4),col_idx] += values[idx]
Result:
[[5.8 0. 0. ]
[5.8 0. 0. ]
[3.5 2.3 0. ]
[1.5 4.3 0. ]]
Update:
I am adding a second example as there was some ambiguity in the dimensions of my first example. Only values and col_ids are updated, everything else as in first example. (I keep the first one, since this is referred to in the answers)
values = np.array([1.5, 2, 5, 20, 50])
col_ids = np.array([[0,0,0,0], [0,0,0,1], [0,0,1,1], [0,0,1,2], [0,1,2,2]])
Result:
[[78.5 0. 0. ]
[28.5 50. 0. ]
[ 3.5 25. 50. ]
[ 1.5 7. 70. ]]
So result is m x n, col_ids is k x m and values has length k. Both m and n are small (m=4, n=3), k is large (about 1e6 in full example)
You can vectorize the loop, but creating an additional intermediate array is much slower for larger data (starting from result with shape (50,50))
import numpy as np
values = np.array([1.5, 2, 2.3])
col_ids = np.array([[0,0,0,0], [0,0,0,1], [0,0,1,1]])
(np.equal.outer(col_ids, np.arange(len(values))) * values[:,None,None]).sum(0)
# for a fixed result shape (4,3)
# (np.equal.outer(col_ids, np.arange(3)) * values[:,None,None]).sum(0)
Output
array([[5.8, 0. , 0. ],
[5.8, 0. , 0. ],
[3.5, 2.3, 0. ],
[1.5, 4.3, 0. ]])
The only reliably faster solution I could find is numba (using version 0.55.1). I thought this implementation would benefit from parallel execution, but I couldn't get any speed up on a 2-core colab instance.
import numba as nb
#nb.njit(parallel=False) # Try parallel=True for multi-threaded execution, no speed up in my benchmarks
def fill(val, ids):
res = np.zeros(ids.shape[::-1])
for i in nb.prange(len(res)):
for j in range(res.shape[1]):
res[i, ids[j,i]] += val[j]
return res
fill(values, col_ids)
Output
array([[5.8, 0. , 0. ],
[5.8, 0. , 0. ],
[3.5, 2.3, 0. ],
[1.5, 4.3, 0. ]])
For a fixed result shape (4,3) with suitable input.
#nb.njit(boundscheck=True) # ~1.25x slower, but much safer
def fill(val, ids):
res = np.zeros((4,3))
for i in nb.prange(ids.shape[0]):
for j in range(ids.shape[1]):
res[j, ids[i,j]] += val[i]
return res
fill(values, col_ids)
Output for the updated example data
array([[78.5, 0. , 0. ],
[28.5, 50. , 0. ],
[ 3.5, 25. , 50. ],
[ 1.5, 7. , 70. ]])
You can solve this using np.add.at. However, AFAIK, this function does not support 2D array so you need to flatten the arrays, computing the 1D flatten indices, and then call the function:
n, m = result.shape
result = np.zeros((4,3))
indices = np.tile(np.arange(0, n*m, m), col_ids.shape[0]) + col_ids.ravel()
np.add.at(result.ravel(), indices, np.repeat(values, n)) # In-place
print(result)
Related
I have an array in the following form where the first two columns are supposed to be indices of a 2-dimensional array and the following columns are arbitrary values.
data = np.array([[ 0. , 1. , 48. , 4. ],
[ 1. , 2. , 44. , 4.4],
[ 1. , 1. , 34. , 2.3],
[ 0. , 2. , 55. , 2.2],
[ 0. , 0. , 42. , 2. ],
[ 1. , 0. , 22. , 1. ]])
How do I combine the indices data[:,:2] with their values data[:,2:] such that the resulting array is accessible by the indices in the first two columns.
In my example that would be:
result = np.array([[[42. , 2. ], [48. , 4. ], [55. , 2.2]],
[[22. , 1. ], [34. , 2.3], [44. , 4.4]]])
I know that there is a trivial solution using python loops. But performance is a concern since I'm dealing with a huge amount of data. Specifically it's output of another program that I need to process.
Maybe there is a relatively trivial numpy solution as well. But I'm kind of stuck.
If it helps the following can be safely assumed:
All numbers in the first two columns are whole numbers (although the array consists of floats).
Every possible index (or rather combinations of indices) in the original array is used exactly once. I.e. there is guaranteed to be exactly one entry of the form [i, j, ...].
The indices start at 0 and I know the highest indices beforehand.
Edit:
Hmm. I see now how my example is misleading. The truth is that some of my input arrays are sorted, but that's unreliable. So I shouldn't assume anything about the order. I reordered some rows in my example to make it clearer. In case anyone wants to make sense of the answer and comment below: In my original question the array appeared to be sorted by the first two columns.
find row, column, depth base your data array, then fill like below:
import numpy as np
data = np.array([[ 0. , 0. , 42. , 2. ],
[ 0. , 1. , 48. , 4. ],
[ 0. , 2. , 55. , 2.2],
[ 1. , 0. , 22. , 1. ],
[ 1. , 1. , 34. , 2.3],
[ 1. , 2. , 44. , 4.4]])
row = int(max(data[:,0]))+1
col = int(max(data[:,1]))+1
depth = len(data[0, 2:])
out = np.zeros([row, col, depth])
out = data[:, 2:].reshape(row,col,depth)
print(out)
Output:
[[[42. 2. ]
[48. 4. ]
[55. 2.2]]
[[22. 1. ]
[34. 2.3]
[44. 4.4]]]
You can use numba in no-python parallel mode with loops (which is inherently for python loops acceleration) that will be one of the most efficient methods in terms of performance as szczesny mentioned in the comments, that won't need to sort; this code is adjusted for when column counts are 2, if it be changeable, this code can be modified to handle that:
# without signature --> #nb.njit(parallel=True)
#nb.njit("float64[:, :, ::1](float64[:, ::1])", parallel=True)
def numba_(data):
data_ = data[:, :2].astype(np.int8)
res = np.empty((data_[:, 0].max() + 1, data_[:, 1].max() + 1, 2))
for i in nb.prange(data_.shape[0]):
res[data_[i, 0], data_[i, 1], 0] = data[i, 2]
res[data_[i, 0], data_[i, 1], 1] = data[i, 3]
return res
without the sorting and curing the proposed NumPy code (horizontal axis --> data.shape[0]):
More general to consider more than 2 columns:
#nb.njit("float64[:, :, ::1](float64[:, ::1])", parallel=True)
def numba_(data):
data_ = data[:, :2].astype(np.int8)
assert data_.shape[0] == data.shape[0]
depth = data[:, 2:].shape[1]
res = np.empty((data_[:, 0].max() + 1, data_[:, 1].max() + 1, depth))
for i in nb.prange(data_.shape[0]):
for j in range(depth):
res[data_[i, 0], data_[i, 1], j] = data[i, j + 2]
return res
Consider the following code that generates the following dst matrix.
tmp = pd.DataFrame()
tmp['a'] = np.random.randint(1, 10, 5)
tmp['b'] = np.random.randint(1, 10, 5)
dst = pairwise_distances(tmp, tmp, metric='l2')
dst
which looks like the following
array([[0. , 5.38516481, 5. , 4.12310563, 2. ],
[5.38516481, 0. , 1.41421356, 3.16227766, 5. ],
[5. , 1.41421356, 0. , 4. , 4.12310563],
[4.12310563, 3.16227766, 4. , 0. , 5. ],
[2. , 5. , 4.12310563, 5. , 0. ]])
Now, I want to somehow get 4 as an output column, because for row=0 and col=4 lies the minimum distance of row0 to another row apart from itself.
I'm trying to use the following code to do the job! but np.nonzeros() is messing up the game.
np.argmin(dst[0, np.nonzero(dst[0,:])]) I'm getting 3 as an output, where I should be getting 4. I understand that np.nonzero() return another set of dimensions [1,2,3,4] of which argmin picks 3rd column which is actual 4th column of the dst matrix. Need help! Thanks in advance!!
Instead of argmin, use np.min and compare the result to dst[0,:]. Finally, pass it to np.flatnonzero or np.nonzero
np.flatnonzero(np.min(dst[0,np.nonzero(dst[0,:])]) == dst[0,:])
Out[150]: array([4], dtype=int64)
Or
np.nonzero(np.min(dst[0,np.nonzero(dst[0,:])]) == dst[0,:])[0]
Out[151]: array([4], dtype=int64)
If you want to return an integer index, you may use np.argmax at the last step
np.argmax(np.min(dst[0,np.nonzero(dst[0,:])]) == dst[0,:])
Out[157]: 4
How can I vectorize the following loop?
def my_fnc():
m = np.arange(27.).reshape((3,3,3))
ret = np.empty_like(m)
it = np.nditer(m, flags=['multi_index'])
for x in it:
i,j,k = it.multi_index
ret[i,j,k] = x / m[i,j,i]
return ret
Basically I'm dividing each value in m by something similar to a diagonal. Not all values in m will be different, the arange is just an example.
Thanks in advance! ~
P.S.: here's the output of the function above, don't mind the nans :)
array([[[ nan, inf, inf],
[ 1. , 1.33333333, 1.66666667],
[ 1. , 1.16666667, 1.33333333]],
[[ 0.9 , 1. , 1.1 ],
[ 0.92307692, 1. , 1.07692308],
[ 0.9375 , 1. , 1.0625 ]],
[[ 0.9 , 0.95 , 1. ],
[ 0.91304348, 0.95652174, 1. ],
[ 0.92307692, 0.96153846, 1. ]]])
Use advanced-indexing to get the m[i,j,i] equivalent in one go and then simply divide input array by it -
r = np.arange(len(m))
ret = m/m[r,:,r,None] # Add new axis with None to allow for broadcasting
I am using the following to calculate the running gradients between data in the same indexes across multiple matrices:
import numpy as np
array_1 = np.array([[1,2,3], [4,5,6]])
array_2 = np.array([[2,3,4], [5,6,7]])
array_3 = np.array([[1,8,9], [9,6,7]])
flat_1 = array_1.flatten()
flat_2 = array_2.flatten()
flat_3 = array_3.flatten()
print('flat_1: {0}'.format(flat_1))
print('flat_2: {0}'.format(flat_2))
print('flat_3: {0}'.format(flat_3))
data = []
gradient_list = []
for item in zip(flat_1,flat_2,flat_3):
data.append(list(item))
print('items: {0}'.format(list(item)))
grads = np.gradient(list(item))
print('grads: {0}'.format(grads))
gradient_list.append(grads)
grad_array=np.array(gradient_list)
print('grad_array: {0}'.format(grad_array))
This doesn't look like an optimal way of doing this - is there a vectorized way of calculating gradients between data in 2d arrays?
numpy.gradient takes axis as parameter, so you might just stack the arrays, and then calcualte the gradient along a certain axis; For instance, use np.dstack with axis=2; If you need a different shape as result, just use reshape method:
np.gradient(np.dstack((array_1, array_2, array_3)), axis=2)
#array([[[ 1. , 0. , -1. ],
# [ 1. , 3. , 5. ],
# [ 1. , 3. , 5. ]],
# [[ 1. , 2.5, 4. ],
# [ 1. , 0.5, 0. ],
# [ 1. , 0.5, 0. ]]])
Or if flatten the arrays first:
np.gradient(np.column_stack((array_1.ravel(), array_2.ravel(), array_3.ravel())), axis=1)
#array([[ 1. , 0. , -1. ],
# [ 1. , 3. , 5. ],
# [ 1. , 3. , 5. ],
# [ 1. , 2.5, 4. ],
# [ 1. , 0.5, 0. ],
# [ 1. , 0.5, 0. ]])
What is the best way to fill in the lower triangle of a numpy array with zeros in place so that I don't have to do the following:
a=np.random.random((5,5))
a = np.triu(a)
since np.triu returns a copy, not a view. Preferable this would require no list indexing as well since I am working with large arrays.
Digging into the internals of triu you'll find that it just multiplies the input by the output of tri.
So you can just multiply the array in-place by the output of tri:
>>> a = np.random.random((5, 5))
>>> a *= np.tri(*a.shape)
>>> a
array([[ 0.46026582, 0. , 0. , 0. , 0. ],
[ 0.76234296, 0.5298908 , 0. , 0. , 0. ],
[ 0.08797149, 0.14881991, 0.9302515 , 0. , 0. ],
[ 0.54794779, 0.36896506, 0.92901552, 0.73747726, 0. ],
[ 0.62917827, 0.61674542, 0.44999905, 0.80970863, 0.41860336]])
Like triu, this still creates a second array (the output of tri), but at least it performs the operation itself in-place. The splat is a bit of a shortcut; consider basing your function on the full version of triu for something robust. But note that you can still specify a diagonal:
>>> a = np.random.random((5, 5))
>>> a *= np.tri(*a.shape, k=2)
>>> a
array([[ 0.25473126, 0.70156073, 0.0973933 , 0. , 0. ],
[ 0.32859487, 0.58188318, 0.95288351, 0.85735005, 0. ],
[ 0.52591784, 0.75030515, 0.82458369, 0.55184033, 0.01341398],
[ 0.90862183, 0.33983192, 0.46321589, 0.21080121, 0.31641934],
[ 0.32322392, 0.25091433, 0.03980317, 0.29448128, 0.92288577]])
I now see that the question title and body describe opposite behaviors. Just in case, here's how you can fill the lower triangle with zeros. This requires you to specify the -1 diagonal:
>>> a = np.random.random((5, 5))
>>> a *= 1 - np.tri(*a.shape, k=-1)
>>> a
array([[0.6357091 , 0.33589809, 0.744803 , 0.55254798, 0.38021111],
[0. , 0.87316263, 0.98047459, 0.00881754, 0.44115527],
[0. , 0. , 0.51317289, 0.16630385, 0.1470729 ],
[0. , 0. , 0. , 0.9239731 , 0.11928557],
[0. , 0. , 0. , 0. , 0.1840326 ]])
If speed and memory use are still a limitation and Cython is available, a short Cython function will do what you want.
Here's a working version designed for a C-contiguous array with double precision values.
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
cpdef make_lower_triangular(double[:,:] A, int k):
""" Set all the entries of array A that lie above
diagonal k to 0. """
cdef int i, j
for i in range(min(A.shape[0], A.shape[0] - k)):
for j in range(max(0, i+k+1), A.shape[1]):
A[i,j] = 0.
This should be significantly faster than any version that involves multiplying by a large temporary array.
import numpy as np
n=3
A=np.zeros((n,n))
for p in range(n):
A[0,p] = p+1
if p >0 :
A[1,p]=p+3
if p >1 :
A[2,p]=p+4
creates a upper triangular matrix starting at 1