I have a=np.array([array([1,2,3,4],[2,3,4,5]),array([6,7,8,9])]). I want to take a dot product of both of the arrays with some vector v.
I tried to vectorize the np.dot function.
vfunc=np.vectorize(np.dot) and I applied the vfunc to my array a. vfunc(a,v)where v is the vector i want to take dot product with. However, i get this error ValueError: setting an array element with a sequence. . Is there any other way to do this?
Since you are passing an object dtype array as argument, you need to specify 'O' result type as well. Without otypes vectorize tries to deduce the return dtype, and may do so wrongly. That's just one of the pitfalls to using np.vectorize:
In [196]: f = np.vectorize(np.dot, otypes=['O'])
In [197]: x = np.array([[1,2,3],[1,2,3,4]])
/usr/local/bin/ipython3:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
#!/usr/bin/python3
In [199]: f(x, x)
Out[199]: array([14, 30], dtype=object)
Another problem with np.vectorize is that it is slower than alternatives:
In [200]: f1 = np.frompyfunc(np.dot, 2,1)
In [201]: f1(x,x)
Out[201]: array([14, 30], dtype=object)
In [202]: np.array([np.dot(i,j) for i,j in zip(x,x)])
Out[202]: array([14, 30])
In [203]: timeit f(x, x)
27.1 µs ± 229 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [204]: timeit f1(x,x)
16.9 µs ± 135 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [205]: timeit np.array([np.dot(i,j) for i,j in zip(x,x)])
21.3 µs ± 201 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
np.vectorize has a clear speed disclaimer. Read the full docs; it isn't as simple a function as you might think. The name can be misleading.
Related
I use below code to create a empty matrix:
import numpy as np
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
print(x)
y =np.empty_like(x)
print(y)
# I get below data:
[[2097184 2097184 2097184]
[2097184 2097184 2097184]
[2097184 2097184 2097184]
[2097184 2097184 2097184]]
why the 2097184 stand for empty?
It doesn't stand for anything. From the documentation:
This function does not initialize the returned array; to do that use zeros_like or ones_like instead. It may be marginally faster than the functions that do set the array values.
So the contents of the array are whatever happens to be in the memory that it used for it. In this case, it was a bunch of 2097184 values. The next time you try it you'll probably get something different.
You use this when you don't care what's in the array, because you're going to overwrite it.
The empty_like method does not initialize the array (that's why it's very faster than zeros_like and ones_like), so the shape of the array is exactly the same as x, but the values are uninitialized and actually are almost random values from the memory place allocated to the array.
In addition, it's just a more efficient alternative to zeros_like or ones_like:
%%timeit
np.zeros_like(x)
>>> 18.4 µs ± 2.39 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
np.ones_like(x)
>>> 14.1 µs ± 205 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
np.empty_like(x)
>>> 2.09 µs ± 62.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I have a 1D array of integers with D elements (i.e. idx = np.array([i0, i1, ...]), s.t. idx.size = D), where each element corresponds to the index along that dimension of an ND array with D dimensions (i.e. data s.t. data.ndim = D). How can I index the data array using the index array idx?
In python I would do data[tuple(idx)], but tuple aren't supported in numba nopython mode.
My current workaround is to use data.ravel() and convert from ND indices to 1D indices of the flattened array, but it seems like there must be an easier (and computationally faster) solution. Is there a take_along_each_axis(data, idx) method somewhere?
Lets do a bit of time testing:
In [135]: data = np.ones((100,100,100,100)); idx = (50,50,50,50)
That's nearly a Gb of memory - not huge enough to create a memory error, but still should be a reasonable test. Actually, I get the same time for basic indexing for much smaller arrays. And for other idx values
In [136]: timeit data[idx]
212 ns ± 9.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
the interpreter translates that into a method call:
In [137]: timeit data.__getitem__(idx)
283 ns ± 4.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
indexing the 'flat' array, can be done with:
In [138]: timeit data.flat[np.ravel_multi_index(idx,data.shape)]
6.65 µs ± 75.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
or taking the conversion out of the loop:
In [139]: %%timeit x=np.ravel_multi_index(idx,data.shape)
...: data.flat[x]
574 ns ± 23.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [142]: %%timeit x=np.ravel_multi_index(idx,data.shape);df=data.flat
...: df[x]
345 ns ± 6.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
I think there are cases where flat indexing is faster, but this isn't one.
So a stand alone operation I don't see the point to writing a njit version. I suppose if it's part of some larger operation it could be worth it.
Assume A is a sparse 2D binary tensor (NxN) and B is a vector (1xN). I want to scale columns of A by B: A_{i, j} <- A_{i, j} x B_{j}
I am looking for an efficient way for doing this. Currently I tried two methods but they are too slow:
Convert B to the diagonal matrix BD and use tf.sparse_tensor_dense_matmul(A, BD). This is going to return a dense matrix and as a result, the method is not efficient.
Using map_fn for sparse tensors tf.SparseTensor(A.indices, tf.map_fn(func, (A.values, A.indices[:, 1]), dtype=tf.float32), A.dense_shape). This returns a sparse tensor but it seems one of the map_fn or SparseTensor functions are the bottleneck.
Thank you!
The faster solution is probably to call gather on your tensor B to be able to multiply directly the values of A. Another option is to use tf.sparse.map_values, creating your vector of updates with tf.gather.
On a 99% empty sparse matrix of size 1000x1000:
>>> %timeit tf.sparse.SparseTensor(A.indices, A.values*tf.gather(B,A.indices[:,1]), A.dense_shape)
855 µs ± 86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The map_values approach:
>>> %timeit tf.sparse.map_values(tf.multiply, A, tf.gather(B,A.indices[:,1]))
978 µs ± 73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Compared to the sparse_dense_matmul approach:
>>> %timeit tf.sparse.from_dense(tf.sparse.sparse_dense_matmul(A,tf.linalg.diag(B)))
26.7 ms ± 2.24 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
I have a numpy array, X, of shape (200, 200, 1500). I also have a function, func, that essentially returns the mean of an array (it does a few other things but they are are all numpy operations, you can think of it as np.mean). Now if I want to apply this function across the second array I could just do np.apply_along_axis(func, 2, X). But, I also have a truth array of shape (200, 200, 1500). I want to only apply func to places where the truth array has True. So it would ignore any places where the truth array is false. So going back to the np.mean example it would take the mean for each array index across the second axis but ignore some arbitrary set of indices.
So in practice, my solution would be to convertX into a new array Y with shape (200, 200) but the elements of the array are lists. This would be done using the truth array. Then apply func to each list in the array. The problem is this seems very time consuming since and I feel like there is a numpy oriented solution for this. Is there?
If what I said with the array list is the best way, how would I go about combining X and the truth array to get Y?
Any suggestions or comments appreciated.
In [268]: X = np.random.randint(0,100,(200,200,1500))
Let's check how apply works with just np.mean:
In [269]: res = np.apply_along_axis(np.mean, 2, X)
In [270]: res.shape
Out[270]: (200, 200)
In [271]: timeit res = np.apply_along_axis(np.mean, 2, X)
1.2 s ± 36.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
An equivalent using iteration on the first two dimensions. I'm using reshape to make it easier to write; speed should be about the same with a double loop.
In [272]: res1 = np.reshape([np.mean(row) for row in X.reshape(-1,1500)],(200,200))
In [273]: np.allclose(res, res1)
Out[273]: True
In [274]: timeit res1 = np.reshape([np.mean(row) for row in X.reshape(-1,1500)],(200,200))
906 ms ± 13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So apply may be convenient, but it is not a speed tool.
For speed in numpy you need to maximize the use of compiled code, and avoiding unnecessary python level loops.
In [275]: res2 = np.mean(X,axis=2)
In [276]: np.allclose(res2,res)
Out[276]: True
In [277]: timeit res2 = np.mean(X,axis=2)
120 ms ± 619 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
If using apply in your new case is hard, you don't loose anything by using something you do understand.
masked
In [278]: mask = np.random.randint(0,2, X.shape).astype(bool)
The [272] iteration can be adapted to work with mask:
In [279]: resM1 = np.reshape([np.mean(row[m]) for row,m in zip(X.reshape(-1,1500),mask.reshape(-1,150
...: 0))],X.shape[:2])
In [280]: timeit resM1 = np.reshape([np.mean(row[m]) for row,m in zip(X.reshape(-1,1500),mask.reshape
...: (-1,1500))],X.shape[:2])
1.43 s ± 18.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This might have problems if row[m] is empty. np.mean([]) produces a warning and nan value.
Applying the mask to X before any further processing looses dimensional information.
In [282]: X[mask].shape
Out[282]: (30001416,)
apply only works with one array, so it will be awkward (though not impossible) to use it to iterate on both X and mask. A structured array with data and mask fields might do the job. But the previous timings show, there's no speed advantage.
masked array
I don't usually expect masked arrays to offer speed, but this case it helps:
In [285]: xM = np.ma.masked_array(X, ~mask)
In [286]: resMM = np.ma.mean(xM, axis=2)
In [287]: np.allclose(resM1, resMM)
Out[287]: True
In [288]: timeit resMM = np.ma.mean(xM, axis=2)
849 ms ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.nanmean
There's a set of functions that use np.nan masking:
In [289]: Xfloat = X.astype(float)
In [290]: Xfloat[~mask] = np.nan
In [291]: resflt = np.nanmean(Xfloat, axis=2)
In [292]: np.allclose(resM1, resflt)
Out[292]: True
In [293]: %%timeit
...: Xfloat = X.astype(float)
...: Xfloat[~mask] = np.nan
...: resflt = np.nanmean(Xfloat, axis=2)
...:
...:
2.17 s ± 200 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This doesn't help :(
I am interested in finding the fastest way of carrying a simple operation in Python3.6 using Numpy. I wish to create a function and from a given array to an array of function values. Here is a simplified code that does that using map:
import numpy as np
def func(x):
return x**2
xRange = np.arange(0,1,0.01)
arr_func = np.array(list(map(func, xRange)))
However, as I am running it with a complicated function and using large arrays, runtime speed is very important for me. Is there a known faster way?
EDIT My question is not the same as this one, because I am asking about assigning from a function, as opposed to a generator.
Check the related How do I build a numpy array from a generator?, where the most compelling option seems to be preallocating the numpy array and setting values, instead of creating a throwaway intermediate list.
arr_func = np.empty(len(xRange))
for i in range(len(xRange)):
arr_func[i] = func(xRange[i])
With a complex function that can't be rewritten with compiled numpy functions, we can't make big improvements in speed.
Define a function with math methods that require scalars, for example:
def func(x):
return math.sin(x)**2 + math.cos(x)**2
In [868]: x = np.linspace(0,np.pi,10000)
For reference do a straight forward list comprehension:
In [869]: np.array([func(i) for i in x])
Out[869]: array([ 1., 1., 1., ..., 1., 1., 1.])
In [870]: timeit np.array([func(i) for i in x])
13.4 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Your list map is slightly faster:
In [871]: timeit np.array(list(map(func, x)))
12.6 ms ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
For 1d array like this, np.array can be replaced with np.fromiter. It works with a generator as well, including the Py3 map.
In [875]: timeit np.fromiter(map(func, x),float)
13.1 ms ± 176 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So that could get around the possible time penalty of creating a whole list first. But in this case it doesn't help.
Another iterator is np.frompyfunc. It is used by np.vectorize, but usually is faster with less overhead. It returns a dtype object array:
In [876]: f = np.frompyfunc(func, 1, 1)
In [877]: f(x)
Out[877]: array([1.0, 1.0, 1.0, ..., 1.0, 1.0, 1.0], dtype=object)
In [878]: timeit f(x)
11.1 ms ± 298 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [879]: timeit f(x).astype(float)
11.2 ms ± 85.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
A slight speed improvement. I noticed more of an improvement with 1000 item x. This is even better if your problem requires several arrays that may be broadcasted against each other.
Assigning to a preallocated out array may save memory, and is often recommended as a alternative to the list append iteration. But here it doesn't not give a speed improvement:
In [882]: %%timeit
...: out = np.empty_like(x)
...: for i,j in enumerate(x): out[i]=func(j)
16.1 ms ± 308 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
(the use of enumerate is slightly faster than range iteration).