I am trying to devise an efficient method to perform array division on NumPy where the divisor is largely made up of 1's.
import numpy as np
A = np.random.rand(3,3)
B = np.array([[1,1,3],[1,1,1],[1,4,1]])
Result = A/B
Here, only 2 instances of the division operation are really required. I am not sure if Numpy is already optimized for division by 1 but my gut feeling is that it isn't.
Your ideas, please?
You can apply the division to selected items of A and B:
In [249]: A=np.arange(9.).reshape(3,3)
In [250]: B = np.array([[1,1,3],[1,1,1],[1,4,1]])
In [251]: I=np.nonzero(B>1)
In [252]: I
Out[252]: (array([0, 2], dtype=int32), array([2, 1], dtype=int32))
In [253]: A[I] /= B[I]
In [254]: A
Out[254]:
array([[ 0. , 1. , 0.66666667],
[ 3. , 4. , 5. ],
[ 6. , 1.75 , 8. ]])
Also a boolean index: A[B>1] /= B[B>1]
I doubt if it's faster. But for other cases, such as a B that contains 0 it is a way of avoiding errors/warnings. There must be a number of SO questions about numpy division by zero.
Interesting question. I didn't do a very thorough test, but filtering the division by searching for 1's in the denominator seems to slow things down, slightly, even when the fraction of 1's is very high (99%) (see code below). This suggests that the search for 1's denom[np.where(denom<>1.0)]... slows things down. Perhaps Numpy already optimizes array divisions in this way?
import numpy as np
def div(filter=False):
np.random.seed(1234)
num = np.random.rand(1024)
denom = np.random.rand(1024)
denom[np.where(denom>.01)] = 1.0
if not filter:
return num/denom
else:
idx = np.where(denom<>1.0)[0]
num[idx]/=denom[idx]
return num
In [17]: timeit div(True)
10000 loops, best of 3: 89.7 µs per loop
In [18]: timeit div(False)
10000 loops, best of 3: 69.2 µs per loop
Related
Assume that I have two arrays A and B, where both A and B are m x n. My goal is now, for each row of A and B, to find where I should insert the elements of row i of A in the corresponding row of B. That is, I wish to apply np.digitize or np.searchsorted to each row of A and B.
My naive solution is to simply iterate over the rows. However, this is far too slow for my application. My question is therefore: is there a vectorized implementation of either algorithm that I haven't managed to find?
We can add each row some offset as compared to the previous row. We would use the same offset for both arrays. The idea is to use np.searchsorted on flattened version of input arrays thereafter and thus each row from b would be restricted to find sorted positions in the corresponding row in a. Additionally, to make it work for negative numbers too, we just need to offset for the minimum numbers as well.
So, we would have a vectorized implementation like so -
def searchsorted2d(a,b):
m,n = a.shape
max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
r = max_num*np.arange(a.shape[0])[:,None]
p = np.searchsorted( (a+r).ravel(), (b+r).ravel() ).reshape(m,-1)
return p - n*(np.arange(m)[:,None])
Runtime test -
In [173]: def searchsorted2d_loopy(a,b):
...: out = np.zeros(a.shape,dtype=int)
...: for i in range(len(a)):
...: out[i] = np.searchsorted(a[i],b[i])
...: return out
...:
In [174]: # Setup input arrays
...: a = np.random.randint(11,99,(10000,20))
...: b = np.random.randint(11,99,(10000,20))
...: a = np.sort(a,1)
...: b = np.sort(b,1)
...:
In [175]: np.allclose(searchsorted2d(a,b),searchsorted2d_loopy(a,b))
Out[175]: True
In [176]: %timeit searchsorted2d_loopy(a,b)
10 loops, best of 3: 28.6 ms per loop
In [177]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 13.7 ms per loop
The solution provided by #Divakar is ideal for integer data, but beware of precision issues for floating point values, especially if they span multiple orders of magnitude (e.g. [[1.0, 2,0, 3.0, 1.0e+20],...]). In some cases r may be so large that applying a+r and b+r wipes out the original values you're trying to run searchsorted on, and you're just comparing r to r.
To make the approach more robust for floating-point data, you could embed the row information into the arrays as part of the values (as a structured dtype), and run searchsorted on these structured dtypes instead.
def searchsorted_2d (a, v, side='left', sorter=None):
import numpy as np
# Make sure a and v are numpy arrays.
a = np.asarray(a)
v = np.asarray(v)
# Augment a with row id
ai = np.empty(a.shape,dtype=[('row',int),('value',a.dtype)])
ai['row'] = np.arange(a.shape[0]).reshape(-1,1)
ai['value'] = a
# Augment v with row id
vi = np.empty(v.shape,dtype=[('row',int),('value',v.dtype)])
vi['row'] = np.arange(v.shape[0]).reshape(-1,1)
vi['value'] = v
# Perform searchsorted on augmented array.
# The row information is embedded in the values, so only the equivalent rows
# between a and v are considered.
result = np.searchsorted(ai.flatten(),vi.flatten(), side=side, sorter=sorter)
# Restore the original shape, decode the searchsorted indices so they apply to the original data.
result = result.reshape(vi.shape) - vi['row']*a.shape[1]
return result
Edit: The timing on this approach is abysmal!
In [21]: %timeit searchsorted_2d(a,b)
10 loops, best of 3: 92.5 ms per loop
You would be better off just just using map over the array:
In [22]: %timeit np.array(list(map(np.searchsorted,a,b)))
100 loops, best of 3: 13.8 ms per loop
For integer data, #Divakar's approach is still the fastest:
In [23]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 7.26 ms per loop
I have read this question and understand that Numpy arrays cannot be used in boolean context. Let's say I want to perform an element-wise boolean check on the validity of inputs to a function. Can I realize this behavior while still using Numpy vectorization, and if so, how? (and if not, why?)
In the following example, I compute a value from two inputs while checking that both inputs are valid (both must be greater than 0)
import math, numpy
def calculate(input_1, input_2):
if input_1 < 0 or input_2 < 0:
return 0
return math.sqrt(input_1) + math.sqrt(input_2)
calculate_many = (lambda x: calculate(x, 20 - x))(np.arange(-20, 40))
By itself, this would not work with Numpy arrays because of ValueError. But, it is imperative that math.sqrt is never run on negative inputs because that would result in another error.
One solution using list comprehension is as follows:
calculate_many = [calculate(x, 20 - x) for x in np.arange(-20, 40)]/=
However, this no longer uses vectorization and would be painfully slow if the size of the arange was increased drastically.
Is there a way to implement this if check while still using vectorization?
I believe below expression performs vectorized operations and avoid the use of loops/lambda functions
np.sqrt(((input1>0) & 1)*input1) + np.sqrt(((input2>0) & 1)*input2)
In [121]: x = np.array([1, 10, 21, -1.])
In [122]: y = 20-x
In [123]: np.sqrt(x)
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in sqrt
#!/usr/bin/python3
Out[123]: array([1. , 3.16227766, 4.58257569, nan])
There are several ways of dealing with 'out-of-range' values.
#Sam's approach is to tweak the inputs so they are valid
In [129]: ((x>0) & 1)*x
Out[129]: array([ 1., 10., 21., -0.])
Another is to use masking to limit the values calculate.
Your function skips the sqrt is either input is negative; conversely it doe sthe calc where both are valid. That's different from testing each separately.
In [124]: mask = (x>=0) & (y>=0)
In [125]: mask
Out[125]: array([ True, True, False, False])
We can use the mask thus:
In [126]: res = np.zeros_like(x)
In [127]: res[mask] = np.sqrt(x[mask]) + np.sqrt(y[mask])
In [128]: res
Out[128]: array([5.35889894, 6.32455532, 0. , 0. ])
In my comments I suggested using the where parameter of np.sqrt. It does, though, need an out parameter as well.
In [130]: np.sqrt(x, where=mask, out=np.zeros_like(x)) +
np.sqrt(y, where=mask, out=np.zeros_like(x))
Out[130]: array([5.35889894, 6.32455532, 0. , 0. ])
Alternatively if we are are happy with the nan in Out[123] we can just suppress the RuntimeWarning.
I have a number of 1-dimensional numpy ndarrays containing the path length between a given node and all other nodes in a network for which I would like to calculate the average. The matter is complicated though by the fact that if no path exists between two nodes the algorithm returns a value of 2147483647 for that given connection. If I leave this value untreated it would obviously grossly inflate my average as a typical path length would be somewhere between 1 and 3 in my network.
One option of dealing with this would be to loop through all elements of all arrays and replace 2147483647 with NaN and then use numpy.nanmean to find the average though that is probably not the most efficient method of going about it. Is there a way of calculating the average with numpy just ignoring all values of 2147483647?
I should add that, I could have up to several million arrays with several million values to average over so any performance gain in how the average is found will make a real difference.
Why not using your usual numpy filtering for this?
m = my_array[my_array != 2147483647].mean()
By the way, if you really want speed, your whole algorithm description seems certainly naive and could be improved by a lot.
Oh and I guess that you are calculating the mean because you have rigorously checked that the underlying distribution is normal so that it means something, aren't you?
np.nanmean(np.where(my_array == 2147483647, np.nan, my_array))
Timings
a = np.random.randn(100000)
a[::10] = 2147483647
%timeit np.nanmean(np.where(a == 2147483647, np.nan, a))
1000 loops, best of 3: 639 µs per loop
%timeit a[a != 2147483647].mean()
1000 loops, best of 3: 259 µs per loop
import pandas as pd
%timeit pd.Series(a).ne(2147483647).mean()
1000 loops, best of 3: 493 µs per loop
One way would be to get the sum for all elements in one go and then removing the contribution from the invalid ones. Finally, we need to get the average value itself, divide by the number of valid elements. So, we would have an implementation like so -
def mean_ignore_num(arr,num):
# Get count of invalid ones
invc = np.count_nonzero(arr==num)
# Get the average value for all numbers and remove contribution from num
return (arr.sum() - invc*num)/float(arr.size-invc)
Verify results -
In [191]: arr = np.full(10,2147483647).astype(np.int32)
...: arr[1] = 5
...: arr[4] = 4
...:
In [192]: arr.max()
Out[192]: 2147483647
In [193]: arr.sum() # Extends beyond int32 max limit, so no overflow
Out[193]: 17179869185
In [194]: arr[arr != 2147483647].mean()
Out[194]: 4.5
In [195]: mean_ignore_num(arr,2147483647)
Out[195]: 4.5
Runtime test -
In [38]: arr = np.random.randint(0,9,(10000))
In [39]: arr[arr != 7].mean()
Out[39]: 3.6704609489462414
In [40]: mean_ignore_num(arr,7)
Out[40]: 3.6704609489462414
In [41]: %timeit arr[arr != 7].mean()
10000 loops, best of 3: 102 µs per loop
In [42]: %timeit mean_ignore_num(arr,7)
10000 loops, best of 3: 36.6 µs per loop
In python, I have an array X with N rows (the number of examples) and n columns (the number of features).
If I want to calculate the second order moment matrix C
C[i,j] = E(x_i x_j)
then I have two possibility:
First, do the loop:
for i in range(N):
for j in range(n):
for k in range(n):
C[j,k] = C[j,k] + X[i,j]*X[i,k]/N
Second, more simple, use numpy product matrix:
import numpy np
C = np.transpose(X).dot(X)/N
The second version in practice is extremely faster.
If now I want to calculate the third order moment matrix T
T[i,j,k] = E(x_i x_j x_k)
then the loop alternative is easy:
for i in range(N):
for j in range(n):
for k in range(n):
for m in range(n):
T[j,k,m] = T[j,k,m] + X[i,j]*X[i,k]*X[i,m]/N
Is there a fast way using numpy libraries to calculate this last tensor, like for the second order moment?
You can use NumPy's einsum notation to solve both your second and third order cases.
Second order :
np.einsum('ij,ik->jk',X,X)/N
Third order :
np.einsum('ij,ik,il->jkl',X,X,X)/N
As can be seen, it would be easier/intuitive to extend this to higher order cases with it.
I know it is not perfect in terms of speed, but why not using np.power(x, 3).sum() / N? It is slower than the dot product, but faster than looping.
In [1]: import numpy as np
In [2]: x = np.random.rand(10000)
In [3]: x.dot(x.T)
Out[3]: 3373.6189738897856
In [4]: %timeit(x.dot(x.T))
The slowest run took 48.74 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.39 µs per loop
In [5]: %timeit(np.power(x, 2).sum())
The slowest run took 4.14 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 140 µs per loop
In [6]: np.power(x, 2).sum()
Out[6]: 3373.6189738897865
Btw, that's how I calculate the moments...
I have a mathematical function of this form $f(x)=\sum_{j=0}^N x^j * \sin(j*x)$ that I would like to compute efficiently in Python. N is of order ~100. This function f is evaluated thousands of times for all entries x of a huge matrix, and therefore I would like to improve the performance (profiler indicates that calculation of f takes up most of the time). In order to avoid the loop in the definition of the function f I wrote:
def f(x)
J=np.arange(0,N+1)
return sum(x**J*np.sin(j*x))
The issue is that if I want to evaluate this function for all entries of a matrix, I would need to use numpy.vectorize first, but as far as I know this not necessarily faster than a for loop.
Is there an efficient way to perform a calculation of this type?
Welcome to Sack Overflow! ^^
Well, calculating something ** 100 is some serious thing. But notice how, when you declare your array J, you are forcing your function to calculate x, x^2, x^3, x^4, ... (and so on) independently.
Let us take for example this function (which is what you are using):
def powervector(x, n):
return x ** np.arange(0, n)
And now this other function, which does not even use NumPy:
def power(x, n):
result = [1., x]
aux = x
for i in range(2, n):
aux *= x
result.append(aux)
return result
Now, let us verify that they both calculate the same thing:
In []: sum(powervector(1.1, 10))
Out[]: 15.937424601000005
In []: sum(power(1.1, 10))
Out[]: 15.937424601000009
Cool, now let us compare the performance of both (in iPython):
In [36]: %timeit sum(powervector(1.1, 10))
The slowest run took 20.42 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 3.52 µs per loop
In [37]: %timeit sum(power(1.1, 10))
The slowest run took 5.28 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.13 µs per loop
It is faster, as you are not calculating all the powers of x, because you know that x ^ N == (x ^ N - 1) * x and you take advantage of it.
You could use this to see if your performance improves. Of course you can change power() to use NumPy vectors as output. You can also have a look at Numba, which is easy to try and may improve performance a bit as well.
As you see, this is only a hint on how to improve some part of your problem. I bet there are a couple of other ways to further improve your code! :-)
Edit
It seems that Numba might not be a bad idea... Simply adding #numba.jit decorator:
#numba.jit
def powernumba(x, n):
result = [1., x]
aux = x
for i in range(2, n):
aux *= x
result.append(aux)
return result
Then:
In [52]: %timeit sum(power(1.1, 100))
100000 loops, best of 3: 7.67 µs per loop
In [51]: %timeit sum(powernumba(1.1, 100))
The slowest run took 5.64 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 2.64 µs per loop
It seems Numba can do some magic there. ;-)
For a scalar x:
>>> import numpy as np
>>> x = 0.5
>>> jj = np.arange(10)
>>> x**jj
array([ 1. , 0.5 , 0.25 , 0.125 , 0.0625 ,
0.03125 , 0.015625 , 0.0078125 , 0.00390625, 0.00195312])
>>> np.sin(jj*x)
array([ 0. , 0.47942554, 0.84147098, 0.99749499, 0.90929743,
0.59847214, 0.14112001, -0.35078323, -0.7568025 , -0.97753012])
>>> (x**jj * np.sin(jj*x)).sum()
0.64489974041068521
Notice the use of the sum method of numpy arrays (equivalently, use np.sum not built-in sum).
If your x is itself an array, use broadcasting:
>>> a = x[:, None]**jj
>>> a.shape
(3, 10)
>>> x[0]**jj == a[0]
array([ True, True, True, True, True, True, True, True, True, True], dtype=bool)
Then sum over a second axis:
>>> res = a * np.sin(jj * x[:, None])
>>> res.shape
(3, 10)
>>> res.sum(axis=1)
array([ 0.01230993, 0.0613201 , 0.17154859])