I have a function that assigns value depending on the condition. My dataset size is usually in the range of 30-50k. I am not sure if this is the correct way to use numpy but when it's more than 5k numbers, it gets really slow. Is there a better way to make it faster ?
import numpy as np
N = 5000; #dataset size
L = N/2;
d=0.1; constant = 5;
x=constant+d*np.random.random(N);
matrix = np.zeros([L,N]);
print "Assigning matrix"
for k in xrange(L):
for i in xrange(k+1):
matrix[k,i] = random.random()
for i in xrange(k+1,N-k-1):
if ( x[i] > x[i-k-1] ) and ( x[i] > x[i+k+1] ):
matrix[k,i] = 0
else:
matrix[k,i] = random.random()
for i in xrange(N-k-1,N):
matrix[k,i] = random.random()
If you are using for loops, you are going to lose the speed in numpy. The way to get speed is to use numpys functions and vectorized operations. Is there a way you can create a random matrix:
matrix = np.random.randn(L,k+1)
Then do something to this matrix to get the 0's positioned you want? Can you elaborate on the condition for setting an entry to 0? For example, you can make the matrix then do:
matrix[matrix > value]
To retain all values above a threshold. If the condition can be expressed as some boolean indexer or arithmetic operation, you can speed it up. If it has to be in the for loop (ie it depends on the values surrounding it as the loop cycles) it may not be able to be vectorized.
Related
I have an very long 2-D numpy array and want to count values inside intervals. I can do it using double loop, however it is very time consuming. Can anyone give me an faster alternative way? I guess it will be better with no loops.
Bellow, there is a simple code exemplifying what I want to do in a fast way.
a = np.random.random([10000, 2])
a[:, 1] += 2 # So we have the first column with values between 0. and 1.,
# and the 2nd column with values between 2. and 3.
for i in range(10):
for j in range(5):
s0 = a[a[:, 0] >= i * 0.1]
s1 = s0[s0[:, 0] < (i+1) * 0.1]
s2 = s1[s1[:, 1] >= 2 + j * 0.2]
s3 = s2[s2[:, 1] < 2 + (j+1) * 0.2]
print(len(s3))
Additional information: I tried using masked arrays, but it did not work because I need to compare an array with lower and higher limits. As much as I know, masked array only allows to compare values inside the numpy arrays with floats, but not with another array.
The operation is inefficient because it creates a lot of temporary arrays and read/write relatively large array over and over: 4 boolean arrays + 4 floating-point arrays per iteration and there are 50 iterations. This means 400 array. Not mention the array needs to be read/written completely over and over. Additionally, creating an array just to count items is not efficient either. You can just use np.count_nonzero instead. Note that printing is slow too but I guess you will not use it in a real-world code.
Additionally, the memory access pattern is not efficient: a[:,0] and a[:,1] are strided views that prevent Numpy to vectorize the code. It also cause twice more data to be read from the memory hierarchy. The transposed version should be proffered (with a copy so to avoid strided views). The transposed array can be precomputed once.
Here is an improved version:
a = np.random.random([10000, 2])
a[:, 1] += 2
x, y = a.T.copy()
for i in range(10):
for j in range(5):
cond = x >= i * 0.1
cond &= x < (i+1) * 0.1
cond &= y >= 2 + j * 0.2
cond &= y < 2 + (j+1) * 0.2
len_s3 = np.count_nonzero(cond)
#print(len_s3)
This is about 6 times faster on my machine. Note booleans array are still created but they are much faster to create and fill since they are 8 times smaller than double-precision floating-point ones in memory. You can use functions like np.logical_and combined with the out parameter so to speed up a bit the computation but the impact is pretty small (most of the cost comes from the internal copy and Numpy internal overheads).
If this is not enough, you can use Numba to speed this up significantly. An alternative solution is to sort the array so to then perform a fast binary search on the sub-parts though it is a bit more tricky to do.
numpy.nanpercentile is extremely slow.
So, I wanted to use cupy.nanpercentile; but there is not cupy.nanpercentile implemented yet.
Do someone have solution for it?
I also had a problem with np.nanpercentile being very slow for my datasets. I found a wokraround that lets you use the standard np.percentile. And it can also be applied to many other libs.
This one should solve your problem. And it also works alot faster than np.nanpercentile:
arr = np.array([[np.nan,2,3,1,2,3],
[np.nan,np.nan,1,3,2,1],
[4,5,6,7,np.nan,9]])
mask = (arr >= np.nanmin(arr)).astype(int)
count = mask.sum(axis=1)
groups = np.unique(count)
groups = groups[groups > 0]
p90 = np.zeros((arr.shape[0]))
for g in range(len(groups)):
pos = np.where (count == groups[g])
values = arr[pos]
values = np.nan_to_num (values, nan=(np.nanmin(arr)-1))
values = np.sort (values, axis=1)
values = values[:,-groups[g]:]
p90[pos] = np.percentile (values, 90, axis=1)
So instead of taking the percentile with the nans, it sorts the rows by the amount of valid data, and takes the percentile of those rows separated. Then adds everything back together. This also works for 3D-arrays, just add y_pos and x_pos instead of pos. And watch out for what axis you are calculating over.
def testset_gen(num):
init=[]
for i in range (num):
a=random.randint(65,122) # Dummy name
b=random.randint(1,100) # Dummy value: 11~100 and 10% of nan
if b<11:
b=np.nan # 10% = nan
init.append([a,b])
return np.array(init)
np_testset=testset_gen(30000000) # 468,751KB
def f1_np (arr, num):
return np.percentile (arr[:,1], num)
# 55.0, 0.523902416229248 sec
print (f1_np(np_testset[:,1], 50))
def cupy_nanpercentile (arr, num):
return len(cp.where(arr > num)[0]) / (len(arr) - cp.sum(cp.isnan(arr))) * 100
# 55.548758317136446, 0.3640251159667969 sec
# 43% faster
# If You need same result, use int(). But You lose saved time.
print (cupy_nanpercentile(cp_testset[:,1], 50))
I can't imagine How test result takes few days. With my computer, It seems 1 Trillion line of data or more. Because of this, I can't reproduce same problem due to lack of resource.
Here's an implementation with numba. After it's been compiled it is more than 7x faster than the numpy version.
Right now it is set up to take the percentile along the first axis, however it could be changed easily.
#numba.jit(nopython=True, cache=True)
def nan_percentile_axis0(arr, percentiles):
"""Faster implementation of np.nanpercentile
This implementation always takes the percentile along axis 0.
Uses numba to speed up the calculation by more than 7x.
Function is equivalent to np.nanpercentile(arr, <percentiles>, axis=0)
Params:
arr (np.array): Array to calculate percentiles for
percentiles (np.array): 1D array of percentiles to calculate
Returns:
(np.array) Array with first dimension corresponding to
values as passed in percentiles
"""
shape = arr.shape
arr = arr.reshape((arr.shape[0], -1))
out = np.empty((len(percentiles), arr.shape[1]))
for i in range(arr.shape[1]):
out[:,i] = np.nanpercentile(arr[:,i], percentiles)
shape = (out.shape[0], *shape[1:])
return out.reshape(shape)
I calculated the elements by double for loop as follows.
N,l=20,10
a=np.random.rand(N,l)
b=np.random.rand(N,l)
r=np.zeros((N,N,l))
for i in range(N):
for j in range(N):
r[i,j]=a[i]*a[j]*(b[i]-b[j])-a[i]/a[j]
Question:
How to vectorize the array and calculate it with broadcasting?
I also want to set the index inot equalsj, which means leave the diagonal element as zero. Can I do that also by vectorization?
You can broadcast all of the arithmetic and remove the loop.s
r2 = (a[:,None]*a) * (b[:,None]-b) - (a[:,None]/a)
# Verify the correctness
np.array_equal(r, r2)
# True
Finally, to set diagonals to zero, either use in-place assignment
r2[(np.arange(N),)*2] = 0
Or, numpy.fill_diagonal, which also fills in-place.
np.fill_diagonal(r2, 0)
I have a given array with a length of over 1'000'000 and values between 0 and 255 (included) as integers. Now I would like to plot on the x-axis the integers from 0 to 255 and on the y-axis the quantity of the corresponding x value in the given array (called Arr in my current code).
I thought about this code:
list = []
for i in range(0, 256):
icounter = 0
for x in range(len(Arr)):
if Arr[x] == i:
icounter += 1
list.append(icounter)
But is there any way I can do this a little bit faster (it takes me several minutes at the moment)? I thought about an import ..., but wasn't able to find a good package for this.
Use numpy.bincount for this task (look for more details here)
import numpy as np
list = np.bincount(Arr)
While I completely agree with the previous answers that you should use a standard histogram algorithm, it's quite easy to greatly speed up your own implementation. Its problem is that you pass through the entire input for each bin, over and over again. It would be much faster to only process the input once, and then write only to the relevant bin:
def hist(arr):
nbins = 256
result = [0] * nbins # or np.zeroes(nbins)
for y in arr:
if y>=0 and y<nbins:
result[y] += 1
return result
I've read a lot about different techniques for iterating over numpy arrays recently and it seems that consensus is not to iterate at all (for instance, see a comment here). There are several similar questions on SO, but my case is a bit different as I have to combine "iterating" (or not iterating) and accessing previous values.
Let's say there are N (N is small, usually 4, might be up to 7) 1-D numpy arrays of float128 in a list X, all arrays are of the same size. To give you a little insight, these are data from PDE integration, each array stands for one function, and I would like to apply a Poincare section. Unfortunately, the algorithm should be both memory- and time-efficient since these arrays are sometimes ~1Gb each, and there are only 4Gb of RAM on board (I've just learnt about memmap'ing of numpy arrays and now consider using them instead of regular ones).
One of these arrays is used for "filtering" the others, so I start with secaxis = X.pop(idx). Now I have to locate pairs of indices where (secaxis[i-1] > 0 and secaxis[i] < 0) or (secaxis[i-1] < 0 and secaxis[i] > 0) and then apply simple algebraic transformations to remaining arrays, X (and save results). Worth mentioning, data shouldn't be wasted during this operation.
There are multiple ways for doing that, but none of them seem efficient (and elegant enough) to me. One is a C-like approach, where you just iterate in a for-loop:
import array # better than lists
res = [ array.array('d') for _ in X ]
for i in xrange(1,secaxis.size):
if condition: # see above
co = -secaxis[i-1]/secaxis[i]
for j in xrange(N):
res[j].append( (X[j][i-1] + co*X[j][i])/(1+co) )
This is clearly very inefficient and besides not a Pythonic way.
Another way is to use numpy.nditer, but I haven't figured out yet how one accesses the previous value, though it allows iterating over several arrays at once:
# without secaxis = X.pop(idx)
it = numpy.nditer(X)
for vec in it:
# vec[idx] is current value, how do you get the previous (or next) one?
Third possibility is to first find sought indices with efficient numpy slices, and then use them for bulk multiplication/addition. I prefer this one for now:
res = []
inds, = numpy.where((secaxis[:-1] < 0) * (secaxis[1:] > 0) +
(secaxis[:-1] > 0) * (secaxis[1:] < 0))
coefs = -secaxis[inds] / secaxis[inds+1] # array of coefficients
for f in X: # loop is done only N-1 times, that is, 3 to 6
res.append( (f[inds] + coefs*f[inds+1]) / (1+coefs) )
But this is seemingly done in 7 + 2*(N - 1) passes, moreover, I'm not sure about secaxis[inds] type of addressing (it is not slicing and generally it has to find all elements by indices just like in the first method, doesn't it?).
Finally, I've also tried using itertools and it resulted in monstrous and obscure structures, which might stem from the fact that I'm not very familiar with functional programming:
def filt(x):
return (x[0] < 0 and x[1] > 0) or (x[0] > 0 and x[1] < 0)
import array
from itertools import izip, tee, ifilter
res = [ array.array('d') for _ in X ]
iters = [iter(x) for x in X] # N-1 iterators in a list
prev, curr = tee(izip(*iters)) # 2 similar iterators, each of which
# consists of N-1 iterators
next(curr, None) # one of them is now for current value
seciter = tee(iter(secaxis))
next(seciter[1], None)
for x in ifilter(filt, izip(seciter[0], seciter[1], prev, curr)):
co = - x[0]/x[1]
for r, p, c in zip(res, x[2], x[3]):
r.append( (p+co*c) / (1+co) )
Not only this looks very ugly, it also takes an awful lot of time to complete.
So, I have following questions:
Of all these methods is the third one indeed the best? If so, what can be done to impove the last one?
Are there any other, better ones yet?
Out of sheer curiosity, is there a way to solve the problem using nditer?
Finally, will I be better off using memmap versions of numpy arrays, or will it probably slow things down a lot? Maybe I should only load secaxis array into RAM, keep others on disk and use third method?
(bonus question) List of equal in length 1-D numpy arrays comes from loading N .npy files whose sizes aren't known beforehand (but N is). Would it be more efficient to read one array, then allocate memory for one 2-D numpy array (slight memory overhead here) and read remaining into that 2-D array?
The numpy.where() version is fast enough, you can speedup it a little by method3(). If the > condition can change to >=, you can also use method4().
import numpy as np
a = np.random.randn(100000)
def method1(a):
idx = []
for i in range(1, len(a)):
if (a[i-1] > 0 and a[i] < 0) or (a[i-1] < 0 and a[i] > 0):
idx.append(i)
return idx
def method2(a):
inds, = np.where((a[:-1] < 0) * (a[1:] > 0) +
(a[:-1] > 0) * (a[1:] < 0))
return inds + 1
def method3(a):
m = a < 0
p = a > 0
return np.where((m[:-1] & p[1:]) | (p[:-1] & m[1:]))[0] + 1
def method4(a):
return np.where(np.diff(a >= 0))[0] + 1
assert np.allclose(method1(a), method2(a))
assert np.allclose(method2(a), method3(a))
assert np.allclose(method3(a), method4(a))
%timeit method1(a)
%timeit method2(a)
%timeit method3(a)
%timeit method4(a)
the %timeit result:
1 loop, best of 3: 294 ms per loop
1000 loops, best of 3: 1.52 ms per loop
1000 loops, best of 3: 1.38 ms per loop
1000 loops, best of 3: 1.39 ms per loop
I'll need to read your post in more detail, but will start with some general observations (from previous iteration questions).
There isn't an efficient way of iterating over arrays in Python, though there are things that slow things down. I like to distinguish between the iteration mechanism (nditer, for x in A:) and the action (alist.append(...), x[i+1] += 1). The big time consumer is usually the action, done many times, not the iteration mechanism itself.
Letting numpy do the iteration in compiled code is the fastest.
xdiff = x[1:] - x[:-1]
is much faster than
xdiff = np.zeros(x.shape[0]-1)
for i in range(x.shape[0]:
xdiff[i] = x[i+1] - x[i]
The np.nditer isn't any faster.
nditer is recommended as a general iteration tool in compiled code. But its main value lies in handling broadcasting and coordinating the iteration over several arrays (input/output). And you need to use buffering and c like code to get the best speed from nditer (I'll look up a recent SO question).
https://stackoverflow.com/a/39058906/901925
Don't use nditer without studying the relevant iteration tutorial page (the one that ends with a cython example).
=========================
Just judging from experience, this approach will be fastest. Yes it's going to iterate over secaxis a number of times, but those are all done in compiled code, and will be much faster than any iteration in Python. And the for f in X: iteration is just a few times.
res = []
inds, = numpy.where((secaxis[:-1] < 0) * (secaxis[1:] > 0) +
(secaxis[:-1] > 0) * (secaxis[1:] < 0))
coefs = -secaxis[inds] / secaxis[inds+1] # array of coefficients
for f in X:
res.append( (f[inds] + coefs*f[inds+1]) / (1+coefs) )
#HYRY has explored alternatives for making the where step faster. But as you can see the differences aren't that big. Other possible tweaks
inds1 = inds+1
coefs = -secaxis[inds] / secaxis[inds1]
coefs1 = coefs+1
for f in X:
res.append(( f[inds] + coefs*f[inds1]) / coefs1)
If X was an array, res could be an array as well.
res = (X[:,inds] + coefs*X[:,inds1])/coefs1
But for small N I suspect the list res is just as good. Don't need to make the arrays any bigger than necessary. The tweaks are minor, just trying to avoid recalculating things.
=================
This use of np.where is just np.nonzero. That actually makes two passes of the array, once with np.count_nonzero to determine how many values it will return, and create the return structure (list of arrays of now known length). And a second loop to fill in those indices. So multiple iterations are fine if it keeps action simple.