Efficiently adding two different sized one dimensional arrays - python

I want to add two numpy arrays of different sizes starting at a specific index. As I need to do this couple of thousand times with large arrays, this needs to be efficient, and I am not sure how to do this efficiently without iterating through each cell.
a = [5,10,15]
b = [0,0,10,10,10,0,0]
res = add_arrays(b,a,2)
print(res) => [0,0,15,20,25,0,0]
naive approach:
# b is the bigger array
def add_arrays(b, a, i):
for j in range(len(a)):
b[i+j] = a[j]

You might assign smaller one into zeros array then add, I would do it following way
import numpy as np
a = np.array([5,10,15])
b = np.array([0,0,10,10,10,0,0])
z = np.zeros(b.shape,dtype=int)
z[2:2+len(a)] = a # 2 is offset
res = z+b
print(res)
output
[ 0 0 15 20 25 0 0]
Disclaimer: I assume that offset + len(a) is always less or equal len(b).

Nothing wrong with your approach. You cannot get better asymptotic time or space complexity. If you want to reduce code lines (which is not an end in itself), you could use slice assignment and some other utils:
def add_arrays(b, a, i):
b[i:i+len(a)] = map(sum, zip(b[i:i+len(a)], a))
But the functional overhead should makes this less performant, if anything.
Some docs:
map
sum
zip

It should be faster than Daweo answer, 1.5-5x times (depending on the size ratio between a and b).
result = b.copy()
result[offset: offset+len(a)] += a

Related

faster way to erode/dilate images

I'm making a script thats does some mathemagical morphology on images (mainly gis rasters). Now, I've implemented erosion and dilation, with opening/closing with reconstruction still on the TODO but thats not the subject here.
My implementation is very simple with nested loops, which I tried on a 10900x10900 raster and it took an absurdly long amount of time to finish, obviously.
Before I continue with other operations, I'd like to know if theres a faster way to do this?
My implementation:
def erode(image, S):
(m, n) = image.shape
buffer = np.full((m, n), 0).astype(np.float64)
for i in range(S, m - S):
for j in range(S, n - S):
buffer[i, j] = np.min(image[i - S: i + S + 1, j - S: j + S + 1]) #dilation is just np.max()
return buffer
I've heard about vectorization but I'm not quite sure I understand it too well. Any advice or pointers are appreciated. Also I am aware that opencv has these morphological operations, but I want to implement my own to learn about them.
The question here is do you want a more efficient implementation because you want to learn about numpy or do you want a more efficient algorithm.
I think there are two obvious things that could be improved with your approach. One is you want to avoid looping on the python level because that is slow. The other is that your taking a maximum of overlapping parts of arrays and you can make it more efficient if you reuse all the effort you put in finding the last maximum.
I will illustrate that with 1d implementations of erosion.
Baseline for comparison
Here is basically your implementation just a 1d version:
def erode(image, S):
n = image.shape[0]
buffer = np.full(n, 0).astype(np.float64)
for i in range(S, n - S):
buffer[i] = np.min(image[i - S: i + S + 1]) #dilation is just np.max()
return buffer
You can make this faster using stride_tricks/sliding_window_view. I.e. by avoiding the loops and doing that at the numpy level.
Faster Implementation
np.lib.stride_tricks.sliding_window_view(arr,2*S+1).min(1)
Notice that it's not quite doing the same since it only starts calculating values once there are 2S+1 values to take the maximum of. But for this illustration I will ignore this problem.
Faster Algorithm
A completely different approach would be to not start calculating the min from scratch but keeping the values ordered and only adding one and removing one when considering the next window one to the right.
Here is a ruff implementation of that:
def smart_erode(arr, m):
n = arr.shape[0]
sd = SortedDict()
for new in arr[:m]:
if new in sd:
sd[new] += 1
else:
sd[new] = 1
for to_remove,new in zip(arr[:-m+1],arr[m:]):
yield sd.keys()[0]
if new in sd:
sd[new] += 1
else:
sd[new] = 1
if sd[to_remove] > 1:
sd[to_remove] -= 1
else:
sd.pop(to_remove)
yield sd.keys()[0]
Notice that an ordered set wouldn't work and an ordered list would have to have a way to remove just one element with a specific value sind you could have repeated values in your array. I am using an ordered dict to store the amount of items present for a value.
A Ruff Benchmark
I want to illustrate how the 3 implementations compare for different window sizes. So I am testing them with an array of 10^5 random integers for different window sizes ranging from 10^3 to 10^4.
arr = np.random.randint(0,10**5,10**5)
sliding_window_times = []
op_times = []
better_alg_times = []
for m in np.linspace(0,10**4,11)[1:].astype('int'):
x = %timeit -o -n 1 -r 1 np.lib.stride_tricks.sliding_window_view(arr,2*m+1).min(1)
sliding_window_times.append(x.best)
x = %timeit -o -n 1 -r 1 erode(arr,m)
op_times.append(x.best)
x = %timeit -o -n 1 -r 1 tuple(smart_erode(arr,2*m+1))
better_alg_times.append(x.best)
print("")
pd.DataFrame({"Baseline Comparison":op_times,
'Faster Implementation':sliding_window_times,
'Faster Algorithm':better_alg_times,
},
index = np.linspace(0,10**4,11)[1:].astype('int')
).plot.bar()
Notice that for very small window sizes the raw power of the numpy implementation wins out but very quickly the amount of work we are saving by not calculating the min from scratch is more important.

Find the location(Indices) of N elements in a huge numpy array

I have a set of say, 5 elements,
[21,103,3,10,243]
and a huge Numpy array
[4,5,1,3,5,100,876,89,78......456,64,3,21,245]
with the 5 elements appearing repetitively in the bigger array.
I want to find all the Indices where the elements of the small list appears in the larger array.
The small list will be less than 100 elements long and the large list will be about 10^7 elements long, and so, speed is a concern here. What is the most elegant and the fastest way to do it in python3.x ?
I have tried using np.where() but it works dead slow. Looking for a faster way.
You can put the 100 elements to be found in a set, a hash table.
Then loop through the elements of the huge array asking if the element is in the set.
S = set([21,103,3,10,243])
A = [4,5,1,3,5,100,876,89,78......456,64,3,21,245]
result = []
for i,x in enumerate(A):
if x in S:
result.append(i)
To speed up things, you can optimize like this:
Sort the larger array
Perform binary search (on the larger array) for each number in the smaller array.
Time Complexity
Sorting using numpy.sort(kind='heapsort') will have time complexity n*log(n).
Binary search will have complexity log(n) for each element in the smaller array. Assuming, there are m elements in the smaller array, the total search complexity will be m*log(n).
Overall, this will provide you good optimization.
smaller_array = [21,103,3,10,243]
bigger_array = [4,5,1,3,5,100,876,89,78,456,64,3,21,243,243]
print(bigger_array)
print(smaller_array)
for val in smaller_array:
if val in bigger_array:
c = bigger_array.index(val)
while True:
print(f'{val} is found in bigger_array at index {bigger_array.index(val,c)}')
c = bigger_array.index(val,c)+1
if val not in bigger_array[c:]:
break
smaller_array = [21,103,3,10,243]
bigger_array = [4,5,1,3,5,100,876,89,78,456,64,3,21,243,243]
print(bigger_array)
print(smaller_array)
for val in smaller_array:
if val in bigger_array:
c=0
try:
while True:
c = bigger_array.index(val,c)
print(f'{val} is found in bigger_array at index {c}')
c+=1
except:
pass

amplitude spectrum in Python

I have a given array with a length of over 1'000'000 and values between 0 and 255 (included) as integers. Now I would like to plot on the x-axis the integers from 0 to 255 and on the y-axis the quantity of the corresponding x value in the given array (called Arr in my current code).
I thought about this code:
list = []
for i in range(0, 256):
icounter = 0
for x in range(len(Arr)):
if Arr[x] == i:
icounter += 1
list.append(icounter)
But is there any way I can do this a little bit faster (it takes me several minutes at the moment)? I thought about an import ..., but wasn't able to find a good package for this.
Use numpy.bincount for this task (look for more details here)
import numpy as np
list = np.bincount(Arr)
While I completely agree with the previous answers that you should use a standard histogram algorithm, it's quite easy to greatly speed up your own implementation. Its problem is that you pass through the entire input for each bin, over and over again. It would be much faster to only process the input once, and then write only to the relevant bin:
def hist(arr):
nbins = 256
result = [0] * nbins # or np.zeroes(nbins)
for y in arr:
if y>=0 and y<nbins:
result[y] += 1
return result

Speed up double for loop in numpy

I currently have the following double loop in my Python code:
for i in range(a):
for j in range(b):
A[:,i]*=B[j][:,C[i,j]]
(A is a float matrix. B is a list of float matrices. C is a matrix of integers. By matrices I mean m x n np.arrays.
To be precise, the sizes are: A: mxa B: b matrices of size mxl (with l different for each matrix) C: axb. Here m is very large, a is very large, b is small, the l's are even smaller than b
)
I tried to speed it up by doing
for j in range(b):
A[:,:]*=B[j][:,C[:,j]]
but surprisingly to me this performed worse.
More precisely, this did improve performance for small values of m and a (the "large" numbers), but from m=7000,a=700 onwards the first appraoch is roughly twice as fast.
Is there anything else I can do?
Maybe I could parallelize? But I don't really know how.
(I am not committed to either Python 2 or 3)
Here's a vectorized approach assuming B as a list of arrays that are of the same shape -
# Convert B to a 3D array
B_arr = np.asarray(B)
# Use advanced indexing to index into the last axis of B array with C
# and then do product-reduction along the second axis.
# Finally, we perform elementwise multiplication with A
A *= B_arr[np.arange(B_arr.shape[0]),:,C].prod(1).T
For cases with smaller a, we could run a loop that iterates through the length of a instead. Also, for more performance, it might be a better idea to store those elements into a separate 2D array instead and perform the elementwise multiplication only once after we get out of the loop.
Thus, we would have an alternative implementation like so -
range_arr = np.arange(B_arr.shape[0])
out = np.empty_like(A)
for i in range(a):
out[:,i] = B_arr[range_arr,:,C[i,:]].prod(0)
A *= out

Efficient ways to iterate over several numpy arrays and process current and previous elements?

I've read a lot about different techniques for iterating over numpy arrays recently and it seems that consensus is not to iterate at all (for instance, see a comment here). There are several similar questions on SO, but my case is a bit different as I have to combine "iterating" (or not iterating) and accessing previous values.
Let's say there are N (N is small, usually 4, might be up to 7) 1-D numpy arrays of float128 in a list X, all arrays are of the same size. To give you a little insight, these are data from PDE integration, each array stands for one function, and I would like to apply a Poincare section. Unfortunately, the algorithm should be both memory- and time-efficient since these arrays are sometimes ~1Gb each, and there are only 4Gb of RAM on board (I've just learnt about memmap'ing of numpy arrays and now consider using them instead of regular ones).
One of these arrays is used for "filtering" the others, so I start with secaxis = X.pop(idx). Now I have to locate pairs of indices where (secaxis[i-1] > 0 and secaxis[i] < 0) or (secaxis[i-1] < 0 and secaxis[i] > 0) and then apply simple algebraic transformations to remaining arrays, X (and save results). Worth mentioning, data shouldn't be wasted during this operation.
There are multiple ways for doing that, but none of them seem efficient (and elegant enough) to me. One is a C-like approach, where you just iterate in a for-loop:
import array # better than lists
res = [ array.array('d') for _ in X ]
for i in xrange(1,secaxis.size):
if condition: # see above
co = -secaxis[i-1]/secaxis[i]
for j in xrange(N):
res[j].append( (X[j][i-1] + co*X[j][i])/(1+co) )
This is clearly very inefficient and besides not a Pythonic way.
Another way is to use numpy.nditer, but I haven't figured out yet how one accesses the previous value, though it allows iterating over several arrays at once:
# without secaxis = X.pop(idx)
it = numpy.nditer(X)
for vec in it:
# vec[idx] is current value, how do you get the previous (or next) one?
Third possibility is to first find sought indices with efficient numpy slices, and then use them for bulk multiplication/addition. I prefer this one for now:
res = []
inds, = numpy.where((secaxis[:-1] < 0) * (secaxis[1:] > 0) +
(secaxis[:-1] > 0) * (secaxis[1:] < 0))
coefs = -secaxis[inds] / secaxis[inds+1] # array of coefficients
for f in X: # loop is done only N-1 times, that is, 3 to 6
res.append( (f[inds] + coefs*f[inds+1]) / (1+coefs) )
But this is seemingly done in 7 + 2*(N - 1) passes, moreover, I'm not sure about secaxis[inds] type of addressing (it is not slicing and generally it has to find all elements by indices just like in the first method, doesn't it?).
Finally, I've also tried using itertools and it resulted in monstrous and obscure structures, which might stem from the fact that I'm not very familiar with functional programming:
def filt(x):
return (x[0] < 0 and x[1] > 0) or (x[0] > 0 and x[1] < 0)
import array
from itertools import izip, tee, ifilter
res = [ array.array('d') for _ in X ]
iters = [iter(x) for x in X] # N-1 iterators in a list
prev, curr = tee(izip(*iters)) # 2 similar iterators, each of which
# consists of N-1 iterators
next(curr, None) # one of them is now for current value
seciter = tee(iter(secaxis))
next(seciter[1], None)
for x in ifilter(filt, izip(seciter[0], seciter[1], prev, curr)):
co = - x[0]/x[1]
for r, p, c in zip(res, x[2], x[3]):
r.append( (p+co*c) / (1+co) )
Not only this looks very ugly, it also takes an awful lot of time to complete.
So, I have following questions:
Of all these methods is the third one indeed the best? If so, what can be done to impove the last one?
Are there any other, better ones yet?
Out of sheer curiosity, is there a way to solve the problem using nditer?
Finally, will I be better off using memmap versions of numpy arrays, or will it probably slow things down a lot? Maybe I should only load secaxis array into RAM, keep others on disk and use third method?
(bonus question) List of equal in length 1-D numpy arrays comes from loading N .npy files whose sizes aren't known beforehand (but N is). Would it be more efficient to read one array, then allocate memory for one 2-D numpy array (slight memory overhead here) and read remaining into that 2-D array?
The numpy.where() version is fast enough, you can speedup it a little by method3(). If the > condition can change to >=, you can also use method4().
import numpy as np
a = np.random.randn(100000)
def method1(a):
idx = []
for i in range(1, len(a)):
if (a[i-1] > 0 and a[i] < 0) or (a[i-1] < 0 and a[i] > 0):
idx.append(i)
return idx
def method2(a):
inds, = np.where((a[:-1] < 0) * (a[1:] > 0) +
(a[:-1] > 0) * (a[1:] < 0))
return inds + 1
def method3(a):
m = a < 0
p = a > 0
return np.where((m[:-1] & p[1:]) | (p[:-1] & m[1:]))[0] + 1
def method4(a):
return np.where(np.diff(a >= 0))[0] + 1
assert np.allclose(method1(a), method2(a))
assert np.allclose(method2(a), method3(a))
assert np.allclose(method3(a), method4(a))
%timeit method1(a)
%timeit method2(a)
%timeit method3(a)
%timeit method4(a)
the %timeit result:
1 loop, best of 3: 294 ms per loop
1000 loops, best of 3: 1.52 ms per loop
1000 loops, best of 3: 1.38 ms per loop
1000 loops, best of 3: 1.39 ms per loop
I'll need to read your post in more detail, but will start with some general observations (from previous iteration questions).
There isn't an efficient way of iterating over arrays in Python, though there are things that slow things down. I like to distinguish between the iteration mechanism (nditer, for x in A:) and the action (alist.append(...), x[i+1] += 1). The big time consumer is usually the action, done many times, not the iteration mechanism itself.
Letting numpy do the iteration in compiled code is the fastest.
xdiff = x[1:] - x[:-1]
is much faster than
xdiff = np.zeros(x.shape[0]-1)
for i in range(x.shape[0]:
xdiff[i] = x[i+1] - x[i]
The np.nditer isn't any faster.
nditer is recommended as a general iteration tool in compiled code. But its main value lies in handling broadcasting and coordinating the iteration over several arrays (input/output). And you need to use buffering and c like code to get the best speed from nditer (I'll look up a recent SO question).
https://stackoverflow.com/a/39058906/901925
Don't use nditer without studying the relevant iteration tutorial page (the one that ends with a cython example).
=========================
Just judging from experience, this approach will be fastest. Yes it's going to iterate over secaxis a number of times, but those are all done in compiled code, and will be much faster than any iteration in Python. And the for f in X: iteration is just a few times.
res = []
inds, = numpy.where((secaxis[:-1] < 0) * (secaxis[1:] > 0) +
(secaxis[:-1] > 0) * (secaxis[1:] < 0))
coefs = -secaxis[inds] / secaxis[inds+1] # array of coefficients
for f in X:
res.append( (f[inds] + coefs*f[inds+1]) / (1+coefs) )
#HYRY has explored alternatives for making the where step faster. But as you can see the differences aren't that big. Other possible tweaks
inds1 = inds+1
coefs = -secaxis[inds] / secaxis[inds1]
coefs1 = coefs+1
for f in X:
res.append(( f[inds] + coefs*f[inds1]) / coefs1)
If X was an array, res could be an array as well.
res = (X[:,inds] + coefs*X[:,inds1])/coefs1
But for small N I suspect the list res is just as good. Don't need to make the arrays any bigger than necessary. The tweaks are minor, just trying to avoid recalculating things.
=================
This use of np.where is just np.nonzero. That actually makes two passes of the array, once with np.count_nonzero to determine how many values it will return, and create the return structure (list of arrays of now known length). And a second loop to fill in those indices. So multiple iterations are fine if it keeps action simple.

Categories