How to efficiently update np array depending on index and value?

How to efficiently update np array depending on index and value? - python

I have an image of the sun, I found center and radius and now I want to process pixels differently if they are inside or outside the disk. The ideal solution would be to imterpolate the parameters of the processing function, in order to smoothly transition from disk to background.
Here is what I'm doing now:
for index,value in np.ndenumerate(sun_img):
if distance.euclidean(index,center) > radius:
sun_img[index] = processing_function(index,value)
Like this it works but it takes forever to compute the image. I'm sure there is a more efficient way to do that. How would you solve this?
Image shape is around (1000, 1000)
Processing_function is basically not doing anything right now: value += 1
The function should be something like a non-linear "step function" with 0.0 value till radius and 1.0 5px after. something like: _______/''''''''''''''''''''' multiplied by the value of the pixel. The slope should be on the value of the radius. I wanna do this in order to enhance the protuberances

Here's a vectorized way leveraging NumPy broadcasting -
m,n = sun_img.shape
I,J = np.ogrid[:m,:n]
sq_dist = (I - center[0])**2 + (J - center[1])**2
valid_mask = sq_dist > radius**2
Now, for a processing_function that just adds 1 to the valid places, defined by the IF-conditional, do -
sun_img[valid_mask] += 1
If you need to implement a custom operation with processing_function that needs those row, column indices, use np.where to get those indices and then iterate through the valid elements, like so -
r,c = np.where(valid_mask)
for index in zip(r,c):
sun_img[index] = processing_function(index,sun_img[r,c])
If you have a lot of such valid places, then computing r,c might make things slow. In that case, directly use the mask, like so -
for index,value in np.ndenumerate(sun_img):
if valid_mask[index]:
sun_img[index] = processing_function(index,value)
Compared to the original code, the benefit is that we have the conditional values pre-computed before going into the loop. The best way again would be to vectorize processing_function itself so that it works on a bigger chunk of data, but that would depend on its implementation.

Related

Why does np.add.at() return the wrong answer for large arrays?

I have a large data set, statistic, with statistic.shape = (1E10,) that I want to effectively bin (sum) into an array of zeros, out = np.zeros(1E10). Each entry in statistic has a corresponding index, idx, which tells me in which out bin it belongs. The indices are not unique so I cannot use out += statistic[idx] since this will only count the first time a particular index is encountered. Therefore I'm using np.add.at(out, idx, statistic). My problem is that for very large arrays, np.add.at() returns the wrong answer.
Below is an example script that shows this behaviour. The function check_add() should return 1.
import numpy as np
def check_add(N):
N = int(N)
out = np.zeros(N)
np.add.at(out, np.arange(N), np.ones(N))
return np.sum(out)/N
n_arr = [1E3, 1E5, 1E8, 1E10]
for n in n_arr:
print('N = {} (log(N) = {}); output ratio is {}'.format(n, np.log10(n), check_add(n)))
This example returns for me:
N = 1000.0 (log(N) = 3.0); output ratio is 1.0
N = 100000.0 (log(N) = 5.0); output ratio is 1.0
N = 100000000.0 (log(N) = 8.0); output ratio is 1.0
N = 10000000000.0 (log(N) = 10.0); output ratio is 0.1410065408
Can someone explain to me why the function fails for N=1E10?

This is an old bug, NumPy issue 13286. ufunc.at was using a too-small variable for the loop counter. It got fixed a while ago, so update your NumPy. (The fix is present in 1.16.3 and up.)

You're overflowing int32:
1E10 % (np.iinfo(np.int32).max - np.iinfo(np.int32).min + 1) # + 1 for 0
Out[]: 1410065408
There's your weird number (googling that number actually got me to here which is how I figured this out.)
Now, what's happening in your function is a bit more weird. By the documentation of ufunc.at you should just be accumulate-adding the 1 values in the indices that are lower than np.iinfo(np.int32).max and the negative indices above np.iinfo(np.int32).min - but it seems to be 1) working backwards and 2) stopping when it gets to the last overflow. Without digging into the c code I couldn't tell you why, but it's probably a good thing it does - your function would fail silently and with the "correct" mean if it had done things this way, while corrupting your results (having 2 or 3 in those indices and 0 in the middle).

It is most likely due to integer precision indeed. If you play around with the numpy data-type (e.g. you constrain it to an (unsigned) value between 0-255) by setting uint8, you will see that they ratios start declining already for the second array. I do not have enough memory to test it, but setting all dtypes to uint64 as below should help:
def check_add(N):
N = int(N)
out = np.zeros(N,dtype='uint64')
np.add.at(out, np.arange(N,dtype='uint64'), 1)
return np.sum(out)/N
To understand the behavior, I recommend setting dtype='uint8' and checking the behavior for smaller N. So what happens is that the np.arange function creates ascending integers for the vector elements until it reaches the integer limit. It then starts again at 0 and counts up again, so at the beginning (smaller Ns) you get correct sum (although your out vector contains a lot of elements >1 in the positions 0:limit and a lot of elements = 0 beyond the limit). If however you choose N large enough, the elements in your out vector start exceeding the integer limit and start again from 0. As soon as that happens your sum is vastly off. To double-check, realize that the uint8 limit is 255(256 integers) and 256^2=65536. Set N = 65536 with dtype='uint8' and check_add(65536) will return 0.
import numpy as np
def check_add(N):
N = int(N)
out = np.zeros(N,dtype='uint8')
np.add.at(out, np.arange(N,dtype='uint8'), 1)
return np.sum(out)/N
n_arr = [1E1, 1E3, 1E5,65536, 1E7]
for n in n_arr:
print('N = {} (log(N) = {}); output ratio is {}'.format(n, np.log10(n), check_add(n)))
Also note, that you don't need the np.ones vector but can simply replace it by 1, if all you care about is uniformly incrementing everything by 1.

Guessing as I couldn't run it, but could it be a problem that you are exceeding max integer value in python for the last option? Ie exceeds 2147483647.
Use longinteger type instead as per below.
Referring to: [enter link description here][1]https://docs.python.org/2.0/ref/integers.html
Hope this helps. Please let me know if it does work.

Speeding up Numpy Masking

I'm still an amature when it comes to thinking about how to optimize. I have this section of code that takes in a list of found peaks and finds where these peaks,+/- some value, are located in a multidimensional array. It then adds +1 to their indices of a zeros array. The code works well, but it takes a long time to execute. For instance it is taking close to 45min to run if ind has 270 values and refVals has a shape of (3050,3130,80). I understand that its a lot of data to churn through, but is there a more efficient way of going about this?
maskData = np.zeros_like(refVals).astype(np.int16)
for peak in ind:
tmpArr = np.ma.masked_outside(refVals,x[peak]-2,x[peak]+2).astype(np.int16)
maskData[tmpArr.mask == False ] += 1
tmpArr = None
maskData = np.sum(maskData,axis=2)

Approach #1 : Memory permitting, here's a vectorized approach using broadcasting -
# Craate +,-2 limits usind ind
r = x[ind[:,None]] + [-2,2]
# Use limits to get inside matches and sum over the iterative and last dim
mask = (refVals >= r[:,None,None,None,0]) & (refVals <= r[:,None,None,None,1])
out = mask.sum(axis=(0,3))
Approach #2 : If running out of memory with the previous one, we could use a loop and use NumPy boolean arrays and that could be more efficient than masked arrays. Also, we would perform one more level of sum-reduction, so that we would be dragging less data with us when moving across iterations. Thus, the alternative implementation would look something like this -
out = np.zeros(refVals.shape[:2]).astype(np.int16)
x_ind = x[ind]
for i in x_ind:
out += ((refVals >= i-2) & (refVals <= i+2)).sum(-1)
Approach #3 : Alternatively, we could replace that limit based comparison with np.isclose in approach #2. Thus, the only step inside the loop would become -
out += np.isclose(refVals,i,atol=2).sum(-1)

Vectorizing image thresholding with Python/NumPy

I've been trying to find a more efficient way to iterate through an image and split their properties on a threshold. In searching online and discussing with some programming friends they introduced me to the concept of vectorizing (particularly using numpy) a function. After much searching and trial and error, I can't seem to get the hang of it. Can some one give me a link, or suggestion how to make the following code more efficient?
Im = plt.imread(img)
Imarray = np.array(Im)
for line in Imarray:
for pixel in line:
if pixel <= 20000:
dim_sum += pixel
dim_counter += 1
if pixel > 20000:
bright_sum += pixel
bright_counter += 1
bright_mean = bright_sum/bright_counter
dim_mean = dim_sum/dim_counter
Basically, each pixel holds a brightness amount between 0 and 30000 and I'm trying to average all pixels below 20000 and above 20000 respectively. The best way I know how to do this is using for loops (which are slow in python) and search through each pixel with if statements.

NumPy supports and encourages vectorization through its arrays and ufuncs. In your case, you have as input image a NumPy array. So, those comparisons could be done in one-go/ vectorized manner to give us boolean arrays of the same shape as the input array. Those boolean arrays when used for indexing into the input arrays would select the valid elements from it. This is called boolean-indexing and forms a key feature in such a vectorized selection.
Finally, we use NumPy ufunc ndarray.mean that again operates in a vectorized fashion to give us the mean values of the selected elements.
Thus, to put all those into code, we would have -
bright_mean, dim_mean = Im[Im > 20000].mean(), Im[Im <= 20000].mean()
For this particular problem, from code-efficiency point of view, it would make more sense to perform the comparison once. The comparison would give us a boolean array, which could be used twice later on, once as it is and second time being inverted. Thus, alternatively we would have -
mask = Im > 20000
bright_mean, dim_mean = Im[mask].mean(), Im[~mask].mean()

Optimizing a nested for loop

I'm trying avoid to use for loops to run my calculations. But I don't know how to do it. I have a matrix w with shape (40,100). Each line holds the position to a wave in a t time. For example first line w[0] is the initial condition (also w[1] for reasons that I will show).
To calculate the next line elements I use, for every t and x on shape range:
w[t+1,x] = a * w[t,x] + b * ( w[t,x-1] + w[t,x+1] ) - w[t-1,x]
Where a and b are some constants based on equation solution (it really doesn't matter), a = 2(1-r), b=r, r=(c*(dt/dx))**2. Where c is the wave speed and dt, dx are related to the increment on x and t direction.
Is there any way to avoid a for loop like:
for t in range(1,nt-1):
for x in range(1,nx-1):
w[t+1,x] = a * w[t,x] + b * ( w[t,x-1] + w[t,x+1] ) - w[t-1,x]
nt and nx are the shape of w matrix.

I assume you're setting w[:,0] and w[:-1] beforehand (to some constants?) because I don't see it in the loop.
If so, you can eliminate for x loop vectorizing this part of code:
for t in range(1,nt-1):
w[t+1,1:-1] = a*w[t,1:-1] + b*(w[t,:-2] + w[t,2:]) - w[t-1,1:-1]

Not really. If you want to do something for every element in your matrix (which you do), you're going to have to operate on each element in some way or another (most obvious way is with a for loop. Less obvious methods will either perform the same or worse).
If you're trying to avoid loops because loops are slow, know that sometimes loops are necessary to solve a certain kind of problem. However, there are lots of ways to make loops more efficient.
Generally with matrix problems like this where you're looking at the neighboring elements, a good solution is using some kind of dynamic programming or memoization (saving your work so you don't have to repeat calculations frequently). Like, suppose for each element you wanted to take the average of it and all the things around it (this is how blurring images works). Each pixel has 8 neighbors, so the average will be the sum / 9. Well, let's say you save the sums of the columns (save NW + W + SW, N + me + S, NE + E + SE). Well when you go to the next one to the right, just sum the values of your previous middle column, your previous last column, and the values of a new column (the new ones to the right). You just replaced adding 9 numbers with adding 5. In operations that are more complicated than addition, reducing 9 to 5 can mean a huge performance increase.
I looked at what you have to do and I couldn't think of a good way to do something like I just described. But see if you can think of something similar.
Also, remember multiplication is much more expensive than addition. So if you had a loop where, for instance, you had to multiply some number by the loop variable, instead of doing 1x, 2x, 3x, ..., you could do (value last time + x).

A 'simple' boundary value/ initial value p‌r‌o‌b‌l‌e‌m in numpy

tl/dr: I have a numpy boundary/initial value problem and want to see if I'm approaching this the right way. I'm fairly new with numpy. I'm presenting a simplified version of the problem.
I have 2 functions a and b defined for integer values of t and x, which I'm trying to calculate for positive x and t (say up to N ). I want to figure out the best way to do this with numpy.
I have boundary values at t=0 and x=0, a(t,x) depends only on a(t-1,x-1) and b(t-1,x-1) while b(t,x) depends on lots of values of a with smaller t, x . This is what makes it 'simple'. We have
a=1 for t=0 and for x=0.
b=0.1 for for t=0 and b=1 for x=0. At x=t=0, we get b=0.1.
In the interior, a(t,x) = a(t-1,x-1) - b(t-1,x-1).
Now the hard part. b(t,x) = a(t-1,x-1) S(t, t-1) + a(t-2,x-2) S(t,t-2) + ...
where S(t,y) is a sum equal to f(a(t-1,1)) + f(a(t-1,2)) + ... + f(a(t-1,y)) for some function f (If you need something specific, you could assume it's just a + a**2).
So my plan is to do this basically as:
initialize values
loop over t:
update a
loop over y:
define the S(t,y) #each step is vectorizable I think
loop over x:
set b to equal the dot product between vector of S and slice of a.
My question: Is this a reasonable approach - can I cut out any of those loops, or should I take a different tack entirely?
Bonus question: Any likely errors for a numpy newb to make coding this?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.