Optimizing a nested for loop - python

I'm trying avoid to use for loops to run my calculations. But I don't know how to do it. I have a matrix w with shape (40,100). Each line holds the position to a wave in a t time. For example first line w[0] is the initial condition (also w[1] for reasons that I will show).
To calculate the next line elements I use, for every t and x on shape range:
w[t+1,x] = a * w[t,x] + b * ( w[t,x-1] + w[t,x+1] ) - w[t-1,x]
Where a and b are some constants based on equation solution (it really doesn't matter), a = 2(1-r), b=r, r=(c*(dt/dx))**2. Where c is the wave speed and dt, dx are related to the increment on x and t direction.
Is there any way to avoid a for loop like:
for t in range(1,nt-1):
for x in range(1,nx-1):
w[t+1,x] = a * w[t,x] + b * ( w[t,x-1] + w[t,x+1] ) - w[t-1,x]
nt and nx are the shape of w matrix.

I assume you're setting w[:,0] and w[:-1] beforehand (to some constants?) because I don't see it in the loop.
If so, you can eliminate for x loop vectorizing this part of code:
for t in range(1,nt-1):
w[t+1,1:-1] = a*w[t,1:-1] + b*(w[t,:-2] + w[t,2:]) - w[t-1,1:-1]

Not really. If you want to do something for every element in your matrix (which you do), you're going to have to operate on each element in some way or another (most obvious way is with a for loop. Less obvious methods will either perform the same or worse).
If you're trying to avoid loops because loops are slow, know that sometimes loops are necessary to solve a certain kind of problem. However, there are lots of ways to make loops more efficient.
Generally with matrix problems like this where you're looking at the neighboring elements, a good solution is using some kind of dynamic programming or memoization (saving your work so you don't have to repeat calculations frequently). Like, suppose for each element you wanted to take the average of it and all the things around it (this is how blurring images works). Each pixel has 8 neighbors, so the average will be the sum / 9. Well, let's say you save the sums of the columns (save NW + W + SW, N + me + S, NE + E + SE). Well when you go to the next one to the right, just sum the values of your previous middle column, your previous last column, and the values of a new column (the new ones to the right). You just replaced adding 9 numbers with adding 5. In operations that are more complicated than addition, reducing 9 to 5 can mean a huge performance increase.
I looked at what you have to do and I couldn't think of a good way to do something like I just described. But see if you can think of something similar.
Also, remember multiplication is much more expensive than addition. So if you had a loop where, for instance, you had to multiply some number by the loop variable, instead of doing 1x, 2x, 3x, ..., you could do (value last time + x).

Related

Speed Up Program Below

I have written this for loop program below where I go through element by element of an array and do some math to those elements. Once the math is calculated it gets stored into another array.
for i in range(0, 1024):
x[i] = a * data[i]+ b * x[(i-1)] + c * x[(i-2)]
So in my program a, b, and c are just scalar numbers. Data and x are arrays. Data has an array size 1024 filled with numbers in each element. X is also an array size 1024 but it's filled with all zeros initially. In order to calculate the new elements of x I use the previous two elements of x. Initially the previous two are 0 and 0 since it takes the last two element from the x array of zeros. I multiply the current element of data by a, the last element of x by b, and the second to last element of x by c. Then I add everything up and save it to the current element of x. Then I do the same thing for every element in data and x.
This loop program works but I was wondering if there is a faster way to do it? Maybe using a combination of numpy functions like cumsum or dot product? Can some one help me maybe make the program faster? Thank you!
Best you could do using recursive method:
x = a * data
coef = np.array([c,b])
for i in range(2, 1024):
x[i] += np.dot(coef, x[i-2:i])
But even better, you can solve this recurrence equation to a closed form solution and apply directly without loop. (This is a basic 2nd order linear equation)
In general, if you want a programm that is fast, Python is not the best option. Python is great for prototyping since it is easy and has a lot of tools, however it is not verry computationally efficient in it's raw form if you compare it to for example C. What I usually do is to use Cython, is is a module for python that let's you convert your script to machiene code (as you do with C) which would greatly increase the speed of the appliation.
It let's you type cast the variables for example:
cdef double a, b, c
When you use a variable in Python the variables has to be checked every single time to make sure what type of variable it is (int, double, string etc). In C, that is not an issue since you have to decide from the start what the variable should be, decreasing the time consumption of the operation.
I would try to transform the for loop in a list comprehension which has much faster processing time in python.

Best way to create a loop for multiplying a matrix by every one of its elements, then summing the results

very new to Python so apologies for the lack of vocabulary/knowledge. I would like to know if there is a better way to achieve what the code below provides. Using the loop I have made, I can generate and append all of the matrices/arrays formed from multiplying matrix A by each and every element within A. The last line of code then sums all of the elements in this array of arrays and prints out the result I want.
The problem is, when I get to about d = 600, I get SIGKILL errors, due to a lack of memory on my computer.
I have considered the mathematics behind it, which included breaking the summation into parts that dealt with different values of indices, but nothing seems to speed it up significantly.
This may be purely a memory-based issue, but I thought I would ask in case there are any Python/code based tips that could help. The code is as follows:
A = numpy.random.randint(0, 4, size=(d, d))
All = []
for n in range(0, d):
for m in range(0, d):
All.append(A*(A[n,m]))
print(numpy.sum(All))
So overall, I achieve the correct result, but due to the large size of the matrices and the number of multiplications, I cannot achieve the required d = 2000 I am looking for without a memory error. Thanks in advance.
You don't need to do looping here and building a new list if all you want is the total sum... what you're doing mathematically comes down to:
total = A.sum() ** 2

A 'simple' boundary value/ initial value p‌r‌o‌b‌l‌e‌m in numpy

tl/dr: I have a numpy boundary/initial value problem and want to see if I'm approaching this the right way. I'm fairly new with numpy. I'm presenting a simplified version of the problem.
I have 2 functions a and b defined for integer values of t and x, which I'm trying to calculate for positive x and t (say up to N ). I want to figure out the best way to do this with numpy.
I have boundary values at t=0 and x=0, a(t,x) depends only on a(t-1,x-1) and b(t-1,x-1) while b(t,x) depends on lots of values of a with smaller t, x . This is what makes it 'simple'. We have
a=1 for t=0 and for x=0.
b=0.1 for for t=0 and b=1 for x=0. At x=t=0, we get b=0.1.
In the interior, a(t,x) = a(t-1,x-1) - b(t-1,x-1).
Now the hard part. b(t,x) = a(t-1,x-1) S(t, t-1) + a(t-2,x-2) S(t,t-2) + ...
where S(t,y) is a sum equal to f(a(t-1,1)) + f(a(t-1,2)) + ... + f(a(t-1,y)) for some function f (If you need something specific, you could assume it's just a + a**2).
So my plan is to do this basically as:
initialize values
loop over t:
update a
loop over y:
define the S(t,y) #each step is vectorizable I think
loop over x:
set b to equal the dot product between vector of S and slice of a.
My question: Is this a reasonable approach - can I cut out any of those loops, or should I take a different tack entirely?
Bonus question: Any likely errors for a numpy newb to make coding this?

Need help vectorizing code or optimizing

I am trying to do a double integral by first interpolating the data to make a surface. I am using numba to try and speed this process up, but it's just taking too long.
Here is my code, with the images needed to run the code located at here and here.
Noting that your code has a quadruple-nested set of for loops, I focused on optimizing the inner pair. Here's the old code:
for i in xrange(K.shape[0]):
for j in xrange(K.shape[1]):
print(i,j)
'''create an r vector '''
r=(i*distX,j*distY,z)
for x in xrange(img.shape[0]):
for y in xrange(img.shape[1]):
'''create an ksi vector, then calculate
it's norm, and the dot product of r and ksi'''
ksi=(x*distX,y*distY,z)
ksiNorm=np.linalg.norm(ksi)
ksiDotR=float(np.dot(ksi,r))
'''calculate the integrand'''
temp[x,y]=img[x,y]*np.exp(1j*k*ksiDotR/ksiNorm)
'''interpolate so that we can do the integral and take the integral'''
temp2=rbs(a,b,temp.real)
K[i,j]=temp2.integral(0,n,0,m)
Since K and img are each about 2000x2000, the innermost statements need to be executed sixteen trillion times. This is simply not practical using Python, but we can shift the work into C and/or Fortran using NumPy to vectorize. I did this one careful step at a time to try to make sure the results will match; here's what I ended up with:
'''create all r vectors'''
R = np.empty((K.shape[0], K.shape[1], 3))
R[:,:,0] = np.repeat(np.arange(K.shape[0]), K.shape[1]).reshape(K.shape) * distX
R[:,:,1] = np.arange(K.shape[1]) * distY
R[:,:,2] = z
'''create all ksi vectors'''
KSI = np.empty((img.shape[0], img.shape[1], 3))
KSI[:,:,0] = np.repeat(np.arange(img.shape[0]), img.shape[1]).reshape(img.shape) * distX
KSI[:,:,1] = np.arange(img.shape[1]) * distY
KSI[:,:,2] = z
# vectorized 2-norm; see http://stackoverflow.com/a/7741976/4323
KSInorm = np.sum(np.abs(KSI)**2,axis=-1)**(1./2)
# loop over entire K, which is same shape as img, rows first
# this loop populates K, one pixel at a time (so can be parallelized)
for i in xrange(K.shape[0]):
for j in xrange(K.shape[1]):
print(i, j)
KSIdotR = np.dot(KSI, R[i,j])
temp = img * np.exp(1j * k * KSIdotR / KSInorm)
'''interpolate so that we can do the integral and take the integral'''
temp2 = rbs(a, b, temp.real)
K[i,j] = temp2.integral(0, n, 0, m)
The inner pair of loops is now completely gone, replaced by vectorized operations done in advance (at a space cost linear in the size of the inputs).
This reduces the time per iteration of the outer two loops from 340 seconds to 1.3 seconds on my Macbook Air 1.6 GHz i5, without using Numba. Of the 1.3 seconds per iteration, 0.68 seconds are spent in the rbs function, which is scipy.interpolate.RectBivariateSpline. There is probably room to optimize further--here are some ideas:
Reenable Numba. I don't have it on my system. It may not make much difference at this point, but easy for you to test.
Do more domain-specific optimization, such as trying to simplify the fundamental calculations being done. My optimizations are intended to be lossless, and I don't know your problem domain so I can't optimize as deeply as you may be able to.
Try to vectorize the remaining loops. This may be tough unless you are willing to replace the scipy RBS function with something supporting multiple calculations per call.
Get a faster CPU. Mine is pretty slow; you can probably get a speedup of at least 2x simply by using a better computer than my tiny laptop.
Downsample your data. Your test images are 2000x2000 pixels, but contain fairly little detail. If you cut their linear dimensions by 2-10x, you'd get a huge speedup.
So that's it for me for now. Where does this leave you? Assuming a slightly better computer and no further optimization work, even the optimized code would take about a month to process your test images. If you only have to do this once, maybe that's fine. If you need to do it more often, or need to iterate on the code as you try different things, you probably need to keep optimizing--starting with that RBS function which consumes more than half the time now.
Bonus tip: your code would be a lot easier to deal with if it didn't have nearly-identical variable names like k and K, nor used j as a variable name and also as a complex number suffix (0j).

Python - Sum of numbers

I am trying to sum all the numbers up to a range, with all the numbers up to the same range.
I am using python:
limit = 10
sums = []
for x in range(1,limit+1):
for y in range(1,limit+1):
sums.append(x+y)
This works just fine, however, because of the nested loops, if the limit is too big it will take a lot of time to compute the sums.
Is there any way of doing this without a nested loop?
(This is just a simplification of something that I need to do to solve a ProjectEuler problem. It involves obtaining the sum of all abundant numbers.)
[x + y for x in xrange(limit + 1) for y in xrange(x + 1)]
This still performs just as many calculations but will do it about twice as fast as a for loop.
from itertools import combinations
(a + b for a, b in combinations(xrange(n + 1, 2)))
This avoids a lot of duplicate sums. I don't know if you want to keep track of those or not.
If you just want every sum with no representation of how you got it then xrange(2*n + 2)
gives you what you want with no duplicates or looping at all.
In response to question:
[x + y for x in set set1 for y in set2]
I am trying to sum all the numbers up
to a range, with all the numbers up to
the same range.
So you want to compute limit**2 sums.
because of the nested loops, if the
limit is too big it will take a lot of
time to compute the sums.
Wrong: it's not "because of the nested loops" -- it's because you're computing a quadratic number of sums, and therefore doing a quadratic amount of work.
Is there any way of doing this without
a nested loop?
You can mask the nesting, as in #aaron's answer, and you can halve the number of sums you compute due to the problem's simmetry (though that doesn't do the same thing as your code), but, to prepare a list with a quadratic number of items, there's absolutely no way to avoid doing a quadratic amount of work.
However, for your stated purpose
obtaining the sum of all abundant
numbers.
you're need an infinite amount of work, since there's an infinity of abundant numbers;-).
I think you have in mind problem 23, which is actually very different: it asks for the sum of all numbers that cannot be expressed as the sum of two abundant numbers. How the summation you're asking about would help you move closer to that solution really escapes me.
I'm not sure if there is a good way not using nested loops.
If I put on your shoes, I'll write as following:
[x+y for x in range(1,limit+1) for y in range(1,limit+1)]

Categories