Speeding up for loops, maybe using a generator? - python

I'm fairly new to python, and I have a problem where I am trying to count how many solutions there are to an equation, such as Ta + Nb + Mc +Pd =e where e is inputted. I don't care what the solutions are, just the quantity.
Abcd are variable positive integers and NMPT are fixed integers
I know it's a rookie error, but I tried 4 nested for loops and it took far too long so I abandoned that, but couldn't think of a more elegant way. Even when I eliminated potential numbers from being allowed the loops I still ended up with a larger computing time.
I have read about generators taking vastly less time but I am unsure how to use them properly, I managed to get the time down to a minute or two but want it quicker using a function with yield in.
Something like, not exactly this but to this extent, and yes I know nesting loops is unfavourable, but Im a novice and trying to learn.
def function():
count = 0
for a is in range (0,e)
for b is in range (0,int(e/N))
Another for loop
Another for loop
count += 1
yield count
And outputting that, it gave me quicker results but not quick enough.
Or am I thinking about this in entirely the wrong way?
Thanks

This is a class of problem where a better algorithm will yield far superior performance gains than changing how the code works.
So the problem you have is, given T, N, M, P, and e find how many solutions there are.
Now something like yield would work for the case where you need all the solutions... getting all the solutions is going to involve enumerating all the solutions, which is going to be 4 nested loops... yield could help for that case.
Counting solutions allows us to find tricks to reduce how much we need to walk...
So let's start with the outermost loop
for a in range(1, ?)
How high can the range go? well we know that for a solution to be valid all of a, b, c, and d must be positive, i.e. >= 1 so we could have the highest a value if T*a + N*1 + M*1 + P*1 == e... hence the upper bound for a is int((e - N - M - P) / T)
for a in range(1, int((e - N - M - P) / T))
for b in range(1, ?)
How high can the range for b go? Well we know that we have T*a already...
for a in range(1, int((e - N - M - P) / T))
for b in range(1, int((e - T*a - M - P) / N))
for c in range(1, ?)
How high can the range for c go? Same principle...
for a in range(1, int((e - N - M - P) / T))
for b in range(1, int((e - T*a - M - P) / N))
for c in range(1, int((e - T*a - N*b - P) / M))
?
Now at this point you might be tempted to do another for loop... but here is where we need to be smart, avoid the last loop if we can... because the upper limit of the range is actually the count of valid solutions!
count = 0
for a in range(1, int((e - N - M - P) / T))
for b in range(1, int((e - T*a - M - P) / N))
for c in range(1, int((e - T*a - N*b - P) / M))
count = count + int((e - T*a - N*b - M*c) / P)
That is a superior algorithm, as it has fewer loops and will consequently return faster...
Oh but there is more... if you know mathematics, and if I recall correctly you can certainly remove another loop, if not remove all of them... but this is where you need to actually know mathematics rather than just brute-forcing the solution

The usual way to speed-up nested for-loops in to use itertools.product() to generate all of parameter values and itertools.starmap() to apply the parameters to a function:
Instead of:
for a in range(5):
for b in range(8):
for c in range(10, 17):
for d in range(5, 11):
v = f(a, b, c, d)
...
Write this instead:
for v in starmap(f, product(range(5), range(8), range(10,17), range(5,11))):
...
The benefits are:
More concise functional style
The integer values are created just once
You don't constantly rebuild the same integer lists
Only one tuple is allocated by product() and it is reused
Both starmap() and product() run at C speed (no pure python steps)
The function "f" is only looked up once.

Related

Finding if the lcm of a list is in the set 3^d

I need to find a better way to raise the 'tes' variable in python till I can get the print out to terminate.
(Running rather slow at 700)
I propose the need for a better way of defining the 'shifts' list, and reducing redundancies since the problem is symmetric lcm(shifts(n,m))=lcm(shift(mxn))
N, M, k =3,3,0
tes =700
sides=[]
def gcd(n, m):
if m == 0:
return n
return gcd(m, n % m)
for N in range(tes+1):
M=3
for M in range(tes+1):
n, m, shifts = N, M, []
while ma.floor(min (n,m)/2)>= 1:
shifts.append(2*n+2*m-4)
n-=2
m-=2
#lcm of shifts list
lcm = 1
for i in shifts:
lcm = lcm * i // gcd(lcm, i)
p = ma.log(lcm) / ma.log(3)
# checking to see if power lcm is in {3^d} where d in Natural numbers
if (p - int(p) == 0) and lcm>1:
print(shifts, N, M,lcm)
M+= 1
N+ 1
So far I have attempted to come up with an equation to make composing the list more efficient (so far this pythonic manner seems to be working better). Was working with sympy but that became to cumbersome dealing with sympy.products to try and directly find the lcm of the list flat out because I couldn't get the bounds correct.
-If I could find a way to avoid redundancies, since lcm is the same f(m,n)=f(n,m).
-Need list comprehension for the shifts if not a basic formula
-Been Looking into mathematical ways to prove if the two sets (3^d,lcm(shifts)) intersect for some natural numbers, but I need to transform lcm(shifts(n,m)) into an analytic function from the current programmatic/numerical method.
-Any resources are also helpful, because this is just a segment for the total project and any further reading I'm sure will be a help in the future for the overall project.

Maximize the spread of all nurses in Scheduling

model.Maximize(
sum(shift_requests[n][d] * shifts[( n, d)] for n in all_nurses
for d in all_days))
Curious how I could change the above (which is optimizing for shift requests) to the below, which would optimize for the spread. I'm trying to actually spread out the assignments as much as possible. Thoughts?
model.Maximize(
np.std(shifts[( n, d)] for n in all_nurses
for d in all_days))
You can try the following
You have a list of Booleans: b(i) means mean span between 2 working days is greater or equal than i
you need to ensure consistency between b(i) variables: b(i) => b(i-1)
if nurse n is working on day d, and b(i) is true, then work[n, d + i - 1] is false. Encoded as model.AddBoolOr(work[n, d].Not(), b(i).Not(), work[n, d + i - 1].Not()] for all the relevant n, d, i.
maximize i where b(i) is true. A crude solution would be just model.Maximize(sum(b(i))). Maybe this can be improved.

How do I implement summation and array iteration correctly based on Pseudo code. PYTHON Relaxation Method

I am trying to implement relaxation iterative solver for a project. The function we create should intake two inputs: Matrix A, and Vector B, and should return iterative vectors X that Approximate solution Ax = b.
Pseudo Code from the book is here:
enter image description here
I am new to Python so I am struggling quite a bit with implementing this method. Here is my code:
def SOR_1(A,b):
k=1
n = len(A)
xo = np.zeros_like(b)
x = np.zeros_like(b)
omega = 1.25
while (k <= N):
for i in range(n-1):
x[i] = (1.0-omega)*xo[i] + (1.0/A[i][i])[omega(-np.sum(A[i][j]*x[j]))
-np.sum(A[i][j]*xo[j] + b[i])]
if ( np.linalg.norm(x - xo) < 1e-9):
print (x)
k = k + 1.0
for i in range(n-1):
xo[i] = x[i]
return x
My question is how do I implement the for loop and generating the arrays correctly based off of the Pseudo Code.
Welcome to Python!
Variables in Python are case sensitive so n is defined but N is not defined. If they are supposed to be different variables, I don't see what your value is for N.
You are off to a good start but the following line is still psuedocode for the most part:
x[i] = (1.0-omega)*xo[i] + (1.0/A[i][i])[omega(-np.sum(A[i][j]*x[j]))
-np.sum(A[i][j]*xo[j] + b[i])]
In the textbook's pseudocode square brackets are being used as a grouping symbol but in Python, they are reserved for creating and accessing lists (which is what python calls arrays). Also, there is no implicit multiplication in Python so you have to write things like (1 + 2)*(3*(4+5)) rather than (1 + 2)[3(4+5)]
The other major issue is that you don't define j. You probably need a for loop that would either look like:
for j in range(1, i):
or if you want to do it inline
sum(A[i][j]*x[j] for j in range(1, i))
Note that range has two arguments, where to start and what value to stop before so range(1, i) is equivalent to the summation from 1 to i - 1
I think you are struggling with that line because there's far too much going on in that line. See if you can figure out parts of it using separate variables or offload some of the work to separate functions.
something like: x[i] =a + b * c * d() - e() but give a,b c, d and e meaningful names. You'd then have to correctly set each variable and define each function but at least you are trying to solve separate problems rather than one huge complex one.
Also, make sure you have your tabs correct. k = k + 1.0 should not be inside that for loop, just inside the while loop.
Coding is an iterative process. First get the while loop working. Don't try to do anything in it (except print out the variable so you can see that it is working). Next get the for loop working inside the while loop (again, just printing the variables). Next get (1.0-omega)*xo[i] working. Along the way, you'll discover and solve issues such as (1.0-omega)*xo[i] will evaluate to 0 because xo is a NumPy list initiated with all zeros.
You'd start with something like:
k = 1
N = 3
n = 3
xo = [1, 2, 3]
while (k <= N):
for i in range(n):
print(k, i)
omega = 1.25
print((1.0-omega)*xo[i])
k += 1
And slowly work more and more of the relaxation solver in until you have everything working.

Fast updating sum of squared residuals

I'd like to find a fast way to update a sum of squared residuals, when I know that only a small fraction of the terms are changing. Let me describe the problem in more detail.
I have N data points from noisy step-function data.
N = 100000
realStepList = [200, 500, 900]
x = np.zeros(N)
for realStep in realStepList:
x[realStep:] += 1
x+=np.random.randn(len(x))*0.1 #Add noise
I'd like to calculate the sum of squared residuals for this data and an arbitrary list of step locations. Here is how I do this.
a = [0, 250, 550, N]
def Q(x, a):
q = np.sum([np.sum((x[ai:af] - i)**2) for i, (ai,af) in enumerate(zip(a[:-1],a[1:]))])
return q
a is my list of potential steps. It's easier to use a list that always has 0 as the first element and N as the last element.
This is relatively slow, since it is a sum over N squares. However, I realized that if I change a by a relatively small amount, most of these N terms will remain unchanged, which means I don't have to compute them again.
So let's say I have already computed Q(x,a) as above. I now have another list
b = [aa + dd for aa, dd in zip(a, d)]
where d is the difference between the two lists. Rather than calculating Q(x,b) as above (another sum over N elements), I want to find
deltaQ(x, a, d) such that
Q(x, b) = Q(x,a) + deltaQ(x, a, d)
I have written such a function, but it is slow and sloppy. In fact, it is slower than Q!
def deltaQ(x, a, d):
z = np.zeros(len(x))
J = np.zeros(len(x))
s = 0
for j, [dd, aa] in enumerate(zip(d, a[1:-1])):
if dd >= 0:
z[aa:aa+dd] += 1
s += sum(x[aa:aa+dd])
if dd < 0:
z[aa+dd:aa] += -1
s += -sum(x[aa+dd:aa])
J[aa:] += 1
dq = 2*s - sum((J**2 - (J-z)**2))
return dq
The idea is to identify all the points in x which will be affected. For example, if the original list was a = [0, 5, 10] and b = [0, 7, 10], then only the terms corresponding to x[5:7] will change in the sum. I keep track of this with the list z. I then calculate the change based on this.
I don't think I'm the first person in the world to have this problem. So my question is:
Is there a fast way to calculate the difference in the sum of squared residuals, since this will often be a sum many fewer elements than recalculating the new sum from scratch?
First of all, I was able to run Q with the original code, only modifying N, to get the following timings on a fairly standard issue laptop (nothing too fancy):
N = 1e6: 0.00236s per loop
N = 1e7: 0.0260s per loop
N = 1e8: 0.251 per loop
The process went into swap at N = 1e9, but I would find a timing of 2.5 seconds quite acceptable for that size, assuming you had enough RAM available.
That being said, I was able to get a 10% speedup by changing the inner np.sum to np.ndarray.sum on the result of the call to np.power:
def Q1(x, a):
return sum(((x[ai:af] - i)**2).sum() for i, (ai, af) in enumerate(zip(a[:-1], a[1:])))
Now here is a version that is three times slower:
def offset(x, a):
d = np.zeros(x.shape, dtype=np.int)
d[a[1:-1]] = 1
# Add out=d to make this run 4 times slower
return np.cumsum(d)
def Q2(x, a):
return np.sum((x - offset(x, a))**2)
Why does this help? Well, notice what offset does: it readjusts x to the baseline that you chose. In the long run this does two things. First, you get a much more vectorized solution than the one you are currently proposing. Secondly, it allows you to rewrite your delta function in terms of the different b arrays that you chose instead of having to compute d, which may not even be possible if len(a) != len(b).
The delta is (x - i)2 - (x - i)2. If you expand out all the mess, you get (j - i)(j + i - 2x). j and i being the values of the steps, returned by offset. Not only does this simplify the computation greatly, but j - i is the mask at which you need to compute the deltas:
def deltaQ1(x, a, b):
i = offset(x, a)
j = offset(x, b)
d = j - i
mask = d.astype(np.bool)
return (d[mask] * (j[mask] + i[mask] - 2 * x[mask])).sum()
This function runs more than 10 to 15 times faster than your original implementation (but keep in mind that it takes a and b instead of a and d as inputs). Calling Q1(x, b) - Q1(x, a) is still twice as fast though. The new function also creates a bunch of temporary arrays, but these can be easily reduced in quantity.
Timings
Here are some sample timings on my computer, in addition to the ones shown above (using the data provided, and a = [0, 250, 550, N], b = [0, 180, 565, N] and therefore d = [0, -70, 15, 0], where relevant:
Raw residuals:
Q: 147µs per loop
Q1: 135µs per loop <-- Use this one!
Q2: 453µs per loop
Delta of residuals:
deltaQ: 8363µs per loop
deltaQ1: 656µs per loop
Q(x, b) - Q(x, a): 297µs per loop
Q1(x, b) - Q1(x, a): 275µs per loop <-- Best solution?
Final note: I have the distinct impression that your original implementation of the delta function is not correct. It does not agree with the result of Q(x, b) - Q(x, a), but deltaQ1(x, a, b) does.
TL;DR
Please don't optimize prematurely. If you do it right, it is of course possible to write a specialized C function to hold i - j and i + j in memory for you which will work much faster, but I doubt you will get much mileage out of a vectorized pipeline. Part of the reason is that you will end up spending a lot of time figuring out how a complex set of indices intermeshes instead of just adding numbers together.

Low Autocorrelation Binary Sequence problem? Python troubleshooting

I'm trying to model this problem (for details on it, http://www.mpi-hd.mpg.de/personalhomes/bauke/LABS/index.php)
I've seen that the proven minimum for a sequence of 10 digits is 13. However, my application seems to be getting 12 quite frequently. This implies some kind of error in my program. Is there an obvious error in the way I've modeled those summations in this code?
def evaluate(self):
self.fitness = 10000000000 #horrible practice, I know..
h = 0
for g in range(1, len(self.chromosome) - 1):
c = self.evaluateHelper(g)
h += c**2
self.fitness = h
def evaluateHelper(self, g):
"""
Helper for evaluate function. The c sub g function.
"""
totalSum = 0
for i in range(len(self.chromosome) - g - 1):
product = self.chromosome[i] * self.chromosome[(i + g) % (len(self.chromosome))]
totalSum += product
return totalSum
I can't spot any obvious bug offhand, but you're making things really complicated, so maybe a bug's lurking and hiding somewhere. What about
def evaluateHelper(self, g):
return sum(a*b for a, b in zip(self.chromosome, self.chomosome[g:]))
this should return the same values you're computing in that subtle loop (where I think the % len... part is provably redundant). Similarly, the evaluate method seems ripe for a similar 1-liner. But, anyway...
There's a potential off-by-one issue: the formulas in the article you point to are summing for g from 1 to N-1 included -- you're using range(1, len(...)-1), whereby N-1 is excluded. Could that be the root of the problem you observe?
Your bug was here:
for i in range(len(self.chromosome) - g - 1):
The maximum value for i will be len(self.chromosome) - g - 2, because range is exclusive. Thus, you don't consider the last pair. It's basically the same as your other bug, just in a different place.

Categories