I am finally (!) switching from coding mostly in Fortran to Python. I have heard that Python enables efficient vectorization. I am wondering how this works. Say I want to do the following:
for each i
skip the first 3 lines
for each j
calculate something
end
calculate average over all j
end
calculate average over all i
This is possible but laborious in Fortran. How can it be done efficiently in Python?
for k in range(i):
if(k>=3):
for z in range(j):
calc3 += (j/3) # Replace (j/3) with "Something"
calc2+=calc3
calc1+=calc2
sum_i = []
for i in range(<from>, <to>, <step>):
sum_j = []
if not i <= 3:
for j in range(<from>, <to>, <step>):
sum_j.append(<something calculated>)
average_j = sum(sum_j) / len(sum_j)
sum_i.append(<the i related value you want the average from>)
average_i = sum(sum_i) / len(sum_i)
EDIT: This is not a vectorization (more like a translation in Python of the code given)
Related
I am new to python and coding and I have a problem where I need to use nested for loops to solve an equation where i^3 + j^3 + k^3 = N. Where N is 50 to 53 and i, j, k have a range of -800 to 800.
I do not understand how to use a nested for loop for this. Every example I have seen of for loops they are used for creating matrices or arrays.
The code I have tried output a very large amounts of falses.
for i in range (-800,801):
for j in range (-800,801):
for k in range(-800,801):
for N in range (50,54):
print(i**3+j**3+k**3==N,end=" ")
print()
Am I on the right track here? I got a large amount of false outputs so does that mean it ran it and is giving me a every possible outcome? I just need to know the numbers that exists that make the statement true
The nested loops are not your problem, they are already correct.
You need to print the values of i, j, k, and N only if they satisfy the equation. For this purpose of conditional execution of code, Python (as many other programming languages) has if statements.
You learn about them in the official Python tutorial or in any good beginners book about programming in Python, like Think Python or Automate the Boring Stuff with Python.
In your case, you can apply an if statement like this:
Instead of unconditionally printing
print(i**3+j**3+k**3==N,end=" ")
use
if i**3+j**3+k**3==N:
print(i, j, k, N)
I think what you want to do is check if
i^3+j^3+k^3==N
and only print i, j and k if the statement is true.
You can do that by adding:
for i in range(-800, 801):
for j in range(-800, 801):
for k in range(-800, 801):
for N in range(50, 54):
if i**3+j**3+k**3 == N:
print(i, j, k, N)
output:
-796 602 659 51
-796 659 602 51
This will give you the number of times the equation is correct:
count = 0
for i in range(-800,801):
for j in range(-800,801):
for k in range(-800,801):
for N in range(50,54):
if (i**3 + j**3 + k**3) == N:
count+= 1
print(count)
I'm making a script thats does some mathemagical morphology on images (mainly gis rasters). Now, I've implemented erosion and dilation, with opening/closing with reconstruction still on the TODO but thats not the subject here.
My implementation is very simple with nested loops, which I tried on a 10900x10900 raster and it took an absurdly long amount of time to finish, obviously.
Before I continue with other operations, I'd like to know if theres a faster way to do this?
My implementation:
def erode(image, S):
(m, n) = image.shape
buffer = np.full((m, n), 0).astype(np.float64)
for i in range(S, m - S):
for j in range(S, n - S):
buffer[i, j] = np.min(image[i - S: i + S + 1, j - S: j + S + 1]) #dilation is just np.max()
return buffer
I've heard about vectorization but I'm not quite sure I understand it too well. Any advice or pointers are appreciated. Also I am aware that opencv has these morphological operations, but I want to implement my own to learn about them.
The question here is do you want a more efficient implementation because you want to learn about numpy or do you want a more efficient algorithm.
I think there are two obvious things that could be improved with your approach. One is you want to avoid looping on the python level because that is slow. The other is that your taking a maximum of overlapping parts of arrays and you can make it more efficient if you reuse all the effort you put in finding the last maximum.
I will illustrate that with 1d implementations of erosion.
Baseline for comparison
Here is basically your implementation just a 1d version:
def erode(image, S):
n = image.shape[0]
buffer = np.full(n, 0).astype(np.float64)
for i in range(S, n - S):
buffer[i] = np.min(image[i - S: i + S + 1]) #dilation is just np.max()
return buffer
You can make this faster using stride_tricks/sliding_window_view. I.e. by avoiding the loops and doing that at the numpy level.
Faster Implementation
np.lib.stride_tricks.sliding_window_view(arr,2*S+1).min(1)
Notice that it's not quite doing the same since it only starts calculating values once there are 2S+1 values to take the maximum of. But for this illustration I will ignore this problem.
Faster Algorithm
A completely different approach would be to not start calculating the min from scratch but keeping the values ordered and only adding one and removing one when considering the next window one to the right.
Here is a ruff implementation of that:
def smart_erode(arr, m):
n = arr.shape[0]
sd = SortedDict()
for new in arr[:m]:
if new in sd:
sd[new] += 1
else:
sd[new] = 1
for to_remove,new in zip(arr[:-m+1],arr[m:]):
yield sd.keys()[0]
if new in sd:
sd[new] += 1
else:
sd[new] = 1
if sd[to_remove] > 1:
sd[to_remove] -= 1
else:
sd.pop(to_remove)
yield sd.keys()[0]
Notice that an ordered set wouldn't work and an ordered list would have to have a way to remove just one element with a specific value sind you could have repeated values in your array. I am using an ordered dict to store the amount of items present for a value.
A Ruff Benchmark
I want to illustrate how the 3 implementations compare for different window sizes. So I am testing them with an array of 10^5 random integers for different window sizes ranging from 10^3 to 10^4.
arr = np.random.randint(0,10**5,10**5)
sliding_window_times = []
op_times = []
better_alg_times = []
for m in np.linspace(0,10**4,11)[1:].astype('int'):
x = %timeit -o -n 1 -r 1 np.lib.stride_tricks.sliding_window_view(arr,2*m+1).min(1)
sliding_window_times.append(x.best)
x = %timeit -o -n 1 -r 1 erode(arr,m)
op_times.append(x.best)
x = %timeit -o -n 1 -r 1 tuple(smart_erode(arr,2*m+1))
better_alg_times.append(x.best)
print("")
pd.DataFrame({"Baseline Comparison":op_times,
'Faster Implementation':sliding_window_times,
'Faster Algorithm':better_alg_times,
},
index = np.linspace(0,10**4,11)[1:].astype('int')
).plot.bar()
Notice that for very small window sizes the raw power of the numpy implementation wins out but very quickly the amount of work we are saving by not calculating the min from scratch is more important.
I would like to calculate the Teager Energy Kurtosis in a function in Python 3.8. I think this should also work with list comprehension.
I tried it with the following code, but I get an error message that the numpy object is not iterable. The variable data contains a list with measured values from an accelerometer.
def EO(data):
numerator = pow(len(data),2)*sum((pow(((pow(data[i+1],2) - pow(data[i],2))-(sum(pow(data[i+1],2) - pow(data[i],2))/len(data))),4)) for i in range(len(data)-1))
denominator = pow(sum(pow(((pow(data[i+1],2) - pow(data[i],2))-(sum(pow(data[i+1],2) - pow(data[i],2))/len(data))),2) for i in range(len(data)-1)),2)
energy_operator = numerator/denominator
return energy_operator
What is the general approach for implementing such formulas where you have to iterate multiple times, also of course with regard to efficiency. The dataset from which the values are to be calculated contains 133329 entries.
I guess the main problem is that the sum of the denominator contains another sum which has to be formed first. How to do that ?. Without list comprehension I would iterate through the whole dataset twice with a for loop to first get the average value and with that calculate the rest in the second iteration. The readability of this is then of course gone.
Any suggestions are welcome !
Cheers,
Gerrit
EDIT:
This is the working code without using list comprehension:
def EO_5(data):
summe = 0
num_sum = 0
den_sum = 0
for i in range(1,len(data)-1):
summe += pow(data[i],2)-((data[i-1])*(data[i+1]))
ave = summe/len(data)
for i in range(1,len(data)-1):
num_sum += pow((pow(data[i],2)-((data[i-1])*(data[i+1])))-ave,4)
den_sum += pow((pow(data[i],2)-((data[i-1])*(data[i+1])))-ave,2)
numerator = (len(data)-1)*num_sum
denominator = pow(den_sum,2)
return numerator/denominator
sum(pow(data[i+1],2) - pow(data[i],2))
I think that's the (a) problem. The argument to sum is basically just a number, when it should be something list-like (iterable).
The other problem is that this is badly over-golfed. Lines that run on that long are frowned on, etc.
The other other problem is that the math expressed in the two code-blocks you've shared don't seem to match. The first, which isn't working, seems to more closely follow what's in the image you linked, but IDK if that means it's "correct". Do you have a better reference for "Teager Energy Kurtosis".
I haven't tested this in any way, but it's pretty much how I'd simplify the code you said is working.
def EO_5(data):
n = len(data) - 1)
deltas = tuple(
pow(x, 2) - (before * after)
for (before, x, after)
in zip(data[:-2], data[1:-1], data[2:])
)
ave = sum(deltas) / len(data)
num_sum = sum(pow(d - ave, 4) for d in deltas)
den_sum = sum(pow(d - ave, 2) for d in deltas)
numerator = n * num_sum
denominator = pow(den_sum, 2)
return numerator / denominator
If you're having problems with performance, you may be able to get numpy to leverage vector operations to make this even more streamlined, but I have limited experience with that.
Here it is.
def EO_5(data):
ave = (sum([i**2 for i in data[1:-1]])-sum([i*j for i,j in zip(data[:-2],data[2:])]))/len(data)
num = (sum([(j**2-i*k-ave)**4 for i,j,k in zip(data[:-2],data[1:-1],data[2:])]))*(len(data)-1)
den = (sum([(j**2-i*k-ave)**2 for i,j,k in zip(data[:-2],data[1:-1],data[2:])]))**2
return num/den
I am trying to implement relaxation iterative solver for a project. The function we create should intake two inputs: Matrix A, and Vector B, and should return iterative vectors X that Approximate solution Ax = b.
Pseudo Code from the book is here:
enter image description here
I am new to Python so I am struggling quite a bit with implementing this method. Here is my code:
def SOR_1(A,b):
k=1
n = len(A)
xo = np.zeros_like(b)
x = np.zeros_like(b)
omega = 1.25
while (k <= N):
for i in range(n-1):
x[i] = (1.0-omega)*xo[i] + (1.0/A[i][i])[omega(-np.sum(A[i][j]*x[j]))
-np.sum(A[i][j]*xo[j] + b[i])]
if ( np.linalg.norm(x - xo) < 1e-9):
print (x)
k = k + 1.0
for i in range(n-1):
xo[i] = x[i]
return x
My question is how do I implement the for loop and generating the arrays correctly based off of the Pseudo Code.
Welcome to Python!
Variables in Python are case sensitive so n is defined but N is not defined. If they are supposed to be different variables, I don't see what your value is for N.
You are off to a good start but the following line is still psuedocode for the most part:
x[i] = (1.0-omega)*xo[i] + (1.0/A[i][i])[omega(-np.sum(A[i][j]*x[j]))
-np.sum(A[i][j]*xo[j] + b[i])]
In the textbook's pseudocode square brackets are being used as a grouping symbol but in Python, they are reserved for creating and accessing lists (which is what python calls arrays). Also, there is no implicit multiplication in Python so you have to write things like (1 + 2)*(3*(4+5)) rather than (1 + 2)[3(4+5)]
The other major issue is that you don't define j. You probably need a for loop that would either look like:
for j in range(1, i):
or if you want to do it inline
sum(A[i][j]*x[j] for j in range(1, i))
Note that range has two arguments, where to start and what value to stop before so range(1, i) is equivalent to the summation from 1 to i - 1
I think you are struggling with that line because there's far too much going on in that line. See if you can figure out parts of it using separate variables or offload some of the work to separate functions.
something like: x[i] =a + b * c * d() - e() but give a,b c, d and e meaningful names. You'd then have to correctly set each variable and define each function but at least you are trying to solve separate problems rather than one huge complex one.
Also, make sure you have your tabs correct. k = k + 1.0 should not be inside that for loop, just inside the while loop.
Coding is an iterative process. First get the while loop working. Don't try to do anything in it (except print out the variable so you can see that it is working). Next get the for loop working inside the while loop (again, just printing the variables). Next get (1.0-omega)*xo[i] working. Along the way, you'll discover and solve issues such as (1.0-omega)*xo[i] will evaluate to 0 because xo is a NumPy list initiated with all zeros.
You'd start with something like:
k = 1
N = 3
n = 3
xo = [1, 2, 3]
while (k <= N):
for i in range(n):
print(k, i)
omega = 1.25
print((1.0-omega)*xo[i])
k += 1
And slowly work more and more of the relaxation solver in until you have everything working.
I am just getting started with competitive programming and after writing the solution to certain problem i got the error of RUNTIME exceeded.
max( | a [ i ] - a [ j ] | + | i - j | )
Where a is a list of elements and i,j are index i need to get the max() of the above expression.
Here is a short but complete code snippet.
t = int(input()) # Number of test cases
for i in range(t):
n = int(input()) #size of list
a = list(map(int, str(input()).split())) # getting space separated input
res = []
for s in range(n): # These two loops are increasing the run-time
for d in range(n):
res.append(abs(a[s] - a[d]) + abs(s - d))
print(max(res))
Input File This link may expire(Hope it works)
1<=t<=100
1<=n<=10^5
0<=a[i]<=10^5
Run-time on leader-board for C language is 5sec and that for Python is 35sec while this code takes 80sec.
It is an online judge so independent on machine.numpy is not available.
Please keep it simple i am new to python.
Thanks for reading.
For a given j<=i, |a[i]-a[j]|+|i-j| = max(a[i]-a[j]+i-j, a[j]-a[i]+i-j).
Thus for a given i, the value of j<=i that maximizes |a[i]-a[j]|+|i-j| is either the j that maximizes a[j]-j or the j that minimizes a[j]+j.
Both these values can be computed as you run along the array, giving a simple O(n) algorithm:
def maxdiff(xs):
mp = mn = xs[0]
best = 0
for i, x in enumerate(xs):
mp = max(mp, x-i)
mn = min(mn, x+i)
best = max(best, x+i-mn, -x+i+mp)
return best
And here's some simple testing against a naive but obviously correct algorithm:
def maxdiff_naive(xs):
best = 0
for i in xrange(len(xs)):
for j in xrange(i+1):
best = max(best, abs(xs[i]-xs[j]) + abs(i-j))
return best
import random
for _ in xrange(500):
r = [random.randrange(1000) for _ in xrange(50)]
md1 = maxdiff(r)
md2 = maxdiff_naive(r)
if md1 != md2:
print "%d != %d\n%s" % (md1, md2, r)
exit
It takes a fraction of a second to run maxdiff on an array of size 10^5, which is significantly better than your reported leaderboard scores.
"Competitive programming" is not about saving a few milliseconds by using a different kind of loop; it's about being smart about how you approach a problem, and then implementing the solution efficiently.
Still, one thing that jumps out is that you are wasting time building a list only to scan it to find the max. Your double loop can be transformed to the following (ignoring other possible improvements):
print(max(abs(a[s] - a[d]) + abs(s - d) for s in range(n) for d in range(n)))
But that's small fry. Worry about your algorithm first, and then turn to even obvious time-wasters like this. You can cut the number of comparisons to half, as #Brett showed you, but I would first study the problem and ask myself: Do I really need to calculate this quantity n^2 times, or even 0.5*n^2 times? That's how you get the times down, not by shaving off milliseconds.