Fortran equivalent of python numpy.minimum - python

I am trying to re-write some python code in fortran, specifically the line
separation[a, :] = sum(np.minimum(1 - distances, distances) ** 2)
The important part being the use of np.minimum to take the element-wise minimum of two multi-dimensional arrays. Distances is a (3, N) array of N coordinates (x,y,z). I can't find a similar function in fortran, so I wrote my own using:
do b = 1, N
temp = 0.0
do c = 1, 3
if ((1 - distances(c, b)) .LE. distances(c, b)) then
temp = temp + (1 - distances(c, b)) ** 2
else
temp = temp + (distances(c, b)) ** 2
end if
end do
separation(a, b) = temp
end do
Unsurprisingly this code is very slow, I am not very experienced in fortran yet so any recommendations as to improving this code or suggesting an alternative method would be greatly appreciated.
I thought perhaps a where statement might help, as the following code in python works
separation[a, :] = sum(np.where(1 - distances <= distances, (1 - distances), distances) ** 2)
But while fortran has where statements, they seem to work rather differently to python ones and they don't seem to be much use here.

it is nearly the same. Most fortran intrinsics operate component-wise on arrays ( assuming you have at least fortran95 )
real a(2,4),b(4)
a=reshape([.1,.2,.3,.4,.5,.6,.7,.8],shape(a))
b=sum(min(1-a,a)**2,1)
write(*,'(4f5.2)')b
end
0.05 0.25 0.41 0.13
note fortran's sum would by default sum the whole array.

Related

How do I implement summation and array iteration correctly based on Pseudo code. PYTHON Relaxation Method

I am trying to implement relaxation iterative solver for a project. The function we create should intake two inputs: Matrix A, and Vector B, and should return iterative vectors X that Approximate solution Ax = b.
Pseudo Code from the book is here:
enter image description here
I am new to Python so I am struggling quite a bit with implementing this method. Here is my code:
def SOR_1(A,b):
k=1
n = len(A)
xo = np.zeros_like(b)
x = np.zeros_like(b)
omega = 1.25
while (k <= N):
for i in range(n-1):
x[i] = (1.0-omega)*xo[i] + (1.0/A[i][i])[omega(-np.sum(A[i][j]*x[j]))
-np.sum(A[i][j]*xo[j] + b[i])]
if ( np.linalg.norm(x - xo) < 1e-9):
print (x)
k = k + 1.0
for i in range(n-1):
xo[i] = x[i]
return x
My question is how do I implement the for loop and generating the arrays correctly based off of the Pseudo Code.
Welcome to Python!
Variables in Python are case sensitive so n is defined but N is not defined. If they are supposed to be different variables, I don't see what your value is for N.
You are off to a good start but the following line is still psuedocode for the most part:
x[i] = (1.0-omega)*xo[i] + (1.0/A[i][i])[omega(-np.sum(A[i][j]*x[j]))
-np.sum(A[i][j]*xo[j] + b[i])]
In the textbook's pseudocode square brackets are being used as a grouping symbol but in Python, they are reserved for creating and accessing lists (which is what python calls arrays). Also, there is no implicit multiplication in Python so you have to write things like (1 + 2)*(3*(4+5)) rather than (1 + 2)[3(4+5)]
The other major issue is that you don't define j. You probably need a for loop that would either look like:
for j in range(1, i):
or if you want to do it inline
sum(A[i][j]*x[j] for j in range(1, i))
Note that range has two arguments, where to start and what value to stop before so range(1, i) is equivalent to the summation from 1 to i - 1
I think you are struggling with that line because there's far too much going on in that line. See if you can figure out parts of it using separate variables or offload some of the work to separate functions.
something like: x[i] =a + b * c * d() - e() but give a,b c, d and e meaningful names. You'd then have to correctly set each variable and define each function but at least you are trying to solve separate problems rather than one huge complex one.
Also, make sure you have your tabs correct. k = k + 1.0 should not be inside that for loop, just inside the while loop.
Coding is an iterative process. First get the while loop working. Don't try to do anything in it (except print out the variable so you can see that it is working). Next get the for loop working inside the while loop (again, just printing the variables). Next get (1.0-omega)*xo[i] working. Along the way, you'll discover and solve issues such as (1.0-omega)*xo[i] will evaluate to 0 because xo is a NumPy list initiated with all zeros.
You'd start with something like:
k = 1
N = 3
n = 3
xo = [1, 2, 3]
while (k <= N):
for i in range(n):
print(k, i)
omega = 1.25
print((1.0-omega)*xo[i])
k += 1
And slowly work more and more of the relaxation solver in until you have everything working.

Speeding up for loops, maybe using a generator?

I'm fairly new to python, and I have a problem where I am trying to count how many solutions there are to an equation, such as Ta + Nb + Mc +Pd =e where e is inputted. I don't care what the solutions are, just the quantity.
Abcd are variable positive integers and NMPT are fixed integers
I know it's a rookie error, but I tried 4 nested for loops and it took far too long so I abandoned that, but couldn't think of a more elegant way. Even when I eliminated potential numbers from being allowed the loops I still ended up with a larger computing time.
I have read about generators taking vastly less time but I am unsure how to use them properly, I managed to get the time down to a minute or two but want it quicker using a function with yield in.
Something like, not exactly this but to this extent, and yes I know nesting loops is unfavourable, but Im a novice and trying to learn.
def function():
count = 0
for a is in range (0,e)
for b is in range (0,int(e/N))
Another for loop
Another for loop
count += 1
yield count
And outputting that, it gave me quicker results but not quick enough.
Or am I thinking about this in entirely the wrong way?
Thanks
This is a class of problem where a better algorithm will yield far superior performance gains than changing how the code works.
So the problem you have is, given T, N, M, P, and e find how many solutions there are.
Now something like yield would work for the case where you need all the solutions... getting all the solutions is going to involve enumerating all the solutions, which is going to be 4 nested loops... yield could help for that case.
Counting solutions allows us to find tricks to reduce how much we need to walk...
So let's start with the outermost loop
for a in range(1, ?)
How high can the range go? well we know that for a solution to be valid all of a, b, c, and d must be positive, i.e. >= 1 so we could have the highest a value if T*a + N*1 + M*1 + P*1 == e... hence the upper bound for a is int((e - N - M - P) / T)
for a in range(1, int((e - N - M - P) / T))
for b in range(1, ?)
How high can the range for b go? Well we know that we have T*a already...
for a in range(1, int((e - N - M - P) / T))
for b in range(1, int((e - T*a - M - P) / N))
for c in range(1, ?)
How high can the range for c go? Same principle...
for a in range(1, int((e - N - M - P) / T))
for b in range(1, int((e - T*a - M - P) / N))
for c in range(1, int((e - T*a - N*b - P) / M))
?
Now at this point you might be tempted to do another for loop... but here is where we need to be smart, avoid the last loop if we can... because the upper limit of the range is actually the count of valid solutions!
count = 0
for a in range(1, int((e - N - M - P) / T))
for b in range(1, int((e - T*a - M - P) / N))
for c in range(1, int((e - T*a - N*b - P) / M))
count = count + int((e - T*a - N*b - M*c) / P)
That is a superior algorithm, as it has fewer loops and will consequently return faster...
Oh but there is more... if you know mathematics, and if I recall correctly you can certainly remove another loop, if not remove all of them... but this is where you need to actually know mathematics rather than just brute-forcing the solution
The usual way to speed-up nested for-loops in to use itertools.product() to generate all of parameter values and itertools.starmap() to apply the parameters to a function:
Instead of:
for a in range(5):
for b in range(8):
for c in range(10, 17):
for d in range(5, 11):
v = f(a, b, c, d)
...
Write this instead:
for v in starmap(f, product(range(5), range(8), range(10,17), range(5,11))):
...
The benefits are:
More concise functional style
The integer values are created just once
You don't constantly rebuild the same integer lists
Only one tuple is allocated by product() and it is reused
Both starmap() and product() run at C speed (no pure python steps)
The function "f" is only looked up once.

How to solve a linear system using Three Column Representation output vector as input data for Cholesky?

I am trying to learn some methods in order to solve linear systems with Python. I have implemented a sort of these methods. Now I wish I could test them with large and sparse matrices. In order to do that I began to learn about Three Column Representation method cause I noticed that I am expected to reduce my sparse matrix before inputing it in my method. Three Column Representation seems to be simple but I cant figure out how to use its output as an input of my Cholesky method (for example). How do I use its output (a three column array with, values and references) as an input to my method? Do I need to rewrite my Cholesky method?
Here is my Cholesky method: https://raw.githubusercontent.com/angellicacardozo/linear-algebra-methods/master/P03CHOLE.py
Thank you
Maybe this can help you:
For i = 1 To n
For j = 1 To i
Sum = a(i, j)
For k = 1 To j-1
Sum = Sum - a(i, k) * a(j, k)
If i > j Then
a(i, j) = Sum / a(j, j)
Else If Sum > 0 Then
a(i, i) = Sqrt(Sum)
Else
ERROR

NumPy tensordot MemoryError

I have two matrices -- A is 3033x3033, and X is 3033x20. I am running the following lines (as suggested in the answer to another question I asked):
n, d = X.shape
c = X.reshape(n, -1, d) - X.reshape(-1, n, d)
return np.tensordot(A.reshape(n, n, -1) * c, c, axes=[(0,1),(0,1)])
On the final line, Python simply stops and says "MemoryError". How can I get around this, either by changing some setting in Python or performing this operation in a more memory-efficient way?
Here is a function that does the calculation without any for loops and without any large temporary array. See the related question for a longer answer, complete with a test script.
def fbest(A, X):
""
KA_best = np.tensordot(A.sum(1)[:,None] * X, X, axes=[(0,), (0,)])
KA_best += np.tensordot(A.sum(0)[:,None] * X, X, axes=[(0,), (0,)])
KA_best -= np.tensordot(np.dot(A, X), X, axes=[(0,), (0,)])
KA_best -= np.tensordot(X, np.dot(A, X), axes=[(0,), (0,)])
return KA_best
I profiled the code with your size arrays:
I love sp.einsum by the way. It is a great place to start when speeding up array operations by removing for loops. You can do SOOOO much with one call to sp.einsum.
The advantage of np.tensordot is that it links to whatever fast numerical library you have installed (i.e. MKL). So, tensordot will run faster and in parallel when you have the right libraries installed.
If you replace the final line with
return np.einsum('ij,ijk,ijl->kl',A,c,c)
you avoid creating the A.reshape(n, n, -1) * c (3301 by 3301 by 20) intermediate that I think is your main problem.
My impression is that the version I give is probably slower (for cases where it doesn't run out of memory), but I haven't rigourously timed it.
It's possible you could go further and avoid creating c, but I can't immediately see how to do it. It'd be a case of following writing the whole thing in terms of sums of matrix indicies and seeing what it simplified to.
You can employ a two-nested loop format iterating along the last dimension of X. Now, that last dimension is 20, so hopefully it would still be efficient enough and more importantly leave minimum memory footprint. Here's the implementation -
n, d = X.shape
c = X.reshape(n, -1, d) - X.reshape(-1, n, d)
out = np.empty((d,d)) # d is a small number: 20
for i in range(d):
for j in range(d):
out[i,j] = (A*c[:,:,i]*(c[:,:,j])).sum()
return out
You can replace the last line with np.einsum -
out[i,j] = np.einsum('ij->',A*c[:,:,i]*c[:,:,j])

Define a matrix depending on variable in Mathematica

I am translating my code from Python to Mathematica. I am trying to define a matrix, whose values depend on a variable chosen by the user, called kappa.
In Python the code looked like that:
def getA(kappa):
matrix = zeros((n, n), float)
for i in range(n):
for j in range(n):
matrix[i][j] = 2*math.cos((2*math.pi/n)*(abs(j-i))*kappa)
n = 5
return matrix
What I have done so far in Mathematica is the following piece of code:
n = 5
getA[kappa_] :=
A = Table[0.0, {n}, {n}];
For[i = 0, i < n, i++,
For[ j = 0, j < n, j++,
A[[i, j]] = 2*Cos[(2*pi/n)*(abs (j - i))*kappa]]];
b = getA[3]
But when I try to evaluate this matrix for a value of kappa equal to 3, I get the following error:
Set::partd: "Part specification A[[i,j]] is longer than depth of object.
How can I fix it?
Try something like this
n = 5;
A = Table[2*Cos[(2 \[Pi]/n) (Abs[ j - i]) \[Kappa]], {i, 1, n}, {j, 1, n}];
b = A /. \[Kappa]->3
I'll leave you to package this into a function if you want to.
You write that you are trying to translate Python into Mathematica; your use of For loops suggests that you are trying to translate to C-in-Mathematica. The first rule of Mathematica club is don't use loops.
Besides that you've made a number of small syntactical errors, such as using abs() where you should have had Abs[] (Mathematica's built-in functions all have names beginning with a capital letter, they wrap their arguments in [ and ], not ( and )), pi is not the name of the value of the ratio of a circle's diameter to its radius (it's called \[Pi]). Note too that I've omitted the multiplication operator which is often not required.
In your particular case, this would be the fastest and the most straightforward solution:
getA[κ_, n_] := ToeplitzMatrix[2 Cos[2 π κ Range[0, n - 1] / n]]

Categories