Python - Multiprocessing with multiple for-loops - python

I know there are other questions asked concerning this topic so I'm sorry I have to ask it again, but I cannot get it to work since I'm quite new to this topic.
I have four for-loop (nested) in which certain algbraic calculations are done (matrix operations for example). These calculations take too much time to complete, so I was hoping I could speed this up with Multiprocessing.
The code is given below. I simulated the ranges and matrix sizes here, but in my code these ranges are really used (so it's not strange that it takes so long). You should be able to run it directly when copy-paste the code.
import numpy as np
from scipy.linalg import fractional_matrix_power
import math
#Lists for the loop (and one value)
x_list = np.arange(0, 32, 1)
y_list = np.arange(0, 32, 1)
a_list = np.arange(0, 501, 1)
b_list = np.arange(0, 501, 1)
c_list = np.arange(0, 64, 1)
d_number = 32
#Matrices
Y = np.arange(2048).reshape(32, 64)
g = np.asmatrix(np.empty([d_number, 1], dtype=np.complex_))
A = np.empty([len(a_list), len(b_list), len(c_list)], dtype=np.complex_)
A_in = np.empty([len(a_list), len(b_list)], dtype=np.complex_)
for ai in range(len(a_list)):
for bi in range(len(b_list)):
for ci in range(len(c_list)):
f_k_i = c_list[ci]
X_i = np.asmatrix([Y[:, ci]]).T
for di in range(d_number):
r = math.sqrt((x_list[di] - a_list[ai])**2 + (y_list[di] - b_list[bi])**2 + 63**2)
g[di, 0] = np.exp(-2 * np.pi * 1j * f_k_i * (r / 8)) / r #g is a vector
A[-bi, -ai, ci] = ((1 / np.linalg.norm(g)**2) * (((g.conj().T * fractional_matrix_power((X_i * X_i.conj().T), (1/5)) * g) / np.linalg.norm(g)**2)**2)).item(0)
A_in[-bi, -ai] = (1 / len(c_list)) * sum(A[-bi, -ai, :])
What is the best way to approach this? If multiprocessing is the solution, how to implement this for my case (since I couldn't figure that out).
Thanks in advance.

One way to approach it would be to move the two inside loops into a function taking ai and bi as parameters and returning the indexes and the result. Then use multiprocessing.Pool.imap_unordered() to run the function on ai, bi pairs. Something like this (untested):
def partial_calc(index):
"""
This function replaces the inner two loops to calculate the value of
A_in[-bi, -ai]. index is a tuple (ai, bi).
"""
ai, bi = index
for ci in range(len(c_list)):
f_k_i = c_list[ci]
X_i = np.asmatrix([Y[:, ci]]).T
for di in range(d_number):
r = math.sqrt((x_list[di] - a_list[ai])**2 + (y_list[di] - b_list[bi])**2 + 63**2)
g[di, 0] = np.exp(-2 * np.pi * 1j * f_k_i * (r / 8)) / r #g is a vector
A[-bi, -ai, ci] = ((1 / np.linalg.norm(g)**2) * (((g.conj().T * fractional_matrix_power((X_i * X_i.conj().T), (1/5)) * g) / np.linalg.norm(g)**2)**2)).item(0)
return ai, bi, (1 / len(c_list)) * sum(A[-bi, -ai, :])
def main():
with multiprocessing.Pool(None) as p:
# this replaces the outer two loops
indices = itertools.product(range(len(a_list)), range(len(b_list)))
partial_results = p.imap_unordered(partial_calc, indices)
for ai, bi, value in partial_results:
A_in[-bi, -ai] = value
#... do something with A_in ...
if __name__ == "__main__":
main()
Or put the inner three loops into the function and generate one "row" for A_in at a time. Profile it both ways and see which is faster.
The trick will be setting up the lists (a_list, b_list, etc) and the Y matrix. And that depends on their characteristics (constant, quickly/slowly calculated, large/small, etc).

Related

Vector Normalization in Python

I'm trying to port this MatLab function in Python:
fs = 128;
x = (0:1:999)/fs;
y_orig = sin(2*pi*15*x);
y_noised = y_orig + 0.5*randn(1,length(x));
[yseg] = mapstd(y_noised);
I wrote this code (which works, so there are not problems with missing variables or else):
Norm_Y = 0
Y_Normalized = []
for i in range(0, len(YSeg), 1):
Norm_Y = Norm_Y + (pow(YSeg[i],2))
Norm_Y = sqrt(Norm_Y)
for i in range(0, len(YSeg), 1):
Y_Normalized.append(YSeg[i] / Norm_Y)
print("%3d %f" %(i, Y_Normalized[i]))
YSeg is Y_Noised (I wrote it in another section of the code).
Now I don't expect the values to be same between MatLab code and mine, cause YSeg or Y_Noised are generated by RAND values, so it's ok they are different, but they are TOO MUCH different.
These are the first 10 values in Matlab:
0.145728655284548
1.41918657039301
1.72322238170491
0.684826842884694
0.125379108969931
-0.188899711186140
-1.03820858801652
-0.402591786430960
-0.844782236884026
0.626897216311757
While these are the first 10 numbers in my python code:
0.052015
0.051132
0.041209
0.034144
0.034450
0.003812
0.048629
0.016854
0.024484
0.021435
It's like mine are 100 times lower. So I feel like I've missed a step during normalization. Can you help ?
You can normalize a vector quite easily in python with numpy:
import numpy as np
def normalize_vector(input_vector):
return input_vector / np.sqrt(np.sum(input_vector**2))
random_vec = np.random.rand(10)
vec_norm = normalize_vector(random_vec)
print(vec_norm)
You can call the provided function with your input vector (YSeg) and check the output. I would expect a similar output as in matlab.
This is an implementation in numpy:
import numpy as np
fs = 127
x = np.arange(10000) / fs
y_orig = np.sin(2 * np.pi * 15 * x)
y_noised = y_orig + 0.5 * np.random.randn(len(x))
yseg = (y_noised - y_noised.mean()) / y_noised.std()
However, why do you consider the values to be "too much different"? After all, the values of y_orig are in range [-1, 1] and you are randomly distorting them by ~0.4 on average.

Convergence tests of Leapfrog method for vectorial wave equation in Python

Considering the following Leapfrog scheme used to discretize a vectorial wave equation with given initial conditions and periodic boundary conditions. I have implemented the scheme and now I want to make numerical convergence tests to show that the scheme is of second order in space and time.
I'm mainly struggling with two points here:
I'm not 100% sure if I implemented the scheme correctly. I really wanted to use slicing because it is so much faster than using loops.
I don't really know how to get the right error plot, because I'm not sure which norm to use. In the examples I have found (they were in 1D) we've always used the L2-Norm.
import numpy as np
import matplotlib.pyplot as plt
# Initial conditions
def p0(x):
return np.cos(2 * np.pi * x)
def u0(x):
return -np.cos(2 * np.pi * x)
# exact solution
def p_exact(x, t):
# return np.cos(2 * np.pi * (x + t))
return p0(x + t)
def u_exact(x, t):
# return -np.cos(2 * np.pi * (x + t))
return u0(x + t)
# function for doing one time step, considering the periodic boundary conditions
def leapfrog_step(p, u):
p[1:] += CFL * (u[:-1] - u[1:])
p[0] = p[-1]
u[:-1] += CFL * (p[:-1] - p[1:])
u[-1] = u[0]
return p, u
# Parameters
CFL = 0.3
LX = 1 # space length
NX = 100 # number of space steps
T = 2 # end time
NN = np.array(range(50, 1000, 50)) # list of discretizations
Ep = []
Eu = []
for NX in NN:
print(NX)
errorsp = []
errorsu = []
x = np.linspace(0, LX, NX) # space grid
dx = x[1] - x[0] # spatial step
dt = CFL * dx # time step
t = np.arange(0, T, dt) # time grid
# TEST
# time loop
for time in t:
if time == 0:
p = p0(x)
u = u0(x)
else:
p, u = leapfrog_step(p, u)
errorsp.append(np.linalg.norm((p - p_exact(x, time)), 2))
errorsu.append(np.linalg.norm((u - u_exact(x, time)), 2))
errorsp = np.array(errorsp) * dx ** (1 / 2)
errorsu = np.array(errorsu) * dx ** (1 / 2)
Ep.append(errorsp[-1])
Eu.append(errorsu[-1])
# plot the error
plt.figure(figsize=(8, 5))
plt.xlabel("$Nx$")
plt.ylabel(r'$\Vert p-\bar{p}\Vert_{L_2}$')
plt.loglog(NN, 15 / NN ** 2, "green", label=r'$O(\Delta x^{2})$')
plt.loglog(NN, Ep, "o", label=r'$E_p$')
plt.loglog(NN, Eu, "o", label=r'$E_u$')
plt.legend()
plt.show()
I would really appreciate it if someone could quickly check the implementation of the scheme and an indication on how to get the error plot.
Apart from the initialization, I see no errors in your code.
As to the initialization, consider the first step. There you should compute, per the method description, approximations for p(dt,j*dx) from the values of p(0,j*dx) and u(0.5*dt, (j+0.5)*dx). This means that you need to initialize at time==0
u = u_exact(x+0.5*dx, 0.5*dt).
and also need to compare the then obtained solution against u_exact(x+0.5*dx, time+0.5*dt).
That you obtained the correct order is IMO more an artefact of the test problem than an accidentially still correct algorithm.
If no exact solution is known, or if you want to use a more realistic algorithm in the test, you would need to compute the initial u values from p(0,x) and u(0,x) via Taylor expansions
u(t,x) = u(0,x) + t*u_t(0,x) + 0.5*t^2*u_tt(0,x) + ...
u(0.5*dt,x) = u(0,x) - 0.5*dt*p_x(0,x) + 0.125*dt^2*u_xx(0,x) + ...
= u(0,x) - 0.5*CFL*(p(0,x+0.5*dx)-p(0,x-0.5*dx))
+ 0.125*CFL^2*(u(0,x+dx)-2*u(0,x)+u(0,x-dx)) + ...
It might be sufficient to take just the linear expansion,
u[j] = u0(x[j]+0.5*dx) - 0.5*CFL*(p0(x[j]+dx)-p0(x[j])
or with array operations
p = p0(x)
u = u0(x+0.5*dx)
u[:-1] -= 0.5*CFL*(p[1:]-p[:-1])
u[-1]=u[0]
as then the second order error in the initial data just adds to the general second order error.
You might want to change the space grid to x = np.linspace(0, LX, NX+1) to have dx = LX/NX.
I would define the exact solution and the initial condition the other way around, as that allows more flexibility in the test problems.
# components of the solution
def f(x): return np.cos(2 * np.pi * x)
def g(x): return 2*np.sin(6 * np.pi * x)
# exact solution
def u_exact(x,t): return f(x+t)+g(x-t)
def p_exact(x,t): return -f(x+t)+g(x-t)
# Initial conditions
def u0(x): return u_exact(x,0)
def p0(x): return p_exact(x,0)

Evaluation a function stored in column major order

I am trying to evaluate a function at discretized point and stored in column-major order like this:
import numpy as np;
N = 3 ##
n = N * N
h = 1 / (N + 1) # step size
h2 = h**2 #
deltaX = np.zeros(N)
deltaY = np.zeros(N);
def Function(x, y):
output = -20. * np.pi * np.sin(2 * np.pi * x) * sin(4 * np.pi * y)
return output
## Equally spaced delta:
for i in range(1, N + 1):
deltaX[i - 1] = i * h;
deltaY[i - 1] = i * h;
### Lexicographic Row order ###
### Evaluation of function at deltaX and deltaY
feval = np.zeros((n, 1))
How could I approach to evaluate the discretization for this function?
Good news: your function properly uses numpy operations, so is completely vectorized. That means that you can evaluate it at every element of the input arrays.
The shape of the inputs don't have to match exactly. They just have to broadcast together. That means that only non-singleton dimensions need to match.
So start by creating the appropriate input arrays. Numpy provides the tools to do this elegantly without looping:
N = 3
h = 1 / (N + 1)
delta_x = np.arange(1., N + 1.) * h
delta_y = np.linspace(h, N * h, N)[:, None]
I deliberately used two different ways to create the coordinate arrays, to serve as an example. In practice, you'd want to use one of the two methods.
The index [:, None] turns delta_y into a column vector. None introduces a new singleton axis. There are any number of Other ways to do the same thing, like `delta_y = ....reshape(-1, 1).
And read the docs I linked to, and for all the functions I used.
Now that you have a column in the y direction and a row in the x, you can call Function as just
val = Function(delta_x, delta_y)
The operation of arranging the 2D matrix val into a 1D array is called raveling. By default, it uses the default row-major order that numpy uses in memory. This order is also called "C" order. The alternative arrangement is to interpret the array in column major order, like Matlab does. This is called Fortran order. It will require a copy of the data since that's not how the elements are laid out in memory.
One way to ravel in Fortran order:
feval = val.ravel(order='F')
An alternative is to transpose and use C order:
feval = val.T.ravel()
The last two lines can be combined, so you end up with 3 lines:
delta_x = h * np.arange(1., N + 1.)
delta_y = h * np.arange(1., N + 1.)[:, None]
feval = Function(delta_x, delta_y).ravel(order='F')
You could make it into a one-liner, but that's pushing it.

Harmonic oscillator

I'm trying to solve the simple pendulum in python. My goal is to save my results into a file in order to make a plot afterwards. Should I put my the code that saves the data in the loop or define a new function ?
NB: I'm a beginner.
Thank you.
import numpy as np
g = 9.8
L = 3
THETA_0 = np.pi / 4
THETA_DOT_0 = 0
def get_theta_double_dot(theta):
return - (g / L) * np.sin(theta)
def theta(t):
theta = THETA_0
theta_dot = THETA_DOT_0
dela_t = 0.01
for tps in np.arange(0, t, delta_t):
theta_double_dot = get_theta_double_dot(theta)
theta = theta + (theta_dot * delta_t)
theta_dot = theta_dot + (theta_double_dot * delta_t)
return theta
Definitely store your results in a variable, and then save the whole thing into a file afterwards. Saving during the loop is "messy" and less efficient because of I/O calls constantly.
Note that this is not true if you are dealing with millions of entries, in which case you'd have to spare memory a bit and write to file in "batches". But it doesn't feel like this should be a problem in your case.

Efficiently coding gradient of function

I'm currently trying to code this beastie in Python (using the numpy libraries). The lambda * w is supposed to be outside the summation.
Currently, I've coded the problem using a for loop, and a running total outside; however, this approach takes a long time.
My vectors for y, w, and x are very large - think 100,000s of elements. I was wondering whether there is a simpler way to vectorize the element instead using simple matrix operations instead of looping through the vector one element by another element.
This is my vectorized code:
xty = xtrain.T.dot(ytrain)
e = math.exp(-w_0.T.dot(xty))
gradient = (-xty*(e/1+e)-lambda_var*w_0)
If I understand your problem correctly, you might just have to bite the bullet and go with the loop:
import numpy as np
wave = 1e3
xs, ys, w = np.arange(1, 4), np.arange(4, 7), np.arange(7, 10)
eps = np.zeros(w.T.shape)
for x, y in zip(xs, ys):
eps += -y * np.exp(-y * w.T * x) * x / (1 + np.exp(-y * w.T * x))
print(eps + wave * w)
[ 7000. 8000. 9000.]

Categories