Optimize computation time in nested for loops?

Optimize computation time in nested for loops? - python

I have this code:
import numpy as np
from skimage.util import img_as_ubyte
from skimage.feature import canny
import math
image = img_as_ubyte(sf_img)
edges = np.flipud(canny(image, sigma=3, low_threshold=10, high_threshold=25))
non_zeros = np.nonzero(edges)
true_rows = non_zeros[0]
true_col = non_zeros[1]
plt.imshow(edges)
plt.show()
N_im = 256
x0 = 0
y0 = -0.25
Npx = 129
Npy = 60
delta_py = 0.025
delta_px = 0.031
Nr = 9
delta_r = 0.5
rho = 0.063
epsilon = 0.75
r_k = np.zeros((1, Nr))
r_min = 0.5
for k in range(0, Nr):
r_k[0, k] = k * delta_r + r_min
a = np.zeros((Npy, Npx, Nr))
#FOR LOOP TO BE TIME OPTIMIZED
for i in range(0, np.size(true_col, 0)): #true_col and true_rows has the same size so it doesn't matter
for m in range(0, Npy):
for l in range(0, Npx):
d = math.sqrt(math.pow(
(((true_col[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
l * delta_px - (Npx * delta_px / 2) + x0)),
2) + math.pow(
(((true_rows[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
m * delta_py - (Npy * delta_py / 2) + y0)),
2))
min_idx = np.argmin(np.abs(d - r_k))
rk_hat = r_k[0, min_idx]
if np.abs(d - rk_hat) < rho:
a[m, l, min_idx] = a[m, l, min_idx] + 1
#ANOTHER LOOP TO BE OPTIMIZED
# for m in range(0, Npy):
# for l in range(0, Npx): #ORIGINAL
# for k in range(0, Nr):
# if a[m, l, k] < epsilon * np.max(a):
# a[m, l, k] = 0
a[np.where(a[:, :, :] < epsilon * np.max(a))] = 0 #SUBSTITUTED
a_prime = np.sum(a, axis=2)
acc_x = np.zeros((Npx, 1))
acc_y = np.zeros((Npy, 1))
for l in range(0, Npx):
acc_x[l, 0] = l * delta_px - (Npx * delta_px / 2) + x0
for m in range(0, Npy):
acc_y[m, 0] = m * delta_py - (Npy * delta_py / 2) + y0
prod = 0
for m in range(0, Npy):
for l in range(0, Npx):
prod = prod + (np.array([acc_x[l, 0], acc_y[m, 0]]) * a_prime[m, l])
points = prod / np.sum(a_prime)
Based on comment to an answer:
true_rows = np.random.randint(0,256,10)
true_col = np.random.randint(0,256,10)
Which, briefly, scans a 256x256 image that has been previously processed through the Canny Edge detection.
The For Loop so must scan every pixel of the resulting image and must also compute 2 nested for loops which does some operations depending on the value of l and m indexes of the 'a' matrix.
Since the edge detection returns an image with zeros and ones (in correspondence of edges) and since the inside operations has to be done only for the one-valued points, I've used
non_zeros = np.nonzero(edges)
to obtain only the indexes I'm interested in. Indeed, previously the code was in this way
for i in range(0, N_im):
for j in range(0, N_im):
if edges[i, j] == 1:
for m in range(0, Npy):
for l in range(0, Npx):
d = math.sqrt(math.pow(
(((i - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
l * delta_px - (Npx * delta_px / 2) + x0)),
2) + math.pow(
(((j - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
m * delta_py - (Npy * delta_py / 2) + y0)),
2))
min_idx = np.argmin(np.abs(d - r_k))
rk_hat = r_k[0, min_idx]
if np.abs(d - rk_hat) < rho:
a[m, l, min_idx] = a[m, l, min_idx] + 1
It seems like I managed to optimize the first two loops, but my script needs to be faster than that.
It takes roughly 6~7 minutes to run and I need to execute it for like 1000 times. Can you help me optimize even further those for loops of this script? Thank you!

You can use Numba JIT to speed up the computation (since the default CPython interpreter is very bad for such computation). Moreover, you can rework the loops so that the code can run in parallel.
Here is the resulting code:
import numba as nb
# Assume you work with 64-bits integer,
# feel free to change it to 32-bit integers if this is not the case.
# If you encounter type issue, let Numba choose with: #nb.njit(parallel=True)
# However, note that the first run will be slower if you let Numba choose.
#nb.njit('int64[:,:,::1](bool_[:,:], float64[:,:], int64, int64, int64, int64, float64, float64, float64, float64, float64)', parallel=True)
def fasterImpl(edges, r_k, Npy, Npx, Nr, N_im, delta_px, delta_py, rho, x0, y0):
a = np.zeros((Npy, Npx, Nr), dtype=nb.int64)
# Find all the position where edges[i,j]==1
validEdgePos = np.where(edges == 1)
for m in nb.prange(0, Npy):
for l in range(0, Npx):
# Iterate over the i,j value where edges[i,j]==1
for i, j in zip(validEdgePos[0], validEdgePos[1]):
d = math.sqrt(math.pow(
(((i - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
l * delta_px - (Npx * delta_px / 2) + x0)),
2) + math.pow(
(((j - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
m * delta_py - (Npy * delta_py / 2) + y0)),
2))
min_idx = np.argmin(np.abs(d - r_k))
rk_hat = r_k[0, min_idx]
if np.abs(d - rk_hat) < rho:
a[m, l, min_idx] += 1
return a
On my machine, with inputs described in your question (including the provided sf_img), this code is 616 times faster.
Reference time: 109.680 s
Optimized time: 0.178 s
Note that results are exactly the same than the reference implementation.

Based on your script, you have little experience with numpy in general. Numpy is optimized with SIMD instructions and your code kinda defeats it. I would advise you to review the basics on how to write numpy code
Please review this cheat sheet. https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf
For instance, this code can be changed from
r_k = np.zeros((1, Nr))
for k in range(0, Nr):
r_k[0, k] = k * delta_r + r_min
### to a simple np.arange assignment
r_k = np.zeros((1, Nr))
r_k[0,:] = np.arange(Nr) * delta_r + r_min
### or you can do everything in one line
r_k = np.expand_dims(np.arange(Nr) * delta_r + r_min,axis=0)
This code is a little awkward because you are creating a np.array while looping through each element. You can probably simplify this code too. Are you changing the data type from int to a np.array of two here?
prod = 0
for m in range(0, Npy):
for l in range(0, Npx):
prod = prod + (np.array([acc_x[l, 0], acc_y[m, 0]]) * a_prime[m, l])
For this loop, you can slowly separate out dependent and independent elements.
#FOR LOOP TO BE TIME OPTIMIZED
for i in range(0, np.size(true_col, 0)): #true_col and true_rows has the same size so it doesn't matter
for m in range(0, Npy):
for l in range(0, Npx):
d = math.sqrt(math.pow(
(((true_col[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
l * delta_px - (Npx * delta_px / 2) + x0)),
2) + math.pow(
(((true_rows[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
m * delta_py - (Npy * delta_py / 2) + y0)),
2))
min_idx = np.argmin(np.abs(d - r_k))
rk_hat = r_k[0, min_idx]
if np.abs(d - rk_hat) < rho:
a[m, l, min_idx] = a[m, l, min_idx] + 1
The outer loop for i in range(0, np.size(true_col, 0)) is fine
You do not need a loop to compute this. For index multiplication, you can allocate an extra matrix array such that you have the desired 1:1 format.
for m in range(0, Npy):
for l in range(0, Npx):
d = math.sqrt(math.pow(
(((true_col[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
l * delta_px - (Npx * delta_px / 2) + x0)),
2) + math.pow(
(((true_rows[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
m * delta_py - (Npy * delta_py / 2) + y0)),
2))
To emulate m and l behavior, you can create an Npx by Npy index matrix. Although this pattern may seem odd, Numpy inherited tricks from MATLAB ecosystem because the goal of MATLAB/numpy is to simplify code and allow you to spend more fixing your logic.
## l matrix
[[0,1,2,3,4,5,6,7,8....Npx],
[0,1,2,3,4,5,6,7,8....Npx],
.....
[0,1,2,3,4,5,6,7,8....Npx]]
##m matrix
[[0,0,0,0,0,0,0,0,0,0,0,0],
[1,1,1,1,,1,1,1,1,1,1,1,1],
.....
[Npx,Npx,Npx.....,Npx]]
## You can create both with one command
l_mat, m_mat = np.meshgrid(np.arange(Npx), np.arange(Npy))
>>> l_mat
array([[ 0, 1, 2, ..., 147, 148, 149],
[ 0, 1, 2, ..., 147, 148, 149],
[ 0, 1, 2, ..., 147, 148, 149],
...,
[ 0, 1, 2, ..., 147, 148, 149],
[ 0, 1, 2, ..., 147, 148, 149],
[ 0, 1, 2, ..., 147, 148, 149]])
>>> m_mat
array([[ 0, 0, 0, ..., 0, 0, 0],
[ 1, 1, 1, ..., 1, 1, 1],
[ 2, 2, 2, ..., 2, 2, 2],
...,
[97, 97, 97, ..., 97, 97, 97],
[98, 98, 98, ..., 98, 98, 98],
[99, 99, 99, ..., 99, 99, 99]])
With those two matrices, you can multiply it to create the result.
d = np.sqrt(np.pow( true_col[i] - np.floor((N_im + 1)/2)) / (N_im + l_mat).....
For these two lines of code, you seem to be setting up an argmin matrix.
min_idx = np.argmin(np.abs(d - r_k))
rk_hat = r_k[0, min_idx]
https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html
vfunc = np.vectorize(lambda x: np.argmin(np.abs(x - r_k))
min_idx = vfunc(d)
vfunc2 = np.vectorize(lambda x: r_k[0, x])
rk_hat = vfunc2(min_idx)
For the last two lines, d and rk_hat should be Npx by Npy matrixes. You can use matrix slicing or np.where to create a matrix mask.
if np.abs(d - rk_hat) < rho:
points = np.where( np.abs(d-rk_hat) < rho )
https://numpy.org/doc/stable/reference/generated/numpy.where.html
I give up the last line, it probably doesn't matter if you put it in a loop
a[m, l, min_idx] = a[m, l, min_idx] + 1
for xy in points:
a[xy[0],xy[1], min_idx[xy[0],xy[1]]] += 1

New answer which optimizes the the nested loop,
....
for i in range(0, np.size(true_col, 0)): #true_col and true_rows has the same size so it doesn't matter
for m in range(0, Npy):
for l in range(0, Npx):
There is a substantial improvement in processing time. For true_col and true_rows lengths of 2500 it takes about 3 seconds on my machine. It is in a function for testing purposes.
def new():
a = np.zeros((Npy, Npx, Nr),dtype=int)
# tease out and separate some of the terms
# used in the calculation of the distance - d
bb = N_im + 1
cc = (Npx * delta_px / 2)
dd = (Npy * delta_py / 2)
l, m = np.meshgrid(np.arange(Npx), np.arange(Npy))
q = (true_col - math.floor(bb / 2)) / bb / 2 # shape (true_col length,)
r = l * delta_px - cc + x0 # shape(Npy,Npx)
s = np.square(q - r[...,None]) # shape(Npy,Npx,true_col length)
# - last dimension is the outer loop of the original
t = (true_rows - math.floor(bb / 2)) / bb / 2 # shape (len(true_rows),)
u = m * delta_py - dd + y0 # shape(60,129) ... (Npx,Npy)
v = np.square(t - u[...,None]) # shape(Npy,Npx,true_col length)
d = np.sqrt(s + v) # shape(Npy,Npx,true_col length)
e1 = np.abs(d[...,None] - r_k.squeeze()) # shape(Npy,Npx,true_col length,len(r_k[0,:]))
min_idx = np.argmin(e1,-1) # shape(Npy,Npx,true_col length)
rk_hat = r_k[0,min_idx] # shape(Npy,Npx,true_col length)
zz = np.abs(d-rk_hat) # shape(Npy,Npx,true_col length)
condition = zz < rho # shape(Npy,Npx,true_col length)
# seemingly unavoidable for loop needed to perform
# a bincount along the last dimension (filtered)
# while retaining the 2d position info
# this will be pretty fast though,
# nothing really going on other than indexing and assignment
for iii in range(Npy*Npx):
row,col = divmod(iii,Npx)
filter = condition[row,col]
one_d = min_idx[row,col]
counts = np.bincount(one_d[filter])
a[row,col,:counts.size] = counts
return a
I could not figure out how to use Numpy methods to get rid of the final loop which filters for less then rho AND does a bincount - if I figure this out, I will update
Data from you question and comments
import math
import numpy as np
np.random.seed(5)
n_ = 2500
true_col = np.random.randint(0,256,n_)
true_rows = np.random.randint(0,256,n_)
N_im = 256
x0 = 0
y0 = -0.25
Npx = 129
Npy = 60
# Npx = 8
# Npy = 4
delta_py = 0.025
delta_px = 0.031
Nr = 9
delta_r = 0.5
rho = 0.063
epsilon = 0.75
r_min = 0.5
r_k = np.arange(Nr) * delta_r + r_min
r_k = r_k.reshape(1,Nr)
Your original nested loops in a function - with some diagnostic additions.
def original(writer=None):
'''writer should be a csv.Writer object.'''
a = np.zeros((Npy, Npx, Nr),dtype=int)
for i in range(0, np.size(true_col, 0)): #true_col and true_rows has the same size so it doesn't matter
for m in range(0, Npy):
for l in range(0, Npx):
d = math.sqrt(math.pow((((true_col[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (l * delta_px - (Npx * delta_px / 2) + x0)),2) +
math.pow((((true_rows[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (m * delta_py - (Npy * delta_py / 2) + y0)),2))
min_idx = np.argmin(np.abs(d - r_k)) # scalar
rk_hat = r_k[0, min_idx] # scalar
if np.abs(d - rk_hat) < rho:
# if (m,l) == (0,0):
if writer:
writer.writerow([i,m,l,d,min_idx,rk_hat,a[m, l, min_idx] + 1])
# print(f'condition satisfied: i:{i} a[{m},{l},{min_idx}] = {a[m, l, min_idx]} + 1')
a[m, l, min_idx] = a[m, l, min_idx] + 1
return a

Related

How to speed up the computation that is slow even with Numba

I'm having trouble with the slow computation of my Python code. Based on the pycallgraph below, the bottleneck seems to be the module named miepython.miepython.mie_S1_S2 (highlighted by pink), which takes 0.47 seconds per call.
The source code for this module is as follows:
import numpy as np
from numba import njit, int32, float64, complex128
__all__ = ('ez_mie',
'ez_intensities',
'generate_mie_costheta',
'i_par',
'i_per',
'i_unpolarized',
'mie',
'mie_S1_S2',
'mie_cdf',
'mie_mu_with_uniform_cdf',
)
#njit((complex128, float64, float64[:]), cache=True)
def _mie_S1_S2(m, x, mu):
"""
Calculate the scattering amplitude functions for spheres.
The amplitude functions have been normalized so that when integrated
over all 4*pi solid angles, the integral will be qext*pi*x**2.
The units are weird, sr**(-0.5)
Args:
m: the complex index of refraction of the sphere
x: the size parameter of the sphere
mu: array of angles, cos(theta), to calculate scattering amplitudes
Returns:
S1, S2: the scattering amplitudes at each angle mu [sr**(-0.5)]
"""
nstop = int(x + 4.05 * x**0.33333 + 2.0) + 1
a = np.zeros(nstop - 1, dtype=np.complex128)
b = np.zeros(nstop - 1, dtype=np.complex128)
_mie_An_Bn(m, x, a, b)
nangles = len(mu)
S1 = np.zeros(nangles, dtype=np.complex128)
S2 = np.zeros(nangles, dtype=np.complex128)
nstop = len(a)
for k in range(nangles):
pi_nm2 = 0
pi_nm1 = 1
for n in range(1, nstop):
tau_nm1 = n * mu[k] * pi_nm1 - (n + 1) * pi_nm2
S1[k] += (2 * n + 1) * (pi_nm1 * a[n - 1]
+ tau_nm1 * b[n - 1]) / (n + 1) / n
S2[k] += (2 * n + 1) * (tau_nm1 * a[n - 1]
+ pi_nm1 * b[n - 1]) / (n + 1) / n
temp = pi_nm1
pi_nm1 = ((2 * n + 1) * mu[k] * pi_nm1 - (n + 1) * pi_nm2) / n
pi_nm2 = temp
# calculate norm = sqrt(pi * Qext * x**2)
n = np.arange(1, nstop + 1)
norm = np.sqrt(2 * np.pi * np.sum((2 * n + 1) * (a.real + b.real)))
S1 /= norm
S2 /= norm
return [S1, S2]
Apparently, the source code is jitted by Numba so it should be faster than it actually is. The number of iterations in for loop in this function is around 25,000 (len(mu)=50, len(a)-1=500).
Any ideas on how to speed up this computation? Is something hindering the fast computation of Numba? Or, do you think the computation is already fast enough?
[More details]
In the above, another function _mie_An_Bn is being used. This function is also jitted, and the source code is as follows:
#njit((complex128, float64, complex128[:], complex128[:]), cache=True)
def _mie_An_Bn(m, x, a, b):
"""
Compute arrays of Mie coefficients A and B for a sphere.
This estimates the size of the arrays based on Wiscombe's formula. The length
of the arrays is chosen so that the error when the series are summed is
around 1e-6.
Args:
m: the complex index of refraction of the sphere
x: the size parameter of the sphere
Returns:
An, Bn: arrays of Mie coefficents
"""
psi_nm1 = np.sin(x) # nm1 = n-1 = 0
psi_n = psi_nm1 / x - np.cos(x) # n = 1
xi_nm1 = complex(psi_nm1, np.cos(x))
xi_n = complex(psi_n, np.cos(x) / x + np.sin(x))
nstop = len(a)
if m.real > 0.0:
D = _D_calc(m, x, nstop + 1)
for n in range(1, nstop):
temp = D[n] / m + n / x
a[n - 1] = (temp * psi_n - psi_nm1) / (temp * xi_n - xi_nm1)
temp = D[n] * m + n / x
b[n - 1] = (temp * psi_n - psi_nm1) / (temp * xi_n - xi_nm1)
xi = (2 * n + 1) * xi_n / x - xi_nm1
xi_nm1 = xi_n
xi_n = xi
psi_nm1 = psi_n
psi_n = xi_n.real
else:
for n in range(1, nstop):
a[n - 1] = (n * psi_n / x - psi_nm1) / (n * xi_n / x - xi_nm1)
b[n - 1] = psi_n / xi_n
xi = (2 * n + 1) * xi_n / x - xi_nm1
xi_nm1 = xi_n
xi_n = xi
psi_nm1 = psi_n
psi_n = xi_n.real
The example inputs are like the followings:
m = 1.336-2.462e-09j
x = 8526.95
mu = np.array([-1., -0.7500396, 0.46037385, 0.5988121, 0.67384093, 0.72468684, 0.76421644, 0.79175856, 0.81723714, 0.83962897, 0.85924182, 0.87641596, 0.89383665, 0.90708978, 0.91931481, 0.93067567, 0.94073113, 0.94961222, 0.95689496, 0.96467123, 0.97138347, 0.97791831, 0.98339434, 0.98870543, 0.99414948, 0.9975728 0.9989995, 0.9989995, 0.9989995, 0.9989995, 0.9989995,0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899952, 0.99899952,
0.99899952, 0.99899952, 0.99899952, 0.99899952, 0.99899952, 0.99899952, 0.99899952, 1. ])

I am focussing on _mie_S1_S2 since it appear to be the most expensive function on the provided example dataset.
First of all, you can use the parameter fastmath=True to the JIT to accelerate the computation if there is no values like +Inf, -Inf, -0 or NaN computed.
Then you can pre-compute some expensive expression containing divisions or implicit integer-to-float conversions. Note that (2 * n + 1) / n = 2 + 1/n and (n + 1) / n = 1 + 1/n. This can be useful to reduce the number of precomputed array but did not change the performance on my machine (this may change regarding the target architecture). Note also that such a precomputation have a slight impact on the result accuracy (most of the time negligible and sometime better than the reference implementation).
On my machine, this strategy make the code 4.5 times faster with fastmath=True and 2.8 times faster without.
The k-based loop can be parallelized using parallel=True and prange of Numba. However, this may not be always faster on all machines (especially the ones with a lot of cores) since the loop is pretty fast.
Here is the final code:
#njit((complex128, float64, float64[:]), cache=True, parallel=True)
def _mie_S1_S2_opt(m, x, mu):
nstop = int(x + 4.05 * x**0.33333 + 2.0) + 1
a = np.zeros(nstop - 1, dtype=np.complex128)
b = np.zeros(nstop - 1, dtype=np.complex128)
_mie_An_Bn(m, x, a, b)
nangles = len(mu)
S1 = np.zeros(nangles, dtype=np.complex128)
S2 = np.zeros(nangles, dtype=np.complex128)
factor1 = np.empty(nstop, dtype=np.float64)
factor2 = np.empty(nstop, dtype=np.float64)
factor3 = np.empty(nstop, dtype=np.float64)
for n in range(1, nstop):
factor1[n - 1] = (2 * n + 1) / (n + 1) / n
factor2[n - 1] = (2 * n + 1) / n
factor3[n - 1] = (n + 1) / n
nstop = len(a)
for k in nb.prange(nangles):
pi_nm2 = 0
pi_nm1 = 1
for n in range(1, nstop):
i = n - 1
tau_nm1 = n * mu[k] * pi_nm1 - (n + 1.0) * pi_nm2
S1[k] += factor1[i] * (pi_nm1 * a[i] + tau_nm1 * b[i])
S2[k] += factor1[i] * (tau_nm1 * a[i] + pi_nm1 * b[i])
temp = pi_nm1
pi_nm1 = factor2[i] * mu[k] * pi_nm1 - factor3[i] * pi_nm2
pi_nm2 = temp
# calculate norm = sqrt(pi * Qext * x**2)
n = np.arange(1, nstop + 1)
norm = np.sqrt(2 * np.pi * np.sum((2 * n + 1) * (a.real + b.real)))
S1 /= norm
S2 /= norm
return [S1, S2]
%timeit -n 1000 _mie_S1_S2_opt(m, x, mu)
On my machine with 6 cores, the final optimized implementation is 12 times faster with fastmath=True and 8.8 times faster without. Note that using similar strategies in other functions may also helps to speed up them.

How to create a Single Vector having 2 Dimensions?

I have used the Equation of Motion (Newtons Law) for a simple spring and mass scenario incorporating it into the given 2nd ODE equation y" + (k/m)x = 0; y(0) = 3; y'(0) = 0.
I have then been able to run a code that calculates and compares the Exact Solution with the Runge-Kutta Method Solution.
It works fine...however, I have recently been asked not to separate my values of 'x' and 'v', but use a single vector 'x' that has two dimensions ( i.e. 'x' and 'v' can be handled by x(1) and x(2) ).
MY CODE:
# Given is y" + (k/m)x = 0; y(0) = 3; y'(0) = 0
# Parameters
h = 0.01; #Step Size
t = 100.0; #Time(sec)
k = 1;
m = 1;
x0 = 3;
v0 = 0;
# Exact Analytical Solution
te = np.arange(0, t ,h);
N = len(te);
w = (k / m) ** 0.5;
x_exact = x0 * np.cos(w * te);
v_exact = -x0 * w * np.sin(w * te);
# Runge-kutta Method
x = np.empty(N);
v = np.empty(N);
x[0] = x0;
v[0] = v0;
def f1 (t, x, v):
x = v
return x
def f2 (t, x, v):
v = -(k / m) * x
return v
for i in range(N - 1): #MAIN LOOP
K1x = f1(te[i], x[i], v[i])
K1v = f2(te[i], x[i], v[i])
K2x = f1(te[i] + h / 2, x[i] + h * K1x / 2, v[i] + h * K1v / 2)
K2v = f2(te[i] + h / 2, x[i] + h * K1x / 2, v[i] + h * K1v / 2)
K3x = f1(te[i] + h / 2, x[i] + h * K2x / 2, v[i] + h * K2v / 2)
K3v = f2(te[i] + h / 2, x[i] + h * K2x / 2, v[i] + h * K2v / 2)
K4x = f1(te[i] + h, x[i] + h * K3x, v[i] + h * K3v)
K4v = f2(te[i] + h, x[i] + h * K3x, v[i] + h * K3v)
x[i + 1] = x[i] + h / 6 * (K1x + 2 * K2x + 2 * K3x + K4x)
v[i + 1] = v[i] + h / 6 * (K1v + 2 * K2v + 2 * K3v + K4v)
Can anyone help me understand how I can create this single vector having 2 dimensions, and how to fix my code up please?

You can use np.array() function, here is an example of what you're trying to do:
x = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

Unsure of your exact expectations of what you are wanting besides just having a 2 lists inside a single list. Though I do hope this link will help answer your issue.
https://www.tutorialspoint.com/python_data_structure/python_2darray.htm?

Plot circular gradients using numpy

I have a code for plotting radial gradients with numpy. So far it looks like this:
import numpy as np
import matplotlib.pyplot as plt
arr = np.zeros((256,256,3), dtype=np.uint8)
imgsize = arr.shape[:2]
innerColor = (0, 0, 0)
outerColor = (255, 255, 255)
for y in range(imgsize[1]):
for x in range(imgsize[0]):
#Find the distance to the center
distanceToCenter = np.sqrt((x - imgsize[0]//2) ** 2 + (y - imgsize[1]//2) ** 2)
#Make it on a scale from 0 to 1innerColor
distanceToCenter = distanceToCenter / (np.sqrt(2) * imgsize[0]/2)
#Calculate r, g, and b values
r = outerColor[0] * distanceToCenter + innerColor[0] * (1 - distanceToCenter)
g = outerColor[1] * distanceToCenter + innerColor[1] * (1 - distanceToCenter)
b = outerColor[2] * distanceToCenter + innerColor[2] * (1 - distanceToCenter)
# print r, g, b
arr[y, x] = (int(r), int(g), int(b))
plt.imshow(arr, cmap='gray')
plt.show()
Is there any way to optimize this code with numpy functions and improve speed?
It should look like this afterwards:

You can use vectorization to very efficiently calculate the distance without the need for a for-loop:
x_axis = np.linspace(-1, 1, 256)[:, None]
y_axis = np.linspace(-1, 1, 256)[None, :]
arr = np.sqrt(x_axis ** 2 + y_axis ** 2)
or you can use a meshgrid:
x_axis = np.linspace(-1, 1, 256)
y_axis = np.linspace(-1, 1, 256)
xx, yy = np.meshgrid(x_axis, y_axis)
arr = np.sqrt(xx ** 2 + yy ** 2)
and interpolate between inner and outer colors using broadcasting again
inner = np.array([0, 0, 0])[None, None, :]
outer = np.array([1, 1, 1])[None, None, :]
arr /= arr.max()
arr = arr[:, :, None]
arr = arr * outer + (1 - arr) * inner

Because of symmetry, actually just need to calculate one-fourth of image 256*256 which is 64*64, then rotate it with 90 degrees piece by piece and combine them. In this way, the total time is 1/4 times than calculating 256*256 pixel.
the following is example.
import numpy as np
import matplotlib.pyplot as plt
##Just calculate 64*64
arr = np.zeros((64,64,3), dtype=np.uint8)
imgsize = arr.shape[:2]
innerColor = (0, 0, 0)
outerColor = (255, 255, 255)
for y in range(imgsize[1]):
for x in range(imgsize[0]):
#Find the distance to the corner
distanceToCenter = np.sqrt((x) ** 2 + (y - imgsize[1]) ** 2)
#Make it on a scale from 0 to 1innerColor
distanceToCenter = distanceToCenter / (np.sqrt(2) * imgsize[0])
#Calculate r, g, and b values
r = outerColor[0] * distanceToCenter + innerColor[0] * (1 - distanceToCenter)
g = outerColor[1] * distanceToCenter + innerColor[1] * (1 - distanceToCenter)
b = outerColor[2] * distanceToCenter + innerColor[2] * (1 - distanceToCenter)
# print r, g, b
arr[y, x] = (int(r), int(g), int(b))
#rotate and combine
arr1=arr
arr2=arr[::-1,:,:]
arr3=arr[::-1,::-1,:]
arr4=arr[::,::-1,:]
arr5=np.vstack([arr1,arr2])
arr6=np.vstack([arr4,arr3])
arr7=np.hstack([arr6,arr5])
plt.imshow(arr7, cmap='gray')
plt.show()

Recursive Inverse FFT

I have implemented two functions FFT and InverseFFT in recursive mode.
These are the functions:
def rfft(a):
n = a.size
if n == 1:
return a
i = 1j
w_n = e ** (-2 * i * pi / float(n))
w = 1
a_0 = np.zeros(int(math.ceil(n / 2.0)), dtype=np.complex_)
a_1 = np.zeros(n / 2, dtype=np.complex_)
for index in range(0, n):
if index % 2 == 0:
a_0[index / 2] = a[index]
else:
a_1[index / 2] = a[index]
y_0 = rfft(a_0)
y_1 = rfft(a_1)
y = np.zeros(n, dtype=np.complex_)
for k in range(0, n / 2):
y[k] = y_0[k] + w * y_1[k]
y[k + n / 2] = y_0[k] - w * y_1[k]
w = w * w_n
return y
def rifft(y):
n = y.size
if n == 1:
return y
i = 1j
w_n = e ** (2 * i * pi / float(n))
w = 1
y_0 = np.zeros(int(math.ceil(n / 2.0)), dtype=np.complex_)
y_1 = np.zeros(n / 2, dtype=np.complex_)
for index in range(0, n):
if index % 2 == 0:
y_0[index / 2] = y[index]
else:
y_1[index / 2] = y[index]
a_0 = rifft(y_0)
a_1 = rifft(y_1)
a = np.zeros(n, dtype=np.complex_)
for k in range(0, n / 2):
a[k] = (a_0[k] + w * a_1[k]) / n
a[k + n / 2] = (a_0[k] - w * a_1[k]) / n
w = w * w_n
return a
Based on the definition of IFFT, converting FFT function to IFFT function can be done by changing 2*i*pi to -2*i*pi and dividing the result by N. The rfft() function works fine but the rifft() function, after these modifications, does not work.
I compare the output of my functions with scipy.fftpack.fft and scipy.fftpack.ifft functions.
I feed the following NumPy array:
a = np.array([1, 0, -1, 3, 0, 0, 0, 0])
The following box shows the results of rfft() function and scipy.fftpack.fft function.
//rfft(a)
[ 3.00000000+0.j -1.12132034-1.12132034j 2.00000000+3.j 3.12132034-3.12132034j -3.00000000+0.j 3.12132034+3.12132034j 2.00000000-3.j -1.12132034+1.12132034j]
//scipy.fftpack.fft(a)
[ 3.00000000+0.j -1.12132034-1.12132034j 2.00000000+3.j 3.12132034-3.12132034j -3.00000000+0.j 3.12132034+3.12132034j 2.00000000-3.j -1.12132034+1.12132034j]
And this box shows the results of rifft() function and scipy.fftpack.ifft function.
//rifft(a)
[ 0.04687500+0.j -0.01752063+0.01752063j 0.03125000-0.046875j 0.04877063+0.04877063j -0.04687500+0.j 0.04877063-0.04877063j 0.03125000+0.046875j -0.01752063-0.01752063j]
//scipy.fftpack.ifft(a)
[ 0.37500000+0.j -0.14016504+0.14016504j 0.25000000-0.375j 0.39016504+0.39016504j -0.37500000+0.j 0.39016504-0.39016504j 0.25000000+0.375j -0.14016504-0.14016504j]

The division by the size N is a global scaling factor and should be performed on the result of the recursion rather than dividing at each stage of the recursion as you have done (by a decreasing factor as you go deeper in the recursion; overall scaling down the result too much). You could address this by removing the / n factor in the final loop of your original implementation, which gets called by another function performing the scaling:
def unscaledrifft(y):
...
for k in range(0, n / 2):
a[k] = (a_0[k] + w * a_1[k])
a[k + n / 2] = (a_0[k] - w * a_1[k])
w = w * w_n
return a
def rifft(y):
return unscaledrifft(y)/y.size
Alternatively, since you are performing a radix-2 FFT, the global factor N would be a power of 2 such that N=2**n, where n is the number of steps in the recursion. You could thus divide by 2 at each stage of the recursion to achieve the same result:
def rifft(y):
...
for k in range(0, n / 2):
a[k] = (a_0[k] + w * a_1[k]) / 2
a[k + n / 2] = (a_0[k] - w * a_1[k]) / 2
w = w * w_n
return a

Implementation of nested summing in python

I want to implement summation from this publication which is denoted as formula 29. As you may see there are 5 nested summations. Now I struggle to implement that. According to what I understand from my teacher, I should nest it in following way:
B=0.
for beta in range():
coeff1=...
sum1=0.
for beta_prim in range():
coeff2=...
sum2=0.
for alfa in range():
coeff3=...
sum3=0.
for alfa_prim in range():
coeff4=...
sum4=0.
for lambda in range():
coeff5=...
sum4+=coeff5
sum3+=sum4*coeff3
sum2+=sum3*coeff2
sum1+=sum2*coeff1
B+=sum1
return B
Now I by coeff{1,2,3,4} I mean those expressions after each sigma sign.
I do it wrong however and I cannot tell where.
Could you give me a tip on that?
Best wishes!
The nested summation formula:

You approach should work. Here is what I came up with:
from math import factorial
import scipy.misc
def comb(n,k):
return scipy.misc.comb(n,k,True)
def monepow(n):
if n % 2 == 0:
return 1
else:
return -1
def B(mu, nu, n, np, l, lp, m):
B = 0
beta1 = max( 0, mu - np - l, nu - np - l + m )
beta2 = min( n - l, nu )
for beta in range(beta1, beta2+1):
c1 = comb(n-l, beta)
beta1p = max( 0, mu - beta - l - lp, nu - beta - l - lp + m )
beta2p = min( np - lp, nu - beta )
for betap in range(beta1p, beta2p+1):
delta = min( mu + nu + l + lp, mu + n + np ) + 1
c2 = comb(np - lp, betap) * factorial(delta) / factorial (mu + beta + betap + l + lp + 1)
alpha1 = max( m, nu - beta - betap - lp + m )
for alpha in range(alpha1, l+1):
c3 = monepow(alpha) * comb(l, alpha) * comb(l, alpha - m)
alpha1p = max( m, nu - beta - betap - alpha + m )
for alphap in range(alpha1p, lp):
c4 = monepow(alphap) * comb(lp, alphap) * comb(lp, alphap-m) * comb(alpha - alphap - m, nu - beta - betap) * factorial(alpha - alphap + beta + lp) * factorial( alpha - alphap + beta + l)
for lam in range(0, mu+1):
c5 = monepow(lam) * comb( alpha - alphap + beta + lp + lam, lam ) * comb (alpha - alphap + betap + l + mu - lam, mu - lam) * comb(mu, lam)
x = c1*c2*c3*c4*c5
B += x
return B
def testAll():
for mu in range (0,10):
for nu in range (0,10):
for n in range(1,5):
for np in range(1,5):
for l in range(0,n):
for lp in range(0,np):
for m in range(0,l+1):
b = None
try:
b = B(mu, nu, n, np, l, lp, m)
except:
pass
if b is not None:
print [mu, nu, n, np, l, lp, m], " -> ", b
# print B(1, 1, 6, 6, 0, 4, 3) # should be == 0
# print B(1, 2, 6, 6, 4, 0, 0) # should be == 0
# print B(2, 2, 6, 6, 4, 0, 0) # might be == 0
# print B(3, 2, 6, 6, 4, 0, 0) # might be == 0
testAll()
The try...except... block is present in testAll() because
I don't know what the valid ranges are for mu and nu (and the range for m may also not be correct), so for some inputs B will attempt to compute the factorial of a negative number and throw an exception.
Let me know if you spot any errors...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimize computation time in nested for loops? - python

Related

How to speed up the computation that is slow even with Numba

How to create a Single Vector having 2 Dimensions?

Plot circular gradients using numpy

Recursive Inverse FFT

Implementation of nested summing in python

Categories

Resources