Speed up double loop in Python - python

Is there a way to speed up a double loop that updates its values from the previous iteration?
In code:
def calc(N, m):
x = 1.0
y = 2.0
container = np.zeros((N, 2))
for i in range(N):
for j in range(m):
x=np.random.gamma(3,1.0/(y*y+4))
y=np.random.normal(1.0/(x+1),1.0/sqrt(x+1))
container[i, 0] = x
container[i, 1] = y
return container
calc(10, 5)
As you can see, the inner loop is updating variables x and y while the outer loop starts with a different value of x each time. I don't think this is vectorizable but maybe there are other possible improvements.
Thanks!

I don't think it's going to add up to any important speed up, but you can save some function calls if you generate all your gamma and normally distributed random values at once.
Gamma functions have a scaling property, so that if you draw a value x from a gamma(k, 1) distribution, then c*x will be a value drawn from a gamma(k, c) distribution. Similarly, with the normal distribution, you can take a y value drawn from a normal(0, 1) distribution and convert it into a value drawn from a normal(m, s) distribution doing x*s + m. So you can rewrite your function as follows:
def calc(N, m):
x = 1.0
y = 2.0
container = np.zeros((N, 2))
nm = N*m
gamma_vals = np.random.gamma(3, 1, size=(nm,))
norm_vals = np.random.normal(0, 1, size=(nm,))
for i in xrange(N):
for j in xrange(m):
ij = i*j
x = gamma_vals[ij] / (y*y+4)
y = norm_vals[ij]/np.sqrt(x+1) + 1/(x+1)
container[i, 0] = x
container[i, 1] = y
return container
If the actual parameters of your distributions had a simpler expression, you may actually be able to use some elaborate form of np.cumprod or the like, and spare yourself the loops. I am not able to figure out a way of doing so...

Does this work?
for i in xrange(N):
# xrange is an iterator, range makes a new list.
# You save linear space and `malloc`ing time by doing this
x += m*y # a simple algebra hack. Compute this line of the loop just once instead of `m` times
y -= m*x
y *= -1 # another simple algebra hack. Compute this line of the loop just once instead of `m` times
container[i,0] = x
container[i,1] = y
return container

Related

Is there a way to speed up these nested loops (Laplacian case) Python?

I am trying to speed up the nested loop in my function Gram.
My function that is causing a big delay is the Laplacian (Abel) because it requires to calculate for each cell of the matrix the norm of a column by a row.
abel = lambda x,y,t,p: np.exp(-np.abs(p) * np.linalg.norm(x-y))
def Gram(X,Y,function,t,p):
n = X.shape[0]
s = Y.shape[0]
K = np.zeros((n,s))
if function==abel:
for i in range(n):
for j in range(s):
K[i,j] = abel(X[i,:],Y[j,:],t,p)
else:
K = polynomial(X,Y,t,p)
return K
I was able to speed up the function a bit by keeping the exponential part out of the abel equation and then I apply it for the whole matrix.
abel_2 = lambda x,y,t,p: np.linalg.norm(x-y) (don't mind the t and p).
def Gram_2(X,Y,function,t,p):
n = X.shape[0]
s = Y.shape[0]
K = np.zeros((n,s))
if function==abel_2:
for i in range(n):
for j in range(s):
K[i,j] = abel_2(X[i,:],Y[j,:],0,0)
K = np.exp(-abs(p)*K)
else:
K = polynomial(X,Y,t,p)
return K
The time is reduced by 50%, however, the double loops (nested) are still a major problem, I believe.
Can someone help with this?
Thank you!
Basically, instead of going through the loops one by one to subtract X[i,:] from Y[j,:], it would save tons of time of just selecting X[i,:] and subtracting it from all Y, then applying the norm on a certain axis!
In my case it was axis=1.
def Gram_10(X,Y,function,t,p):
n = X.shape[0]
s = Y.shape[0]
K = np.zeros((n,s))
if function==abel_2:
for i in range(n):
# it is important to put the correct slice (:s) , so the matrix provided by the norm goes
# to the right place in the function
K[i,:s] = np.linalg.norm(X[i,:]-Y,axis=1)
K = np.exp(-abs(p)*K)
else:
K = polynomial(X,Y,t,p)
return K

2d sum using an array - Python

I'm trying to sum a two dimensional function using the array method, somehow, using a for loop is not outputting the correct answer. I want to find (in latex) $$\sum_{i=1}^{M}\sum_{j=1}^{M_2}\cos(i)\cos(j)$$ where according to Mathematica the answer when M=5 is 1.52725. According to the for loop:
def f(N):
s1=0;
for p1 in range(N):
for p2 in range(N):
s1+=np.cos(p1+1)*np.cos(p2+1)
return s1
print(f(4))
is 0.291927.
I have thus been trying to use some code of the form:
def f1(N):
mat3=np.zeros((N,N),np.complex)
for i in range(0,len(mat3)):
for j in range(0,len(mat3)):
mat3[i][j]=np.cos(i+1)*np.cos(j+1)
return sum(mat3)
which again
print(f1(4))
outputs 0.291927. Looking at the array we should find for each value of i and j a matrix of the form
mat3=[[np.cos(1)*np.cos(1),np.cos(2)*np.cos(1),...],[np.cos(2)*np.cos(1),...]...[np.cos(N+1)*np.cos(N+1)]]
so for N=4 we should have
mat3=[[np.cos(1)*np.cos(1) np.cos(2)*np.cos(1) ...] [np.cos(2)*np.cos(1) ...]...[... np.cos(5)*np.cos(5)]]
but what I actually get is the following
mat3=[[0.29192658+0.j 0.+0.j 0.+0.j ... 0.+0.j] ... [... 0.+0.j]]
or a matrix of all zeros apart from the mat3[0][0] element.
Does anybody know a correct way to do this and get the correct answer? I chose this as an example because the problem I'm trying to solve involves plotting a function which has been summed over two indices and the function that python outputs is not the same as Mathematica (i.e., a function of the form $$f(E)=\sum_{i=1}^{M}\sum_{j=1}^{M_2}F(i,j,E)$$).
The return statement is not indented correctly in your sample code. It returns immediately in the first loop iteration. Indent it on the function body instead, so that both for loops finish:
def f(N):
s1=0;
for p1 in range(N):
for p2 in range(N):
s1+=np.cos(p1+1)*np.cos(p2+1)
return s1
>>> print(f(5))
1.527247272700347
I have moved your code to a more numpy-ish version:
import numpy as np
N = 5
x = np.arange(N) + 1
y = np.arange(N) + 1
x = x.reshape((-1, 1))
y = y.reshape((1, -1))
mat = np.cos(x) * np.cos(y)
print(mat.sum()) # 1.5272472727003474
The trick here is to reshape x to a column and y to a row vector. If you multiply them, they are matched up like in your loop.
This should be more performant, since cos() is only called 2*N times. And it avoids loops (bad in python).
UPDATE (regarding your comment):
This pattern can be extended in any dimension. Basically, you get something like a crossproduct. Where every instance of x is matched up with every instance of y, z, u, k, ... Along the corresponding dimensions.
It's a bit confusing to describe, so here is some more code:
import numpy as np
N = 5
x = np.arange(N) + 1
y = np.arange(N) + 1
z = np.arange(N) + 1
x = x.reshape((-1, 1, 1))
y = y.reshape((1, -1, 1))
z = z.reshape((1, 1, -1))
mat = z**2 * np.cos(x) * np.cos(y)
# x along first axis
# y along second, z along third
# mat[0, 0, 0] == 1**2 * np.cos(1) * np.cos(1)
# mat[0, 4, 2] == 3**2 * np.cos(1) * np.cos(5)
If you use this for many dimensions, and big values for N, you will run into memory problems, though.

How to create an array that can be accessed according to its indices in Numpy?

I am trying to solve the following problem via a Finite Difference Approximation in Python using NumPy:
$u_t = k \, u_{xx}$, on $0 < x < L$ and $t > 0$;
$u(0,t) = u(L,t) = 0$;
$u(x,0) = f(x)$.
I take $u(x,0) = f(x) = x^2$ for my problem.
Programming is not my forte so I need help with the implementation of my code. Here is my code (I'm sorry it is a bit messy, but not too bad I hope):
## This program is to implement a Finite Difference method approximation
## to solve the Heat Equation, u_t = k * u_xx,
## in 1D w/out sources & on a finite interval 0 < x < L. The PDE
## is subject to B.C: u(0,t) = u(L,t) = 0,
## and the I.C: u(x,0) = f(x).
import numpy as np
import matplotlib.pyplot as plt
# definition of initial condition function
def f(x):
return x^2
# parameters
L = 1
T = 10
N = 10
M = 100
s = 0.25
# uniform mesh
x_init = 0
x_end = L
dx = float(x_end - x_init) / N
#x = np.zeros(N+1)
x = np.arange(x_init, x_end, dx)
x[0] = x_init
# time discretization
t_init = 0
t_end = T
dt = float(t_end - t_init) / M
#t = np.zeros(M+1)
t = np.arange(t_init, t_end, dt)
t[0] = t_init
# Boundary Conditions
for m in xrange(0, M):
t[m] = m * dt
# Initial Conditions
for j in xrange(0, N):
x[j] = j * dx
# definition of solution to u_t = k * u_xx
u = np.zeros((N+1, M+1)) # NxM array to store values of the solution
# finite difference scheme
for j in xrange(0, N-1):
u[j][0] = x**2 #initial condition
for m in xrange(0, M):
for j in xrange(1, N-1):
if j == 1:
u[j-1][m] = 0 # Boundary condition
else:
u[j][m+1] = u[j][m] + s * ( u[j+1][m] - #FDM scheme
2 * u[j][m] + u[j-1][m] )
else:
if j == N-1:
u[j+1][m] = 0 # Boundary Condition
print u, t, x
#plt.plot(t, u)
#plt.show()
So the first issue I am having is I am trying to create an array/matrix to store values for the solution. I wanted it to be an NxM matrix, but in my code I made the matrix (N+1)x(M+1) because I kept getting an error that the index was going out of bounds. Anyways how can I make such a matrix using numpy.array so as not to needlessly take up memory by creating a (N+1)x(M+1) matrix filled with zeros?
Second, how can I "access" such an array? The real solution u(x,t) is approximated by u(x[j], t[m]) were j is the jth spatial value, and m is the mth time value. The finite difference scheme is given by:
u(x[j],t[m+1]) = u(x[j],t[m]) + s * ( u(x[j+1],t[m]) - 2 * u(x[j],t[m]) + u(x[j-1],t[m]) )
(See here for the formulation)
I want to be able to implement the Initial Condition u(x[j],t[0]) = x**2 for all values of j = 0,...,N-1. I also need to implement Boundary Conditions u(x[0],t[m]) = 0 = u(x[N],t[m]) for all values of t = 0,...,M. Is the nested loop I created the best way to do this? Originally I tried implementing the I.C. and B.C. under two different for loops which I used to calculate values of the matrices x and t (in my code I still have comments placed where I tried to do this)
I think I am just not using the right notation but I cannot find anywhere in the documentation for NumPy how to "call" such an array so at to iterate through each value in the proposed scheme. Can anyone shed some light on what I am doing wrong?
Any help is very greatly appreciated. This is not homework but rather to understand how to program FDM for Heat Equation because later I will use similar methods to solve the Black-Scholes PDE.
EDIT: So when I run my code on line 60 (the last "else" that I use) I get an error that says invalid syntax, and on line 51 (u[j][0] = x**2 #initial condition) I get an error that reads "setting an array element with a sequence." What does that mean?

Correctly annotate a numba function using jit

I started with this code to calculate a simple matrix multiplication. It runs with %timeit in around 7.85s on my machine.
To try to speed this up I tried cython which reduced the time to 0.4s. I want to also try to use numba jit compiler to see if I can get similar speed ups (with less effort). But adding the #jit annotation appears to give exactly the same timings (~7.8s). I know it can't figure out the types of the calculate_z_numpy() call but I'm not sure what I can do to coerce it. Any ideas?
from numba import jit
import numpy as np
#jit('f8(c8[:],c8[:],uint)')
def calculate_z_numpy(q, z, maxiter):
"""use vector operations to update all zs and qs to create new output array"""
output = np.resize(np.array(0, dtype=np.int32), q.shape)
for iteration in range(maxiter):
z = z*z + q
done = np.greater(abs(z), 2.0)
q = np.where(done, 0+0j, q)
z = np.where(done, 0+0j, z)
output = np.where(done, iteration, output)
return output
def calc_test():
w = h = 1000
maxiter = 1000
# make a list of x and y values which will represent q
# xx and yy are the co-ordinates, for the default configuration they'll look like:
# if we have a 1000x1000 plot
# xx = [-2.13, -2.1242,-2.1184000000000003, ..., 0.7526000000000064, 0.7584000000000064, 0.7642000000000064]
# yy = [1.3, 1.2948, 1.2895999999999999, ..., -1.2844000000000058, -1.2896000000000059, -1.294800000000006]
x1, x2, y1, y2 = -2.13, 0.77, -1.3, 1.3
x_step = (float(x2 - x1) / float(w)) * 2
y_step = (float(y1 - y2) / float(h)) * 2
y = np.arange(y2,y1-y_step,y_step,dtype=np.complex)
x = np.arange(x1,x2,x_step)
q1 = np.empty(y.shape[0],dtype=np.complex)
q1.real = x
q1.imag = y
# Transpose y
x_y_square_matrix = x+y[:, np.newaxis] # it is np.complex128
# convert square matrix to a flatted vector using ravel
q2 = np.ravel(x_y_square_matrix)
# create z as a 0+0j array of the same length as q
# note that it defaults to reals (float64) unless told otherwise
z = np.zeros(q2.shape, np.complex128)
output = calculate_z_numpy(q2, z, maxiter)
print(output)
calc_test()
I figured out how to do this with some help from someone else.
#jit('i4[:](c16[:],c16[:],i4,i4[:])',nopython=True)
def calculate_z_numpy(q, z, maxiter,output):
"""use vector operations to update all zs and qs to create new output array"""
for iteration in range(maxiter):
for i in range(len(z)):
z[i] = z[i] + q[i]
if z[i] > 2:
output[i] = iteration
z[i] = 0+0j
q[i] = 0+0j
return output
What I learnt is that use numpy datastructures as inputs (for typing), but within use c like paradigms for looping.
This runs in 402ms which is a touch faster than cython code 0.45s so for fairly minimal work in rewriting the loop explicitly we have a python version faster than C(just).

Can I vectorise this python code?

I have written this python code to get neighbours of a label (a set of pixels sharing some common properties). The neighbours for a label are defined as the other labels that lie on the other side of the boundary (the neighbouring labels share a boundary). So, the code I wrote works but is extremely slow:
# segments: It is a 2-dimensional numpy array (an image really)
# where segments[x, y] = label_index. So each entry defines the
# label associated with a pixel.
# i: The label whose neighbours we want.
def get_boundaries(segments, i):
neighbors = []
for y in range(1, segments.shape[1]):
for x in range(1, segments.shape[0]):
# Check if current index has the label we want
if segments[x-1, y] == i:
# Check if neighbour in the x direction has
# a different label
if segments[x-1, y] != segments[x, y]:
neighbors.append(segments[x,y])
# Check if neighbour in the y direction has
# a different label
if segments[x, y-1] == i:
if segments[x, y-1] != segments[x, y]:
neighbors.append(segments[x, y])
return np.unique(np.asarray(neighbors))
As you can imagine, I have probably completely misused python here. I was wondering if there is a way to optimize this code to make it more pythonic.
Here you go:
def get_boundaries2(segments, i):
x, y = np.where(segments == i) # where i is
right = x + 1
rightMask = right < segments.shape[0] # keep in bounds
down = y + 1
downMask = down < segments.shape[1]
rightNeighbors = segments[right[rightMask], y[rightMask]]
downNeighbors = segments[x[downMask], down[downMask]]
neighbors = np.union1d(rightNeighbors, downNeighbors)
return neighbors
As you can see, there are no Python loops at all; I also tried to minimize copies (the first attempt made a copy of segments with a NAN border, but then I devised the "keep in bounds" check).
Note that I did not filter out i itself from the "neighbors" here; you can add that easily at the end if you want. Some timings:
Input 2000x3000: original takes 13 seconds, mine takes 370 milliseconds (35x speedup).
Input 1000x300: original takes 643 ms, mine takes 17.5 ms (36x speedup).
You need to replace your for loops with numpy's implicit looping.
I don't know enough about your code to convert it directly, but I can give an example.
Suppose you have an array of 100000 random integers, and you need to get an array of each element divided by its neighbor.
import random, numpy as np
a = np.fromiter((random.randint(1, 100) for i in range(100000)), int)
One way to do this would be:
[a[i] / a[i+1] for i in range(len(a)-1)]
Or this, which is much faster:
a / np.roll(a, -1)
Timeit:
initcode = 'import random, numpy as np; a = np.fromiter((random.randint(1, 100) for i in range(100000)), int)'
timeit.timeit('[a[i] / a[i+1] for i in range(len(a)-1)]', initcode, number=100)
5.822079309000401
timeit.timeit('(a / np.roll(a, -1))', initcode, number=100)
0.1392055350006558

Categories