Speed of scipy fsolve in vectorised code

Speed of scipy fsolve in vectorised code - python

I have an array of size (254, 80) which I need to use scipy's fsolve on. I have found that the speed of using fsolve on a vector is quicker than it is in a for loop but only for vectors upto about 100 values long. After this, the speed quickly drops off and becomes very slow, sometimes completely stopping.
I'm currently looping through one dimension of the array and using a vectorised fsolve on the smaller dimension but it's still taking longer than I would expect/like.
Does anyone have a good work around for this or a know of a similar function which will be happy handling a vector of a larger size? Or perhaps if I am doing something wrong...
Here's the current code:
for i in range(array.shape[0]):
f = lambda y: a[i] - m[i]*y - md[i]*(( y**4 + 2*(y**2)*np.cos(Thetas[i,:]) )**0.25)
ystar[i,:] = fsolve(f, y0[i])
(The rest of the variables are all a similar size)
Digging in to this further, I have found that a function such as
f = lambda y: y*np.tanh(y) - a0/(m**2)
is faster to solve than
f = lambda y: (m**2)y*np.tanh(y) - a0
where m and a0 are large 2D np arrays.
Can anyone explain why this is?
Thanks,
Rachael

Although noone answered I found a workaround which avoided the fsolve function and used interpolation instead. Luckily the initial guess is good enough that only a few y values are needed. If the initial guess knowledge is poor then this method is probably not appropriate. Do note this still has some issues but for my purposes it performs well...
ystar = np.empty((A,B)) # empty array for the solutions
num_ys = 20 #number of points to find where the solution is
y0_u = y0 #just so the calculated initial guess isn't overwritten
for i in range(Thetas.shape[1]):
ys = np.linspace(-.05,.2,num_ys)[:,None]*np.ones((num_ys,Thetas.shape[0])) + y0_u
vals = (np.squeeze(eta) - np.squeeze(m)*ys*np.sqrt(g*np.tanh(ys**2*depth)) - np.squeeze(md)*np.sqrt(g*np.tanh(depth*np.sqrt(ys**4+2*(ys**2)*kB*np.cos(Thetas[:,i]+phi_bi)+kB**2)))*(( ys**4+2*(ys**2)*kB*np.cos(Thetas[:,i]+phi_bi)+kB**2 )**0.25))
idxs_important = -1*(np.clip(np.vstack(((np.sign(vals[:-1]*vals[1:])-1),np.zeros((1,Thetas[:,i].size)))),-1,0) + np.clip(np.vstack((np.zeros((1,Thetas[:,i].size)),(np.sign(vals[:-1]*vals[1:]))-1)),-1,0))
ys_chosen = idxs_important*ys
ys_chosen[ys_chosen==0] = 10000
sorted_ys_idx = np.argsort(ys_chosen.T, axis = 1)
sorted_ys = ((ys_chosen.T)[np.arange(np.shape(ys_chosen.T)[0])[:,np.newaxis],sorted_ys_idx]).T
sorted_vals = (((vals*idxs_important).T)[np.arange(np.shape(vals.T)[0])[:,np.newaxis],sorted_ys_idx]).T
# interpolation bit
x_id = 0
yposs = sorted_ys[:2,:]
valposs = sorted_vals[:2,:]
y = yposs[0,:] + (yposs[1,:] - yposs[0,:])*(x_id - valposs[0,:])/(valposs[1,:] - valposs[0,:])
ystar[:,i] = np.squeeze(y)
y0_u=ystar[:,i]

Related

Implement method of lines to solve PDE in Python scipy with comparable performance to Matlab's ode15s

I want to use the method of lines to solve the thin-film equation. I have implemented it (with gamma=mu=0) Matlab using ode15s and it seems to work fine:
N = 64;
x = linspace(-1,1,N+1);
x = x(1:end-1);
dx = x(2)-x(1);
T = 1e-2;
h0 = 1+0.1*cos(pi*x);
[t,h] = ode15s(#(t,y) thinFilmEq(t,y,dx), [0,T], h0);
function dhdt = thinFilmEq(t,h,dx)
phi = 0;
hxx = (circshift(h,1) - 2*h + circshift(h,-1))/dx^2;
p = phi - hxx;
px = (circshift(p,-1)-circshift(p,1))/dx;
flux = (h.^3).*px/3;
dhdt = (circshift(flux,-1) - circshift(flux,1))/dx;
end
The film just flattens after some time, and for large time the film should tend to h(t->inf)=1. I haven't done any rigorous check and convergence analysis, but at least the result looks promising after only spending less than 5 mins to code it.
I want to do the same thing in Python, and I tried the following:
import numpy as np
import scipy.integrate as spi
def thin_film_eq(t,h,dx):
print(t) # to check the current evaluation time for debugging
phi = 0
hxx = (np.roll(h,1) - 2*h + np.roll(h,-1))/dx**2
p = phi - hxx
px = (np.roll(p,-1) - np.roll(p,1))/dx
flux = h**3*px/3
dhdt = (np.roll(flux,-1) - np.roll(flux,1))/dx
return dhdt
N = 64
x = np.linspace(-1,1,N+1)[:-1]
dx = x[1]-x[0]
T = 1e-2
h0 = 1 + 0.1*np.cos(np.pi*x)
sol = spi.solve_ivp(lambda t,h: thin_film_eq(t,h,dx), (0,T), h0, method='BDF', vectorized=True)
I add a print statement inside the function so I can check the current progress of the program. For some reasons, it is taking very tiny time step and after waiting for a few minutes it is still stuck at t=3.465e-5, with dt smaller than 1e-10. (haven't finished yet by the time I finished typing this question, and it probably won't within any reasonable time). For the Matlab program, it is done within a second with only 14 time steps taken (I only specify the time span, and it outputs 14 time steps with everything else kept at default). I want to ask the following:
Have I done anything wrong which dramatically slows down the computation time for my Python code? What settings should I choose for the solve_ivp function call? One thing I'm not sure is if I do the vectorization properly. Also did I write the function in the correct way? I know this is a stiff ODE, but the ultra-small time step taken by
Is the difference really just down to the difference in the ode solver? scipy.integrate.solve_ivp(f, method='BDF') is the recommended substitute of ode15s according to the official numpy website. But for this particular example the performance difference is one second vs takes ages to solve. The difference is a lot bigger than I thought.
Are there other alternative methods I can try in Python for solving similar PDEs? (something along the line of finite difference/method of lines) I mean utilizing existing libraries, preferably those in scipy.

Precision Matlab and Python (numpy)

I'm converting a Matlab script to Python and I am getting different results in the 10**-4 order.
In matlab:
f_mean=f_mean+nanmean(f);
f = f - nanmean(f);
f_t = gradient(f);
f_tt = gradient(f_t);
if n_loop==1
theta = atan2( sum(f.*f_tt), sum(f.^2) );
end
theta = -2.2011167e+03
In Python:
f_mean = f_mean + np.nanmean(vel)
vel = vel - np.nanmean(vel)
firstDerivative = np.gradient(vel)
secondDerivative = np.gradient(firstDerivative)
if numberLoop == 1:
theta = np.arctan2(np.sum(vel * secondDerivative),
np.sum([vel**2]))
Although first and secondDerivative give the same results in Python and Matlab, f_mean is slightly different: -0.0066412 (Matlab) and -0.0066414 (Python); and so theta: -0.4126186 (M) and -0.4124718 (P). It is a small difference, but in the end leads to different results in my scripts.
I know some people asked about this difference, but always regarding std, which I get, but not regarding mean values. I wonder why it is.

One possible source of the initial difference you describe (between means) could be numpy's use of pairwise summation which on large arrays will typically be appreciably more accurate than the naive method:
a = np.random.uniform(-1, 1, (10**6,))
a = np.r_[-a, a]
# so the sum should be zero
a.sum()
# 7.815970093361102e-14
# use cumsum to get naive summation:
a.cumsum()[-1]
# -1.3716805469243809e-11
Edit (thanks #sascha): for the last word and as a "provably exact" reference you could use math.fsum:
import math
math.fsum(a)
# 0.0
Don't have matlab, so can't check what they are doing.

Working out an equation

I'm trying to solve a differential equation numerically, and am writing an equation that will give me an array of the solution to each time point.
import numpy as np
import matplotlib.pylab as plt
pi=np.pi
sin=np.sin
cos=np.cos
sqrt=np.sqrt
alpha=pi/4
g=9.80665
y0=0.0
theta0=0.0
sina = sin(alpha)**2
second_term = g*sin(alpha)*cos(alpha)
x0 = float(raw_input('What is the initial x in meters?'))
x_vel0 = float(raw_input('What is the initial velocity in the x direction in m/s?'))
y_vel0 = float(raw_input('what is the initial velocity in the y direction in m/s?'))
t_f = int(raw_input('What is the maximum time in seconds?'))
r0 = x0
vtan = sqrt(x_vel0**2+y_vel0**2)
dt = 1000
n = range(0,t_f)
r_n = r0*(n*dt)
r_nm1 = r0((n-1)*dt)
F_r = ((vtan**2)/r_n)*sina-second_term
r_np1 = 2*r_n - r_nm1 + dt**2 * F_r
data = [r0]
for time in n:
data.append(float(r_np1))
print data
I'm not sure how to make the equation solve for r_np1 at each time in the range n. I'm still new to Python and would like some help understanding how to do something like this.

First issue is:
n = range(0,t_f)
r_n = r0*(n*dt)
Here you define n as a list and try to multiply the list n with the integer dt. This will not work. Pure Python is NOT a vectorized language like NumPy or Matlab where you can do vector multiplication like this. You could make this line work with
n = np.arange(0,t_f)
r_n = r0*(n*dt),
but you don't have to. Instead, you should move everything inside the for loop to do the calculation at each timestep. At the present point, you do the calculation once, then add the same only result t_f times to the data list.
Of course, you have to leave your initial conditions (which is a key part of ODE solving) OUTSIDE of the loop, because they only affect the first step of the solution, not all of them.
So:
# Initial conditions
r0 = x0
data = [r0]
# Loop along timesteps
for n in range(t_f):
# calculations performed at each timestep
vtan = sqrt(x_vel0**2+y_vel0**2)
dt = 1000
r_n = r0*(n*dt)
r_nm1 = r0*((n-1)*dt)
F_r = ((vtan**2)/r_n)*sina-second_term
r_np1 = 2*r_n - r_nm1 + dt**2 * F_r
# append result to output list
data.append(float(r_np1))
# do something with output list
print data
plt.plot(data)
plt.show()
I did not add any piece of code, only rearranged your lines. Notice that the part:
n = range(0,t_f)
for time in n:
Can be simplified to:
for time in range(0,t_f):
However, you use n as a time variable in the calculation (previously - and wrongly - defined as a list instead of a single number). Thus you can write:
for n in range(0,t_f):
Note 1: I do not know if this code is right mathematically, as I don't even know the equation you're solving. The code runs now and provides a result - you have to check if the result is good.
Note 2: Pure Python is not the best tool for this purpose. You should try some highly optimized built-ins of SciPy for ODE solving, as you have already got hints in the comments.

Graphing n iterations of a function- Python

I'm studying dynamical systems, particularly the logistic family g(x) = cx(1-x), and I need to iterate this function an arbitrary amount of times to understand its behavior. I have no problem iterating the function given a specific point x_0, but again, I'd like to graph the entire function and its iterations, not just a single point. For plotting a single function, I have this code:
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
def logplot(c, n = 10):
dt = .001
x = np.arange(0,1.001,dt)
y = c*x*(1-x)
plt.plot(x,y)
plt.axis([0, 1, 0, c*.25 + (1/10)*c*.25])
plt.show()
I suppose I could tackle this by the lengthy/daunting method of explicitly creating a list of the range of each iteration using something like the following:
def log(c,x0):
return c*x0*(1-x)
def logiter(c,x0,n):
i = 0
y = []
while i <= n:
val = log(c,x0)
y.append(val)
x0 = val
i += 1
return y
But this seems really cumbersome and I was wondering if there were a better way. Thanks

Some different options
This is really a matter of style. Your solution works and is not very difficult to understand. If you want to go on on those lines, then I would just tweak it a bit:
def logiter(c, x0, n):
y = []
x = x0
for i in range(n):
x = c*x*(1-x)
y.append(x)
return np.array(y)
The changes:
for loop is easier to read than a while loop
x0 is not used in the iteration (this adds one more variable, but it is mathematically easier to understand; x0 is a constant)
the function is written out, as it is a very simple one-liner (if it weren't, its name should be changed to be something else than log, which is very easy to confuse with logarithm)
the result is converted into a numpy array. (Just what I usually do, if I need to plot something)
In my opinion the function is now legible enough.
You might also take an object-oriented approach and create a logistic function object:
class Logistics():
def __init__(self, c, x0):
self.x = x0
self.c = c
def next_iter(self):
self.x = self.c * self.x * (1 - self.x)
return self.x
Then you may use this:
def logiter(c, x0, n):
l = Logistics(c, x0)
return np.array([ l.next_iter() for i in range(n) ])
Or if you may make it a generator:
def log_generator(c, x0):
x = x0
while True:
x = c * x * (1-x)
yield x
def logiter(c, x0, n):
l = log_generator(c, x0)
return np.array([ l.next() for i in range(n) ])
If you need performance and have large tables, then I suggest:
def logiter(c, x0, n):
res = np.empty((n, len(x0)))
res[0] = c * x0 * (1 - x0)
for i in range(1,n):
res[i] = c * res[i-1] * (1 - res[i-1])
return res
This avoids the slowish conversion into np.array and some copying of stuff around. The memory is allocated only once, and the expensive conversion from a list into an array is avoided.
(BTW, if you returned an array with the initial x0 as the first row, the last version would look cleaner. Now the first one has to be calculated separately if copying the vector around is desired to be avoided.)
Which one is best? I do not know. IMO, all are readable and justified, it is a matter of style. However, I speak only very broken and poor Pythonic, so there may be good reasons why still something else is better or why something of the above is not good!
Performance
About performance: With my machine I tried the following:
logiter(3.2, linspace(0,1,1000), 10000)
For the first three approaches the time is essentially the same, approximately 1.5 s. For the last approach (preallocated array) the run time is 0.2 s. However, if the conversion from a list into an array is removed, the first one runs in 0.16 s, so the time is really spent in the conversion procedure.
Visualization
I can think of two useful but quite different ways to visualize the function. You mention that you will have, say, 100 or 1000 different x0's to start with. You do not mention how many iterations you want to have, but maybe we will start with just 100. So, let us create an array with 100 different x0's and 100 iterations at c = 3.2.
data = logiter(3.6, np.linspace(0,1,100), 100)
In a way a standard method to visualize the function is draw 100 lines, each of which represents one starting value. That is easy:
import matplotlib.pyplot as plt
plt.plot(data)
plt.show()
This gives:
Well, it seems that all values end up oscillating somewhere, but other than that we have only a mess of color. This approach may be more useful, if you use a narrower range of values for x0:
data = logiter(3.6, np.linspace(0.8,0.81,100), 100)
you may color-code the starting values by e.g.:
color1 = np.array([1,0,0])
color2 = np.array([0,0,1])
for i,k in enumerate(np.linspace(0, 1, data.shape[1])):
plt.plot(data[:,i], '.', color=(1-k)*color1 + k*color2)
This plots the first columns (corresponding to x0 = 0.80) in red and the last columns in blue and uses a gradual color change in between. (Please note that the more blue a dot is, the later it is drawn, and thus blues overlap reds.)
However, it is possible to take a quite different approach.
data = logiter(3.6, np.linspace(0,1,1000), 50)
plt.imshow(data.T, cmap=plt.cm.bwr, interpolation='nearest', origin='lower',extent=[1,21,0,1], vmin=0, vmax=1)
plt.axis('tight')
plt.colorbar()
gives:
This is my personal favourite. I won't spoil anyone's joy by explaining it too much, but IMO this shows many peculiarities of the behaviour very easily.

Here's what I was aiming for; an indirect approach to understanding (by visualization) the behavior of initial conditions of the function g(c, x) = cx(1-x):
def jam(c, n):
x = np.linspace(0,1,100)
y = c*x*(1-x)
for i in range(n):
plt.plot(x, y)
y = c*y*(1-y)
plt.show()

Python optimization problem?

Alright, i had this homework recently (don't worry, i've already done it, but in c++) but I got curious how i could do it in python. The problem is about 2 light sources that emit light. I won't get into details tho.
Here's the code (that I've managed to optimize a bit in the latter part):
import math, array
import numpy as np
from PIL import Image
size = (800,800)
width, height = size
s1x = width * 1./8
s1y = height * 1./8
s2x = width * 7./8
s2y = height * 7./8
r,g,b = (255,255,255)
arr = np.zeros((width,height,3))
hy = math.hypot
print 'computing distances (%s by %s)'%size,
for i in xrange(width):
if i%(width/10)==0:
print i,
if i%20==0:
print '.',
for j in xrange(height):
d1 = hy(i-s1x,j-s1y)
d2 = hy(i-s2x,j-s2y)
arr[i][j] = abs(d1-d2)
print ''
arr2 = np.zeros((width,height,3),dtype="uint8")
for ld in [200,116,100,84,68,52,36,20,8,4,2]:
print 'now computing image for ld = '+str(ld)
arr2 *= 0
arr2 += abs(arr%ld-ld/2)*(r,g,b)/(ld/2)
print 'saving image...'
ar2img = Image.fromarray(arr2)
ar2img.save('ld'+str(ld).rjust(4,'0')+'.png')
print 'saved as ld'+str(ld).rjust(4,'0')+'.png'
I have managed to optimize most of it, but there's still a huge performance gap in the part with the 2 for-s, and I can't seem to think of a way to bypass that using common array operations... I'm open to suggestions :D
Edit:
In response to Vlad's suggestion, I'll post the problem's details:
There are 2 light sources, each emitting light as a sinusoidal wave:
E1 = E0*sin(omega1*time+phi01)
E2 = E0*sin(omega2*time+phi02)
we consider omega1=omega2=omega=2*PI/T and phi01=phi02=phi0 for simplicity
by considering x1 to be the distance from the first source of a point on the plane, the intensity of the light in that point is
Ep1 = E0*sin(omega*time - 2*PI*x1/lambda + phi0)
where
lambda = speed of light * T (period of oscillation)
Considering both light sources on the plane, the formula becomes
Ep = 2*E0*cos(PI*(x2-x1)/lambda)sin(omegatime - PI*(x2-x1)/lambda + phi0)
and from that we could make out that the intensity of the light is maximum when
(x2-x1)/lambda = (2*k) * PI/2
and minimum when
(x2-x1)/lambda = (2*k+1) * PI/2
and varies in between, where k is an integer
For a given moment of time, given the coordinates of the light sources, and for a known lambda and E0, we had to make a program to draw how the light looks
IMHO i think i optimized the problem as much as it could be done...

Interference patterns are fun, aren't they?
So, first off this is going to be minor because running this program as-is on my laptop takes a mere twelve and a half seconds.
But let's see what can be done about doing the first bit through numpy array operations, shall we? We have basically that you want:
arr[i][j] = abs(hypot(i-s1x,j-s1y) - hypot(i-s2x,j-s2y))
For all i and j.
So, since numpy has a hypot function that works on numpy arrays, let's use that. Our first challenge is to get an array of the right size with every element equal to i and another with every element equal to j. But this isn't too hard; in fact, an answer below points my at the wonderful numpy.mgrid which I didn't know about before that does just this:
array_i,array_j = np.mgrid[0:width,0:height]
There is the slight matter of making your (width, height)-sized array into (width,height,3) to be compatible with your image-generation statements, but that's pretty easy to do:
arr = (arr * np.ones((3,1,1))).transpose(1,2,0)
Then we plug this into your program, and let things be done by array operations:
import math, array
import numpy as np
from PIL import Image
size = (800,800)
width, height = size
s1x = width * 1./8
s1y = height * 1./8
s2x = width * 7./8
s2y = height * 7./8
r,g,b = (255,255,255)
array_i,array_j = np.mgrid[0:width,0:height]
arr = np.abs(np.hypot(array_i-s1x, array_j-s1y) -
np.hypot(array_i-s2x, array_j-s2y))
arr = (arr * np.ones((3,1,1))).transpose(1,2,0)
arr2 = np.zeros((width,height,3),dtype="uint8")
for ld in [200,116,100,84,68,52,36,20,8,4,2]:
print 'now computing image for ld = '+str(ld)
# Rest as before
And the new time is... 8.2 seconds. So you save maybe four whole seconds. On the other hand, that's almost exclusively in the image generation stages now, so maybe you can tighten them up by only generating the images you want.

If you use array operations instead of loops, it is much, much faster. For me, the image generation is now what takes so long time. Instead of your two i,j loops, I have this:
I,J = np.mgrid[0:width,0:height]
D1 = np.hypot(I - s1x, J - s1y)
D2 = np.hypot(I - s2x, J - s2y)
arr = np.abs(D1-D2)
# triplicate into 3 layers
arr = np.array((arr, arr, arr)).transpose(1,2,0)
# .. continue program
The basics that you want to remember for the future is: this is not about optimization; using array forms in numpy is just using it like it is supposed to be used. With experience, your future projects should not go the detour over python loops, the array forms should be the natural form.
What we did here was really simple. Instead of math.hypot we found numpy.hypot and used it. Like all such numpy functions, it accepts ndarrays as arguments, and does exactly what we want.

List comprehensions are much faster than loops. For example, instead of
for j in xrange(height):
d1 = hy(i-s1x,j-s1y)
d2 = hy(i-s2x,j-s2y)
arr[i][j] = abs(d1-d2)
You'd write
arr[i] = [abs(hy(i-s1x,j-s1y) - hy(i-s2x,j-s2y)) for j in xrange(height)]
On the other hand, if you're really trying to "optimize", then you might want to reimplement this algorithm in C, and use SWIG or the like to call it from python.

The only changes that come to my mind is to move some operations out of the loop:
for i in xrange(width):
if i%(width/10)==0:
print i,
if i%20==0:
print '.',
arri = arr[i]
is1x = i - s1x
is2x = i - s2x
for j in xrange(height):
d1 = hy(is1x,j-s1y)
d2 = hy(is2x,j-s2y)
arri[j] = abs(d1-d2)
The improvement, if any, will probably be minor though.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Speed of scipy fsolve in vectorised code - python

Related

Implement method of lines to solve PDE in Python scipy with comparable performance to Matlab's ode15s

Precision Matlab and Python (numpy)

Working out an equation

Graphing n iterations of a function- Python

Python optimization problem?

Categories

Resources