I am playing with SciPy today and I wanted to test least square fitting. The function malo(time) works perfectly in returning me calculated concentrations if I put it in a loop which iterates over an array of timesteps (in the code "time").
Now I want to compare my calculated concentrations with my measured ones. I created a residuals function which calculates the difference between measured concentration (in the script an array called conc) and the modelled concentration with malo(time).
With optimize.leastsq I want to fit the parameter PD to fit both curves as good as possible. I don't see a mistake in my code, malo(time) performs well, but whenever I want to run the optimize.leastsq command Python says "only length-1 arrays can be converted to Python scalars". If I set the timedt array to a single value, the code runs without any error.
Do you see any chance to convince Python to use my array of timesteps in the loop?
import pylab as p
import math as m
import numpy as np
from scipy import optimize
Q = 0.02114
M = 7500.0
dt = 30.0
PD = 0.020242215
tom = 26.0 #Minuten
tos = tom * 60.0 #Sekunden
timedt = np.array([30.,60.,90])
conc= np.array([ 2.7096, 2.258 , 1.3548, 0.9032, 0.9032])
def malo(time):
M1 = M/Q
M2 = 1/(tos*m.sqrt(4*m.pi*PD*((time/tos)**3)))
M3a = (1 - time/tos)**2
M3b = 4*PD*(time/tos)
M3 = m.exp(-1*(M3a/M3b))
out = M1 * M2 * M3
return out
def residuals(p,y,time):
PD = p
err = y - malo(timedt)
return err
p0 = 0.05
p1 = optimize.leastsq(residuals,p0,args=(conc,timedt))
Notice that you're working here with arrays defined in NumPy module. Eg.
timedt = np.array([30.,60.,90])
conc= np.array([ 2.7096, 2.258 , 1.3548, 0.9032, 0.9032])
Now, those arrays are not part of standard Python (which is a general purpose language). The problem is that you're mixing arrays with regular operations from the math module, which is part of the standard Python and only meant to work on scalars.
So, for example:
M2 = 1/(tos*m.sqrt(4*m.pi*PD*((time/tos)**3)))
will work if you use np.sqrt instead, which is designed to work on arrays:
M2 = 1/(tos*np.sqrt(4*m.pi*PD*((time/tos)**3)))
And so on.
NB: SciPy and other modules meant for numeric/scientific programming know about NumPy and are built on top of it, so those functions should all work on arrays. Just don't use math when working with them. NumPy comes with replicas of all those functions (sqrt, cos, exp, ...) to work with your arrays.
Related
I have two Numpy (complex) arrays A[t],B[t] defined over a grid of points "t". These two arrays are convolved in a way such that I want a third array C[y] = (A*B)(y), where "y" needs to be exactly the same points as the "t" grid. The point is that both A and B need to be integrated from -\infty to \infty according to the standard convolution operation.
Im using scipy.signal.convolve for this, and I would also like to use the fftconvolve since my arrays are supposed to be big enough. However, when I try the module on a minimal working code, I seem to be doing things very wrong. Here is a piece of the code, where I choose A(t) = exp( -t**2 ) and B(t) = exp(-t). The convolution of these two functions in Mathematica gives:
C[y] = \integral_{-\infty}^{\infty} dt A[t]B[ y- t ] = sqrt(pi)*exp( 0.25 - y )
But then I try this in Python and get very wrong results:
import scipy.signal as scp
import numpy as np
import matplotlib.pyplot as plt
delta = 0.001
t = np.arange(1000)*delta
a = np.exp( -t**2 )
b = np.exp( -t )
c = scp.convolve(a, b, mode='same')*delta
d = np.sqrt(np.pi)*np.exp( 0.25 - t )
plt.plot(np.arange(len(c)) * delta, c)
plt.plot(t[::50], d[::50], 'o')
As far as I understood, the "same" mode allows for evaluation over the same points of the original grids, but this doesn't seem to be the case... Any help is greatly appreciated!
I am trying to solve a second order ODE with solve_bvp. I have split the second order ODE into a system of tow first oder ODEs. I have a changing set of constants depending on the x (mesh) value. So I am passing these as an array of shape (N,) into my function numdens. While trying to run solve_bvp I get the error that the returns have different shapes namely (N,) and (N-1,) and thus cannot be broadcast into one array. But when I check each return back manually outside of the function it has the shape (N,).
If I run the solver without my changing constants I get a solution akin to the right one.
import numpy as np
from scipy.integrate import solve_bvp,odeint
import matplotlib.pyplot as plt
E_0 = 1 * 0.0000016021773 #erg: gcm^2/s^2
m_H = 1.6*10**(-24) #g
c = 3e11 #cm
sigma_c = 2*10**(-23)
n_0 = 1*10**(20) #1/cm^3
v_0 = (2*E_0/m_H)**(0.5) #cm/s
T = 10**7
b = 20.3
n_eq = b*T**3
n_s = 2.03*10**(19)
Q = 1
def velocity(v,x):
dvdx = -sigma_c*n_0*v_0*((8*v_0*v-7*v**2-v_0**2)/(2*v*c))
return dvdx
n_num = 100
x_num = np.linspace(-1*10**(6),3*10**(6), n_num)
sol_velo = odeint(velocity,0.999999999999*v_0,x_num)
sol_new = np.reshape(sol_velo,n_num)
def constants(v):
D1 = (c*v/(3*n_0*v_0*sigma_c))
D2 = ((v**2-8*v_0*v+v_0**2)/(6*v))
D3 = sigma_c*n_0*v_0*((8*v_0*v-7*v**2-v_0**2)/(2*v*c))
return D1,D2,D3
def numdens(x,y):
v = sol_new
D1,D2,D3 = constants(v)
return np.vstack((y[1],(-D2*y[1]-D3*y[0]+Q*((1-y[0])/n_eq))/(D1)))
def bc_num(ya, yb):
return np.array([ya[0]-n_s,yb[0]-n_eq])
y_num = np.array([np.linspace(n_s, n_eq, n_num),np.linspace(n_s, n_eq, n_num)])
sol_num = solve_bvp(numdens, bc_num, x_num, y_num)
plt.plot(sol_num.x, sol_num.y[0], label='$n(x)$')
plt.plot(x_num, sol_velo-v_0/7, label='$v(x)$')
plt.yscale('log')
plt.grid(alpha=0.5)
plt.legend(framealpha=1)
plt.show()
You need to take into account that the BVP solver uses an adaptive mesh. That is, after refining the initial guess on the initial grid the solver identifies regions with overly large errors and creates new mesh nodes there. As far as I have seen, the opposite is not implemented, even if it may be in some applications sensible to reduce the number of mesh nodes on especially "nice" segments.
Thus what you are doing the the numdens function is incomprehensible, it has to function exactly like any other function that you would pass to an ODE solver. If I had to propose some fast fix, and without knowing what the underlying problem is that you want to solve, I would change the assignment of v to
v = np.interp(x,x_num,sol_velo)
as that should at least produce an array of the correct format.
For my dynamics course I am tasked with writing a python code that will plot the trajectory of a position vector from when it starts on the ground to when it lands on the ground. I currently have my code create a linear space from the two zero values that I calculated by hand, but I want to code that in. Because I also need to create velocity vectors on the trajectory, I have the position vector broken into its x and y components. I have looked into xlim and this thread, but couldn't figure out how to implement them. I'm fairly new to python and coding in general, so I'm still trying to learn how things work.
import numpy as np
import matplotlib.pyplot as plt
#creates a function that returns the x component
def re10(x):
r1 = 0.05*x
return r1
#creates a function that returns the y component
def re20(x):
r2 = -4.91*(x**2) + 30*x + 100
return r2
#Calculates the two zeroes of the trajectory
tmin = (-30 + np.sqrt(30**2 -4*-4.91*100))/(2*-4.91)
tmax = (-30 - np.sqrt(30**2 -4*-4.91*100))/(2*-4.91)
#Initializing time space
t = np.linspace(tmin, tmax, 100)
#Plot
plt.plot(re10(t), re20(t)) #(x, y)
You can easily find the zeroes of a funtion using numpy library.
First, install it. Open a cmd console and write pip install numpy.
Then, write this code in your script:
import numpy
re20 = [-4.91, 30, 100]
zeroes = numpy.roots(coeff)
print(zeroes[0])
print(zeroes[1])
As you will see when running the script from console (or you IDE), numpy.roots(function) will return you the zeroes of your function, as an array.
That is why you use the [] operator to access each one of them (take note that in programming, an array's first element will be at index 0).
To use it directly into your code, you can do:
tmin = zeroes[0]
tmax = zeroes[1]
Simpy is for symbolic operations, it is pretty powerful, but you don't need it for this, my mistake.
Hope you have fun with Python, it's a cool language !
My python code takes about 6.2 seconds to run. The Matlab code runs in under 0.05 seconds. Why is this and what can I do to speed up the Python code? Is Cython the solution?
Matlab:
function X=Test
nIter=1000000;
Step=.001;
X0=1;
X=zeros(1,nIter+1); X(1)=X0;
tic
for i=1:nIter
X(i+1)=X(i)+Step*(X(i)^2*cos(i*Step+X(i)));
end
toc
figure(1) plot(0:nIter,X)
Python:
nIter = 1000000
Step = .001
x = np.zeros(1+nIter)
x[0] = 1
start = time.time()
for i in range(1,1+nIter):
x[i] = x[i-1] + Step*x[i-1]**2*np.cos(Step*(i-1)+x[i-1])
end = time.time()
print(end - start)
How to speed up your Python code
Your largest time sink is np.cos which performs several checks on the format of the input.
These are relevant and usually negligible for high-dimensional inputs, but for your one-dimensional input, this becomes the bottleneck.
The solution to this is to use math.cos, which only accepts one-dimensional numbers as input and thus is faster (though less flexible).
Another time sink is indexing x multiple times.
You can speed this up by having one state variable which you update and only writing to x once per iteration.
With all of this, you can speed up things by a factor of roughly ten:
import numpy as np
from math import cos
nIter = 1000000
Step = .001
x = np.zeros(1+nIter)
state = x[0] = 1
for i in range(nIter):
state += Step*state**2*cos(Step*i+state)
x[i+1] = state
Now, your main problem is that your truly innermost loop happens completely in Python, i.e., you have a lot of wrapping operations that eat up time.
You can avoid this by using uFuncs (e.g., created with SymPy’s ufuncify) and using NumPy’s accumulate:
import numpy as np
from sympy.utilities.autowrap import ufuncify
from sympy.abc import t,y
from sympy import cos
nIter = 1000000
Step = 0.001
state = x[0] = 1
f = ufuncify([y,t],y+Step*y**2*cos(t+y))
times = np.arange(0,nIter*Step,Step)
times[0] = 1
x = f.accumulate(times)
This runs practically within an instant.
… and why that’s not what you should worry about
If your exact code (and only that) is what you care about, then you shouldn’t worry about runtime anyway, because it’s very short either way.
If on the other hand, you use this to gauge efficiency for problems with a considerable runtime, your example will fail because it considers only one initial condition and is a very simple dynamics.
Moreover, you are using the Euler method, which is either not very efficient or robust, depending on your step size.
The latter (Step) is absurdly low in your case, yielding much more data than you probably need:
With a step size of 1, You can see what’s going on just fine.
If you want a robust integration in such cases, it’s almost always best to use a modern adaptive integrator, that can adjust its step size itself, e.g., here is a solution to your problem using a native Python integrator:
from math import cos
import numpy as np
from scipy.integrate import solve_ivp
T = 1000
dt = 0.001
x = solve_ivp(
lambda t,state: state**2*cos(t+state),
t_span = (0,T),
t_eval = np.arange(0,T,dt),
y0 = [1],
rtol = 1e-5
).y
This automatically adjusts the step size to something higher, depending on the error tolerance rtol.
It still returns the same amount of output data, but that’s via interpolation of the solution.
It runs in 0.3 s for me.
How to speed up things in a scalable manner
If you still need to speed up something like this, chances are that your derivative (f) is considerably more complex than in your example and thus it is the bottleneck.
Depending on your problem, you may be able to vectorise its calcultion (using NumPy or similar).
If you can’t vectorise, I wrote a module that specifically focusses on this by hard-coding your derivative under the hood.
Here is your example in with a sampling step of 1.
import numpy as np
from jitcode import jitcode,y,t
from symengine import cos
T = 1000
dt = 1
ODE = jitcode([y(0)**2*cos(t+y(0))])
ODE.set_initial_value([1])
ODE.set_integrator("dop853")
x = np.hstack([ODE.integrate(t) for t in np.arange(0,T,dt)])
This runs again within an instant. While this may not be a relevant speed boost here, this is scalable to huge systems.
The difference is jit-compilation, which Matlab uses per default. Let's try your example with Numba(a Python jit-compiler)
Code
import numba as nb
import numpy as np
import time
nIter = 1000000
Step = .001
#nb.njit()
def integrate(nIter,Step):
x = np.zeros(1+nIter)
x[0] = 1
for i in range(1,1+nIter):
x[i] = x[i-1] + Step*x[i-1]**2*np.cos(Step*(i-1)+x[i-1])
return x
#Avoid measuring the compilation time,
#this would be also recommendable for Matlab to have a fair comparison
res=integrate(nIter,Step)
start = time.time()
for i in range(100):
res=integrate(nIter,Step)
end=time.time()
print((end - start)/100)
This results in 0.022s runtime per call.
When doing some estimations, calculations, and other fun stuff in Python I came across something really weird and upsetting.
I have this thing where I estimate some parameters using ML-estimation, and have previously assumed that everything was peachy and fine. I read csv-data with pandas, and use the subsequent data for the estimation. Therefore, the data has originally been passed down to the ML-estimation function as Pandas Series. Today I wanted to try some matrix-operations on a thing in the calculation for kicks-and-giggles, and converted the input-data to numpy arrays. However, when I ran the code, the estimation results were different. After restoring some of the multiplications, it was still different. Then I changed back to using Pandas series, and it returned to the previously expected result.
This is where I got curious and now turn to you. Is it so that there is a rounding error between float64 Numpy arrays and float64 Pandas Series so different that when doing my calculations, they get so drastically different?
Consider the following code-example containing a sample from my ML-estimator
import pandas as pd
import numpy as np
import math
values = [3.41527085753, 3.606855606852, 3.5550625070226231, 3.680327020956565, \
3.30270511221, 3.704752803295, 3.6307205395804001, 3.200863997609199, \
2.90599272353, 3.555062501231, 2.8047528032711295, 3.415270760685753, \
3.50599277872, 3.445622506242, 3.3047528084632258, 3.219431344191376, \
3.68032756565, 3.451542245654, 3.2244456543387564, 2.999848273256456]
Ps = pd.Series(values, dtype=np.float64)
Narr = np.array(values, dtype=np.float64)
def getLambda(S, delta = 1/255):
n = len(S) - 1
Sx = sum( S[0:-1] )
Sy = sum( S[1:] )
Sxx = sum( S[0:-1]**2 )
Sxy = sum( S[0:-1]*S[1:] )
mu = (Sy*Sxx - Sx*Sxy) / ( n*(Sxx - Sxy) - (Sx**2 - Sx*Sy) )
lambd = np.log((Sxy - mu*Sx - mu*Sy + n*mu**2) / (Sxx -2*mu*Sx + n*mu**2) )/ delta
a = math.exp(-lambd*delta)
return mu, a, lambd
print("Numpy Array calculation gives me mu = {}, alpha = {} and Lambda = {}".format(getLambda(Narr)[0], getLambda(Narr)[1], getLambda(Narr)[2]))
print("Pandas Series calculation gives me mu = {}, alpha = {} and Lambda = {}".format(getLambda(Ps)[0], getLambda(Ps)[1], getLambda(Ps)[2]))
The values are just some random value picked from my original data in my larger project.
This will, atleast for me, print:
>> Numpy Array calculation gives me mu = 3.378432651661709, alpha = 102.09644146650535 and Lambda = -1179.6090571432392
>> Pandas Series calculation gives me mu = 3.3981019891871247, alpha = nan and Lambda = nan
The procedure, method, and original data is identical, and it still gets a difference of about 0.019669 already in the calculation of mu, which is for me really weird and upsetting.
If this is due to difference (keep in mind that I explicity stated that it should be float64 in both cases) in rounding between the two methods of handling data its weird as this just makes me question which and why I should use any of them. Otherwise, there has to be a bug in one of them? Or is there a third alternative which explains everything and was something that I did not know of to begin with?