Sympy: integrate() strange output - python

I'm just learning how to use sympy and I have tried a simple integration of a sin function. When the argument of sin() has a constant phase constant the output of integrate() gives the same value whatever is the phase: 0
from sympy import *
w = 0.01
phi = 0.3
k1 = integrate(sin(w*x), (x, 0.0, 10.0))
k2 = integrate(sin(w*x + 0.13), (x, 0.0, 10.0))
k3 = integrate(sin(w*x + phi),(x, 0.0, 10.0))
k1, k2, k3
(0.499583472197429, 0, 0)
Can somebody explain me why ?

That seems to be a bug. A workaround solution could be to get a symbolic expression of your integral first (which seems to work fine), then evaluate it for each set of parameters at the upper and lower bound and calculate the difference:
import sympy as sp
x, w, phi = sp.symbols('x w phi')
# integrate function symbolically
func = sp.integrate(sp.sin(w * x + phi), x)
# define your parameters
para = [{'w': 0.01, 'phi': 0., 'lb': 0., 'ub': 10., 'res': 0.},
{'w': 0.01, 'phi': 0.13, 'lb': 0., 'ub': 10., 'res': 0.},
{'w': 0.01, 'phi': 0.3, 'lb': 0., 'ub': 10., 'res': 0.}]
# evaluate your function for all parameters using the function subs
for parai in para:
parai['res'] = func.subs({w: parai['w'], phi: parai['phi'], x: parai['ub']})
-func.subs({w: parai['w'], phi: parai['phi'], x: parai['lb']})
After this, para looks then as follows:
[{'lb': 0.0, 'phi': 0.0, 'res': 0.499583472197429, 'ub': 10.0, 'w': 0.01},
{'lb': 0.0, 'phi': 0.13, 'res': 1.78954987094131, 'ub': 10.0, 'w': 0.01},
{'lb': 0.0, 'phi': 0.3, 'res': 3.42754951227208, 'ub': 10.0, 'w': 0.01}]
which seems to give reasonable results for the integration which are stored in res

I just ran your code in the development version of SymPy and I got (0.499583472197429, 1.78954987094131, 3.42754951227208). So it seems the bug will be fixed in the next version.
It also looks like this bug is in Python 2 only. When I use Python 3, even with the latest stable version (0.7.6.1) I get the same answer.

Can I recommend using numpy for numerical integration?
>>> import numpy as np
>>> w = 0.01
>>> phi = 0.3
>>> dt = 0.01
>>> t = 2*np.pi*np.arange(0,1,dt)
>>> np.sum( np.sin(t)*dt)
-1.0733601507606494e-17
>>> np.sum( np.sin(t+ phi)*dt)
2.5153490401663703e-17
These numbers are basically close to 0. The exact number is an artifact of our choice of mesh dt and shift phi (as well as the accuracy of np.sin)
To be more consistent with your example:
>>> t = np.arange(0,10,dt)
>>> w = 0.01
>>> phi = 0.3
>>> np.sum( np.sin(w*t)*dt)
0.4990843046978698
>>> np.sum( np.sin(w*t + phi)*dt)
3.4270800187375658
>>> np.sum( np.sin(w*t + 0.13)*dt)
1.7890581525454512
As quoted in Integrating in Python using Sympy it's a bad idea to use a symbolic library for numerical work

Related

Fit data to integral using quad - magnetic hysteresis loop

I'm having trouble getting a fit to converge, as it's either not converging or giving a NaN error, depending on my start parameters. I'm using quad to integrate and fitting using lmfit. Any help is appreciated.
I'm fitting my data to a Langevin function, weighted by a log-normal distribution. Stackoverflow won't let me post an image of the function because of my reputation score, but it's in the code below.
I'm plugging in H (field) and fitting for Ms, Dm, and sigma, while mu_0, Msb, kb, and T are all constants.
Here's what I'm working with, using some example data:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy
from numpy import vectorize, sqrt, log, inf, exp, pi, tanh
from scipy.constants import k, mu_0
from lmfit import Parameters
from scipy.integrate import quad
x_data = [-7.0, -6.5, -6.0, -5.5, -5.0, -4.5, -4.0, -3.5, -3.0, -2.5, -2.0, -1.5, -1.0,
-0.95, -0.9, -0.85, -0.8, -0.75, -0.7, -0.65, -0.6, -0.55, -0.5, -0.45, -0.4,
-0.35, -0.3, -0.25, -0.2, -0.1,-0.05, 3e-6, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3,
0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0,
1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0]
y_data = [-61.6, -61.6, -61.6, -61.5, -61.5, -61.4, -61.3, -61.2, -61.1, -61.0, -60.8,
-60.4, -59.8, -59.8, -59.7, -59.5, -59.4, -59.3, -59.1, -58.9, -58.7, -58.4,
-58.1, -57.7, -57.2, -56.5, -55.6, -54.3, -52.2, -48.7, -41.8, -27.3, 2.6,
30.1, 43.1, 49.3, 52.6, 54.5, 55.8, 56.6, 57.3, 57.8, 58.2, 58.5, 58.7, 59.0,
59.1, 59.3, 59.5, 59.6, 59.7, 59.8, 59.9, 60.5, 60.8, 61.0, 61.2, 61.3, 61.4,
61.4, 61.5, 61.6, 61.6, 61.7, 61.7]
params = Parameters()
params.add('Dm' , value = 8e-9 , vary = True, min = 0, max = 1) # magnetic diameter (m)
params.add('s' , value = 0.4 , vary = True, min = 0.0, max = 10.0) # sigma, unitless
params.add('Ms' , value = 61.0 , vary = True) #, min = 30.0 , max = 100.0) # saturation magnetization (emu/g)
params.add('Msb', value = 446000 * 1e-16, vary = False) # Bulk magnetite saturation magnetization (A/m)
params.add('T' , value = 300 , vary = False) # Temperature (K)
def Mag(x_data, params):
v = params.valuesdict() # put parameters into a dictionary
def numerator(D, x_data, params):
# langevin
a_numerator = pi * v['Msb'] * x_data * D**3
a_denominator = 6*k*v['T']
a = a_numerator / a_denominator
langevin = (1/tanh(a)) - (1/a)
# PDF
exp_num = (log(D/v['Dm']))**2
exp_denom = 2 * v['s']
exponential = exp(-exp_num/exp_denom)
pdf = exponential/(sqrt(2*pi) * v['s'] * D)
return D**3 * langevin * pdf
def denominator(D, params):
# PDF
exp_num = (log(D/v['Dm']))**2
exp_denom = 2 * v['s']
exponential = exp(-exp_num/exp_denom)
pdf = exponential/(sqrt(2*pi) * v['s'] * D)
return D**3 * pdf
# return integrals
return v['Ms'] * quad(numerator, 0, inf, args=(x_data, params))[0] / quad(denominator, 0, inf,args=(params))[0]
# vectorize
vcurve = np.vectorize(Mag, excluded=set([1]))
plt.plot(x_data, vcurve(x_data, params))
plt.scatter(x_data, y_data)
This plots the data and the fit equation with start parameters. I have an issue somewhere with units in the Langevin and have to multiply the numerator by 1e-16 to get the curve looking correct...
from lmfit import minimize, Minimizer, Parameters, Parameter, report_fit
def fit_function(params, x_data, y_data):
model1 = vcurve(x_data, params)
resid1 = y_data - model1
return resid1
minner = Minimizer(fit_function, params, fcn_args=(x_data, y_data))
result = minner.minimize()
report_fit(result)
result.params.pretty_print()
Depending on the sigma (s) value I choose, which should be able to range from 0 to infinity, the integral won't converge, giving the following error:
/var/folders/pz/tbd_dths0_512bm6l43vpg680000gp/T/ipykernel_68003/1413445460.py:39: IntegrationWarning: The algorithm does not converge. Roundoff error is detected
in the extrapolation table. It is assumed that the requested tolerance
cannot be achieved, and that the returned result (if full_output = 1) is
the best which can be obtained.
return v['Ms'] * quad(numerator, 0, inf, args=(x_data, params))[0] / quad(denominator, 0, inf,args=(params))[0]
I'm stuck on why the fit isn't converging. Is this an issue because I'm using very small numbers or is this an issue with quad/lmfit? Thank you!
Having parameters that are closer to order 1 (say, between 1e-7 and 1e7) is a good idea. If you expect a parameter is in the 1.e-9 (or 1.e-16!) range, you could definitely scale it (in the fitting function) so that the value passed back and forth by the fitting algorithm is closer to order 1. But, I sort of doubt that is the main problem you are having.
It looks to me like your Mag function is not very sensitive to the values of your variable parameters Dm and s. I am not 100% sure why that is. Have you verified that calculations using your "Mag" or "vcurve" do what you expect them to do?

Fitting a curve to some datapoints

the fitted curve doesn't fit the datapoints (xH_data, nH_data) as expected. Does someone know what might be the issue here?
from scipy.optimize import curve_fit
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
xH_data = np.array([1., 1.03, 1.06, 1.1, 1.2, 1.3, 1.5, 1.7, 2., 2.6, 3., 4., 5., 6.])
nH_data = np.array([403., 316., 235., 160., 70.8, 37.6, 14.8, 7.11, 2.81, 0.665, 0.313, 0.090, 0.044, 0.029])*1.0e6
plt.plot(xH_data, nH_data)
plt.yscale("log")
plt.xscale("log")
def eTemp(x, A, a, B):
n = B*(A+x)**a
return n
parameters, covariance = curve_fit(eTemp, xH_data, nH_data, maxfev=200000)
fit_A = parameters[0]
fit_a = parameters[1]
fit_B = parameters[2]
print(fit_A)
print(fit_a)
print(fit_B)
r = np.logspace(0, 0.7, 1000)
ne = fit_B *(fit_A + r)**(fit_a)
plt.plot(r, ne)
plt.yscale("log")
plt.xscale("log")
Thanks in advance for the help.
Ok, here is a different approach. As usual, the main problem are initial guesses for the non linear fit (For details, check this). Here, those are evaluated by using an integro relation of the fit function y( x ) = a ( x - c )^p, namely int( y ) = ( x - c ) / ( p + 1 ) y + d = x y / ( p + 1 ) - c y / ( p + 1 ) + d This means we can get c and p via a linear fit of int y against x y and y. Once those are known, a is a simple linear fit. It will turn out that these guesses are already quite good. Nevertheless, those will go as initial values into a non-linear fit providing the final result. In detail this goes like this:
import matplotlib.pyplot as plt
import numpy as np
from scipy.integrate import cumtrapz
from scipy.optimize import curve_fit
xHdata = np.array(
[
1.0, 1.03, 1.06, 1.1, 1.2, 1.3, 1.5,
1.7, 2.0, 2.6, 3.0, 4.0, 5.0, 6.0
]
)
nHdata = np.array(
[
403.0, 316.0, 235.0, 160.0, 70.8, 37.6,
14.8, 7.11, 2.81, 0.665, 0.313, 0.090, 0.044, 0.029
]
) * 1.0e6
def fit_func( x, a, c, p ):
out = a * ( x - c )**p
return out
### fitting the non-linear parameters as part of an integro-equation
### this is the standard matrix formulation of a linear fit
Sy = cumtrapz( nHdata, x=xHdata, initial=0 ) ## int( y )
VMXT = np.array( [ xHdata * nHdata , nHdata, np.ones( len( nHdata ) ) ] ) ## ( x y, y, d )
VMX = VMXT.transpose()
A = np.dot( VMXT, VMX )
SV = np.dot( VMXT, Sy )
sol = np.linalg.solve( A , SV )
print ( sol )
pF = 1 / sol[0] - 1
print( pF )
cF = -sol[1] * ( pF + 1 )
print( cF )
### making a linear fit on the scale
### the short version of the matrix form if only one factor is calculated
fk = fit_func( xHdata, 1, cF, pF )
aF = np.dot( nHdata, fk ) / np.dot( fk, fk )
print( aF )
#### using these guesses as input for a final non-linear fit
sol, cov = curve_fit(fit_func, xHdata, nHdata, p0=( aF, cF, pF ) )
print( sol )
print( cov )
### plotting
xth = np.linspace( 1, 6, 125 )
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.scatter( xHdata, nHdata )
ax.plot( xth, fit_func( xth, aF, cF, pF ), ls=':' )
ax.plot( xth, fit_func( xth, *sol ) )
plt.show()
Providing:
[-3.82334284e-01 2.51613126e-01 5.41522867e+07]
-3.6155122388787175
0.6580972107001803
8504146.59883185
[ 5.32486242e+07 2.44780953e-01 -7.24897172e+00]
[[ 1.03198712e+16 -2.71798924e+07 -2.37545914e+08]
[-2.71798924e+07 7.16072922e-02 6.26461373e-01]
[-2.37545914e+08 6.26461373e-01 5.49910325e+00]]
(note the high correlation from a to c and p)
and
I know of two things that might help you
Provide the p0 input parameter to curve_fit with a set of appropriate starting parameters to the function. That can keep the algorithm from running wild.
Change the function you are fitting so that it returns np.log(n) and then make the fit to np.log(nH_data). As it is now, there is a far larger penalty for not fitting the first data points than for not fitting the last data points, as the values are about 10^2 larger for the first ones. Thus, the first data points become "more important" to fit for the algorithm. Taking the logarithm puts them more on the same scale, so that points are weighed equally.
Go ahead and play around with it. I managed a pretty fine fit with these parameters
[-7.21450545e-01 -3.36131028e+00 5.97293632e+06]
I think you're nearly there, just need to fit on a log scale and throw in a decent guess. To make the guess you just need to throw in a plot like
plt.figure()
plt.plot(np.log(xH_data), np.log(nH_data))
and you'll see it's nearly linear. So your B will be the exponentiated intercept (i.e. exp(20ish)) and the a is the approximate slope (-5ish). A is weird one, does it have some physical meaning or you just threw it in there? If there's no physical meaning, I'd say get rid of it.
from scipy.optimize import curve_fit
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
xH_data = np.array([1., 1.03, 1.06, 1.1, 1.2, 1.3, 1.5, 1.7, 2., 2.6, 3., 4., 5., 6.])
nH_data = np.array([403., 316., 235., 160., 70.8, 37.6, 14.8, 7.11, 2.81, 0.665, 0.313, 0.090, 0.044, 0.029])*1.0e6
def eTemp(x, A, a, B):
logn = np.log(B*(x + A)**a)
return logn
parameters, covariance = curve_fit(eTemp, xH_data, np.log(nH_data), p0=[np.exp(0.1), -5, np.exp(20)], maxfev=200000)
fit_A = parameters[0]
fit_a = parameters[1]
fit_B = parameters[2]
print(fit_A)
print(fit_a)
print(fit_B)
r = np.logspace(0, 0.7, 1000)
ne = np.exp(eTemp(r, fit_A, fit_a, fit_B))
plt.plot(xH_data, nH_data)
plt.plot(r, ne)
plt.yscale("log")
plt.xscale("log")
There is a problem with your fit equation. If A is less than -1 and your a parameter is negative then you get an imaginary value for your function within your fit range. For this reason you need to add constraints and an initial set of parameters to your curve_fit function for example:
parameters, covariance = curve_fit(eTemp, xH_data, nH_data, method='dogbox', p0 = [100, -3.3, 10E8], bounds=((-0.9, -10, 0), (200, -1, 10e9)), maxfev=200000)
You need to change the method to 'dogbox' in order to perform this fit with the constraints.

Minimization of an equation using Python

I have four vectors.
x = [0.4, -0.3, 0.9]
y1 = [0.3, 1, 0]
y2 = [1, -0.9, 0.5]
y3 =[0.6, 0.01, 0.8]
I need to minimize following equation:
where 0 <= a,b,g <= 1. I have tried to use scipy.minimize but I could not understand how that can be used for this equation. Is there any library for optimization that I can use or is there any easier way in Python to do it?
My ultimate goal is to find values of a,b,g between 0-1 that give me minimum value given these four vectors as input.
Edit 0: I fixed the problem by using a Bounds instance. The array x should be what you are looking for. Here is the answer.
fun: 0.34189582276366093
hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
jac: array([ 6.91014296e-01, 3.49720253e-07, -2.88657986e-07])
message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 40
nit: 8
status: 0
success: True
x: array([0. , 0.15928136, 0.79907217])
I worked on that a little bit. I get stuck with an error, but I feel like I am on the correct way to solve it. Here is the code.
import numpy as np
from scipy.optimize import Bounds,minimize
def cost_function(ini):
x = np.array([0.4, -0.3, 0.9])
y1 = np.array([0.3, 1, 0])
y2 = np.array([1, -0.9, 0.5])
y3 =np.array([0.6, 0.01, 0.8])
L = np.linalg.norm(np.transpose(x) - np.dot(ini[0],y1) - np.dot(ini[1],y2) - np.dot(ini[2],y3))
return L
ini = np.random.rand(3)
min_b= np.zeros(3)
max_b= np.ones(3)
bnds=Bounds(min_b,max_b)
print(minimize(cost_function,x0 =ini,bounds=bnds))
However, I am getting the error ValueError: length of x0 != length of bounds, although the lengths are equal. I could not find a solution, maybe you do. Good luck! Let me know if you find a solution and if it works!

fitting data with numpy

I have the following data:
>>> x
array([ 3.08, 3.1 , 3.12, 3.14, 3.16, 3.18, 3.2 , 3.22, 3.24,
3.26, 3.28, 3.3 , 3.32, 3.34, 3.36, 3.38, 3.4 , 3.42,
3.44, 3.46, 3.48, 3.5 , 3.52, 3.54, 3.56, 3.58, 3.6 ,
3.62, 3.64, 3.66, 3.68])
>>> y
array([ 0.000857, 0.001182, 0.001619, 0.002113, 0.002702, 0.003351,
0.004062, 0.004754, 0.00546 , 0.006183, 0.006816, 0.007362,
0.007844, 0.008207, 0.008474, 0.008541, 0.008539, 0.008445,
0.008251, 0.007974, 0.007608, 0.007193, 0.006752, 0.006269,
0.005799, 0.005302, 0.004822, 0.004339, 0.00391 , 0.003481,
0.003095])
Now, I want to fit these data with, say, a 4 degree polynomial. So I do:
>>> coefs = np.polynomial.polynomial.polyfit(x, y, 4)
>>> ffit = np.poly1d(coefs)
Now I create a new grid for x values to evaluate the fitting function ffit:
>>> x_new = np.linspace(x[0], x[-1], num=len(x)*10)
When I do all the plotting (data set and fitting curve) with the command:
>>> fig1 = plt.figure()
>>> ax1 = fig1.add_subplot(111)
>>> ax1.scatter(x, y, facecolors='None')
>>> ax1.plot(x_new, ffit(x_new))
>>> plt.show()
I get the following:
fitting_data.png
What I expect is the fitting function to fit correctly (at least near the maximum value of the data). What am I doing wrong?
Unfortunately, np.polynomial.polynomial.polyfit returns the coefficients in the opposite order of that for np.polyfit and np.polyval (or, as you used np.poly1d). To illustrate:
In [40]: np.polynomial.polynomial.polyfit(x, y, 4)
Out[40]:
array([ 84.29340848, -100.53595376, 44.83281408, -8.85931101,
0.65459882])
In [41]: np.polyfit(x, y, 4)
Out[41]:
array([ 0.65459882, -8.859311 , 44.83281407, -100.53595375,
84.29340846])
In general: np.polynomial.polynomial.polyfit returns coefficients [A, B, C] to A + Bx + Cx^2 + ..., while np.polyfit returns: ... + Ax^2 + Bx + C.
So if you want to use this combination of functions, you must reverse the order of coefficients, as in:
ffit = np.polyval(coefs[::-1], x_new)
However, the documentation states clearly to avoid np.polyfit, np.polyval, and np.poly1d, and instead to use only the new(er) package.
You're safest to use only the polynomial package:
import numpy.polynomial.polynomial as poly
coefs = poly.polyfit(x, y, 4)
ffit = poly.polyval(x_new, coefs)
plt.plot(x_new, ffit)
Or, to create the polynomial function:
ffit = poly.Polynomial(coefs) # instead of np.poly1d
plt.plot(x_new, ffit(x_new))
Note that you can use the Polynomial class directly to do the fitting and return a Polynomial instance.
from numpy.polynomial import Polynomial
p = Polynomial.fit(x, y, 4)
plt.plot(*p.linspace())
p uses scaled and shifted x values for numerical stability. If you need the usual form of the coefficients, you will need to follow with
pnormal = p.convert(domain=(-1, 1))

python - curve_fit - A non-int/float error

import numpy as np
from scipy.optimize import curve_fit
x1 = [0.25, 0.33, 0.40, 0.50, 0.60, 0.75, 1.00]
y1 = [1.02, 1.39, 1.67, 1.89, 2.08, 2.44, 2.50]
def mmfunc(x1, d, e):
return d*x1/(e + x1)
y2 = mmfunc(x1,6.0,1.0)
popt, pcov = curve_fit(mmfunc, x1, y1)
I get this error
TypeError: can't multiply sequence by non-int of type 'float'
(x1 is an array (floats), d, e are floats)
(I tried reading values from a file, printed the values (they are floats) ...
I tried a simpler function - nothing seems to work!)
The problem is that you're not converting your lists to numpy arrays, so you can't add to or multiply by scalars. This seems to work for me:
import numpy as np
x1 = np.array([0.25, 0.33, 0.40, 0.50, 0.60, 0.75, 1.00], dtype="float")
y1 = np.array([1.02, 1.39, 1.67, 1.89, 2.08, 2.44, 2.50], dtype="float")
def mmfunc(x1, d, e):
return d*x1/(e + x1)
y2 = mmfunc(x1,6.0,1.0)
(Note: I didn't have scipy installed so I wasn't able to check that the curve_fit function works, but the conversion to np.array fixed the exception related to arithmetic on lists.)

Categories