Overflow in exp when curve_fit of datetime - python

I tried to fit datetime vs. float data using curve_fit. As far as I understand, curve_fit does not work with datetime, so I first have to convert the data to numerical values. This gives me very large values for x that cause an overflow in the exp function. My code is below. The same code does work if I fit with a polynomial instead of the exponential.
def func(x, a):
return (np.exp(a*x))
def fit_exponential(gd):
gdtemp['Date'] = pd.to_datetime(gdtemp.Date)
mask = (gdtemp['Date'] > '2020-01-30') & (gdtemp['Date'] <= '2020-03-20')
gdtemp = gdtemp.loc[mask].copy()
x = pd.to_numeric(gdtemp.Date)
y=gdtemp['Confirmed']
popt, pcov = curve_fit(func,x, y)
How can I modify the code to work with the exponential?
I have two ideas on how to fix this but am not sure how to go about implementing this:
1st idea: Don't convert with to_numeric, but in some other way that produces smaller numbers. My input data is fairly simple and consists of exactly 1 row per day, so I don't need time or anything else. Is there another function similar to to_numeric() that ignores the time part and produces smaller numbers?
2nd idea: divide the numeric date values by some large number and later multiply back. What number should I use for dividing?

I solved this, by mapping the large numerical x values to the interval [0;1] and fitting on this interval.
The essential modifications are:
small_x = (x - x.min()) / (x.max() - x.min())
popt, pcov = curve_fit(func4 ,small_x, y )
The values in the exponent are now reasonable (on the order of 1 in my case) and there is no problem with overflows.
Without this mapping I would end up with very large x values (on the order of 10^(15)) and very tiny values for a (on the order of 10^(-15)) which obviously the fititng function did not like.

Related

Program to fit a hyperbola to linear data using least squares (Levenberg-Marquardt algorithm) not working as expected

I have a 1D array data which I am trying to model as hyperbola using three parameters. I am trying to implement the Levenberg Marquardt algorithm using the leastsq function from scipy.optimize library. However, my program is getting stuck at an iteration where a number is getting divided by a zero, and I don't understand why.
Some background: The 1D array data are basically lacunarity values for different box sizes. I've generated the lacunarity data from some sound files, the context to which can be found here.
In the algorithm, the least squares function takes three inputs:
(a) initial guess for the three parameters
(b) the x coordinate for the least squares problem - that's basically a 1D array of integers from 1 to 100 in my problem
(c) the y coordinate for the least squares problem - this is the 1D array that stores the lacunarity values. So Lacunarity values are a function of x, where x varies from 1 to 100.
The hyperbola is modeled using three parameters a,b and c as
The code gives the following error:
"OverflowError: cannot convert float infinity to integer"
The code:
#import
from scipy import *
from scipy.optimize import leastsq
import matplotlib.pylab as plt
import numpy as np
import codecs, json
from math import *
# Define your function to calculate the residuals.
#The fitting function holds your parameter values.
def residuals(p, y, x):
err = y-pval(x,p)
return err
def pval(x, p):
z = x
for i in range(100):
print(x)
print(x[i]**p[1])
z[i] = p[0]/(x[i]**p[1])+p[2]
return z
#read in your data
obj_text = codecs.open('textfiles\CC1.json', 'r', encoding='utf-8').read()
b_new = json.loads(obj_text)
data = np.array(b_new)
x = np.arange(1,101)
y = data[1:101]
#guess at initial parameters
A1_0=1.0
A2_0=1.0
A3_0=0.5
#leastsq package calls the Levenberg-Marquardt algorithm
pname = (['A1','A2','A3'])
p0 = array([A1_0 , A2_0, A3_0])
plsq = leastsq(residuals, p0, args=(y, x), maxfev=2000)
# Now, plot your data
plt.plot(x,y,'xo',x,pval(x,plsq[0]),'x')
title('Least-squares fit to data')
xlabel('x')
ylabel('y')
legend(['Data', 'Fit'],loc=4)
# Your best-fit paramters are kept within plsq[0].
print(plsq[0])
According to the error, the value of x changes to 0 at some point in the iteration, and the first parameter a ends up getting divided by zero which gives the error.
To troubleshoot, I printed the values x[i]^b and the array x while executing the code, and you can see the values here. I see that the array x is getting modified which shouldn't happen. x should remain a 1D array of natural numbers from 1 to 100 and not get modified in the iteration. I couldn't identify where exactly is the code modifying the array x.
I expect the array x to remain unchanged and the code to print the final three values of the parameters a,b and c.
EDIT: I made some changes to my code after which it worked successfully. Following are those edits incase anyone would be interested:
Did not define z as z = x, but rather just defined it as z = np.arange(1,101). The result was that the array x did not change anymore which is what was expected.
Changed the datatype of arrays x and y to float using
x = np.array(x, dtype=np.float64)
I got stuck once more, at the piece of code which plots the data. I got the errors" 'title' not defined. Similar errors for xlabel, ylabel. So I just removed those lines and just stuck with
plt.plot(x,y,'red',x,pval(x,plsq[0]),'blue')
plt.show()
Not a direct answer to your question, but since you're using exponentiation (**), I strongly recommend that you convert all your numbers to Decimal beforehand, in order to avoid the precision-loss inherent in floating-point arithmetic on large values.
For example:
import decimal
decimal.getcontext().prec = 100
A1_0=Decimal("1.0")
A2_0=Decimal("1.0")
A3_0=Decimal("0.5")
x = [Decimal(f) for f in x]
y = [Decimal(f) for f in y]
Perhaps your zero will "turn up" to be a small value close to zero...

Using extremely small floats in NumPy

I'm using Python 3 and trying to plot the half-life time of a process. The formula for this half life time is -ln(2)/(ln(1-f)). In this formula, f is an extremely small number, of the order 10^-17 most of the time, and even less.
Because I have to plot a range of values of f, I have to repeat the calculation -ln(2)/(ln(1-f)) multiple times. I do this via the expression
np.log(2)/(-1*np.log(1-f))
When I plot the half life time for many values of f, I find that for really small values of f, Python starts rounding 1-f to the same number, even though I input the same values of f.
Is there anyway I could increase float precision so that Python could distuingish between outputs of 1-f for small changes in f?
The result you want can be achieved using numpy.log1p. It computes log(1 + x) with a better numerical precision than numpy.log(1 + x), or, as the docs say:
For real-valued input, log1p is accurate also for x so small that
1 + x == 1 in floating-point accuracy.
With this your code becomes:
import numpy as np
min_f, max_f = -32, -15
f = np.logspace(min_f, max_f, max_f - min_f + 1)
y = np.log(2)/(-1*np.log1p(-f))
This can be evaluated consistently:
import matplotlib.pyplot as plt
plt.loglog(f, y)
plt.show()
This function will only stop working if your values of f leave the range of floats, i.e. down to 1e-308. This should be sufficient for any physical measurement (especially considering that there is such a thing as a smallest physical time-scale, the Planck-time t_P = 5.39116(13)e-44 s).

Exponential fit with the least squares Python

I have a very specific task, where I need to find the slope of my exponential function.
I have two arrays, one denoting the wavelength range between 400 and 750 nm, the other the absorption spectrum. x = wavelengths, y = absorption.
My fit function should look something like that:
y_mod = np.float(a_440) * np.exp(-S*(x - 440.))
where S is the slope and in the image equals 0.016, which should be in the range of S values I should get (+/- 0.003). a_440 is the reference absorption at 440 nm, x is the wavelength.
Modelled vs. original plot:
I would like to know how to define my function in order to get an exponential fit (not on log transformed quantities) of it without guessing beforehand what the S value is.
What I've tried so far was to define the function in such way:
def func(x, a, b):
return a * np.exp(-b * (x-440))
And it gives pretty nice matches
fitted vs original.
What I'm not sure is whether this approach is correct or should I do it differently?
How would one use also the least squares or the absolute differences in y approaches for minimization in order to remove the effect of overliers?
Is it possible to also add random noise to the data and recompute the fit?
Your situation is the same as the one described in the documentation for scipy's curve_fit.
The problem you're incurring is that your definition of the function accepts only one argument when it should receive three: x (the independent variable where the function is evaluated), plus a_440 and S.
Cleaning a bit, the function should be more like this.
def func(x, A, S):
return A*np.exp(-S*(x-440.))
It might be that you run into a warning about the covariance matrix. you solve that by providing a decent starting point to the curve_fit through the argument p0 and providing a list. For example in this case p0=[1,0.01] and in the fitting call it would look like the following
curve_fit(func, x, y, p0=[1,0.01])

Having trouble plotting a log-log plot in python

Hey so I'm trying to plot variables like age against its frequency, for a rotating body. I am given the period and period derivative aswell as their associated errors. Since frequency is related to period by:
f = 1/T
where frequency is f and period is T
then,
df = - (1/(T^2)) * dT
where dT and dF are the derivatives of period and frequency
but when it comes to plotting the log of this I can't do it in python as it doesn't accept negative values for a loglog plot.
I've tried a work around of using only absolute values but then I only get half the errors when plotting error bars. Is there a way to make python plot both the negative and positive error bars? The frequency derivative itself is a negative quantity.
Unfortunately, log(x) cannot be negative because log(x) = y <=> 10^y = x.
Is 10^y ever going to be -5?
Unfortunately it is impossible to make 10^y<=0 because as y becomes -infinity, x approaches 1/infinity; x approaches, but never passes 0.
Is it possible to plot log(x), where x is negative?
One simple solution to your problem however, is to take the absolute value of df. By doing this, negative numbers become positive. The only downside is that after you've transformed the data this way, you will need to undo the transformation. If the number was negative (and turned positive due to abs(df)), then you must multiply it by -1 afterwards.
You may need to define your own absolute value function that records any values it needs to make positive:
changeList = []
def absRecordChanges(value):
if value < 0 :
value = value * -1
changeList.append(value)
return value
There are other ways to solve the problem, but they are all centred around transforming your data to meet the conditions of a log tranformation (x > 0), and having the data you changed recorded so you can change it back afterward (before you plot it).
EDIT:
While fiddling around in desmos, I was able to plot log(x) where x is any integer. I used a piecewise function to do this: {x<0:-log(abs(x)),log (x)}.
def piecewiseLog(x)
If x <= 0 :
return -log(abs(x))
else :
return log(x)
As I'm not familiar with matlab syntax, this link has an alternative solution: http://www.mathworks.com/matlabcentral/answers/31566-display-negative-values-on-logarithmic-graph

Finding the highest R^2 value

I'm new in python and my problem is that I have a given set of data:
import numpy as np
x=np.arange(1,5)
y=np.arange(5,9)
My problem is to find a number n (not necessarily an integer) that will give me the highest value of R^2 value when I plot y^n vs x. I'm thinking of generating n for example as:
n=np.linspace(1,9,100)
I don't know how to execute my idea. My other means is to resort to brute force of generating n and raising y for each value of n. After getting that value (let's say y1), I will plot y1 vs x (which means I have to generate 100 plots. But I have no clue on how to get the R^2 value ( for a linear fit) for a given plot.
What I want to do is to have a list (or array) of R^2 values:
R2= np.array() #a set containing the R^2 values calculated from the plots
and find the max value on that array and from there, find the plot that gave that R^2 value thus I will find a particular n. I don't know how to do this.
If you are able to use the pandas library, this problem is very easy to express:
import pandas
import numpy as np
x = pandas.Series(np.arange(1,5))
y = pandas.Series(np.arange(5,9))
exponents = exponents = np.linspace(1, 9, 100)
r2s = {n:pandas.ols(x=x, y=y**n).r2 for n in exponents}
max(r2s.iteritems(), key=lambda x: x[1])
#>>> (1.0, 1.0)
Breaking this down:
the pandas.Series object is an indexed column of data. It's like a numpy array, but with extra features. In this case, we only care about it because it is something we can pass to pandas.ols.
pandas.ols is a basic implementation of least-squares regression. You can do this directly in numpy with numpy.linalg.lstsq, but it won't directly report the R-squared values for you. To do it with pure numpy, you'll need to get the sum of squared residuals from numpy's lstsq and then perform the formulaic calculations for R-squared manually. You could write this as a function for yourself (probably a good exercise).
The stuff inside the {..} is a dict comprehension. It will iterate over the desired exponents, perform the ols function for each, and report the .r2 attribute (where the R-squared statistic is stored) indexed by whatever exponent number was used to get it.
The final step is to call max on a sequence of the key-value pairs in r2s, and key tells max that it is the second element (the R-squared) by which elements are compared.
An example function to do it with just np.linalg.lstsq is here (good explanation for calculating R2 in numpy):
def r2(x, y):
x_with_intercept = np.vstack([x, np.ones(len(x))]).T
coeffs, resid = np.linalg.lstsq(x_with_intercept, y)[:2]
return 1 - resid / (y.size * y.var())[0]
Then in pure numpy the above approach:
import numpy as np
x = np.arange(1,5)
y = np.arange(5,9)
exponents = np.linspace(1, 9, 100)
r2s = {n:r2(x=x, y=y**n) for n in exponents}
max(r2s.iteritems(), key=lambda x: x[1])
#>>> (1.0, 1.0)
As a final note, there is a fancier way to specify getting the 1-positional item from something. You use the built-in library operator and the callable itemgetter:
max(..., key=operator.itemgetter(1))
The expression itemgetter(1) results in an object that is callable -- when it is called on an argument r it invokes the __getitem__ protocol to result in r[1].

Categories