I'm trying to write an inverse lognormal function in python:
import numpy as np
import scipy.stats as sp
from scipy.optimize import curve_fit
def lognorm1(x,s,scale):
ANS = sp.lognorm(s,scale=scale).ppf(x)
return ANS
curve_fit(lognorm1,x,y)
I have no troubles fitting the curve, however the scale paramater is the exponential of what LOGNORM.INV function is on excel. I know I can just log the scale parameter at the end, but is there anyway to rewrite the function so that I don't have to do this everytime?
Indeed, SciPy documentation says
A common parametrization for a lognormal random variable Y is in terms of the mean, mu, and standard deviation, sigma, of the unique normally distributed random variable X such that exp(X) = Y. This parametrization corresponds to setting s = sigma and scale = exp(mu).
So let's set it as such:
def lognorm1(x, mu, sigma):
ANS = sp.lognorm(s=sigma, scale=exp(mu)).ppf(x)
return ANS
curve_fit(lognorm1, x, y)
Now the parameters returned by curve_fit have the meaning of the mean and standard deviation of the underlying normal distribution.
Related
I have just been trying to match the scipy outputs of the lognormal distribution to the formulas on wikipedia.
And I am stuck on the partial expectation with a lower bound.
If I use this simple lognormal distribution:
k = .25
sigma = .5
mu = .1 # from the logged variable
lnorm = scist.lognorm(s=sigma, scale=np.exp(mu))
where k is the lower bound,
the partial expectation, as I understand it, is given by:
Fine. So we are simply talking about the mean of the lognormal distribution and a CDF with a z-score. scipy provides the partial
lnorm.expect(lambda x:x, lb=k)
>>> 1.25199...
Indeed, we can confirm this is the partial by checking it against the conditional expectation. Computing it directly or using the partial above yield the same result:
lnorm.expect(lambda x:x, lb=k) / (1 - lnorm.cdf(k))
>>> 1.25385...
lnorm.expect(lambda x:x, lb=k, conditional=True)
>>> 1.25385...
However, scipy's cdf function takes the x variable, not the z-score and I am uncertain how to transform this:
Into an x value. I have tried many different flavors.
I would have thought:
would do the trick to account for the subtraction of mu that must occur when scipy's cdf (presumably) computes the z-score internally.
Any formulation I use ends up with a very small or 0 value.
Any help would be greatly appreciated.
IIUC, you can simply compute the CDF of a Normal distribution N(0,1) in (mu+sigma^2-ln(k))/2, i.e.
import numpy as np
import scipy.stats as sps
def partial_expectation(mu, sigma, k):
"""
Returns partial expectation given
mean, standard deviation and k.
https://en.wikipedia.org/wiki/Log-normal_distribution
"""
# compute cumulative density function
# of Normal distribution N(0,1) in x=x_phi
x_phi = (mu + sigma**2 - np.log(k))/sigma
phi = sps.norm.cdf(x_phi, loc=0, scale=1)
# mean of lognormal
lognorm_mu = np.exp(mu + .5*(sigma**2))
# result
return lognorm_mu * phi
k = .25
sigma = .5
mu = .1 # from the logged variable
lnorm = sps.lognorm(s=sigma, scale=np.exp(mu))
print('from def:', partial_expectation(mu, sigma, k))
print('from sps:', lnorm.expect(lb=k))
from def: 1.251999952174895
from sps: 1.2519999521748952
Here is a probability density function of a lognormal distribution:
from scipy.stats import lognorm
def f(x): return lognorm.pdf(x, s=0.2, loc=0, scale=np.exp(10))
This function has very small y values (max ~ 1E-5) and distributes over x value ~1E5. We know that the integral of a PDF should be 1, but when using the following codes to directly calculate integral, the answer is round 1E-66 since the computation accuracy is not enough.
from scipy.integrate import quad
import pandas as pd
ans, err = quad(f, -np.inf, np.inf)
Could you kindly help me to correctly calculate an integral like this? Thank you.
The values that you are using correspond to the underlying normal distribution having mean mu = 10 and standard deviation sigma = 0.2. With those values, the mode of the distribution (i.e. the location of the maximum of the PDF) is at exp(mu - sigma**2) = 21162.795717500194. The function quad works pretty well, but it can be fooled. In this case, apparently quad only samples the function where the values are extremely small--it never "sees" the higher values way out around 20000.
You can fix this by computing the integral over two intervals, say [0, mode] and [mode, np.inf]. (There is no need to compute the integral over the negative axis, since the PDF is 0 there.)
For example, this script prints 1.0000000000000004
import numpy as np
from scipy.stats import lognorm
from scipy.integrate import quad
def f(x, mu=0, sigma=1):
return lognorm.pdf(x, s=sigma, loc=0, scale=np.exp(mu))
mu = 10
sigma = 0.2
mode = np.exp(mu - sigma**2)
ans1, err1 = quad(f, 0, mode, args=(mu, sigma))
ans2, err2 = quad(f, mode, np.inf, args=(mu, sigma))
integral = ans1 + ans2
print(integral)
I am trying to fit Gaussian function to my Python plot. I have attached the code here. Any corrections would be appreciated!
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import math
import random
from numpy import genfromtxt
data= genfromtxt ('PVC_Cs137.txt')
plt.xlim(0,2500)
plt.ylim(0,30000)
plt.xlabel("Channel number")
plt.ylabel("Counts")
x = data[:,0]
y = data[:,1]
n = len(x)
mean = sum(x*y)/n
sigma = sum(y*(x-mean)**2)/n
def gaus(x,a,x0,sigma):
return a*exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma])
plt.plot(x,gaus(x,*popt))
plt.show()
And here is the link to my file:
https://www.dropbox.com/s/hrqjr2jgfsjs55x/PVC_Cs137.txt?dl=0
There are two problems with your approach. One is related to programming. The gauss fit function has to work with a numpy array. math functions can't provide this functionality, they work with scalars. Therefore your fit functions should look like this
def gauss(x, a, x0, sigma):
return a * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))
This produces with the right mean/sigma combination a Gauss curve like this
And now we look at the distribution of the values from your file:
This doesn't even vaguely look like a Gauss curve. No wonder that the fit function doesn't converge.
Actually there is a third problem, your calculation of mean/sigma is wrong, but since you can't fit your data to a Gaussian distribution, we can neglect this problem for now.
I have a problem performing a double definite integral in a function which depend of 2 variables (q,r) and has one extra integral in it.
The function I want to weight with a gaussian function is:
F(q,r)=f(q,r)+int_{0,r}(h(q,r')dr')
And in must be integrated again to be weighted with the gaussian:
I(q)=int_{0,inf}(F(q,r)^2*g(r)dr)
The gaussian g(r) is centered in the coordinate R.
The main problem as you can observe is that I am mixing arrays with scalars. Using the same method that it is used for the gaussian (np.ogrid and sum over the axis) could be a solution, but I don't know how to implement it.
import numpy as np
from scipy.integrate import quad
import math as m
R=53.
R0=40.
delta=50.
c=2.
qm, rm = np.ogrid[0.0005:2.0:0.0005, 20:100:500j]
#normalized gauss function
#g(r)
def gauss_grid(r,Rmin,pd):
def gauss(r,Rmin,pd):
sigma=1.5
return (1/sigma)*np.exp(-((r-Rmin)**2)/(2*sigma**2))
gauss_grid = gauss(r,Rmin,pd)
#normalization of gaussian
gauss_grid /= np.sum(gauss_grid)
return gauss_grid
#spherical function
#f(q,r)
def form(q,R):
return (4/3)*m.pi*3*(np.sin(q*R)-q*R*np.cos(q*R))/(q**3)
#FINAL function
#I(q)
def helfand():
def F(q,R):
#integral (0,R) of h(q,r)
def integral(q,Rmax):
#h(q,r)
def integrand(r,q):
return np.sin(q*r)*(r**2)/(q*r*(1+np.exp(c*(R0-r))))
return quad(integrand, 0, Rmax, args=(q))[0]
return (form(q,R)+delta*integral(q,R))**2
FF_hel=F(qm,rm)
FF_hel *= gauss_grid(rm,R,pd)
I=FF_hel.sum(axis=1)
return I,qm.ravel()
helfand()
*UPDATE****
I tried with the scipy.integrate library (with quad) and I cannot make it done. It is like it doesn't pass the right argument (q) to the next function. Here a very simplified version of what I'm trying:
import numpy as np
from scipy.integrate import quad
import matplotlib.pyplot as plt
R=53.
R0=41.
pd=15.
sigma=1.5
def I(q):
#(function with integral inside) squared
def FF(q,r):
def integral_f(q,r):
def f(r1,q):
return np.sin(q*r1)
return quad(f,0,r,args=(q))[0]
def h(q,r):
return (r*np.cos(q*r))
return (h(q,r)+integral_f(q,r))**2
#gaussian function normalized
def g(r,R0):
def gauss(r,R0):
return (1/sigma)*np.exp(-((r-R0)**2)/(2*sigma**2))
return gauss(r,R0)/(quad(gauss,0,np.inf,args=(R0))[0])
#main function to be integrated with gaussian
def function(r,q):
return FF(q,r)*g(r,R)
return quad(function,0,np.inf,args=(q))[0]
q=np.arange(0.001,1.,0.001)
plt.plot(q,I(q))
The error says:
Supplied function does not return a valid float.
I'd create a simple 2D rectangular mesh of points that spanned the limits of integration points. Then I'd prefer Gaussian quadrature over each element to evaluate the integral. It would mean calling the function, weighted or not, at each integration point, multiplying by the quadrature weight, and summing.
It's similar to 2D quadrilaterial finite elements and evaluating the stiffness matrix by numerical integration.
There are 2D quadrature methods in SciPy. I'd use that before writing my own.
http://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.quadrature.html
I think you can compute this with two single integrals.
If you write out the double integral you get two parts:
int_{0,inf}(f(q,r)*g(r)dr)+ int_(0,inf)( int_{0,r}(h(q,r')dr')*g(r)dr)
We can exchange the order of integration in the second to get
int_(0,inf)( int_{r',inf}(g(r)dr) * h(q,r')dr')
The inner integral can be expressed in terms of the complementary
error function.
I am having troubles plotting a Cumulative Distribution Function.
So far I Have found this:
scipy.stats.beta.cdf(0.2,6,7)
But that only gives me a point.
This will be what I use to plot:
pylab.plot()
pylab.show()
What I want it to look like is this:
File:Binomial distribution cdf.svg
with p = .2 and the bounds stopping once y = 1 or close to 1.
The first argument to cdf can be an array of values, rather than a single value. It will then return an array of values.
import scipy.stats as stats
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0,20,100)
cdf = stats.binom.cdf
plt.plot(x,cdf(x, 50, 0.2))
plt.show()
I don't think the user above, ubuntu, has suggested the right function to use.
Actually his answer is very much misleading and incorrect at large.
Note that binom.cdf() is a function to calculate the cdf of a binomial distribution specified by n and p, Binomial(n,p). That's to say it returns values of the cdf of that random variable for each value in x, rather than the actual cdf function for the discrete distribution specified by vector x.
To calculate cdf for any distribution defined by vector x, just use the histogram() function:
import numpy as np
hist, bin_edges = np.histogram(np.random.randint(0,10,100), normed=True)
cdf = cumsum(hist)
or, just use the hist() plotting function from matplotlib.