Riemann sum of a probability density

Riemann sum of a probability density - python

I am trying to find the probability of an event of a random variable past a specific value, i.e. pr(x>a), where a is some constant, typically much higher than the average of x, and x is not of any standard Gaussian distribution. So I wanted to fit some other probability density function, and take the integral of the pdf of x from a to inf. As this is a problem of modelling the spikes, I considered this an Extreme Value analysis problem, and found that the Weibull distribution might be appropriate.
Regarding extreme value distributions, the Weibull distribution has a very "not-easy-to-implement" integral, and I therefore figured I could just get the pdf from Scipy, and do a Riemann-sum. I also thought that I could as well simply evaluate the kernel density, get the pdf, and do the same with the Riemann sum, to approximate the integral.
I found a Q here on Stack which provided a neat method for doing Riemann sums in Python, and I adapted that code to fit my problem. But when I evaluate the integral I get weird numbers, indicating that something is either wrong with the KDE, or the Riemann sum-function.
Two scenarios, the first with the Weibull, in accordance with the Scipy documentation:
x = theData
x_grid = np.linspace(0,np.max(x),len(x))
p = ss.weibull_min.fit(x[x!=0], floc=0)
pd = ss.weibull_min.pdf(x_grid,p[0], p[1], p[2])
which looks like this:
and then also tried the KDE method as follows
pd = ss.gaussian_kde(x).pdf(x_grid)
which I subsequently run through the following function:
def riemannSum(a, b, n):
dx = (b - a) / n
s = 0.0
x = a
for i in range(n):
s += pd[x]
x += dx
return s * dx
print(riemannSum(950.0, 1612.0, 10000))
print(riemannSum(0.0, 1612.0, 100000))
In the case of the Weibull, it gives me
>> 0.272502150549
>> 18.2860384829
and in the case of the KDE, I get
>> 0.448450460469
>> 18.2796021034
This is obviously wrong. Taking the integral of the entire thing should give me 1, and 18.2+ is quite far off.
Am I wrong in my assumptions of what I can do with these density functions? Or have I made some mistake in the Riemann sum function

the Weibull distribution has a very "not-easy-to-implement" integral
Huh?!
Weibull distribution has very well defined CDF, so implementing integral is pretty much one-liner (ok, make it two for clarity)
def WeibullCDF(x, lmbd, k):
q = pow(x/lmbd, k)
return 1.0 - exp(-q)
And, of course, there is ss.weibull_min.cdf(x_grid,p[0], p[1], p[2]) if you want to pick from standard library

I know there is an accepted answer that worked for you but I stumbled across this while looking to see how to do a Riemann sum of a probability density and others may too so I will give this a go.
Basically, I think you had (what is now) an older version of numpy that allowed floating point indexing and your pd variable pointed to an array of values drawn from the pdf corresponding to the values at xgrid. Nowadays you will get an error in numpy when trying to use a floating point index but since you didn't you were accessing the value of the pdf at the grid values corresponding to that index. What you needed to do was calculate the pdf with the new values you wanted to use in your Riemann sum.
I edited the code from the question to create a method that works for calculating the integral of the pdf.
def riemannSum(a, b, n):
dx = (b-a)/n
s = 0.0
x = 0
pd = weibull_min.pdf(np.linspace(a, b, n), p[0], p[1], p[2])
for i in range(n):
s += pd[x]
x += 1
return s*dx

Below Riemann implementation can also be used (it uses Java instead of Python) sorry.
import static java.lang.Math.exp;
import static java.lang.Math.pow;
import java.util.Optional;
import java.util.function.BiFunction;
import java.util.function.BinaryOperator;
import java.util.function.Function;
import java.util.stream.IntStream;
public class WeibullPDF
{
public interface Riemann extends BiFunction<Function<Double, Double>, Integer,
BinaryOperator<Double>> { }
public static void main(String args[])
{
int N=100000;
Riemann s = (f, n) -> (a, b) ->
IntStream.range(0, n).
.mapToDouble(i->f.apply(a+i*((b-a)/n))*((b-a)/n)).sum();
double k=1.5;
Optional<Double> weibull =
Optional.of(s.apply(x->k*pow(x,k-1)*exp(-pow(x,k)),N).apply(0.0,1612.0));
weibull.ifPresent(System.out::println); //prints 0.9993617886716168
}
}

Related

Python Integration: to calculate area under the curve

I want to find the integral of output power Po in the following code:
Vo = 54.6
# defining a function for duty cycle, output current and output power
def duty_cycle(output_voltage, array_voltage):
duty_cycle = np.divide(output_voltage, array_voltage)
return duty_cycle
def output_current(array_current, duty_cycle):
output_current = np.divide(array_current, duty_cycle)
return output_current
def output_power(output_voltage, output_current):
output_power = np.multiply(output_voltage, output_current)
return output_power
#calculating duty cycle, output current and output power
D = duty_cycle(Vo, array_params['arr_v_mp'])
Io = output_current(array_params['arr_i_mp'], D)
Po = output_power(Vo, Io)
#plot ouput power
plt.ylabel('Output Power [W]')
Po.plot(style='r-')
The code above is just a part of a script. array_params is a pandas time-series data frame. When plotted pandas Series Po, it looks like this:
This is my first time calculating integral using python. After reading through the internet, I think Python's scipy module could be of help but don't really know how and which method to implement. I would appreciate your help in any manner with the above-explained problem.

To compute an integral of the form int y(x) dx from x0 to x1, with an array x_array with values from x0 to x1 and a corresponding y_array of same length, one can use numpy's trapezoidal integration:
integral = np.trapz(y_array, x_array)
which will work also for non-constant spacing x_array[i+1]-x_array[i].

If an indefinite integral (i.e. an integral F(t) = integral f(t) dt) is needed, use scipy.integrate.cumtrapz (instead of numpy.trapz for definite integrals).
integrated = scipy.integrate.cumtrapz(power, dx=timestep)
or
integrated = scipy.integrate.cumtrapz(power, x=timevalues)
To have integrated the same length as power, specify the initial value of the integral, via the optional parameter initial (e.g. initial=0) to scipy.integrate.cumtrapz.

Can I use something like scipy.stats, in Python, to create a fitness function responds like a distribution

I need to create a normalised fitness function for positive values 0→∞. I want to experiment, starting with (input→output) something like 0→0, 1→1, ∞→0. My maths is a bit weak and expect this is really not hard, if you no how.
So the output of the function should be heavily skewed towards 0 and I need to be able to change the input value which produces the maximum output, 1.
I could make a linear function, something like a triangular distribution, but then I need to set a maximum value at which input would be distinguished (above that value everything looks the same.) I could also merge two simple expressions together with something like this:
from matplotlib import pyplot as plt
import numpy as np
from math import exp
def frankenfunc(x, mu):
longtail = lambda x, mu: 1 / exp((x - mu))
shortail = lambda x, mu: pow(x / mu, 2)
if x < mu:
return shortail(x, mu)
else:
return longtail(x, mu)
x = np.linspace(0, 10, 300)
y = [frankenfunc(i, 1) for i in x]
plt.plot(x, y)
plt.show()
This is ok and should work, especially as the actual values it returns don't matter too much as they will be used in a binary tournament. Still it's ugly and I'd like the flexibility to use the statistical distributions from scipy or something similar if possible.

So you want a probability dustribution with a pdf of this form? Then you need to:
normalize it (the integral of pdf over the domain is unity)
subclass the rv_continuous class like shown in the docs, http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html, with your function for the _pdf method.
Alternatively, browse the list of distributions implemented in scipy.stats. there are several with pdf shapes of this general form you're sketching.

polyfit refining: setting polynomial to be always possitive

I am trying to fit a polynomial to my data, e.g.
import scipy as sp
x = [1,6,9,17,23,28]
y = [6.1, 7.52324, 5.71, 5.86105, 6.3, 5.2]
and say I know the degree of polynomial (e.g.: 3), then I just use scipy.polyfit method to get the polynomial of a given degree:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
fittedModelFunction = sp.polyfit(x, y, 3)
func = sp.poly1d(fittedModelFunction)
++++++++++++++++++++++++++++++
QUESTIONS: ++++++++++++++++++++++++++++++
1) How can I tell in addition that the resulting function func must be always positive (i.e. f(x) >= 0 for any x)?
2) How can I further define a constraint (e.g. number of (local) min and max points, etc.) in order to get a better fitting?
Is there smth like this:
http://mail.scipy.org/pipermail/scipy-user/2007-July/013138.html
but more accurate?

Always Positve
I haven't been able to find a scipy reference that determines if a function is positive-definite, but an indirect way would be to find the all the roots - Scipy Roots - of the function and inspect the limits near those roots. There are a few cases to consider:
No roots at all
Pick any x and evaluate the function. Since the function does not cross the x-axis because of a lack of roots, any positive result will indicate the function is positive!
Finite number of roots
This is probably the most likely case. You would have to inspect the limits before and after each root - Scipy Limits. You would have to specify your own minimum acceptable delta for the limit however. I haven't seen a 2-sided limit method provided by Scipy, but it looks simple enough to make your own.
from sympy import limit
// f: function, v: variable to limit, p: point, d: delta
// returns two limit values
def twoSidedLimit(f, v, p, d):
return limit(f, v, p-d), limit(f, v, p+d)
Infinite roots
I don't think that polyfit would generate an oscillating function, but this is something to consider. I don't know how to handle this with the method I have already offered... Um, hope it does not happen?
Constraints
The only built-in form of constraints seems to be limited to the optimize library of SciPy. A crude way to enforce constraints for polyfit would be to get the function from polyfit, generate a vector of values for various x, and try to select values from the vector that violate the constraint. If you try to use filter, map, or lambda it may be slow with large vectors since python's filter makes a copy of the list/vector being filtered. I can't really help in this regard.

On ordinary differential equations (ODE) and optimization, in Python

I want to solve this kind of problem:
dy/dt = 0.01*y*(1-y), find t when y = 0.8 (0<t<3000)
I've tried the ode function in Python, but it can only calculate y when t is given.
So are there any simple ways to solve this problem in Python?
PS: This function is just a simple example. My real problem is so complex that can't be solve analytically. So I want to know how to solve it numerically. And I think this problem is more like an optimization problem:
Objective function y(t) = 0.8, Subject to dy/dt = 0.01*y*(1-y), and 0<t<3000
PPS: My real problem is:
objective function: F(t) = 0.85,
subject to: F(t) = sqrt(x(t)^2+y(t)^2+z(t)^2),
x''(t) = (1/F(t)-1)*250*x(t),
y''(t) = (1/F(t)-1)*250*y(t),
z''(t) = (1/F(t)-1)*250*z(t)-10,
x(0) = 0, y(0) = 0, z(0) = 0.7,
x'(0) = 0.1, y'(0) = 1.5, z'(0) = 0,
0<t<5

This differential equation can be solved analytically quite easily:
dy/dt = 0.01 * y * (1-y)
rearrange to gather y and t terms on opposite sides
100 dt = 1/(y * (1-y)) dy
The lhs integrates trivially to 100 * t, rhs is slightly more complicated. We can always write a product of two quotients as a sum of the two quotients * some constants:
1/(y * (1-y)) = A/y + B/(1-y)
The values for A and B can be worked out by putting the rhs on the same denominator and comparing constant and first order y terms on both sides. In this case it is simple, A=B=1. Thus we have to integrate
1/y + 1/(1-y) dy
The first term integrates to ln(y), the second term can be integrated with a change of variables u = 1-y to -ln(1-y). Our integrated equation therefor looks like:
100 * t + C = ln(y) - ln(1-y)
not forgetting the constant of integration (it is convenient to write it on the lhs here). We can combine the two logarithm terms:
100 * t + C = ln( y / (1-y) )
In order to solve t for an exact value of y, we first need to work out the value of C. We do this using the initial conditions. It is clear that if y starts at 1, dy/dt = 0 and the value of y never changes. Thus plug in the values for y and t at the beginning
100 * 0 + C = ln( y(0) / (1 - y(0) )
This will give a value for C (assuming y is not 0 or 1) and then use y=0.8 to get a value for t. Note that because of the logarithm and the factor 100 multiplying t y will reach 0.8 within a relatively short range of t values, unless the initial value of y is incredibly small. It is of course also straightforward to rearrange the equation above to express y in terms of t, then you can plot the function as well.
Edit: Numerical integration
For a more complexed ODE which cannot be solved analytically, you will have to try numerically. Initially we only know the value of the function at zero time y(0) (we have to know at least that in order to uniquely define the trajectory of the function), and how to evaluate the gradient. The idea of numerical integration is that we can use our knowledge of the gradient (which tells us how the function is changing) to work out what the value of the function will be in the vicinity of our starting point. The simplest way to do this is Euler integration:
y(dt) = y(0) + dy/dt * dt
Euler integration assumes that the gradient is constant between t=0 and t=dt. Once y(dt) is known, the gradient can be calculated there also and in turn used to calculate y(2 * dt) and so on, gradually building up the complete trajectory of the function. If you are looking for a particular target value, just wait until the trajectory goes past that value, then interpolate between the last two positions to get the precise t.
The problem with Euler integration (and with all other numerical integration methods) is that its results are only accurate when its assumptions are valid. Because the gradient is not constant between pairs of time points, a certain amount of error will arise for each integration step, which over time will build up until the answer is completely inaccurate. In order to improve the quality of the integration, it is necessary to use more sophisticated approximations to the gradient. Check out for example the Runge-Kutta methods, which are a family of integrators which remove progressive orders of error term at the cost of increased computation time. If your function is differentiable, knowing the second or even third derivatives can also be used to reduce the integration error.
Fortunately of course, somebody else has done the hard work here, and you don't have to worry too much about solving problems like numerical stability or have an in depth understanding of all the details (although understanding roughly what is going on helps a lot). Check out http://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.ode.html#scipy.integrate.ode for an example of an integrator class which you should be able to use straightaway. For instance
from scipy.integrate import ode
def deriv(t, y):
return 0.01 * y * (1 - y)
my_integrator = ode(deriv)
my_integrator.set_initial_value(0.5)
t = 0.1 # start with a small value of time
while t < 3000:
y = my_integrator.integrate(t)
if y > 0.8:
print "y(%f) = %f" % (t, y)
break
t += 0.1
This code will print out the first t value when y passes 0.8 (or nothing if it never reaches 0.8). If you want a more accurate value of t, keep the y of the previous t as well and interpolate between them.

As an addition to Krastanov`s answer:
Aside of PyDSTool there are other packages, like Pysundials and Assimulo which provide bindings to the solver IDA from Sundials. This solver has root finding capabilites.

Use scipy.integrate.odeint to handle your integration, and analyse the results afterward.
import numpy as np
from scipy.integrate import odeint
ts = np.arange(0,3000,1) # time series - start, stop, step
def rhs(y,t):
return 0.01*y*(1-y)
y0 = np.array([1]) # initial value
ys = odeint(rhs,y0,ts)
Then analyse the numpy array ys to find your answer (dimensions of array ts matches ys). (This may not work first time because I am constructing from memory).
This might involve using the scipy interpolate function for the ys array, such that you get a result at time t.
EDIT: I see that you wish to solve a spring in 3D. This should be fine with the above method; Odeint on the scipy website has examples for systems such as coupled springs that can be solved for, and these could be extended.

What you are asking for is a ODE integrator with root finding capabilities. They exist and the low-level code for such integrators is supplied with scipy, but they have not yet been wrapped in python bindings.
For more information see this mailing list post that provides a few alternatives: http://mail.scipy.org/pipermail/scipy-user/2010-March/024890.html
You can use the following example implementation which uses backtracking (hence it is not optimal as it is a bolt-on addition to an integrator that does not have root finding on its own): https://github.com/scipy/scipy/pull/4904/files

Iterative polynomial multiplication -- Chebyshev polynomials in Python

My question is: What is the best approach to iterative polynomial multiplication in Python?
I thought an interesting project would be to write a function in Python to generate the coefficients and exponents of each term for a Chebyshev polynomial of a given degree. The recursive function to generate such a polynomial (represented by Tn(x)) is:
With:
T0(x) = 1
and
T1(x) = x:
Tn(x) = 2xTn-1(x) - Tn-2(x)
What I have so far isn't very useful, but I am having trouble kind of wrapping my brain around how to get this going. What I want to happen is the following:
>> chebyshev(4)
[[8,4], [8,2], [1,0]]
This list represents the Chebyshev polynomial of the 4th degree:
T4(x) = 8x4 - 8x2 + 1
import sys
def chebyshev(n, a=[1,0], b=[1,1]):
z = [2,1]
result = []
if n == 0:
return a
if n == 1:
return b
print >> sys.stderr, ([z[0]*b[0],
z[1]+b[1]],
a) # This displays the proper result for n = 2
return result
The one solution I found on the web didn't work, so I am hoping someone can shed some light.
p.s. More information on Chebyshev polynomials: CSU Fullteron, Wikipedia - Chebyshev polynomials. They are very cool/useful, and tie together some really interesting trig functions/properties; worth a read.

SciPy has an implementation for Chebyshev
http://www.scipy.org/doc/api_docs/SciPy.special.orthogonal.html
I would suggest looking at their code.

The best implementation for Chebyshev is :
// Computes T_n(x), with -1 <= x <= 1
real T( int n, real x )
{
return cos( n*acos(x) ) ;
}
If you test this against other implementations, including explicit polynomial evaluation and iteratively computing the recurrence relation, this is actually just as fast. Try it yourself..
Generally:
Explicit polynomial evaluation is the worst (for large n)
Recursive evaluation is a little better
cosine evaluation is the best

orthopy (a project of mine) also supports computation of Chebyshev polynomials. With
import orthopy
# from sympy.abc import x
x = 0.5
normalization = "normal" # or "classical", "monic"
evaluator = orthopy.c1.chebyshev1.Eval(x, normalization)
for _ in range(10):
print(next(evaluator))
0.5641895835477564
0.39894228040143276
-0.39894228040143265
...
you get the values of the polynomials with increasing degree at x = 0.5. You can use a list/vector of multiple values, or even sympy symbolics.
Computation happens with recurrence relations of course. If you're interested in the coefficients, check out
rc = orthopy.c1.chebyshev1.RecurrenceCoefficients("monic", symbolic=True)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Riemann sum of a probability density - python

Related

Python Integration: to calculate area under the curve

Can I use something like scipy.stats, in Python, to create a fitness function responds like a distribution

polyfit refining: setting polynomial to be always possitive

On ordinary differential equations (ODE) and optimization, in Python

Iterative polynomial multiplication -- Chebyshev polynomials in Python

Categories

Resources