I'm trying to create a function to find the rolling derivatives (first and second) in Pandas.
I find that df.diff() is quite convenient.
I want to find the derivatives with the rolling window value = 40.
For the first derivative,
noise = np.random.normal(size=int(1e4))
noise=pd.DataFrame(noise)
first_derivative=noise.diff(periods=40)
Is it correct if I use this for the second derivative?
second_derivative=noise.diff(periods=40).diff()
I'm confused, but if I put periods=40 again in the second .diff() then it would be 40*40 rolling window (for the second derivative).
Thank you!
Pandas is not a mathematical library, and its diff() operation just take discrete differences among elements, not derivatives.
In order to take derivatives, I would recommend you to use SymPy, a nice Python library for symbolic mathematics. Check documentation for further details.
Example:
from sympy import *
>> diff(cos(x), x)
-sin(x)
Related
I have a polynomial function for which I would like to find all local extrema. I can evaluate the polynomial via P(x) and to its derivative via d_P(x).
My first thought was to use minimize_scalar, however this does not seem to be able to take advantage of the fact that I can evaluate the derivative. Alternatively, I can use the more general minimize function and provide the gradient.
Is there a rule of thumb about which method will work better, or is this something where I should test out both methods and see what works better. Since the function I am optimizing is a polynomial (well behaved) I wonder if it really matters so much which I use, but if someone has a more background that would be great.
In particular, P(x) is the (unique) polynomial of degree n which alternatively attains a value of 1 or -1 on a set of n-1 points.
Here is a sample of the P(x) scaled so that P(0)=1. Note that the y axis is plotted on a symlog scale.
Since you have a continuous scalar function, the documentation of minimize_scalar suggests a more discrete optimization approach. Since it doesn't use gradient information you won't have trouble with noise/discontinuities/discreteness in your objective. However, if you use minimize in conjunction with a gradient based method then you will have trouble with convergence for noise/discontinuities/discreteness.
If the objective function is fist order continuous then both minimize and minimize_scalar should yield the same solution for a given bound.
I'm stuck trying to get functions that are existent in scipy (or sympy) for the following task:
Suppose we are given the following function:
f(A,B,C) = k1-A*sin(B*k2-C)
for each of the axis A,B,C of the space we have a specific interval, like [a_lb, a_ub], [b_lb, b_ub], [c_lb, c_ub], [d_lb, d_ub].
Which functions of scipy can be used to compute if the space encompassed by the boundaries is intersected by the given function? I thought of like e.g. computing the Hessian matrix.
Thank you for hints
Best regards
If I understand correctly, what you are looking for is an answer to whether f(A,B,C) bounded in the domain [a_l,a_u]x[b_l,b_u]x[c_l,c_u] has a value within [d_l,d_u]. You can try using scipy.optimize.minimize for this.
If you run scipy.optimize.minimize on f with the bounds [a_l,a_u]x[b_l,b_u]x[c_l,c_u], you should get the minimal value of f in the domain. Similarly, minimizing -f will give you the maximal value of f in the domain. f intersects the given boundary if and only if the interval [fmin, fmax] intersects the interval [d_l,d_u].
Note that scipy.optimize.minimize is a non-linear optimization and therefore requires an initial guess. The middle point of the domain box is a natural choice, but since the non-linear optimization may encounter a local minimum (or not converge), you may want to try several other initial guesses as well. scipy.optimize.minimize has many (optional) parameters so I recommend you read its documentation and play with them to fine-tune your usage to your needs.
I have a custom step function that, when plotted, is like this:
I have modified the datetimes to float values using .timestamp(), and have used scipy.integrate.quad to integrate this. However, the absolute error I get as an output is incredibly high - over the full range of my data (approximately 30x the plot, but in the similar pattern) - I get a integral value of 776197710.7495924 but an error value of 525307969.5046983 which is the same magnitude. When I choose not to use the step function and instead smoothly interpolate over the points using scipy.interp1d and then integrate that function, I get similar figures.
How can I integrate this with less error?
EDIT: I am fully aware of how to just sum the rectangles created by the step function, but I want to understand how I might do this with scipy. Thanks!
I normally get good enough results with the simpsons integration rule --- Wiki:
import numpy as np
from scipy import integrate
x = np.linspace(-3.145, 3.145, 1000)
y = np.sin(x)/x
integrate.simps(y, x)
>> 3.7038704152755586
Which is consistent with the expected result.
If you are having bad results, maybe your signal is to long and you are accumulating to many approximation errors ?
I've been looking for a way to code a snippet in Python which calculate for any n-th order of Fourier series curve fitting. To calculate a certain order of Fourier series curve fitting, say 3 order is quite simple, however to do it where the order n is variable, still not workable yet. Perhaps somebody has done it, but my searching can't find it yet. I wonder if anybody could give a help. Thanks.
Well the formula is
n-th cos_coeff = (2/T)*integral(-T/2,T/2, f(t)*cos(n*t*2*pi/2)dt)
n-th sin coeff = (2/T)*integral(-T/2,T/2, f(t)*sin(n*t*2*pi/2)dt)
Check scypi and scipy.integrate for details on integration.
Here it should be
cos_coeff(f, T, N) = (2/T)*quad(lambda t: f(t)*cos(N*t*2*math.pi/2),-T/2,T/2)
(Not tested, though)
I am not familiar with Discrete Fourier Transform, but you can perhaps compute said coefficient from it, too. Check
http://docs.scipy.org/doc/scipy/reference/tutorial/fftpack.html
I have a graph between 2 functions f and g.
I know it follows a power law function with exponential cutoff.
f(x) = x**(-alpha)*e**(-lambda*x)
How do I find the value of exponent alpha?
If you have sufficiently close x points (for example one every 0.1), you can try the following:
ln(f(x)) = -alpha ln(x) - lambda x
ln(f(x))' = - alpha / x - lambda
So depending on where you have your points:
If you have a lot of points near 0, you can try:
h(x) = x ln(f(x))' = -alpha - lambda x
So the limit of the function h when x goes to 0 is -alpha
If you have large values of x, the function x -> ln(f(x))' tends toward lambda when x goes to infinity, so you can guess lambda and use pwdyson's expression.
If you don't have close x points, the numerical derivative will be very noisy, so I would try to guess lambda as the limit of -ln(f(x)/x for large x's...
If you don't have large values, but a large number of x's, you can try a minimization of
sum_x_i (ln(y_i) + alpha ln(x_i) + lambda x_i) ^2
on both alpha and lambda (I guess It would be more precise than the initial expression)...
It is a simple least square regression (numpy.linalg.lstsq will do the job).
So you have plenty of methods, the one to chose really depends on you inputs.
The usual and general way of doing what you want is to perform a non-linear regression (even though, as noted in another response, it is possible to linearize the problem). Python can do this quite easily with the help of the SciPy package, which is used by many scientists.
The routine you are looking for is its least-square optimization routine (scipy.optimize.leastsq). Once you wrap your head around the way this general optimization procedure works (see the example), you will probably find many other opportunities to use it. Basically, you calculate the list of differences between your measurements and their ideal value f(x), and you ask SciPy to find the parameters that make these differences as small as possible, so that your data fits the model as well as possible. This then gives you the parameter you are looking for.
It sounds like you might be trying to fit a power-law to a distribution with an exponential cutoff at the low end due to incompleteness - but I may be reading too far into your problem.
If that is the problem you're dealing with, this website (and accompanying publication) addresses the issue: http://tuvalu.santafe.edu/~aaronc/powerlaws/. I wrote the python implementation of the power-law fitter on that page; it is linked from there.
If you know that the points follow this law exactly, then invert the equation and put in an x and its corresponding f(x) value:
import math
alpha = -(lambda*x + math.log(f(x)))/math.log(x)
But the if the points do not exactly fit the equation you will need to do some sort of regression to determine alpha.
EDIT: Ok, so they don't fit exactly. This is getting beyond a Python question, but there may be something in numpy that can handle it. Here is a numpy linear regression recipe but your equation can't be rearranged into a linear form, so you'll have to look into non-linear regression.