Wrong inflection points

Wrong inflection points - python

I want to come up with a plot that shows the inflection points of a curve as follows:
I have a somewhat similar curve and I want to compute somehow the inflection points by using python. My curve looks as follows:
I am using the following code to compute the inflection points:
def find_inflection_points(df, n=1):
raw = df['consumption'].to_numpy()
infls = []
dx = 0
for i, x in enumerate(np.diff(raw, n)):
if x >= dx and i > 0:
infls.append(i*n)
dx = x
# plot results
plt.plot(raw, label='Input Data')
for i, infl in enumerate(infls, 1):
plt.axvline(x=infl, color='k', label=f'Inflection Point {i}')
plt.legend(bbox_to_anchor=(1.55, 1.0))
return infls
However, I am getting the following plot:
I would expect less inflection points. Any idea of what should I change or any other proposal for implementation?
EDIT:
The data is the following:
raw = np.array([52.33,
50.154444444444444,
48.69222222222223,
46.49111111111111,
44.01444444444444,
43.30555555555556,
43.034444444444446,
40.62888888888889,
40.38111111111111,
39.07666666666667,
38.339999999999996,
36.41444444444445,
36.37888888888889,
36.17111111111111,
35.666666666666664,
33.827777777777776,
29.35222222222222,
28.60888888888889,
24.43,
22.078888888888887,
21.756666666666664,
20.345555555555556,
19.874444444444446,
19.763333333333335])

So, strictly speaking, inflection point is indeed a change of sign of curvature. Which, for a 3 times differenciable function, is a point at which there is a change of sign of the 2nd derivative (the second derivative is 0, and the third derivative is not).
In your case, since the data are very discrete (only 24 data, one per hour in the day, I surmise), it is quite tricky to talk about second and third derivative. But if we give a try, we can see that, not only the point you are interested in are not inflection points (when the second derivative is 0, that means that the first derivative is locally constant. Which means that the slope is constant. And the points you seem to be interested in are, on the contrary, points where there is a change of slope. The the opposite of an inflection point!.
They are tho, inflection points of the derivative, since it seems that they are local extremum of the second derivative, so, at least if we had a continuous enough curve to dare speak of 3rd derivative, one could that that the extrema of second derivative are the 0 of the 3rd derivative. And 3rd derivative is the 2nd derivative of the 1st derivative. So, you could say that you are interested in the inflection points of the 1st derivative, that is of the slope.
See code below:
import numpy as np
from matplotlib import pyplot as plt
raw = np.array([52.33, 50.154444444444444, 48.69222222222223, 46.49111111111111, 44.01444444444444, 43.30555555555556, 43.034444444444446, 40.62888888888889, 40.38111111111111, 39.07666666666667, 38.339999999999996, 36.41444444444445, 36.37888888888889, 36.17111111111111, 35.666666666666664, 33.827777777777776, 29.35222222222222, 28.60888888888889, 24.43, 22.078888888888887, 21.756666666666664, 20.345555555555556, 19.874444444444446, 19.763333333333335])
x=np.arange(24)
rawb=np.roll(raw, 1)
rawa=np.roll(raw,-1)
der2=rawb+rawa-2*raw
der2[0]=der2[-1]=np.nan
der2abs=np.abs(der2)
offs = der2abs/(der2abs+np.roll(der2abs, -1))
yoffs = raw*(1-offs) + rawa*offs
chsign=(der2*np.roll(der2,-1))<0
s=1.5
mxder2 = (((der2>np.roll(der2,-1)) & (der2>np.roll(der2,1))) | ((der2<np.roll(der2,-1)) & (der2<np.roll(der2,1)))) & (der2abs>s)
fig, ax=plt.subplots()
ax2=ax.twinx()
ax.plot(raw)
ax2.plot([0]*24)
ax2.plot(der2)
ax.scatter(x[mxder2], raw[mxder2], 80, c='r', marker='*')
ax.scatter((x+offs)[chsign], yoffs[chsign], 30, c='g', marker='o')
plt.show()
It shows
The blue line are your data.
The orange line is the second derivative.
The green dots are the points where second derivative is 0.
While the red stars are the points where second derivative is an extrema (with a minimum absolute value: we don't count local extrema of area where second derivative is almost flatlining 0).
From what you have shown, you seem more interested in red stars!
The green dots are not just too numerous. Even filtering them (from an unknown criteria) would not do: they are all quite boring!
What makes the situation, and the vocabulary, ambiguous is the fact that we are talking about discrete points in reality. Inflection point are points where second derivative is 0. That is where 1st derivative is an extrema. And you need on the contrary points where second derivative is extreme. On so discrete set of data, you can be both tho. And maybe that was the case in your paper: points with sharp change of slopes are points where second derivative are extremely positive, but is surronded by 2 extremely negative second derivative (or the opposite).
But, my point is, you seem more interested in red stars.
As for how I compute that:
der2 is the second derivative, using discrete scheme y[-dt]-2y[0]+y[dt]
der2abs is its absolute value.
offs is a barycenter weighted by successive values of der2abs. Where there is a change of sign of the 2nd derivative, between index i and i+1, this account for an estimation of the exact position of the 0: offs is 0 if the 0 is at index i, 1 if it is at index 1, 0.5 if it is in the middle between i and i+1, etc. offs makes no sense where there is no change of sign (and we won't use those values).
yoffs is the raw value using the same barycenter. So, yoffs is yoffs[i], yoffs[i+1], yoffs[i+0.5] in the 3 previous cases (what would be yoffs[i+0.5] were a legal thing). Like offs, makes sense only where there is a change of sign of der2.
chsign is precisely what says where those change of sign occur.
So, we just have to plot yoffs[chsign] vs (x+offs)[chsign] to filter the cases where the second derivative are 0.
The red stars are easier to compute:
We just find all the points whose second derivative is either bigger or smaller than its 2 neighbor. And filter those to add a minimum value condition (|secondDerivative| must be at least 1.5)

Related

How to loop through a slope of linear regression and retrieve parameters and p value?

I have a dataset consisting of x and y, both are relative numbers. I want to iterate over the slope within a certain interval (let's say n = 3) and retrieve the model parameters, r- and p-value in a list. So I basically want to find the slope within a given set of points, move further down the x-axis and repeat this, over and over.
Eventually, I can find out at what point the slope is no longer significant and then the corresponding x-value. Underneath is an illustration of my idea. Red indicates significant slopes, blue indicates insignificant slopes as it moves further on the x-axis.
Illustration of iterative slope (I cannot yet upload photos here).

Matplotlib: How to make a histogram with bins of equal area?

Given some list of numbers following some arbitrary distribution, how can I define bin positions for matplotlib.pyplot.hist() so that the area in each bin is equal to (or close to) some constant area, A? The area should be calculated by multiplying the number of items in the bin by the width of the bin and its value should be no greater than A.
Here is a MWE to display a histogram with normally distributed sample data:
import matplotlib.pyplot as plt
import numpy as np
x = np.random.randn(100)
plt.hist(x, bin_pos)
plt.show()
Here bin_pos is a list representing the positions of the boundaries of the bins (see related question here.

I found this question intriguing. The solution depends on whether you want to plot a density function, or a true histogram. The latter case turns out to be quite a bit more challenging. Here is more info on the difference between a histogram and a density function.
Density Functions
This will do what you want for a density function:
def histedges_equalN(x, nbin):
npt = len(x)
return np.interp(np.linspace(0, npt, nbin + 1),
np.arange(npt),
np.sort(x))
x = np.random.randn(1000)
n, bins, patches = plt.hist(x, histedges_equalN(x, 10), normed=True)
Note the use of normed=True, which specifies that we're calculating and plotting a density function. In this case the areas are identically equal (you can check by looking at n * np.diff(bins)). Also note that this solution involves finding bins that have the same number of points.
Histograms
Here is a solution that gives approximately equal area boxes for a histogram:
def histedges_equalA(x, nbin):
pow = 0.5
dx = np.diff(np.sort(x))
tmp = np.cumsum(dx ** pow)
tmp = np.pad(tmp, (1, 0), 'constant')
return np.interp(np.linspace(0, tmp.max(), nbin + 1),
tmp,
np.sort(x))
n, bins, patches = plt.hist(x, histedges_equalA(x, nbin), normed=False)
These boxes, however, are not all equal area. The first and last, in particular, tend to be about 30% larger than the others. This is an artifact of the sparse distribution of the data at the tails of the normal distribution and I believe it will persist anytime their is a sparsely populated region in a data set.
Side note: I played with the value pow a bit, and found that a value of about 0.56 had a lower RMS error for the normal distribution. I stuck with the square-root because it performs best when the data is tightly-spaced (relative to the bin-width), and I'm pretty sure there is a theoretical basis for it that I haven't bothered to dig into (anyone?).
The issue with equal-area histograms
As far as I can tell it is not possible to obtain an exact solution to this problem. This is because it is sensitive to the discretization of the data. For example, suppose the first point in your dataset is an outlier at -13 and the next value is at -3, as depicted by the red dots in this image:
Now suppose the total "area" of your histogram is 150 and you want 10 bins. In that case the area of each histogram bar should be about 15, but you can't get there because as soon as your bar includes the second point, its area jumps from 10 to 20. That is, the data does not allow this bar to have an area between 10 and 20. One solution for this might be to adjust the lower-bound of the box to increase its area, but this starts to become arbitrary and does not work if this 'gap' is in the middle of the data set.

Find intersection of A(x) and B(y) in complex plane plus corr. x and y

suppose I have the following Problem:
I have a complex function A(x) and a complex function B(y). I know these functions cross in the complex plane. I would like to find out the corresponding x and y of this intersection point, numerically ( and/or graphically). What is the most clever way of doing that?
This is my starting point:
import matplotlib.pyplot as plt
import numpy as np
from numpy import sqrt, pi
x = np.linspace(1, 10, 10000)
y = np.linspace(1, 60, 10000)
def A_(x):
return -1/( 8/(pi*x)*sqrt(1-(1/x)**2) - 1j*(8/(pi*x**2)) )
A = np.vectorize(A_)
def B_(y):
return 3/(1j*y*(1+1j*y))
B = np.vectorize(B_)
real_A = np.real(A(x))
imag_A = np.imag(A(x))
real_B = np.real(B(y))
imag_B = np.imag(B(y))
plt.plot(real_A, imag_A, color='blue')
plt.plot(real_B, imag_B, color='red')
plt.show()
I don't have to plot it necessarily. I just need x_intersection and y_intersection (with some error that depends on x and y).
Thanks a lot in advance!
EDIT:
I should have used different variable names. To clarify what i need:
x and y are numpy arrays and i need the index of the intersection point of each array plus the corresponding x and y value (which again is not the intersection point itself, but some value of the arrays x and y ).

Here I find the minimum of the distance between the two curves. Also, I cleaned up your code a bit (eg, vectorize wasn't doing anything useful).
import matplotlib.pyplot as plt
import numpy as np
from numpy import sqrt, pi
from scipy import optimize
def A(x):
return -1/( 8/(pi*x)*sqrt(1-(1/x)**2) - 1j*(8/(pi*x**2)) )
def B(y):
return 3/(1j*y*(1+1j*y))
# The next three lines find the intersection
def dist(x):
return abs(A(x[0])-B(x[1]))
sln = optimize.minimize(dist, [1, 1])
# plotting everything....
a0, b0 = A(sln.x[0]), B(sln.x[1])
x = np.linspace(1, 10, 10000)
y = np.linspace(1, 60, 10000)
a, b = A(x), B(y)
plt.plot(a.real, a.imag, color='blue')
plt.plot(b.real, b.imag, color='red')
plt.plot(a0.real, a0.imag, "ob")
plt.plot(b0.real, b0.imag, "xr")
plt.show()
The specific x and y values at the intersection point are sln.x[0] and sln.x[1], since A(sln.x[0])=B(sln.x[1]). If you need the index, as you also mention in your edit, I'd use, for example, numpy.searchsorted(x, sln.x[0]), to find where the values from the fit would insert into your x and y arrays.
I think what's a bit tricky with this problem is that the space for graphing where the intersection is (ie, the complex plane) does not show the input space, but one has to optimize over the input space. It's useful for visualizing the solution, then, to plot the distance between the curves over the input space. That can be done like this:
data = dist((X, Y))
fig, ax = plt.subplots()
im = ax.imshow(data, cmap=plt.cm.afmhot, interpolation='none',
extent=[min(x), max(x), min(y), max(y)], origin="lower")
cbar = fig.colorbar(im)
plt.plot(sln.x[0], sln.x[1], "xw")
plt.title("abs(A(x)-B(y))")
From this it seems much more clear how optimize.minimum is working -- it just rolls down the slope to find the minimum distance, which is zero in this case. But still, there's no obvious single visualization that one can use to see the whole problem.
For other intersections one has to dig a bit more. That is, #emma asked about other roots in the comments, and there I mentioned that there's no generally reliable way to find all roots to arbitrary equations, but here's how I'd go about looking for other roots. Here I won't lay out the complete program, but just list the changes and plots as I go along.
First, it's obvious that for the domain shown in my first plot that there's only one intersection, and that there are no intersection in the region to the left. The only place there could be another intersection is to the right, but for that I'll need to allow the sqrt in the def of B to get a negative argument without throwing an exception. An easy way to do this is to add 0j to the argument of the sqrt, like this, sqrt(1+0j-(1/x)**2). Then the plot with the intersection becomes
I plotted this over a broader range (x=np.linspace(-10, 10, 10000) and y=np.linspace(-400, 400, 10000)) and the above is the zoom of the only place where anything interesting is going on. This shows the intersection found above, plus the point where it looks like the two curves might touch (where the red curve, B, comes to a point nearly meeting the blue curve A going upward), so that's the new interesting thing, and the thing I'll look for.
A bit of playing around with limits, etc, show that B is coming to a point asymptotically, and the equation of B is obvious that it will go to 0 + 0j for large +/- y, so that's about all there is to say for B.
It's difficult to understand A from the above plot, so I'll look at the real and imaginary parts independently:
So it's not a crazy looking function, and the jumping between Re=const and Im=const is just the nature of sqrt(1-x-2), which is pure complex for abs(x)<1 and pure real for abs(x)>1.
It's pretty clear now that the other time the curves are equal is at y= +/-inf and x=0. And, quick look at the equations show that A(0)=0+0j and B(+/- inf)=0+0j, so this is another intersection point (though since it occurs at B(+/- inf), it's sort-of ambiguous on whether or not it would be called an intersection).
So that's about it. One other point to mention is that if these didn't have such an easy analytic solution, like it wasn't clear what B was at inf, etc, one could also graph/minimize, etc, by looking at B(1/y), and then go from there, using the same tools as above to deal with the infinity. So using:
def dist2(x):
return abs(A(x[0])-B(1./x[1]))
Where the min on the right is the one initially found, and the zero, now at x=-0 and 1./y=0 is the other one (which, again, isn't interesting enough to apply an optimizer here, but it could be interesting in other equations).
Of course, it's also possible to estimate this by just finding the minimum of the data that goes into the above graph, like this:
X, Y = np.meshgrid(x, y)
data = dist((X, Y))
r = np.unravel_index(data.argmin(), data.shape)
print x[r[1]], y[r[0]]
# 2.06306306306 1.8008008008 # min approach gave 2.05973231 1.80069353
But this is only approximate (to the resolution of data) and involved many more calculations (1M compared to a few hundred). I only post this because I think it might be what the OP originally had in mind.

Briefly, two analytic solutions are derived for the roots of the problem. The first solution removes the parametric representation of x and solves for the roots directly in the (u, v) plane, where for example A(x): u(x) + i v(y) gives v(u) = f(u). The second solution uses a polar representation, e.g. A(x) is given by r(x) exp(i theta(x)), and offers a better understanding of the behavior of the square root as x passes through unity towards zero. Possible solutions occurring at the singular points are explored. Finally, a bisection root finding algorithm is constructed as a Python iterator to invert certain solutions. Summarizing, the one real root can be found as a solution to either of the following equations:
and gives:
x0 = -2.059732
y0 = +1.800694
A(x0) = B(y0) = (-0.707131, -i 0.392670)
As in most problems there are a number of ways to proceed. One can use a "black box" and hopefully find the root they are looking for. Sometimes an answer is all that is desired, and with a little understanding of the functions this may be an adequate way forward. Unfortunately, it is often true that such an approach will provide less insight about the problem then others.
For example, algorithms find it difficult locating roots in the global space. Local roots may be found with other roots lying close by and yet undiscovered. Consequently, the question arises: "Are all the roots accounted for?" A more complete understanding of the functions, e.g. asymptotic behaviors, branch cuts, singular points, can provide the global perspective to better answer this, as well as other important questions.
So another possible solution would be building one's own "black box." A simple bisection routine might be a starting point. Robust if the root lies in the initial interval and fairly efficient. This encourages us to look at the global behavior of the functions. As the code is structured and debugged the various functions are explored, new insights are gained, and the algorithm has become a tool towards a more complete solution to the problem. Perhaps, with some patience, a closed-form solution can be found. A Python iterator is constructed and listed below implementing a bisection root finding algorithm.
Begin by putting the functions A(x) and B(x) in a more standard form:
C(x) = u(x) + i v(x)
and here the complex number i is brought out of the denominator and into the numerator, casting the problem into the form of functions of a complex variable. The new representation simplifies the original functions considerably. The real and imaginary parts are now clearly separated. An interesting graph is to plot A(x) and B(x) in the 3-dimensional space (u, v, x) and then visualize the projection into the u-v plane.
import numpy as np
from numpy import real, imag
import matplotlib.pyplot as plt
ax = fig.gca(projection='3d')
s = np.linspace(a, b, 1000)
ax.plot(f(s).real, f(s).imag, z, color='blue')
ax.plot(g(s).real, g(s).imag, z, color='red')
ax.plot(f(s).real, f(s).imag, 0, color='black')
ax.plot(g(s).real, g(s).imag, 0, color='black')
The question arises: "Can the parametric representation be replaced so that a relationship such as,
A(x): u(x) + i v(x) gives v(u) = f(u)
is obtained?" This will provide A(x) as a function v(u) = f(u) in the u-v plane. Then, if for
B(x): u(x) + i v(x) gives v(u) = g(u)
a similar relationship can be found, the solutions can be set equal to one another,
f(u) = g(u)
and the root(s) computed. In fact, it is convenient to look for a solution in the square of the above equation. The worst case is that an algorithm will have to be built to find the root, but at this point the behavior of our functions are better understood. For example, if f(u) and g(u) are polynomials of degree n then it is known that there are n roots. The best case is that a closed-form solution might be a reward for our determination.
Here is more detail to the solution. For A(x) the following is derived:
and v(u) = f(u) is just v(u) = constant. Similarly for B(x) a slightly more complex form is required:
Look at the function g(u) for B(x). It is imaginary if u > 0, but the root must be real since f(u) is real. This means that u must be less then 0, and there is both a positive and negative real branch to the square root. The sign of f(u) then allows one to pick the negative branch as the solution for the root. So the fact that the solution must be real is determined by the sign of u, and the fact that the real root is negative specifies what branch of the square root to choose.
In the following plot both the real (u < 0) and complex (u > 0) solutions are shown.
The camera looks toward the origin in the back corner, where the red and blue curves meet. The z-axis is the magnitude of f(u) and g(u). The x and y axes are the real/complex values of u respectively. The blue curves are the real solution with (3 - |u|). The red curves are the complex solution with (3 + |u|). The two sets meet at u = 0. The black curve is f(u) equal to (-pi/8).
There is a divergence in g(u) at |u| = 3 and this is associated with x = 0. It is far removed from the solution and will not be considered further.
To obtain the roots to f = g it is easier to square f(u) and equate the two functions. When the function g(u) is squared the branches of the square root are lost, much like squaring the solutions for x**2 = 4. In the end the appropriate root will be chosen by the sign of f(u) and so this is not an issue.
So by looking at the dependence of A and B, with respect to the parametric variable x, a representation for these functions was obtained where v is a function of u and the roots found. A simpler representation can be obtained if the term involving c in the square root is ignored.
The answer gives all the roots to be found. A cubic equation has at most three roots and one is guaranteed to be real. The other two may be imaginary or real. In this case the real root has been found and the other two roots are complex. Interestingly, as c changes these two complex roots may move into the real plane.
In the above figure the x-axis is u and the y axis is the evaluated cubic equation with constant c. The blue curve has c as (pi/8) squared. The red curve uses a larger and negative value for c, and has been translated upwards for purposes of demonstration. For the blue curve there is an inflection point near (0, 0.5), while the red curve has a maximum at (-0.9, 2.5) and a minimum at (0.9, -0.3).
The intersection of the cubic with the black line represents the roots given by: u**3 + c u + 3c = 0. For the blue curve there is one intersection and one real root with two roots in the complex plane. For the red curve there are three intersections, and hence 3 roots. As we change the value of the constant c (blue to red) the one real root undergoes a "pitchfork" bifurcation, and the two roots in the complex plane move into the real space.
Although the root (u_t, v_t) has been located, obtaining the value for x requires that (u, v) be inverted. In the present example this is a trivial matter, but if not, a bisection routine can be used to avoid the algebraic difficulties.
The parametric representation also leads to a solution for the real root, and it rounds out the analysis with an independent verification of the first result. Second, it answers the question about what happens at the singularity in the square root. Third, it gives a greater understanding of the multiplicity of roots.
The steps are: (1) convert A(x) and B(x) into polar form, (2) equate the modulus and argument giving two equations in two unknowns, (3) make a substitution for z = x**2. Converting A(x) to polar form:
Absolute value bars are not indicated, and it should be understood that the moduli r(x) and s(x) are positive definite as their names imply. For B(x):
The two equations in two unknowns:
Finally, the cubic solution is sketched out here where the substitution z = x**2 has been made:
The solution for z = x**2 gives x directly, which allows one to substitute into both A(x) and B(x). This is an exact solution if all terms are maintained in the cubic solution, and there is no error in x0, y0, A(x0), or B(x0). A simpler representation can be found by considering terms proportional to 1/d as small.
Before leaving the polar representation consider the two singular points where: (1) abs(x) = 1, and (2) x = 0. A complicating factor is that the arguments behave something like 1/x instead of x. It is worthwhile to look at a plot of the arctan(a) and then ask yourself how that changes if its argument is replaced by 1/a. The following graphs will then look less foreign.
Consider the polar representation of B(x). As x approaches 0 the modulus and argument tend toward infinity, i.e. the point is infinitely far from the origin and lies along the y-axis. Approaching 0 from the negative direction the point lies along the negative y-axis with varphi = (-pi/2), while approaching from the other direction the point lies along the positive y-axis with varphi = (+pi/2).
A somewhat more complicated behavior is exhibited by A(x). A(x) is even in x since the modulus is positive definite and the argument involves only x**2. There is a symmetry across the y-axis that allows us to only consider the x > 0 plane.
At x = 1 the modulus is just (pi/8), and as x continues to approach 0 so does r(x). The behavior of the argument is more complex. As x approaches unity from large positive values the argument is diverging towards +inf and so theta is approaching (+pi/2). As x passes through 1 the argument becomes complex. At x equals 0 the argument has reached its minimum value of -i. For complex arguments the arctan is given by:
The following are plots of the arguments for A(x) and B(x). The x-axis is the value of x, and the y-axis is the value of the angle in units of pi. In the first plot theta is shown in blue curves, and as x approaches 1 the angle approaches (+pi/2). Theta is real because abs(x) >= 1, and notice it is symmetric across the y-axis. The black curve is varphi and as x approaches 0 it approaches plus or minus (pi/2). Notice it is an odd function in x.
In the second plot A(x) is shown where abs(x) < 1 and the argument becomes complex. Near x = 1 theta is equal to (+pi/2), the blue curve, minus a small imaginary part, the red curve. As x approaches zero theta is equal to (+pi/2) minus a large imaginary part. At x equals 0 the argument is equal to -i and theta = (+pi/2) minus an infinite imaginary part, i.e ln(0) = -inf:
The values for x0 and y0 are determined by the set of equations that equate modulus and argument of A(x) and B(x), and there are no other roots. If x0 = 0 was a root, then it would fall out of these equations. The same holds for x0 = 1. In fact, if one uses approximations in the argument of A(x) about these points, and then substitutes into the equation for the modulus, the equality cannot be maintained there.
Here is another perspective: consider the set of equations where x is assumed large and call it x_inf. The equation for the argument then gives x_inf = y_inf, where 1 is neglected with respect to x_inf squared. Upon substitution into the second equation a cubic is obtained in x_inf. Will this give the correct answer? Yes, if x0 is actually large, and in this case you might get away with it since x0 is approximately 2. The difference between the sqrt(4) and the sqrt(5) is around 10%. But does this mean that x_inf = 100 is a solution? No it does not: x_inf is only a solution if it equals x0.
The initial reason for examining the problem in the first place was to find a context for building a root-finding bisection routine as a Python iterator. This can be used to find any of the roots discussed here, and looks something like this:
class Bisection:
def __init__(self, a, b, func, max_iter):
self.max_iter = max_iter
self.count_iter = 0
self.a = a
self.b = b
self.func = func
fa = func(self.a)
fb = func(self.b)
if fa*fb >= 0.0:
raise ValueError
def __iter__(self):
self.x1 = self.a
self.x2 = self.b
self.xmid = self.x1 + ((self.x2 - self.x1)/2.0)
return self
def __next__(self):
f1 = self.func(self.x1)
f2 = self.func(self.x2)
error = abs(f1 - f2)
fmid = self.func(self.xmid)
if fmid == 0.0:
return self.xmid
if f1*fmid < 0:
self.x2 = self.xmid
else:
self.x1 = self.xmid
self.xmid = self.x1 + ((self.x2 - self.x1)/2.0)
f1 = self.func(self.x1)
fmid = self.func(self.xmid)
self.count_iter += 1
if self.count_iter >= self.max_iter:
raise StopIteration
return self.xmid
The routine does only a minimal amount in the way of catching exceptions and was used to find x for the given solution in the u-v plane. The arguments a and b give the lower and upper brackets for the root to be found. The argument func is the function for the root to be found. This might look like: u0 - B(x).real. The constant max_iterations tells the iterator to stop after a given number of bisections has been attempted.

How can I find the critical points of a 2D array in python?

I have a (960,960) array an I am trying to find the critical points so I can find the local extrema.
I have tried using the np.diff and np.gradient, but I have run into some trouble and I am not sure which function to use.
np.diff offers the option of calculating the second order diff, but the gradient doesn't.
How should I go about getting the critical points?
I've tried
diff = np.diff(storm, n=2)
dxx = diff[0]
dyy = diff[1]
derivative = dyy/dxx
I run into problems here because some of the values along the dxx are equal to zero.
Then there's the option of
gradient = np.gradient(storm)
g2 = np.gradient(gradient)
but will this give me what I'm looking for?

Critical point is the point where the first derivative (or gradient in multi-dimensional case) of a function is 0. Thus, you should check the x- and y- difference of your function. numpy's diff function is good for this case.
So, if the differences between two neighboring elements in x- y- directions are close to 0, then you can say that that point is a critical point. That's when the difference changes its sign (from negative to positive, or vice versa), assuming the your function is smooth.
# get difference in x- and y- direction
sec_grad_x = np.diff(storm,n=1,axis=0)
sec_grad_y = np.diff(storm,n=1,axis=1)
cp = []
# starts from 1 because diff function gives a forward difference
for i in range(1,n-1):
for j in range(1,n-1):
# check when the difference changes its sign
if ((sec_grad_x[i-1,j]<0) != (sec_grad_x[i-1+1,j]<0)) and \
((sec_grad_y[i,j-1]<0) != (sec_grad_y[i,j-1+1]<0)):
cp.append([i,j, storm[i,j]])
cp = np.array(cp)

These spectrum bands used to be judged by eye, how to do it programmatically?

Operators used to examine the spectrum, knowing the location and width of each peak and judge the piece the spectrum belongs to. In the new way, the image is captured by a camera to a screen. And the width of each band must be computed programatically.
Old system: spectroscope -> human eye
New system: spectroscope -> camera -> program
What is a good method to compute the width of each band, given their approximate X-axis positions; given that this task used to be performed perfectly by eye, and must now be performed by program?
Sorry if I am short of details, but they are scarce.
Program listing that generated the previous graph; I hope it is relevant:
import Image
from scipy import *
from scipy.optimize import leastsq
# Load the picture with PIL, process if needed
pic = asarray(Image.open("spectrum.jpg"))
# Average the pixel values along vertical axis
pic_avg = pic.mean(axis=2)
projection = pic_avg.sum(axis=0)
# Set the min value to zero for a nice fit
projection /= projection.mean()
projection -= projection.min()
#print projection
# Fit function, two gaussians, adjust as needed
def fitfunc(p,x):
return p[0]*exp(-(x-p[1])**2/(2.0*p[2]**2)) + \
p[3]*exp(-(x-p[4])**2/(2.0*p[5]**2))
errfunc = lambda p, x, y: fitfunc(p,x)-y
# Use scipy to fit, p0 is inital guess
p0 = array([0,20,1,0,75,10])
X = xrange(len(projection))
p1, success = leastsq(errfunc, p0, args=(X,projection))
Y = fitfunc(p1,X)
# Output the result
print "Mean values at: ", p1[1], p1[4]
# Plot the result
from pylab import *
#subplot(211)
#imshow(pic)
#subplot(223)
#plot(projection)
#subplot(224)
#plot(X,Y,'r',lw=5)
#show()
subplot(311)
imshow(pic)
subplot(312)
plot(projection)
subplot(313)
plot(X,Y,'r',lw=5)
show()

Given an approximate starting point, you could use a simple algorithm that finds a local maxima closest to this point. Your fitting code may be doing that already (I wasn't sure whether you were using it successfully or not).
Here's some code that demonstrates simple peak finding from a user-given starting point:
#!/usr/bin/env python
from __future__ import division
import numpy as np
from matplotlib import pyplot as plt
# Sample data with two peaks: small one at t=0.4, large one at t=0.8
ts = np.arange(0, 1, 0.01)
xs = np.exp(-((ts-0.4)/0.1)**2) + 2*np.exp(-((ts-0.8)/0.1)**2)
# Say we have an approximate starting point of 0.35
start_point = 0.35
# Nearest index in "ts" to this starting point is...
start_index = np.argmin(np.abs(ts - start_point))
# Find the local maxima in our data by looking for a sign change in
# the first difference
# From http://stackoverflow.com/a/9667121/188535
maxes = (np.diff(np.sign(np.diff(xs))) < 0).nonzero()[0] + 1
# Find which of these peaks is closest to our starting point
index_of_peak = maxes[np.argmin(np.abs(maxes - start_index))]
print "Peak centre at: %.3f" % ts[index_of_peak]
# Quick plot showing the results: blue line is data, green dot is
# starting point, red dot is peak location
plt.plot(ts, xs, '-b')
plt.plot(ts[start_index], xs[start_index], 'og')
plt.plot(ts[index_of_peak], xs[index_of_peak], 'or')
plt.show()
This method will only work if the ascent up the peak is perfectly smooth from your starting point. If this needs to be more resilient to noise, I have not used it, but PyDSTool seems like it might help. This SciPy post details how to use it for detecting 1D peaks in a noisy data set.
So assume at this point you've found the centre of the peak. Now for the width: there are several methods you could use, but the easiest is probably the "full width at half maximum" (FWHM). Again, this is simple and therefore fragile. It will break for close double-peaks, or for noisy data.
The FWHM is exactly what its name suggests: you find the width of the peak were it's halfway to the maximum. Here's some code that does that (it just continues on from above):
# FWHM...
half_max = xs[index_of_peak]/2
# This finds where in the data we cross over the halfway point to our peak. Note
# that this is global, so we need an extra step to refine these results to find
# the closest crossovers to our peak.
# Same sign-change-in-first-diff technique as above
hm_left_indices = (np.diff(np.sign(np.diff(np.abs(xs[:index_of_peak] - half_max)))) > 0).nonzero()[0] + 1
# Add "index_of_peak" to result because we cut off the left side of the data!
hm_right_indices = (np.diff(np.sign(np.diff(np.abs(xs[index_of_peak:] - half_max)))) > 0).nonzero()[0] + 1 + index_of_peak
# Find closest half-max index to peak
hm_left_index = hm_left_indices[np.argmin(np.abs(hm_left_indices - index_of_peak))]
hm_right_index = hm_right_indices[np.argmin(np.abs(hm_right_indices - index_of_peak))]
# And the width is...
fwhm = ts[hm_right_index] - ts[hm_left_index]
print "Width: %.3f" % fwhm
# Plot to illustrate FWHM: blue line is data, red circle is peak, red line
# shows FWHM
plt.plot(ts, xs, '-b')
plt.plot(ts[index_of_peak], xs[index_of_peak], 'or')
plt.plot(
[ts[hm_left_index], ts[hm_right_index]],
[xs[hm_left_index], xs[hm_right_index]], '-r')
plt.show()
It doesn't have to be the full width at half maximum — as one commenter points out, you can try to figure out where your operators' normal threshold for peak detection is, and turn that into an algorithm for this step of the process.
A more robust way might be to fit a Gaussian curve (or your own model) to a subset of the data centred around the peak — say, from a local minima on one side to a local minima on the other — and use one of the parameters of that curve (eg. sigma) to calculate the width.
I realise this is a lot of code, but I've deliberately avoided factoring out the index-finding functions to "show my working" a bit more, and of course the plotting functions are there just to demonstrate.
Hopefully this gives you at least a good starting point to come up with something more suitable to your particular set.

Late to the party, but for anyone coming across this question in the future...
Eye movement data looks very similar to this; I'd base an approach off that used by Nystrom + Holmqvist, 2010. Smooth the data using a Savitsky-Golay filter (scipy.signal.savgol_filter in scipy v0.14+) to get rid of some of the low-level noise while keeping the large peaks intact - the authors recommend using an order of 2 and a window size of about twice the width of the smallest peak you want to be able to detect. You can find where the bands are by arbitrarily removing all values above a certain y value (set them to numpy.nan). Then take the (nan)mean and (nan)standard deviation of the remainder, and remove all values greater than the mean + [parameter]*std (I think they use 6 in the paper). Iterate until you're not removing any data points - but depending on your data, certain values of [parameter] may not stabilise. Then use numpy.isnan() to find events vs non-events, and numpy.diff() to find the start and end of each event (values of -1 and 1 respectively). To get even more accurate start and end points, you can scan along the data backward from each start and forward from each end to find the nearest local minimum which has value smaller than mean + [another parameter]*std (I think they use 3 in the paper). Then you just need to count the data points between each start and end.
This won't work for that double peak; you'd have to do some extrapolation for that.

The best method might be to statistically compare a bunch of methods with human results.
You would take a large variety data and a large variety of measurement estimates (widths at various thresholds, area above various thresholds, different threshold selection methods, 2nd moments, polynomial curve fits of various degrees, pattern matching, and etc.) and compare these estimates to human measurements of the same data set. Pick the estimate method that correlates best with expert human results. Or maybe pick several methods, the best one for each of various heights, for various separations from other peaks, and etc.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Wrong inflection points - python

Related

How to loop through a slope of linear regression and retrieve parameters and p value?

Matplotlib: How to make a histogram with bins of equal area?

Find intersection of A(x) and B(y) in complex plane plus corr. x and y

How can I find the critical points of a 2D array in python?

These spectrum bands used to be judged by eye, how to do it programmatically?

Categories

Resources