Calculating area within a curve

Calculating area within a curve - python

I am measuring the current that passes through a sample as a vary the voltage over it. The result is a current-voltage plot like this
https://content.sciendo.com/view/journals/joeb/9/1/graphic/j_joeb-2018-0023_fig_004.jpg
I want to calculate the area within the curve of the first half period (part of the curve in the first quadrant). Not sure what the best way to do this in python. I have tried to write some code that finds pairs with same x-coordinate and subtracts the bottom y-value from the top y-value, and iterates over all the points in the first quadrant.
def LobeAreaByPeriod(all_periods_I):
print('Starting Lobe area by period')
LobeArea = []
for period in all_periods_I:
halfperiod = round(len(period)/2)
duration = round(halfperiod/2)
area = 0
for i in range(duration):
area += (period[i] - period[halfperiod - i])
LobeArea.append(area)
return LobeArea
I am not certain the pairs will be located directly above each other this way and I find it difficult to check it the answer is really correct. Any tips on how to do this?

To calculate area within a curve you're going to have to do some sort of approximation. Depending on your necessity of precision the method you choose will vary. I like the trapezoidal rule. Numpy has an implementation that is quite nice.
np.trapz

Related

Find Start and End point of a Curve in Python

I have a signal with 5 curves and my goal is to find the start, peak, and end point of each curve.
To find the peaks, I am using find_peaks function from scipy.signal. The following screenshot shows signal and the 5 peaks:
In order to find the start point of each of these curves, I have written a function that checks for the percentage difference between every pair of signal values. If the difference is less than or equal to 1%, then I mark it as the start point.
def find_start(peak, leftLimit, signal):
# Go in left direction from the peak
# Start at the current curve's peak, and go until the previous curve's peak (or until zero in case of very first curve)
for i in range(peak, leftLimit, -1):
diff = signal[i] - signal[i-1]
perc_change = float(diff)/signal[i] * 100
perc_change = round(abs(perc_change), 3)
if perc_change <= 1.0:
return i
return -1
I have a similar function to find the end point of each curve.
Now, although these two functions work in general, they fail if the signal is quite distorted and is not constant (straight horizontal line) outside the 5 curves. The following screenshot shows one such case:
Is there a better way (or any Python Library function) that can help locate the start and end points for each curve?

I would try folding your signal with a defined pulse (rectangular or triangle) and shift this triangle over your signal.
It is basically described here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.correlate.html
In the resulting correlation result you could apply a simple thresholding to get the indices where it rises/dropes over/below the threshold.

Find plateau in Numpy array

I am looking for an efficient way to detect plateaus in otherwise very noisy data. The plateaus are always relatively broad A simple example of what this data could look like:
test=np.random.uniform(0.9,1,100)
test[10:20]=0
plt.plot(test)
Note that there can be multiple plateaus (which should all be detected) which can have different values.
I've tried using scipy.signal.argrelextrema, but it doesn't seem to be doing what I want it to:
peaks=argrelextrema(test,np.less,order=25)
plt.vlines(peaks,ymin=0, ymax=1)
I don't need the exact interval of the plateau- a rough range estimate would be enough, as long as that estimate is bigger or equal than the actual plateau range. It should be relatively efficient however.

There is a method scipy.signal.find_peaks that you can try, here is an exmple
import numpy
from scipy.signal import find_peaks
test = numpy.random.uniform(0.9, 1.0, 100)
test[10 : 20] = 0
peaks, peak_plateaus = find_peaks(- test, plateau_size = 1)
although find_peaks only finds peaks, it can be used to find valleys if the array is negated, then you do the following
for i in range(len(peak_plateaus['plateau_sizes'])):
if peak_plateaus['plateau_sizes'][i] > 1:
print('a plateau of size %d is found' % peak_plateaus['plateau_sizes'][i])
print('its left index is %d and right index is %d' % (peak_plateaus['left_edges'][i], peak_plateaus['right_edges'][i]))
it will print
a plateau of size 10 is found
its left index is 10 and right index is 19

This is really just a "dumb" machine learning task. You'll want to code a custom function to screen for them. You have two key characteristics to a plateau:
They're consecutive occurrences of the same value (or very nearly so).
The first and last points deviate strongly from a forward and backward moving average, respectively. (Try quantifying this based on the standard deviation if you expect additive noise, for geometric noise you'll have to take the magnitude of your signal into account too.)
A simple loop should then be sufficient to calculate a forward moving average, stdev of points in that forward moving average, reverse moving average, and stdev of points in that reverse moving average.
Read until you find a point well outside the regular noise (compare to variance). Start buffering those indices into a list.
Keep reading and buffering indices into that list while they have the same value (or nearly the same, if your plateaus can be a little rough; you'll want to use some tolerance plus the standard deviation of your plateaus, or just some tolerance if you expect them all to behave similarly).
If the variance of the points in your buffer gets too high, it's not a plateau, too rough; throw it out and start scanning again from your current position.
If the last value was very different from the previous (on the order of the change that triggered your code to start buffering indices) and in the opposite direction of the original impulse, cap your buffer here; you've got a plateau there.
Now do whatever you want with the points at those indices. Delete them, replace them with a linear interpolation between the two boundary points, whatever.
I could generate some noise and give you some sample code, but this is really something you're going to have to adapt to your application. (For example, there's a shortcoming in this method that a plateau which captures a point on the middle of the "cliff edge" may leave that point when it removes the rest of the plateau. If that's something you're worried about, you'll have to do a little more exploring after you ID the plateau.) You should be able to do this in a single pass over the data, but it might be wise to get some statistics on the whole set first to intelligently tweak your thresholds.
If you have an exact definition of what constitutes a plateau, you can make this a lot less hand-wavey and ML-looking, but so long as you're trying to identify fuzzy pattern, you're gonna have to take a statistics-based approach.

I had a similar problem, and found a simple heuristic solution shared below. I find plateaus as ranges of constant gradient of the signal. You could change the code to also check that the gradient is (close to) 0.
I apply a moving average (uniform_filter_1d) to filter out noise. Also, I calculate the first and second derivative of the signal numerically, so I'm not sure it matches the requirement of efficiency. But it worked perfectly for my signal and might be a good starting point for others.
def find_plateaus(F, min_length=200, tolerance = 0.75, smoothing=25):
'''
Finds plateaus of signal using second derivative of F.
Parameters
----------
F : Signal.
min_length: Minimum length of plateau.
tolerance: Number between 0 and 1 indicating how tolerant
the requirement of constant slope of the plateau is.
smoothing: Size of uniform filter 1D applied to F and its derivatives.
Returns
-------
plateaus: array of plateau left and right edges pairs
dF: (smoothed) derivative of F
d2F: (smoothed) Second Derivative of F
'''
import numpy as np
from scipy.ndimage.filters import uniform_filter1d
# calculate smooth gradients
smoothF = uniform_filter1d(F, size = smoothing)
dF = uniform_filter1d(np.gradient(smoothF),size = smoothing)
d2F = uniform_filter1d(np.gradient(dF),size = smoothing)
def zero_runs(x):
'''
Helper function for finding sequences of 0s in a signal
https://stackoverflow.com/questions/24885092/finding-the-consecutive-zeros-in-a-numpy-array/24892274#24892274
'''
iszero = np.concatenate(([0], np.equal(x, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
# Find ranges where second derivative is zero
# Values under eps are assumed to be zero.
eps = np.quantile(abs(d2F),tolerance)
smalld2F = (abs(d2F) <= eps)
# Find repititions in the mask "smalld2F" (i.e. ranges where d2F is constantly zero)
p = zero_runs(np.diff(smalld2F))
# np.diff(p) gives the length of each range found.
# only accept plateaus of min_length
plateaus = p[(np.diff(p) > min_length).flatten()]
return (plateaus, dF, d2F)

Random Walk of a Photon Through the Sun

For a project, I am trying to determine the time it would take for a photon to leave the Sun. However, I am having trouble with my code (found below).
More specifically, I set up a for loop with an if statement, and if some randomly generated probability is less than the probability of collision, that means the photon collides and it changes direction.
What I am having trouble with is setting up a condition where the for loop stops if the photon escapes (when distance > Sun radius). The one I have set up already doesn't appear to work.
I use a very scaled down measurement of the Sun's radius because if I didn't it would take a long time for the photon to escape in my simulation.
from numpy.random import random as rng # we want them random numbers
import numpy as np # for the math functions
import matplotlib.pyplot as plt # to make pretty pretty class
mass_proton = 1.67e-27
mass_electron = 9.11e-31
Thompson_cross = 6.65e-29
Sun_density = 150000
Sun_radius = .005
Mean_Free = (mass_proton + mass_electron)/(Thompson_cross*Sun_density*np.sqrt(2))
time_step= 10**-13 # Used this specifically in order for the path length to be < Mean free Path
path_length = (3e8)*time_step
Probability = 1-np.exp(-path_length/Mean_Free) # Probability of the photon colliding
def Random_walk():
x = 0 # Start at origin (0,0)
y = 0
N = 1000
m=0 # This is a counter I have set up for the number of collisions
for i in range(1,N+1):
prand = rng(N+1) # Randomly generated probability
if prand[i] < Probability: # If my prand is less than the probability
# of collision, the photon collides and changes
# direction
x += Mean_Free*np.cos(2*np.pi*prand)
y += Mean_Free*np.sin(2*np.pi*prand)
m += 1 # Everytime a collision occurs 1 is added to my collision counter
distance = np.sqrt(x**2 + y**2) # Final distance the photon covers
if np.all(distance) > Sun_radius: # if the distance the photon travels
break # is greater than the Radius of the Sun,
# the for loop stops, meaning the
#photon escaped
print(m)
return x,y,distance
x,y,d = Random_walk()
plt.plot(x,y, '--')
plt.plot(x[-1], y[-1], 'ro')
Any criticisms of my code are welcome, this is for a grade and I do want to learn how to do this correctly, so please tell me if you notice any other errors.

I don't understand the motivation for the formulas you've implemented. I'll explain my own motivation here, but if your instructor told you to do something else, I guess you should listen to them instead.
If I were going to do this, I would generate a sequence of movements of a photon, stopping when distance of the photon to the center of the sun is greater than the solar radius. Each movement is a sample from a distribution which has two components: one for the distance, and one for the direction. I will assume that these are independent (this may be questioned in a more careful simulation).
It seems plausible that the distribution of distance is an exponential distribution with parameter 1/(mean free path). Then the density is p(d) = (1/MFP) exp(-d/MFP). Its cdf is 1 - exp(-d/MFP) and the inverse of the cdf is -MFP log(1 - p) where p = cdf(d). Now you can sample from the distribution of distances: let p = rand(0, 1) where rand = uniform random and plug it into the inverse cdf to get d. This is called the inverse cdf method of sampling; a web search will find more info about it.
As for the direction, you can let angle = rand(0, 2*pi) and then (x, y) = (cos(angle), sin(angle)).
Now you can construct the series of positions. From an initial location, let the new location = previous + d*(x, y). Stop when distance of location to center is greater than radius.
Looks like a great problem! Good luck and have fun. Let me know if you have any questions.

Here is a way of thinking about the problem that you may find helpful. At each moment, the photon has a position (x, y) and a direction (dx, dy). The (dx, dy) variables are coefficients of the unit vector, so sqrt(dx**2 + dy**2) = 1.0. The distance traveled by the photon during one step is path_length * direction.
At each step you do 4 things:
calculate the photon's new position
figure out if the photon has left the sun by computing its distance from the center point
determine, with a single random number, whether or not the photon collides. If it does you randomly generate a new direction.
Append the photon's current position to a list. You might want to do this as a function of distance from the center rather than x,y.
At the end, you plot the list you have built up.
You should also choose a random direction at the very start.
I don't know how you will terminate the loop, for the photon isn't ever guaranteed to leave the sun - just like in the real world. In principle the program might run forever (or until the sun burns out).
There is a slight inaccuracy in that the photon can collide at any instant, not just at the end of one step. But since the steps are small, so is the error introduced by this simplification.
I will point out that you do not need numpy for any of this except perhaps the final plot. The standard Python library has all the math functions you need. Numpy is of course great for manipulating arrays of data, but the only array you will have here is the one you build, a step at a time, of photon position versus time.
As I pointed out in one of my comments, you are modeling the sun as a 2-dimensional object. If you want to do this calculation in three dimensions, you don't need to change this basic approach.

Selecting best range of values from histogram curve

Scenario :
I am trying to track two different colored objects. At the beginning, user is prompted to hold the first colored object (say, may be a RED) at a particular position in front of camera (marked on screen by a rectangle) and press any key, then my program takes that portion of frame (ROI) and analyze the color in it, to find what color to track. Similarly for second object also. Then as usual, use cv.inRange function in HSV color plane and track the object.
What is done :
I took the ROI of object to be tracked, converted it to HSV and checked the Hue histogram. I got two cases as below :
( here there is only one major central peak. But in some cases, I get two such peaks, One a bigger peak with some pixel cluster around it, and second peak, smaller than first one, but significant size with small cluster around it also. I don't have an sample image of it now. But it almost look like below (created in paint))
Question :
How can I get best range of hue values from these histograms?
By best range I mean, may be around 80-90% of the pixels in ROI lie in that range.
Or is there any better method than this to track different colored objects ?

If I understand right, the only thing you need here is to find a maximum in a graph, where the maximum is not necessarily the highest peak, but the area with largest density.
Here's a very simple not too scientific but fast O(n) approach. Run the histogram trough a low pass filter. E.g. a moving average. The length of your average can be let's say 20. In that case the 10th value of your new modified histogram would be:
mh10 = (h1 + h2 + ... + h20) / 20
where h1, h2... are values from your histogram. The next value:
mh11 = (h2 + h3 + ... + h21) / 20
which can be calculated much easier using the previously calculated mh10, by dropping it's first component and adding a new one to the end:
mh11 = mh10 - h1/20 + h21/20
Your only problem is how you handle numbers at the edge of your histogram. You could shrink your moving average's length to the length available, or you could add values before and after what you already have. But either way, you couldn't handle peaks right at the edge.
And finally, when you have this modified histogram, just get the maximum. This works, because now every value in your histogram contains not only himself but it's neighbors as well.
A more sophisticated approach is to weight your average for example with a Gaussian curve. But that's not linear any more. It would be O(k*n), where k is the length of your average which is also the length of the Gaussian.

These spectrum bands used to be judged by eye, how to do it programmatically?

Operators used to examine the spectrum, knowing the location and width of each peak and judge the piece the spectrum belongs to. In the new way, the image is captured by a camera to a screen. And the width of each band must be computed programatically.
Old system: spectroscope -> human eye
New system: spectroscope -> camera -> program
What is a good method to compute the width of each band, given their approximate X-axis positions; given that this task used to be performed perfectly by eye, and must now be performed by program?
Sorry if I am short of details, but they are scarce.
Program listing that generated the previous graph; I hope it is relevant:
import Image
from scipy import *
from scipy.optimize import leastsq
# Load the picture with PIL, process if needed
pic = asarray(Image.open("spectrum.jpg"))
# Average the pixel values along vertical axis
pic_avg = pic.mean(axis=2)
projection = pic_avg.sum(axis=0)
# Set the min value to zero for a nice fit
projection /= projection.mean()
projection -= projection.min()
#print projection
# Fit function, two gaussians, adjust as needed
def fitfunc(p,x):
return p[0]*exp(-(x-p[1])**2/(2.0*p[2]**2)) + \
p[3]*exp(-(x-p[4])**2/(2.0*p[5]**2))
errfunc = lambda p, x, y: fitfunc(p,x)-y
# Use scipy to fit, p0 is inital guess
p0 = array([0,20,1,0,75,10])
X = xrange(len(projection))
p1, success = leastsq(errfunc, p0, args=(X,projection))
Y = fitfunc(p1,X)
# Output the result
print "Mean values at: ", p1[1], p1[4]
# Plot the result
from pylab import *
#subplot(211)
#imshow(pic)
#subplot(223)
#plot(projection)
#subplot(224)
#plot(X,Y,'r',lw=5)
#show()
subplot(311)
imshow(pic)
subplot(312)
plot(projection)
subplot(313)
plot(X,Y,'r',lw=5)
show()

Given an approximate starting point, you could use a simple algorithm that finds a local maxima closest to this point. Your fitting code may be doing that already (I wasn't sure whether you were using it successfully or not).
Here's some code that demonstrates simple peak finding from a user-given starting point:
#!/usr/bin/env python
from __future__ import division
import numpy as np
from matplotlib import pyplot as plt
# Sample data with two peaks: small one at t=0.4, large one at t=0.8
ts = np.arange(0, 1, 0.01)
xs = np.exp(-((ts-0.4)/0.1)**2) + 2*np.exp(-((ts-0.8)/0.1)**2)
# Say we have an approximate starting point of 0.35
start_point = 0.35
# Nearest index in "ts" to this starting point is...
start_index = np.argmin(np.abs(ts - start_point))
# Find the local maxima in our data by looking for a sign change in
# the first difference
# From http://stackoverflow.com/a/9667121/188535
maxes = (np.diff(np.sign(np.diff(xs))) < 0).nonzero()[0] + 1
# Find which of these peaks is closest to our starting point
index_of_peak = maxes[np.argmin(np.abs(maxes - start_index))]
print "Peak centre at: %.3f" % ts[index_of_peak]
# Quick plot showing the results: blue line is data, green dot is
# starting point, red dot is peak location
plt.plot(ts, xs, '-b')
plt.plot(ts[start_index], xs[start_index], 'og')
plt.plot(ts[index_of_peak], xs[index_of_peak], 'or')
plt.show()
This method will only work if the ascent up the peak is perfectly smooth from your starting point. If this needs to be more resilient to noise, I have not used it, but PyDSTool seems like it might help. This SciPy post details how to use it for detecting 1D peaks in a noisy data set.
So assume at this point you've found the centre of the peak. Now for the width: there are several methods you could use, but the easiest is probably the "full width at half maximum" (FWHM). Again, this is simple and therefore fragile. It will break for close double-peaks, or for noisy data.
The FWHM is exactly what its name suggests: you find the width of the peak were it's halfway to the maximum. Here's some code that does that (it just continues on from above):
# FWHM...
half_max = xs[index_of_peak]/2
# This finds where in the data we cross over the halfway point to our peak. Note
# that this is global, so we need an extra step to refine these results to find
# the closest crossovers to our peak.
# Same sign-change-in-first-diff technique as above
hm_left_indices = (np.diff(np.sign(np.diff(np.abs(xs[:index_of_peak] - half_max)))) > 0).nonzero()[0] + 1
# Add "index_of_peak" to result because we cut off the left side of the data!
hm_right_indices = (np.diff(np.sign(np.diff(np.abs(xs[index_of_peak:] - half_max)))) > 0).nonzero()[0] + 1 + index_of_peak
# Find closest half-max index to peak
hm_left_index = hm_left_indices[np.argmin(np.abs(hm_left_indices - index_of_peak))]
hm_right_index = hm_right_indices[np.argmin(np.abs(hm_right_indices - index_of_peak))]
# And the width is...
fwhm = ts[hm_right_index] - ts[hm_left_index]
print "Width: %.3f" % fwhm
# Plot to illustrate FWHM: blue line is data, red circle is peak, red line
# shows FWHM
plt.plot(ts, xs, '-b')
plt.plot(ts[index_of_peak], xs[index_of_peak], 'or')
plt.plot(
[ts[hm_left_index], ts[hm_right_index]],
[xs[hm_left_index], xs[hm_right_index]], '-r')
plt.show()
It doesn't have to be the full width at half maximum — as one commenter points out, you can try to figure out where your operators' normal threshold for peak detection is, and turn that into an algorithm for this step of the process.
A more robust way might be to fit a Gaussian curve (or your own model) to a subset of the data centred around the peak — say, from a local minima on one side to a local minima on the other — and use one of the parameters of that curve (eg. sigma) to calculate the width.
I realise this is a lot of code, but I've deliberately avoided factoring out the index-finding functions to "show my working" a bit more, and of course the plotting functions are there just to demonstrate.
Hopefully this gives you at least a good starting point to come up with something more suitable to your particular set.

Late to the party, but for anyone coming across this question in the future...
Eye movement data looks very similar to this; I'd base an approach off that used by Nystrom + Holmqvist, 2010. Smooth the data using a Savitsky-Golay filter (scipy.signal.savgol_filter in scipy v0.14+) to get rid of some of the low-level noise while keeping the large peaks intact - the authors recommend using an order of 2 and a window size of about twice the width of the smallest peak you want to be able to detect. You can find where the bands are by arbitrarily removing all values above a certain y value (set them to numpy.nan). Then take the (nan)mean and (nan)standard deviation of the remainder, and remove all values greater than the mean + [parameter]*std (I think they use 6 in the paper). Iterate until you're not removing any data points - but depending on your data, certain values of [parameter] may not stabilise. Then use numpy.isnan() to find events vs non-events, and numpy.diff() to find the start and end of each event (values of -1 and 1 respectively). To get even more accurate start and end points, you can scan along the data backward from each start and forward from each end to find the nearest local minimum which has value smaller than mean + [another parameter]*std (I think they use 3 in the paper). Then you just need to count the data points between each start and end.
This won't work for that double peak; you'd have to do some extrapolation for that.

The best method might be to statistically compare a bunch of methods with human results.
You would take a large variety data and a large variety of measurement estimates (widths at various thresholds, area above various thresholds, different threshold selection methods, 2nd moments, polynomial curve fits of various degrees, pattern matching, and etc.) and compare these estimates to human measurements of the same data set. Pick the estimate method that correlates best with expert human results. Or maybe pick several methods, the best one for each of various heights, for various separations from other peaks, and etc.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.