Convert a sawtooth into a continuous linear function - python

Data from angular encoders is in a sawtooth shape ranging from 0° to 360°. I would now like to create a continuous linear function that describes the total angle.
I would like to go from a sawtooth function that can be created like this (in python with numpy):
x = np.arange(0,1000,2)
y = np.arange(0,1000,2)%360
Plot sawtooth function
Back to the linear (in this case identity) function:
x = np.arange(0,1000,2)
y = np.arange(0,1000,2)
Plot linear function
The data I'm trying to use this on is not generated, it's measurement data from an angular encoder. I do not know the frequency. I know that the function value is in the interval [0,360]. I'm looking for a solution that can also handle a 'negative' sawtooth.

Hi I faced your same issue and I solved it:
This is how my signal looks like:
Sawtooth function of angle
It's an array containing the Angle of a rotating complex number in the range [-pi, pi].
What I wanted is the continuous linear function as you described.
I just thought to compare two consecutive elements of the array containing the angle values and exactly when the difference between them is a multiple of pi, each next value is incremented by such difference.
n=0
for i in range(len(Angle)-1):
if round((Angle[i] - Angle[i+1])/pi) == n+2:
n=n+2
Angle[i+1]=Angle[i+1] + pi*n
This is what I got:
Linear Angle

It looks like you just need to split effective value in two parts, let's call them base and reminder. Total value would be base + reminder.
Then, you analyze changes of input and in case it was high (359 or close) and suddenly became low (0 or close), you add 360 to base. You subtract 360 from base if change happened in other direction. After base recalculation you assign input value to reminder for future reference. And that's all.

Related

Find plateau in Numpy array

I am looking for an efficient way to detect plateaus in otherwise very noisy data. The plateaus are always relatively broad A simple example of what this data could look like:
test=np.random.uniform(0.9,1,100)
test[10:20]=0
plt.plot(test)
Note that there can be multiple plateaus (which should all be detected) which can have different values.
I've tried using scipy.signal.argrelextrema, but it doesn't seem to be doing what I want it to:
peaks=argrelextrema(test,np.less,order=25)
plt.vlines(peaks,ymin=0, ymax=1)
I don't need the exact interval of the plateau- a rough range estimate would be enough, as long as that estimate is bigger or equal than the actual plateau range. It should be relatively efficient however.
There is a method scipy.signal.find_peaks that you can try, here is an exmple
import numpy
from scipy.signal import find_peaks
test = numpy.random.uniform(0.9, 1.0, 100)
test[10 : 20] = 0
peaks, peak_plateaus = find_peaks(- test, plateau_size = 1)
although find_peaks only finds peaks, it can be used to find valleys if the array is negated, then you do the following
for i in range(len(peak_plateaus['plateau_sizes'])):
if peak_plateaus['plateau_sizes'][i] > 1:
print('a plateau of size %d is found' % peak_plateaus['plateau_sizes'][i])
print('its left index is %d and right index is %d' % (peak_plateaus['left_edges'][i], peak_plateaus['right_edges'][i]))
it will print
a plateau of size 10 is found
its left index is 10 and right index is 19
This is really just a "dumb" machine learning task. You'll want to code a custom function to screen for them. You have two key characteristics to a plateau:
They're consecutive occurrences of the same value (or very nearly so).
The first and last points deviate strongly from a forward and backward moving average, respectively. (Try quantifying this based on the standard deviation if you expect additive noise, for geometric noise you'll have to take the magnitude of your signal into account too.)
A simple loop should then be sufficient to calculate a forward moving average, stdev of points in that forward moving average, reverse moving average, and stdev of points in that reverse moving average.
Read until you find a point well outside the regular noise (compare to variance). Start buffering those indices into a list.
Keep reading and buffering indices into that list while they have the same value (or nearly the same, if your plateaus can be a little rough; you'll want to use some tolerance plus the standard deviation of your plateaus, or just some tolerance if you expect them all to behave similarly).
If the variance of the points in your buffer gets too high, it's not a plateau, too rough; throw it out and start scanning again from your current position.
If the last value was very different from the previous (on the order of the change that triggered your code to start buffering indices) and in the opposite direction of the original impulse, cap your buffer here; you've got a plateau there.
Now do whatever you want with the points at those indices. Delete them, replace them with a linear interpolation between the two boundary points, whatever.
I could generate some noise and give you some sample code, but this is really something you're going to have to adapt to your application. (For example, there's a shortcoming in this method that a plateau which captures a point on the middle of the "cliff edge" may leave that point when it removes the rest of the plateau. If that's something you're worried about, you'll have to do a little more exploring after you ID the plateau.) You should be able to do this in a single pass over the data, but it might be wise to get some statistics on the whole set first to intelligently tweak your thresholds.
If you have an exact definition of what constitutes a plateau, you can make this a lot less hand-wavey and ML-looking, but so long as you're trying to identify fuzzy pattern, you're gonna have to take a statistics-based approach.
I had a similar problem, and found a simple heuristic solution shared below. I find plateaus as ranges of constant gradient of the signal. You could change the code to also check that the gradient is (close to) 0.
I apply a moving average (uniform_filter_1d) to filter out noise. Also, I calculate the first and second derivative of the signal numerically, so I'm not sure it matches the requirement of efficiency. But it worked perfectly for my signal and might be a good starting point for others.
def find_plateaus(F, min_length=200, tolerance = 0.75, smoothing=25):
'''
Finds plateaus of signal using second derivative of F.
Parameters
----------
F : Signal.
min_length: Minimum length of plateau.
tolerance: Number between 0 and 1 indicating how tolerant
the requirement of constant slope of the plateau is.
smoothing: Size of uniform filter 1D applied to F and its derivatives.
Returns
-------
plateaus: array of plateau left and right edges pairs
dF: (smoothed) derivative of F
d2F: (smoothed) Second Derivative of F
'''
import numpy as np
from scipy.ndimage.filters import uniform_filter1d
# calculate smooth gradients
smoothF = uniform_filter1d(F, size = smoothing)
dF = uniform_filter1d(np.gradient(smoothF),size = smoothing)
d2F = uniform_filter1d(np.gradient(dF),size = smoothing)
def zero_runs(x):
'''
Helper function for finding sequences of 0s in a signal
https://stackoverflow.com/questions/24885092/finding-the-consecutive-zeros-in-a-numpy-array/24892274#24892274
'''
iszero = np.concatenate(([0], np.equal(x, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
# Find ranges where second derivative is zero
# Values under eps are assumed to be zero.
eps = np.quantile(abs(d2F),tolerance)
smalld2F = (abs(d2F) <= eps)
# Find repititions in the mask "smalld2F" (i.e. ranges where d2F is constantly zero)
p = zero_runs(np.diff(smalld2F))
# np.diff(p) gives the length of each range found.
# only accept plateaus of min_length
plateaus = p[(np.diff(p) > min_length).flatten()]
return (plateaus, dF, d2F)

Having trouble plotting a log-log plot in python

Hey so I'm trying to plot variables like age against its frequency, for a rotating body. I am given the period and period derivative aswell as their associated errors. Since frequency is related to period by:
f = 1/T
where frequency is f and period is T
then,
df = - (1/(T^2)) * dT
where dT and dF are the derivatives of period and frequency
but when it comes to plotting the log of this I can't do it in python as it doesn't accept negative values for a loglog plot.
I've tried a work around of using only absolute values but then I only get half the errors when plotting error bars. Is there a way to make python plot both the negative and positive error bars? The frequency derivative itself is a negative quantity.
Unfortunately, log(x) cannot be negative because log(x) = y <=> 10^y = x.
Is 10^y ever going to be -5?
Unfortunately it is impossible to make 10^y<=0 because as y becomes -infinity, x approaches 1/infinity; x approaches, but never passes 0.
Is it possible to plot log(x), where x is negative?
One simple solution to your problem however, is to take the absolute value of df. By doing this, negative numbers become positive. The only downside is that after you've transformed the data this way, you will need to undo the transformation. If the number was negative (and turned positive due to abs(df)), then you must multiply it by -1 afterwards.
You may need to define your own absolute value function that records any values it needs to make positive:
changeList = []
def absRecordChanges(value):
if value < 0 :
value = value * -1
changeList.append(value)
return value
There are other ways to solve the problem, but they are all centred around transforming your data to meet the conditions of a log tranformation (x > 0), and having the data you changed recorded so you can change it back afterward (before you plot it).
EDIT:
While fiddling around in desmos, I was able to plot log(x) where x is any integer. I used a piecewise function to do this: {x<0:-log(abs(x)),log (x)}.
def piecewiseLog(x)
If x <= 0 :
return -log(abs(x))
else :
return log(x)
As I'm not familiar with matlab syntax, this link has an alternative solution: http://www.mathworks.com/matlabcentral/answers/31566-display-negative-values-on-logarithmic-graph

Creating a fool proof graphing calculator using python - Python 2.7

I am trying to create a fool proof graphing calculator using python and pygame.
I created a graphing calculator that works for most functions. It takes a user string infix expression and converts it to postfix for easier calculations. I then loop through and pass in x values into the postfix expression to get a Y value for graphing using pygame.
The first problem I ran into was when taking calculations of impossible things. (like dividing by zero, square root of -1, 0 ^ non-positive number). If something like this would happen I would output None and that pixel wouldn't be added to the list of points to be graphed.
* I have showed all the different attempts I have made at this to help you understand where I cam coming from. If you would like to only see my most current code and method, jump down to where it says "current".
Method 1
My first method was after I acquired all my pixel values, I would paint them using the pygame aalines function. This worked, except it wouldn't work when there were missing points in between actual points because it would just draw the line across the points. (1/x would not work but something like 0^x would)
This is what 1/x looks like using the aalines method
Method 1.1
My next Idea was to split the line into two lines every time a None was printed back. This worked for 1/x, but I quickly realized that it would only work if one of the passed in X values exactly landed on a Y value of None. 1/x might work, but 1/(x+0.0001) wouldn't work.
Method 2
My next method was to convert the each pixel x value into the corresponding x point value in the window (for example, (0,0) on the graphing window actually would be pixel (249,249) on a 500x500 program window). I would then calculate every y value with the x values I just created. This would work for any line that doesn't have a slope > 1 or < -1.
This is what 1/x would look like using this method.
Current
My most current method is supposed to be a advanced working version of method 2.
Its kind of hard to explain. Basically I would take the x value in between each column on the display window. For every pixel I would do this just to the left and just to the right of it. I would then plug those two values into the expression to get two Y values. I would then loop through each y value on that column and check if the current value is in between both of the Y values calculated earlier.
size is a list of size two that is the dimensions of the program window.
xWin is a list of size two that holds the x Min and x Max of the graphing window.
yWin is a list of size two that holds the y Min and y Max of the graphing window.
pixelToPoint is a function that takes scalar pixel value (just x or just y) and converts it to its corresponding value on the graphing window
pixels = []
for x in range(size[0]):
leftX = pixelToPoint(x,size[0]+1, xWin, False)
rightX = pixelToPoint(x+1, size[0]+1, xWin, False)
leftY = calcPostfix(postfix, leftX)
rightY = calcPostfix(postfix, rightX)
for y in range(size[1]):
if leftY != None and rightY != None:
yPoint = pixelToPoint(y,size[1],yWin, True)
if (rightY <= yPoint <= leftY) or (rightY >= yPoint >= leftY):
pixels.append((x,y))
for p in pixels:
screen.fill(BLACK, (p, (1, 1)))
This fixed the problem in method 2 of having the pixels not connected into a continuous line. However, it wouldn't fix the problem of method 1 and when graphing 1/x, it looked exactly the same as the aalines method.
-------------------------------------------------------------------------------------------------------------------------------
I am stuck and can't think of a solution. The only way I can think of fixing this is by using a whole bunch of x values. But this way seems really inefficient. Also I am trying to make my program as resizable and customizable as possible so everything must be variably driven and I am not sure what type of calculations are needed to find out how many x values are needed to be used depending on the program window size and the graph's window size.
I'm not sure if I am on the right track or if there is a completely different method of doing this, but I want to create my graphing calculator to able to graph any function (just like my actual graphing calculator).
Edit 1
I just tried using as many x values as there are pixels (500x500 display window calculates 250,000 y values).
Its worked for every function I've tried with it, but it is really slow. It takes about 4 seconds to calculate (it fluctuates depending on the equation). I've looked around online and have found graphing calculators that are almost instantaneous in their graphing, but I cant figure out how they do it.
This online graphing calcuator is extremely fast and effective. There must be some algorithm other than using a bunch of x values than can achieve what I want because that site is doing it..
The problem you have is that to be able to know if between two point you can reasonably draw a line you have to know if the function is continuous in the interval.
It is a complex problem in General what you could do is use the following heuristic. If the slope of the line have changed too much from the previous one guess you have a non continuous point in the interval and don't draw a line.
Another solution would be based on solution 2.
After have draw the points that correspond to every value of the x axis try to draw for every adjacent x: (x1, x2) the y within (y1 = f(x1), y2 = f(x2)) that can be reach by an x within (x1, x2).
This can be done by searching by dichotomy or via the Newton search heuristic an x that could fit.

autocorrelation function of time series data with numpy

I have been trying to calculate an autocorrelation function, as defined in statistical mechanics, using numpy. Most of the documentation I found is relative to functions like correlate and convolve. However, for a given random variable x these functions just seem to calculate the sum
ACF(dt) = sum_{t=0}^T [(x(t)*x(t+dt)]
instead of the average
ACF(dt) = mean[x(t)*x(t+dt)]
so in fact for calculating an autocorrelation function one would need to do something like:
acf = np.correlate(x,x,mode='full')
acf_half = acf[acf.size / 2:]
ldata = len(acf)
acf = np.array([x/(ldata-i) for i,x in enumerate(acf_half)])
Of course we would need to subtract mean(x)**2 from the resulting acf to be correct.
Can anyone confirm that this is correct?
Generally speaking, the autocorrelation, correlation, etc. is the sum (integral). Sometimes it is normalized, but not averaged in the sense as you've written above. This is because they are defined in terms of the mathematical convolution operation, which is simply the integral that you've written as a sum above.
The brackets at the stat mech page indicate a thermal average, which is an ensemble or time average over the 'experiment' taking place many times at many different states at some temperature. This (the finite temperature) causes the fluctuations that give rise to the 'statistical' nature of the problem, and cause the decay of the correlation (loss of long range order). This simply means that you should find the autocorrelation of several datasets, and average those together, but do not take the mean of the function.
As far as I can tell, your code is attempting to weigh the correlation at dt by the length of the overlap length dt, but I do not believe that this is correct.
With respect to the subtraction of <s>2, that's in the case of the spin model, where <s> would be the mean spin (magnetization), so I believe you are correct in that you should use mean(x)**2.
As a side-note, I would suggest using mode='same' instead of 'full' so that the domain of your correlation matches the domain of your input without having to look at just one-half of the output (here the output is symmetric, so it doesn't really make a difference).

Performing many means in numpy

Good Morning,
I am implimenting a Cressman filter for doing distance weighted averages in Numpy.. I use a Ball Tree implimentation (thanks to Jake VanderPlas) to return a list of locatations for each point in a request array.. the query array (q) is shape [n,3] and at each point has the x,y,z at point I want to do a weighted average of points stored in the tree.. the code wrapped around the tree returns points within a certain distance so I get an arrays of variable length arrays..
I use a where to find non-empty entries (ie positions where there were at least some points within the radius of influence) creating the isgood array...
I then loop over all query points to return the weighted average of the values self.z (note that this can either be dims=1 or dims=2 to allow multiple co-gridding)
so the thing that complilcates using map or other quicker methods is the nonuniformity of the lengths of the arrays within self.distances and self.locations... I am still fairly green to numpy/python but I can not think of a way to do this array wise (ie not reverting to loops)
self.locations, self.distances = self.tree.query_radius( q, r, return_distance=True)
t2=time()
if debug: print "Removing voids"
isgood=np.where( np.array([len(x) for x in self.locations])!=0)[0]
interpol = np.zeros( (len(self.locations),) + np.shape(self.z[0]) )
interpol.fill(np.nan)
for dist, ix, posn, roi in zip(self.distances[isgood], self.locations[isgood], isgood, r[isgood]):
interpol[isgood[jinterpol]] = np.average(self.z[ix], weights=(roi**2-dist**2) / (roi**2 + dist**2), axis=0)
jinterpol += 1
so... Any hints of how to speed up the loop?..
For a typical mapping as appied to mapping weather radar data from a range,azimuth,elevation grid to a cartesian grid where I have 240x240x34 points and 4 variables takes 99s to query the tree (written by Jake in C and cython.. this is the hard step as you need to search the data!) and 100 seconds to do the calculation... which in my opinon is slow?? where is my overhead? is np.mean efficient or as it is called millions of times is there a speedup to be gained here? would I gain by using float32 rather than the default64... or even scaling to ints (which would be very hard to avoid wrap around in the weighting... any hints gratefully recieved!
You can find a discussion about the relative merits of the Cressman scheme vs using a Gaussian weight function at:
http://www.flame.org/~cdoswell/publications/radar_oa_00.pdf
The key is to match the smoothing parameter to the data (I recommend using a value close to the average spacing between data points). Once you know the smoothing parameter, you can set an "influence radius" equal to the radius where the weight function falls to 0.01 (or whatever).
How important is speed? If you wish, rather than calling an exponential function to determine the weight, you can make up a discrete table of weights for some fixed number of radius increments, which speeds up the calculation considerably. Ideally, you should have data outside the grid boundaries that can be used in the mapping of the values surrounding the gridpoints (even on the boundary points of the grid). Note this is NOT a true interpolation scheme - it won't return the observed values at the data points exactly. Like the Cressman scheme, it's a low-pass filer.

Categories