matplotlib argrelmax doesn't find all maxes - python

I have a project where I'm sampling analog data and attempting to analyze with matplotlib. Currently, my analog data source is a potentiometer hooked up to a microcontroller, but that's not really relevant to the issue. Here's my code
arrayFront = RunningMean(array(dataFront), 15)
arrayRear = RunningMean(array(dataRear), 15)
x = linspace(0, len(arrayFront), len(arrayFront)) # Generate x axis
y = linspace(0, len(arrayRear), len(arrayRear)) # Generate x axis
min_vals_front = scipy.signal.argrelmin(arrayFront, order=2)[0] # Min
min_vals_rear = scipy.signal.argrelmin(arrayRear, order=2)[0] # Min
max_vals_front = scipy.signal.argrelmax(arrayFront, order=2)[0] # Max
max_vals_rear = scipy.signal.argrelmax(arrayRear, order=2)[0] # Max
maxvalfront = max(arrayFront[max_vals_front])
maxvalrear = max(arrayRear[max_vals_rear])
minvalfront = min(arrayFront[min_vals_front])
minvalrear = min(arrayRear[min_vals_rear])
plot(x, arrayFront, label="Front Pressures")
plot(y, arrayRear, label="Rear Pressures")
plot(x[min_vals_front], arrayFront[min_vals_front], "x")
plot(x[max_vals_front], arrayFront[max_vals_front], "o")
plot(y[min_vals_rear], arrayRear[min_vals_rear], "x")
plot(y[max_vals_rear], arrayRear[max_vals_rear], "o")
xlim(-25, len(arrayFront) + 25)
ylim(-1000, 7000)
legend(loc='upper left')
show()
dataFront and dataRear are python lists that hold the sampled data from 2 potentiometers. RunningMean is a function that calls:
convolve(x, ones((N,)) / N, mode='valid')
The problem is that the argrelmax (and min) functions don't always find all the maxes and mins. Sometimes it doesn't find ANY max or mins, and that causes me problems in this block of code
maxvalfront = max(arrayFront[max_vals_front])
maxvalrear = max(arrayRear[max_vals_rear])
minvalfront = min(arrayFront[min_vals_front])
minvalrear = min(arrayRear[min_vals_rear])
because the [min_vals_(blank)] variables are empty. Does anyone have any idea what is happening here, and what I can do to fix the problem? Thanks in advance.
Here's one of graphs of data where not all the maxes and mins are found:

signal.argrelmin is a thin wrapper around signal.argrelextrema with comparator=np.less. np.less(a, b) returns the truth value of a < b element-wise. Notice that np.less requires a to be strictly less than b for it to be True.
Your data has the same minimum value at a lot of neighboring locations. At the local minima, the inequality between local minimum and its neighbors does not satisfy a strictly less than relationship; instead it only satisfies a strictly less than or equal to relationship.
Therefore, to find these extrema use signal.argrelmin with comparator=np.less_equal. For example, using a snippet from your data:
import numpy as np
from scipy import signal
arrayRear = np.array([-624.59309896, -624.59309896, -624.59309896,
-625., -625., -625.,])
print(signal.argrelmin(arrayRear, order=2)[0])
# []
print(signal.argrelextrema(arrayRear, np.less_equal)[0])
# [0 1 3 4 5]
print(signal.argrelextrema(arrayRear, np.less_equal, order=2)[0])
# [0 3 4 5]

Related

Python merge datasets X1(t), X2(t) -> X1(X2)

I have some datasets (lets stay at 2 here) which are dependent on a common variable t, like X1(t) and X2(t). However X1(t) and X2(t) don't have to share the same t values or even have the same amount of datapoints.
For example they could look like:
t1 = [2,6,7,8,10,13,14,16,17]
X1 = [10,10,10,20,20,20,30,30,30]
t2 = [3,4,5,6,8,10,11,14,15,16]
X2 = [95,100,100,105,158,150,142,196,200,204]
I am trying to create a new dataset YNew(XNew) (=X2(X1)) such that both datasets are linked without the shared variable t.
In this case it should look like:
XNew = [10,20,30]
YNew = [100,150,200]
where to every occuring X1-value a corresponding X2-value (a mean value) is assigned.
Is there an easy already known way to achieve this(maybe with pandas)?
My first guess would be to find all t-values for a certain X1-value (in the example case the X1-value 10 would lie in the range 2,...,7) and then look for all X2-values in that range and get their mean value. Then you should be able to assign YNew(XNew).
Thanks for every advice!
Update:
I added a graph, so maybe my intentions are a bit more clear. I want to assign the mean X2-value to the corresponding X1-value in the marked regions (where the same X1-values occur).
graph corresponding to example lists
alright, I just tried to implement what I mentioned and it works as I liked it.
Although I think that some things are still a little clumsy...
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# datasets to treat
t1 = [2,6,7,8,10,13,14,16,17]
X1 = [10,10,10,20,20,20,30,30,30]
t2 = [3,4,5,6,8,10,11,14,15,16]
X2 = [95,100,100,105,158,150,142,196,200,204]
X1Series = pd.Series(X1, index = t1)
X2Series = pd.Series(X2, index = t2)
X1Values = X1Series.drop_duplicates().values #returns all occuring values of X1 without duplicates as array
# lists for results
XNew = []
YNew = []
#find for every occuring value X1 the mean value of X2 in the range of X1
for value in X1Values:
indexpos = X1Series[X1Series == value].index.values
max_t = indexpos[indexpos.argmax()] # get max and min index of the range of X1
min_t =indexpos[indexpos.argmin()]
print("X1 = "+str(value)+" occurs in range from "+str(min_t)+" to "+str(max_t))
slicedX2 = X2Series[(X2Series.index >= min_t) & (X2Series.index <= max_t)] # select range of X2
print("in this range there are following values of X2:")
print(slicedX2)
mean = slicedX2.mean() #calculate mean value of selection and append extracted values
print("with the mean value of: " + str(mean))
XNew.append(value)
YNew.append(mean)
fig = plt.figure()
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
ax1.plot(t1, X1,'ro-',label='X1(t)')
ax1.plot(t2, X2,'bo',label='X2(t)')
ax1.legend(loc=2)
ax1.set_xlabel('t')
ax1.set_ylabel('X1/X2')
ax2.plot(XNew,YNew,'ro-',label='YNew(XNew)')
ax2.legend(loc=2)
ax2.set_xlabel('XNew')
ax2.set_ylabel('YNew')
plt.show()

Estimate formants using LPC in Python

I'm new to signal processing (and numpy, scipy, and matlab for that matter). I'm trying to estimate vowel formants with LPC in Python by adapting this matlab code:
http://www.mathworks.com/help/signal/ug/formant-estimation-with-lpc-coefficients.html
Here is my code so far:
#!/usr/bin/env python
import sys
import numpy
import wave
import math
from scipy.signal import lfilter, hamming
from scikits.talkbox import lpc
"""
Estimate formants using LPC.
"""
def get_formants(file_path):
# Read from file.
spf = wave.open(file_path, 'r') # http://www.linguistics.ucla.edu/people/hayes/103/Charts/VChart/ae.wav
# Get file as numpy array.
x = spf.readframes(-1)
x = numpy.fromstring(x, 'Int16')
# Get Hamming window.
N = len(x)
w = numpy.hamming(N)
# Apply window and high pass filter.
x1 = x * w
x1 = lfilter([1., -0.63], 1, x1)
# Get LPC.
A, e, k = lpc(x1, 8)
# Get roots.
rts = numpy.roots(A)
rts = [r for r in rts if numpy.imag(r) >= 0]
# Get angles.
angz = numpy.arctan2(numpy.imag(rts), numpy.real(rts))
# Get frequencies.
Fs = spf.getframerate()
frqs = sorted(angz * (Fs / (2 * math.pi)))
return frqs
print get_formants(sys.argv[1])
Using this file as input, my script returns this list:
[682.18960189917243, 1886.3054773107765, 3518.8326108511073, 6524.8112723782951]
I didn't even get to the last steps where they filter the frequencies by bandwidth because the frequencies in the list aren't right. According to Praat, I should get something like this (this is the formant listing for the middle of the vowel):
Time_s F1_Hz F2_Hz F3_Hz F4_Hz
0.164969 731.914588 1737.980346 2115.510104 3191.775838
What am I doing wrong?
Thanks very much
UPDATE:
I changed this
x1 = lfilter([1., -0.63], 1, x1)
to
x1 = lfilter([1], [1., 0.63], x1)
as per Warren Weckesser's suggestion and am now getting
[631.44354635609318, 1815.8629524985781, 3421.8288991389031, 6667.5030877036006]
I feel like I'm missing something since F3 is very off.
UPDATE 2:
I realized that the order being passed to scikits.talkbox.lpc was off due to a difference in sampling frequency. Changed it to:
Fs = spf.getframerate()
ncoeff = 2 + Fs / 1000
A, e, k = lpc(x1, ncoeff)
Now I'm getting:
[257.86573127888488, 774.59006835496086, 1769.4624576002402, 2386.7093679399809, 3282.387975973973, 4413.0428174593926, 6060.8150432549655, 6503.3090645887842, 7266.5069407315023]
Much closer to Praat's estimation!
The problem had to do with the order being passed to the lpc function. 2 + fs / 1000 where fs is the sampling frequency is the rule of thumb according to:
http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html
I have not been able to get the results you expect, but I do notice two things which might cause some differences:
Your code uses [1, -0.63] where the MATLAB code from the link you provided has [1 0.63].
Your processing is being applied to the entire x vector at once instead of smaller segments of it (see where the MATLAB code does this: x = mtlb(I0:Iend); ).
Hope that helps.
There are at least two problems:
According to the link, the "pre-emphasis filter is a highpass all-pole (AR(1)) filter". The signs of the coefficients given there are correct: [1, 0.63]. If you use [1, -0.63], you get a lowpass filter.
You have the first two arguments to scipy.signal.lfilter reversed.
So, try changing this:
x1 = lfilter([1., -0.63], 1, x1)
to this:
x1 = lfilter([1.], [1., 0.63], x1)
I haven't tried running your code yet, so I don't know if those are the only problems.

How to generate a fractal graph of a market in python

I wish to generate this in python:
http://classes.yale.edu/fractals/RandFrac/Market/TradingTime/Example1/Example1.html
but I'm incredibly stuck and new to this concept. Does anybody know of a library or gist for this?
Edit:
From what I can understand is that you need to split the fractal in 2 every time. So you have to calculate the y-axis point from the line between the two middle points. Then the two sections need to be formed according to the fractal?
Not 100% sure what you are asking, but as I understood from your comments, you want to generate a realistically looking stock market curve using the recursion described in the link.
As far as I understood the description in the linked page and some of the parent pages, it works like this:
You are given a start and an end point and a number of turning points in the form (t1, v1), (t2, v2), etc., for example start=(0,0), end=(1,1), turns = [(1/4, 1/2), (3/4, 1/4)], where ti and vi are fractions between 0 and 1.
You determine the actual turning points scaled to that interval between start and end and calculate the differences between those points, i.e. how far to go from pi to reach pi+1.
You shuffle those segments to introduce some randomness; when put together, they still cover exactly the same distance, i.e. they connect the original start and end point.
Repeat by recursively calling the function for the different segments between the new points.
Here's some Python code I just put together:
from __future__ import division
from random import shuffle
def make_graph(depth, graph, start, end, turns):
# add points to graph
graph.add(start)
graph.add(end)
if depth > 0:
# unpack input values
fromtime, fromvalue = start
totime, tovalue = end
# calcualte differences between points
diffs = []
last_time, last_val = fromtime, fromvalue
for t, v in turns:
new_time = fromtime + (totime - fromtime) * t
new_val = fromvalue + (tovalue - fromvalue) * v
diffs.append((new_time - last_time, new_val - last_val))
last_time, last_val = new_time, new_val
# add 'brownian motion' by reordering the segments
shuffle(diffs)
# calculate actual intermediate points and recurse
last = start
for segment in diffs:
p = last[0] + segment[0], last[1] + segment[1]
make_graph(depth - 1, graph, last, p, turns)
last = p
make_graph(depth - 1, graph, last, end, turns)
from matplotlib import pyplot
depth = 8
graph = set()
make_graph(depth, graph, (0, 0), (1, 1), [(1/9, 2/3), (5/9, 1/3)])
pyplot.plot(*zip(*sorted(graph)))
pyplot.show()
And here some example output:
I had a similar interest and developed a python3 library to do just what you want.
pip install fractalmarkets
See https://github.com/hyperstripe50/fractal-market-analysis/blob/master/README.md
Using #tobias_k solution and pandas, we can translate and scale the normalized fractal to a time-based one.
import arrow
import pandas as pd
import time
depth = 5
# the "geometry" of fractal
turns = [
(1 / 9, 0.60),
(5 / 9, 0.30),
(8 / 9, 0.70),
]
# select start / end time
t0 = arrow.now().floor("hours")
t1 = t0.shift(days=5)
start = (pd.to_datetime(t0._datetime), 1000)
end = (pd.to_datetime(t1._datetime), 2000)
# create a non-dimensionalized [0,0]x[1,1] Fractal
_start, _end = (0, 0), (1, 1)
graph = set()
make_graph(depth, graph, _start, _end, turns)
# just check graph length
assert len(graph) == (len(turns) + 1) ** depth + 1
# create a pandas dataframe from the normalized Fractal
df = pd.DataFrame(graph)
df.sort_values(0, inplace=True)
df.reset_index(drop=True, inplace=True)
# translate to real coordinates
X = pd.DataFrame(
data=[(start[0].timestamp(), start[1]), (end[0].timestamp(), end[1])]
).T
delta = X[1] - X[0]
Y = df.mul(delta) + X[0]
Y[0] = [*map(lambda x: pd.to_datetime(x, unit="s"), Y[0])]
# now resample and interpolate data according to *grid* size
grid ="min"
Z = Y.set_index(0)
A = Z.resample(grid).mean().interpolate()
# plot both graph to check errors
import matplotlib.pyplot as plt
ax = Z.plot()
A.plot(ax=ax)
plt.show()
showing both graphs:
and zooming to see interpolation and snap-to-grid differences:

Python Joint Distribution of N Variables

So I need to calculate the joint probability distribution for N variables. I have code for two variables, but I am having trouble generalizing it to higher dimensions. I imagine there is some sort of pythonic vectorization that could be helpful, but, right now my code is very C like (and yes I know that is not the right way to write Python). My 2D code is below:
import numpy
import math
feature1 = numpy.array([1.1,2.2,3.0,1.2,5.4,3.4,2.2,6.8,4.5,5.6,1.9,2.8,3.7,4.4,7.3,8.3,8.1,7.0,8.0,6.8,6.2,4.9,5.7,6.3,3.7,2.4,4.5,8.5,9.5,9.9]);
feature2 = numpy.array([11.1,12.8,13.0,11.6,15.2,13.8,11.1,17.8,12.5,15.2,11.6,20.8,14.7,14.4,15.3,18.3,11.4,17.0,16.0,16.8,12.2,14.9,15.7,16.3,13.7,12.4,14.2,18.5,19.8,19.0]);
#===Concatenate All Features===#
numFrames = len(feature1);
allFeatures = numpy.zeros((2,numFrames));
allFeatures[0,:] = feature1;
allFeatures[1,:] = feature2;
#===Create the Array to hold all the Bins===#
numBins = int(0.25*numFrames);
allBins = numpy.zeros((allFeatures.shape[0],numBins+1));
#===Find the maximum and minimum of each feature===#
allRanges = numpy.zeros((allFeatures.shape[0],2));
for f in range(allFeatures.shape[0]):
allRanges[f,0] = numpy.amin(allFeatures[f,:]);
allRanges[f,1] = numpy.amax(allFeatures[f,:]);
#===Create the Array to hold all the individual feature probabilities===#
allIndividualProbs = numpy.zeros((allFeatures.shape[0],numBins));
#===Grab all the Individual Probs and the Bins===#
for f in range(allFeatures.shape[0]):
freqhist, binedges = numpy.histogram(allFeatures[f,:],bins=numBins,range=[allRanges[f,0],allRanges[f,1]],density=False);
allBins[f,:] = binedges;
allIndividualProbs[f,:] = freqhist;
#===Create the joint probability array===#
jointProbs = numpy.zeros((numBins,numBins));
#===Compute the joint probability distribution===#
numElements = 0;
for b1 in range(numBins):
for b2 in range(numBins):
for f1 in range(numFrames):
for f2 in range(numFrames):
if ( ( (feature1[f1] >= allBins[0,b1]) and (feature1[f1] <= allBins[0,b1+1]) ) and ((feature2[f2] >= allBins[1,b2]) and (feature2[f2] <= allBins[1,b2+1])) ):
jointProbs[b1,b2] += 1;
numElements += 1;
jointProbs /= numElements;
#===But what if I add the following===#
feature3 = numpy.array([21.1,21.8,23.5,27.6,25.2,23.8,22.1,22.8,26.5,25.2,28.6,20.8,24.7,24.4,29.3,28.3,27.4,26.0,26.2,26.1,25.9,24.0,22.7,22.3,23.7,26.4,24.2,28.5,29.8,29.0]);
How can I generalize the large loop? For N variables (features) this loop would be enormous. Is there a Pythonic way to do this easily?
Check out the function numpy.histogramdd. This function can compute histograms in arbitrary numbers of dimensions. If you set the parameter normed=True, it returns the bin count divided by the bin hypervolume. If you'd prefer something more like a probability mass function (where everything sums to 1), just normalize it yourself. All together, you'll have something like:
import numpy as np
numBins = 10 # number of bins in each dimension
data = np.random.randn(100000, 3) # generate 100000 3-d random data points
jointProbs, edges = np.histogramdd(data, bins=numBins)
jointProbs /= jointProbs.sum()

Speed up Matplotlib?

I've read here that matplotlib is good at handling large data sets. I'm writing a data processing application and have embedded matplotlib plots into wx and have found matplotlib to be TERRIBLE at handling large amounts of data, both in terms of speed and in terms of memory. Does anyone know a way to speed up (reduce memory footprint of) matplotlib other than downsampling your inputs?
To illustrate how bad matplotlib is with memory consider this code:
import pylab
import numpy
a = numpy.arange(int(1e7)) # only 10,000,000 32-bit integers (~40 Mb in memory)
# watch your system memory now...
pylab.plot(a) # this uses over 230 ADDITIONAL Mb of memory
Downsampling is a good solution here -- plotting 10M points consumes a bunch of memory and time in matplotlib. If you know how much memory is acceptable, then you can downsample based on that amount. For example, let's say 1M points takes 23 additional MB of memory and you find it to be acceptable in terms of space and time, therefore you should downsample so that it's always below the 1M points:
if(len(a) > 1M):
a = scipy.signal.decimate(a, int(len(a)/1M)+1)
pylab.plot(a)
Or something like the above snippet (the above may downsample too aggressively for your taste.)
I'm often interested in the extreme values too so, before plotting large chunks of data, I proceed in this way:
import numpy as np
s = np.random.normal(size=(1e7,))
decimation_factor = 10
s = np.max(s.reshape(-1,decimation_factor),axis=1)
# To check the final size
s.shape
Of course np.max is just an example of extreme calculation function.
P.S.
With numpy "strides tricks" it should be possible to avoid copying data around during reshape.
I was interested in preserving one side of a log sampled plot so I came up with this:
(downsample being my first attempt)
def downsample(x, y, target_length=1000, preserve_ends=0):
assert len(x.shape) == 1
assert len(y.shape) == 1
data = np.vstack((x, y))
if preserve_ends > 0:
l, data, r = np.split(data, (preserve_ends, -preserve_ends), axis=1)
interval = int(data.shape[1] / target_length) + 1
data = data[:, ::interval]
if preserve_ends > 0:
data = np.concatenate([l, data, r], axis=1)
return data[0, :], data[1, :]
def geom_ind(stop, num=50):
geo_num = num
ind = np.geomspace(1, stop, dtype=int, num=geo_num)
while len(set(ind)) < num - 1:
geo_num += 1
ind = np.geomspace(1, stop, dtype=int, num=geo_num)
return np.sort(list(set(ind) | {0}))
def log_downsample(x, y, target_length=1000, flip=False):
assert len(x.shape) == 1
assert len(y.shape) == 1
data = np.vstack((x, y))
if flip:
data = np.fliplr(data)
data = data[:, geom_ind(data.shape[1], num=target_length)]
if flip:
data = np.fliplr(data)
return data[0, :], data[1, :]
which allowed me to better preserve one side of plot:
newx, newy = downsample(x, y, target_length=1000, preserve_ends=50)
newlogx, newlogy = log_downsample(x, y, target_length=1000)
f = plt.figure()
plt.gca().set_yscale("log")
plt.step(x, y, label="original")
plt.step(newx, newy, label="downsample")
plt.step(newlogx, newlogy, label="log_downsample")
plt.legend()

Categories