time-series segmentation in python

time-series segmentation in python - python

I am trying to segment the time-series data as shown in the figure. I have lots of data from the sensors, any of these data can have different number of isolated peaks region. In this figure, I have 3 of those. I would like to have a function that takes the time-series as the input and returns the segmented sections of equal length.
My initial thought was to have a sliding window that calculates the relative change in the amplitude. Since the window with the peaks will have relatively higher changes, I could just define certain threshold for the relative change that would help me take the window with isolated peaks. However, this will create problem when choosing the threshold as the relative change is very sensitive to the noises in the data.
Any suggestions?

To do this you need to find signal out of noise.
get mean value of you signal and add some multiplayer that place borders on top and on bottom of noise - green dashed line
find peak values below bottom of noise -> array 2 groups of data
find peak values on top of noise -> array 2 groups of data
get min index of bottom first peak and max index of top of first peak to find first peak range
get min index of top second peak and max index of bottom of second peak to find second peak range
Some description in code. With this method you can find other peaks.
One thing that you need to input by hand is to tell program thex value between peaks for splitting data into parts.
See graphic for summary.
import numpy as np
from matplotlib import pyplot as plt
# create noise data
def function(x, noise):
y = np.sin(7*x+2) + noise
return y
def function2(x, noise):
y = np.sin(6*x+2) + noise
return y
noise = np.random.uniform(low=-0.3, high=0.3, size=(100,))
x_line0 = np.linspace(1.95,2.85,100)
y_line0 = function(x_line0, noise)
x_line = np.linspace(0, 1.95, 100)
x_line2 = np.linspace(2.85, 3.95, 100)
x_pik = np.linspace(3.95, 5, 100)
y_pik = function2(x_pik, noise)
x_line3 = np.linspace(5, 6, 100)
# concatenate noise data
x = np.linspace(0, 6, 500)
y = np.concatenate((noise, y_line0, noise, y_pik, noise), axis=0)
# plot data
noise_band = 1.1
top_noise = y.mean()+noise_band*np.amax(noise)
bottom_noise = y.mean()-noise_band*np.amax(noise)
fig, ax = plt.subplots()
ax.axhline(y=y.mean(), color='red', linestyle='--')
ax.axhline(y=top_noise, linestyle='--', color='green')
ax.axhline(y=bottom_noise, linestyle='--', color='green')
ax.plot(x, y)
# split data into 2 signals
def split(arr, cond):
return [arr[cond], arr[~cond]]
# find bottom noise data indexes
botom_data_indexes = np.argwhere(y < bottom_noise)
# split by visual x value
splitted_bottom_data = split(botom_data_indexes, botom_data_indexes < np.argmax(x > 3))
# find top noise data indexes
top_data_indexes = np.argwhere(y > top_noise)
# split by visual x value
splitted_top_data = split(top_data_indexes, top_data_indexes < np.argmax(x > 3))
# get first signal range
first_signal_start = np.amin(splitted_bottom_data[0])
first_signal_end = np.amax(splitted_top_data[0])
# get x index of first signal
x_first_signal = np.take(x, [first_signal_start, first_signal_end])
ax.axvline(x=x_first_signal[0], color='orange')
ax.axvline(x=x_first_signal[1], color='orange')
# get second signal range
second_signal_start = np.amin(splitted_top_data[1])
second_signal_end = np.amax(splitted_bottom_data[1])
# get x index of first signal
x_second_signal = np.take(x, [second_signal_start, second_signal_end])
ax.axvline(x=x_second_signal[0], color='orange')
ax.axvline(x=x_second_signal[1], color='orange')
plt.show()
Output:
red line = mean value of all data
green line - top and bottom noise borders
orange line - selected peak data

1, It depends on how you want to define a "region", but looks like you just have feeling instead of strict definition. If you have a very clear definition of what kind of piece you want to cut out, you can try some method like "matched filter"
2, You might want to detect the peak of absolute magnitude. If not working, try peak of absolute magnitude of first-order difference, even 2nd-order.
3, it is hard to work on the noisy data like this. My suggestion is to do filtering before you pick up sections (on unfiltered data). Filtering will give you smooth peaks so that the position of peaks can be detected by the change of derivative sign. For filtering, try just "low-pass filter" first. If it doesn't work, I also suggest "Hilbert–Huang transform".
*, Looks like you are using matlab. The methods mentioned are all included in matlab.

Related

How can I cut a piece away from a plot and set the point I need to zero?

In my work I have the task to read in a CSV file and do calculations with it. The CSV file consists of 9 different columns and about 150 lines with different values acquired from sensors. First the horizontal acceleration was determined, from which the distance was derived by double integration. This represents the lower plot of the two plots in the picture. The upper plot represents the so-called force data. The orange graph shows the plot over the 9th column of the CSV file and the blue graph shows the plot over the 7th column of the CSV file.
As you can see I have drawn two vertical lines in the lower plot in the picture. These lines represent the x-value, which in the upper plot is the global minimum of the orange function and the intersection with the blue function. Now I want to do the following, but I need some help: While I want the intersection point between the first vertical line and the graph to be (0,0), i.e. the function has to be moved down. How do I achieve this? Furthermore, the piece of the function before this first intersection point (shown in purple) should be omitted, so that the function really only starts at this point. How can I do this?
In the following picture I try to demonstrate how I would like to do that:
If you need my code, here you can see it:
import numpy as np
import matplotlib.pyplot as plt
import math as m
import loaddataa as ld
import scipy.integrate as inte
from scipy.signal import find_peaks
import pandas as pd
import os
# Loading of the values
print(os.path.realpath(__file__))
a,b = os.path.split(os.path.realpath(__file__))
print(os.chdir(a))
print(os.chdir('..'))
print(os.chdir('..'))
path=os.getcwd()
path=path+"\\Data\\1 Fabienne\\Test1\\left foot\\50cm"
print(path)
dataListStride = ld.loadData(path)
indexStrideData = 0
strideData = dataListStride[indexStrideData]
#%%Calculation of the horizontal acceleration
def horizontal(yAngle, yAcceleration, xAcceleration):
a = ((m.cos(m.radians(yAngle)))*yAcceleration)-((m.sin(m.radians(yAngle)))*xAcceleration)
return a
resultsHorizontal = list()
for i in range (len(strideData)):
strideData_yAngle = strideData.to_numpy()[i, 2]
strideData_xAcceleration = strideData.to_numpy()[i, 4]
strideData_yAcceleration = strideData.to_numpy()[i, 5]
resultsHorizontal.append(horizontal(strideData_yAngle, strideData_yAcceleration, strideData_xAcceleration))
resultsHorizontal.insert(0, 0)
#plt.plot(x_values, resultsHorizontal)
#%%
#x-axis "convert" into time: 100 Hertz makes 0.01 seconds
scale_factor = 0.01
x_values = np.arange(len(resultsHorizontal)) * scale_factor
#Calculation of the global high and low points
heel_one=pd.Series(strideData.iloc[:,7])
plt.scatter(heel_one.idxmax()*scale_factor,heel_one.max(), color='red')
plt.scatter(heel_one.idxmin()*scale_factor,heel_one.min(), color='blue')
heel_two=pd.Series(strideData.iloc[:,9])
plt.scatter(heel_two.idxmax()*scale_factor,heel_two.max(), color='orange')
plt.scatter(heel_two.idxmin()*scale_factor,heel_two.min(), color='green')#!
#Plot of force data
plt.plot(x_values[:-1],strideData.iloc[:,7]) #force heel
plt.plot(x_values[:-1],strideData.iloc[:,9]) #force toe
# while - loop to calculate the point of intersection with the blue function
i = heel_one.idxmax()
while strideData.iloc[i,7] > strideData.iloc[i,9]:
i = i-1
# Length calculation between global minimum orange function and intersection with blue function
laenge=(i-heel_two.idxmin())*scale_factor
print(laenge)
#%% Integration of horizontal acceleration
velocity = inte.cumtrapz(resultsHorizontal,x_values)
plt.plot(x_values[:-1], velocity)
#%% Integration of the velocity
s = inte.cumtrapz(velocity, x_values[:-1])
plt.plot(x_values[:-2],s)
I hope it's clear what I want to do. Thanks for helping me!

I didn't dig all the way through your code, but the following tricks may be useful.
Say you have x and y values:
x = np.linspace(0,3,100)
y = x**2
Now, you only want the values corresponding to, say, .5 < x < 1.5. First, create a boolean mask for the arrays as follows:
mask = np.logical_and(.5 < x, x < 1.5)
(If this seems magical, then run x < 1.5 in your interpreter and observe the results).
Then use this mask to select your desired x and y values:
x_masked = x[mask]
y_masked = y[mask]
Then, you can translate all these values so that the first x,y pair is at the origin:
x_translated = x_masked - x_masked[0]
y_translated = y_masked - y_masked[0]
Is this the type of thing you were looking for?

How to do calculation on zoomed plot area

I have a time series plot along with a scatter plot on top to indicate some points of the series with certain characteristics. On jupyter notebook I am using %matplotlib notebook to get interaction plot and zoom.
Is it possible to calculate all points
EDIT:
The following code is a dummy example of ploting radnom data and marking with red dots those point where their value is above a certain threshold.
%matplotlib notebook
# generate random data [0, 10]
random_data = np.random.randint(10, size = 20)
# implement rule --> i.e. check which data point is > 3
index = np.where([random_data > 3])[1]
value = np.where([random_data > 3])[0]
# plot data and mark data point where rule applies
plt.plot(random_data)
plt.scatter(index, random_data[index], c = 'r')
This generates the plot below.
Is it possible to to get a result that calculates the red dots every time i zoom in the plot

So after a lot of search I came up with the following solution.
%matplotlib notebook
# generate random data [0, 10]
random_data = np.random.randint(10, size = 20)
# implement rule --> i.e. check which data point is > 3
index = np.where([random_data > 3])[1]
value = np.where([random_data > 3])[0]
# plot data and mark data point where rule applies
fig, ax = plt.subplots(1,1)
ax.plot(random_data)
ax.scatter(index, random_data[index], c = 'r')
global scatter_index
scatter_data = index
def on_xlims_change(axes):
d1, d2 = axes.get_xlim()
number_of_points = index[np.where((index > d1 )& (index < d2))].shape[0]
axes.legend([f'{ number_of_points } numbers of points in area' ])
# use a maplotlib callback to do the calculation
ax.callbacks.connect('xlim_changed', on_xlims_change)
The idea is that you can use a callback to get the new axis limits and filter data based on those limits. Hope

How to limit the length of a line on the plot showing the linear regression?

I am trying to create a graph out of Excel data (2 columns, one for axis x, and other for axis y), and to show median, mean, and trend line (linear regression).
The problem is with the last component. Median and mean are shown as lines extending from the highest dot until the lowest dot on the scatter plot, however, trendline's length is absolutely random.
Depending on the 2nd column values, sometimes it is short and almost horizontal, but for one set of data it started somewhere in the middle of a plot, and went down below the lowest dot on the graph extending visually the plot and it looks bad.
My question is: How to limit the length of a line on the plot showing the linear regression?
Here is a screenshot of before and after:
And after adding: np.clip
I have nicely cut the lower part of the line, but instead of totally limit it below a certain point, I just have limited it's y-values, and it turned into horizontal line at this y-value. This was done simply by limiting value of a function showing y-values of the linear regression trend line, but I don't know how to do it for x values as well.
colors = np.where(x<reasonablemin,'k',np.where(x>reasonablemax,'k','y'))
plt.title(plottitle)
ax = plt.axes()
plt.gca().invert_yaxis()
ax.scatter(x, y, c=colors)
finalx = [x for x in x if ((x < reasonablemax) & (x > reasonablemin))]
mask = (x[1:-1] > reasonablemax)
x[1:-1][mask] = np.nan
mask = (x[1:-1] < reasonablemin)
x[1:-1][mask] = np.nan
clearedagain = cleared.dropna()
print(clearedagain)
x = cleared[parameter]
y = cleared['Depth']
xcleared = clearedagain[parameter]
ycleared = clearedagain['Depth']
x = x.values.reshape(len(x), 1)
y = y.values.reshape(len(y), 1)
xcleared = xcleared.values.reshape(len(xcleared), 1)
ycleared = ycleared.values.reshape(len(ycleared), 1)
model = LinearRegression()
model.fit(xcleared, ycleared)
x_linearregression = np.linspace(0, reasonablemax)
y_linearregression = model.predict(x_linearregression[:, np.newaxis])
print(y_linearregression)
minimum = min(ycleared)
maximum = max(ycleared)
np.clip(y_linearregression, minimum, maximum, out=y_linearregression)
print(y_linearregression)
linear_regression_line = ax.plot(x_linearregression, y_linearregression,
label='Trendline', linestyle='dotted')
plt.ylim(max(ycleared)+1,min(ycleared-1))
ax.set_xlabel(xlabel)
ax.set_ylabel(ylabel)
ax.axis('tight')
plt.show()
Just to cut that line at the level of the lowest dot. And also the highest if the data set was the other way around.

Recorded audio of one note produces multiple onset times

I am using the Librosa library for pitch and onset detection. Specifically, I am using onset_detect and piptrack.
This is my code:
def detect_pitch(y, sr, onset_offset=5, fmin=75, fmax=1400):
y = highpass_filter(y, sr)
onset_frames = librosa.onset.onset_detect(y=y, sr=sr)
pitches, magnitudes = librosa.piptrack(y=y, sr=sr, fmin=fmin, fmax=fmax)
notes = []
for i in range(0, len(onset_frames)):
onset = onset_frames[i] + onset_offset
index = magnitudes[:, onset].argmax()
pitch = pitches[index, onset]
if (pitch != 0):
notes.append(librosa.hz_to_note(pitch))
return notes
def highpass_filter(y, sr):
filter_stop_freq = 70 # Hz
filter_pass_freq = 100 # Hz
filter_order = 1001
# High-pass filter
nyquist_rate = sr / 2.
desired = (0, 0, 1, 1)
bands = (0, filter_stop_freq, filter_pass_freq, nyquist_rate)
filter_coefs = signal.firls(filter_order, bands, desired, nyq=nyquist_rate)
# Apply high-pass filter
filtered_audio = signal.filtfilt(filter_coefs, [1], y)
return filtered_audio
When running this on guitar audio samples recorded in a studio, therefore samples without noise (like this), I get very good results in both functions. The onset times are correct and the frequencies are almost always correct (with some octave errors sometimes).
However, a big problem arises when I try to record my own guitar sounds with my cheap microphone. I get audio files with noise, such as this. The onset_detect algorithm gets confused and thinks that noise contains onset times. Therefore, I get very bad results. I get many onset times even if my audio file consists of one note.
Here are two waveforms. The first is of a guitar sample of a B3 note recorded in a studio, whereas the second is my recording of an E2 note.
The result of the first is correctly B3 (the one onset time was detected).
The result of the second is an array of 7 elements, which means that 7 onset times were detected, instead of 1! One of those elements is the correct onset time, other elements are just random peaks in the noise part.
Another example is this audio file containing the notes B3, C4, D4, E4:
As you can see, the noise is clear and my high-pass filter has not helped (this is the waveform after applying the filter).
I assume this is a matter of noise, as the difference between those files lies there. If yes, what could I do to reduce it? I have tried using a high-pass filter but there is no change.

I have three observations to share.
First, after a bit of playing around, I've concluded that the onset detection algorithm appears as if it's probably probably been designed to automatically rescale its own operation in order to take into account local background noise at any given instant. This is likely in order so that it can detect onset times in pianissimo sections with equal likelihood as it would in fortissimo sections. This has the unfortunate result that the algorithm tends to trigger on background noise coming from your cheap microphone--the onset detection algorithm honestly thinks it's simply listening to pianissimo music.
A second observation is that roughly the first ~2200 samples in your recorded example (roughly the first 0.1 seconds) are a bit wonky, in the sense that the noise truly is nearly zero during that short initial interval. Try zooming way into the waveform at the starting point and you'll see what I mean. Unfortunately, the start of the guitar playing follows so quickly after the noise onset (roughly around sample 3000) that the algorithm is unable to resolve the two independently--instead it simply merges the two into a single onset event that begins about 0.1 seconds too early. I therefore cut out roughly the first 2240 samples in order to "normalize" the file (I don't think this is cheating though; it's an edge effect that would likely disappear if you had simply recorded a second or so of initial silence prior to plucking the first string, as one would normally do).
My third observation is that frequency-based filtering only works if the noise and the music are actually in somewhat different frequency bands. That may be true in this case, however I don't think you've demonstrated it yet. Therefore, instead of frequency-based filtering, I elected to try a different approach: thresholding. I used the final 3 seconds of your recording, where there is no guitar playing, in order to estimate the typical background noise level throughout the recording, in units of RMS energy, and then I used that median value to set a minimum energy threshold which was calculated to lie safely above the median. Only onset events returned by the detector occurring at times when the RMS energy is above the threshold are accepted as "valid".
An example script is shown below:
import librosa
import numpy as np
import matplotlib.pyplot as plt
# I played around with this but ultimately kept the default value
hoplen=512
y, sr = librosa.core.load("./Vocaroo_s07Dx8dWGAR0.mp3")
# Note that the first ~2240 samples (0.1 seconds) are anomalously low noise,
# so cut out this section from processing
start = 2240
y = y[start:]
idx = np.arange(len(y))
# Calcualte the onset frames in the usual way
onset_frames = librosa.onset.onset_detect(y=y, sr=sr, hop_length=hoplen)
onstm = librosa.frames_to_time(onset_frames, sr=sr, hop_length=hoplen)
# Calculate RMS energy per frame. I shortened the frame length from the
# default value in order to avoid ending up with too much smoothing
rmse = librosa.feature.rmse(y=y, frame_length=512, hop_length=hoplen)[0,]
envtm = librosa.frames_to_time(np.arange(len(rmse)), sr=sr, hop_length=hoplen)
# Use final 3 seconds of recording in order to estimate median noise level
# and typical variation
noiseidx = [envtm > envtm[-1] - 3.0]
noisemedian = np.percentile(rmse[noiseidx], 50)
sigma = np.percentile(rmse[noiseidx], 84.1) - noisemedian
# Set the minimum RMS energy threshold that is needed in order to declare
# an "onset" event to be equal to 5 sigma above the median
threshold = noisemedian + 5*sigma
threshidx = [rmse > threshold]
# Choose the corrected onset times as only those which meet the RMS energy
# minimum threshold requirement
correctedonstm = onstm[[tm in envtm[threshidx] for tm in onstm]]
# Print both in units of actual time (seconds) and sample ID number
print(correctedonstm+start/sr)
print(correctedonstm*sr+start)
fg = plt.figure(figsize=[12, 8])
# Print the waveform together with onset times superimposed in red
ax1 = fg.add_subplot(2,1,1)
ax1.plot(idx+start, y)
for ii in correctedonstm*sr+start:
ax1.axvline(ii, color='r')
ax1.set_ylabel('Amplitude', fontsize=16)
# Print the RMSE together with onset times superimposed in red
ax2 = fg.add_subplot(2,1,2, sharex=ax1)
ax2.plot(envtm*sr+start, rmse)
for ii in correctedonstm*sr+start:
ax2.axvline(ii, color='r')
# Plot threshold value superimposed as a black dotted line
ax2.axhline(threshold, linestyle=':', color='k')
ax2.set_ylabel("RMSE", fontsize=16)
ax2.set_xlabel("Sample Number", fontsize=16)
fg.show()
Printed output looks like:
In [1]: %run rosatest
[ 0.17124717 1.88952381 3.74712018 5.62793651]
[ 3776. 41664. 82624. 124096.]
and the plot that it produces is shown below:

Did you test to normalize the sound sample before treatment ?
When reading onset_detect documentation we can see that there is a lot of optionals arguments, have you already try to use some ?
Maybe one of this optionals arguments may help you to keep only the good one (or at least limit the size of the onset time returned array):
librosa.util.peak_pick (maybe the best)
backtrack
energy
Please see also an update of your code in order to use a pre-computed onset envelope:
def detect_pitch(y, sr, onset_offset=5, fmin=75, fmax=1400):
y = highpass_filter(y, sr)
o_env = librosa.onset.onset_strength(y, sr=sr)
times = librosa.frames_to_time(np.arange(len(o_env)), sr=sr)
onset_frames = librosa.onset.onset_detect(y=o_env, sr=sr)
pitches, magnitudes = librosa.piptrack(y=y, sr=sr, fmin=fmin, fmax=fmax)
notes = []
for i in range(0, len(onset_frames)):
onset = onset_frames[i] + onset_offset
index = magnitudes[:, onset].argmax()
pitch = pitches[index, onset]
if (pitch != 0):
notes.append(librosa.hz_to_note(pitch))
return notes
def highpass_filter(y, sr):
filter_stop_freq = 70 # Hz
filter_pass_freq = 100 # Hz
filter_order = 1001
# High-pass filter
nyquist_rate = sr / 2.
desired = (0, 0, 1, 1)
bands = (0, filter_stop_freq, filter_pass_freq, nyquist_rate)
filter_coefs = signal.firls(filter_order, bands, desired, nyq=nyquist_rate)
# Apply high-pass filter
filtered_audio = signal.filtfilt(filter_coefs, [1], y)
return filtered_audio
does it work better ?

How to remove/omit smaller contour lines using matplotlib

I am trying to plot contour lines of pressure level. I am using a netCDF file which contain the higher resolution data (ranges from 3 km to 27 km). Due to higher resolution data set, I get lot of pressure values which are not required to be plotted (rather I don't mind omitting certain contour line of insignificant values). I have written some plotting script based on the examples given in this link http://matplotlib.org/basemap/users/examples.html.
After plotting the image looks like this
From the image I have encircled the contours which are small and not required to be plotted. Also, I would like to plot all the contour lines smoother as mentioned in the above image. Overall I would like to get the contour image like this:-
Possible solution I think of are
Find out the number of points required for plotting contour and mask/omit those lines if they are small in number.
or
Find the area of the contour (as I want to omit only circled contour) and omit/mask those are smaller.
or
Reduce the resolution (only contour) by increasing the distance to 50 km - 100 km.
I am able to successfully get the points using SO thread Python: find contour lines from matplotlib.pyplot.contour()
But I am not able to implement any of the suggested solution above using those points.
Any solution to implement the above suggested solution is really appreciated.
Edit:-
# Andras Deak
I used print 'diameter is ', diameter line just above del(level.get_paths()[kp]) line to check if the code filters out the required diameter. Here is the filterd messages when I set if diameter < 15000::
diameter is 9099.66295612
diameter is 13264.7838257
diameter is 445.574234531
diameter is 1618.74618114
diameter is 1512.58974168
However the resulting image does not have any effect. All look same as posed image above. I am pretty sure that I have saved the figure (after plotting the wind barbs).
Regarding the solution for reducing the resolution, plt.contour(x[::2,::2],y[::2,::2],mslp[::2,::2]) it works. I have to apply some filter to make the curve smooth.
Full working example code for removing lines:-
Here is the example code for your review
#!/usr/bin/env python
from netCDF4 import Dataset
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
import numpy as np
import scipy.ndimage
from mpl_toolkits.basemap import interp
from mpl_toolkits.basemap import Basemap
# Set default map
west_lon = 68
east_lon = 93
south_lat = 7
north_lat = 23
nc = Dataset('ncfile.nc')
# Get this variable for later calucation
temps = nc.variables['T2']
time = 0 # We will take only first interval for this example
# Draw basemap
m = Basemap(projection='merc', llcrnrlat=south_lat, urcrnrlat=north_lat,
llcrnrlon=west_lon, urcrnrlon=east_lon, resolution='l')
m.drawcoastlines()
m.drawcountries(linewidth=1.0)
# This sets the standard grid point structure at full resolution
x, y = m(nc.variables['XLONG'][0], nc.variables['XLAT'][0])
# Set figure margins
width = 10
height = 8
plt.figure(figsize=(width, height))
plt.rc("figure.subplot", left=.001)
plt.rc("figure.subplot", right=.999)
plt.rc("figure.subplot", bottom=.001)
plt.rc("figure.subplot", top=.999)
plt.figure(figsize=(width, height), frameon=False)
# Convert Surface Pressure to Mean Sea Level Pressure
stemps = temps[time] + 6.5 * nc.variables['HGT'][time] / 1000.
mslp = nc.variables['PSFC'][time] * np.exp(9.81 / (287.0 * stemps) * nc.variables['HGT'][time]) * 0.01 + (
6.7 * nc.variables['HGT'][time] / 1000)
# Contour only at 2 hpa interval
level = []
for i in range(mslp.min(), mslp.max(), 1):
if i % 2 == 0:
if i >= 1006 and i <= 1018:
level.append(i)
# Save mslp values to upload to SO thread
# np.savetxt('mslp.txt', mslp, fmt='%.14f', delimiter=',')
P = plt.contour(x, y, mslp, V=2, colors='b', linewidths=2, levels=level)
# Solution suggested by Andras Deak
for level in P.collections:
for kp,path in enumerate(level.get_paths()):
# include test for "smallness" of your choice here:
# I'm using a simple estimation for the diameter based on the
# x and y diameter...
verts = path.vertices # (N,2)-shape array of contour line coordinates
diameter = np.max(verts.max(axis=0) - verts.min(axis=0))
if diameter < 15000: # threshold to be refined for your actual dimensions!
#print 'diameter is ', diameter
del(level.get_paths()[kp]) # no remove() for Path objects:(
#level.remove() # This does not work. produces ValueError: list.remove(x): x not in list
plt.gcf().canvas.draw()
plt.savefig('dummy', bbox_inches='tight')
plt.close()
After the plot is saved I get the same image
You can see that the lines are not removed yet. Here is the link to mslp array which we are trying to play with http://www.mediafire.com/download/7vi0mxqoe0y6pm9/mslp.txt
If you want x and y data which are being used in the above code, I can upload for your review.
Smooth line
You code to remove the smaller circles working perfectly. However the other question I have asked in the original post (smooth line) does not seems to work. I have used your code to slice the array to get minimal values and contoured it. I have used the following code to reduce the array size:-
slice = 15
CS = plt.contour(x[::slice,::slice],y[::slice,::slice],mslp[::slice,::slice], colors='b', linewidths=1, levels=levels)
The result is below.
After searching for few hours I found this SO thread having simmilar issue:-
Regridding regular netcdf data
But none of the solution provided over there works.The questions similar to mine above does not have proper solutions. If this issue is solved then the code is perfect and complete.

General idea
Your question seems to have 2 very different halves: one about omitting small contours, and another one about smoothing the contour lines. The latter is simpler, since I can't really think of anything else other than decreasing the resolution of your contour() call, just like you said.
As for removing a few contour lines, here's a solution which is based on directly removing contour lines individually. You have to loop over the collections of the object returned by contour(), and for each element check each Path, and delete the ones you don't need. Redrawing the figure's canvas will get rid of the unnecessary lines:
# dummy example based on matplotlib.pyplot.clabel example:
import matplotlib
import numpy as np
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
delta = 0.025
x = np.arange(-3.0, 3.0, delta)
y = np.arange(-2.0, 2.0, delta)
X, Y = np.meshgrid(x, y)
Z1 = mlab.bivariate_normal(X, Y, 1.0, 1.0, 0.0, 0.0)
Z2 = mlab.bivariate_normal(X, Y, 1.5, 0.5, 1, 1)
# difference of Gaussians
Z = 10.0 * (Z2 - Z1)
plt.figure()
CS = plt.contour(X, Y, Z)
for level in CS.collections:
for kp,path in reversed(list(enumerate(level.get_paths()))):
# go in reversed order due to deletions!
# include test for "smallness" of your choice here:
# I'm using a simple estimation for the diameter based on the
# x and y diameter...
verts = path.vertices # (N,2)-shape array of contour line coordinates
diameter = np.max(verts.max(axis=0) - verts.min(axis=0))
if diameter<1: # threshold to be refined for your actual dimensions!
del(level.get_paths()[kp]) # no remove() for Path objects:(
# this might be necessary on interactive sessions: redraw figure
plt.gcf().canvas.draw()
Here's the original(left) and the removed version(right) for a diameter threshold of 1 (note the little piece of the 0 level at the top):
Note that the top little line is removed while the huge cyan one in the middle doesn't, even though both correspond to the same collections element i.e. the same contour level. If we didn't want to allow this, we could've called CS.collections[k].remove(), which would probably be a much safer way of doing the same thing (but it wouldn't allow us to differentiate between multiple lines corresponding to the same contour level).
To show that fiddling around with the cut-off diameter works as expected, here's the result for a threshold of 2:
All in all it seems quite reasonable.
Your actual case
Since you've added your actual data, here's the application to your case. Note that you can directly generate the levels in a single line using np, which will almost give you the same result. The exact same can be achieved in 2 lines (generating an arange, then selecting those that fall between p1 and p2). Also, since you're setting levels in the call to contour, I believe the V=2 part of the function call has no effect.
import numpy as np
import matplotlib.pyplot as plt
# insert actual data here...
Z = np.loadtxt('mslp.txt',delimiter=',')
X,Y = np.meshgrid(np.linspace(0,300000,Z.shape[1]),np.linspace(0,200000,Z.shape[0]))
p1,p2 = 1006,1018
# this is almost the same as the original, although it will produce
# [p1, p1+2, ...] instead of `[Z.min()+n, Z.min()+n+2, ...]`
levels = np.arange(np.maximum(Z.min(),p1),np.minimum(Z.max(),p2),2)
#control
plt.figure()
CS = plt.contour(X, Y, Z, colors='b', linewidths=2, levels=levels)
#modified
plt.figure()
CS = plt.contour(X, Y, Z, colors='b', linewidths=2, levels=levels)
for level in CS.collections:
for kp,path in reversed(list(enumerate(level.get_paths()))):
# go in reversed order due to deletions!
# include test for "smallness" of your choice here:
# I'm using a simple estimation for the diameter based on the
# x and y diameter...
verts = path.vertices # (N,2)-shape array of contour line coordinates
diameter = np.max(verts.max(axis=0) - verts.min(axis=0))
if diameter<15000: # threshold to be refined for your actual dimensions!
del(level.get_paths()[kp]) # no remove() for Path objects:(
# this might be necessary on interactive sessions: redraw figure
plt.gcf().canvas.draw()
plt.show()
Results, original(left) vs new(right):
Smoothing by resampling
I've decided to tackle the smoothing problem as well. All I could come up with is downsampling your original data, then upsampling again using griddata (interpolation). The downsampling part could also be done with interpolation, although the small-scale variation in your input data might make this problem ill-posed. So here's the crude version:
import scipy.interpolate as interp #the new one
# assume you have X,Y,Z,levels defined as before
# start resampling stuff
dN = 10 # use every dN'th element of the gridded input data
my_slice = [slice(None,None,dN),slice(None,None,dN)]
# downsampled data
X2,Y2,Z2 = X[my_slice],Y[my_slice],Z[my_slice]
# same as X2 = X[::dN,::dN] etc.
# upsampling with griddata over original mesh
Zsmooth = interp.griddata(np.array([X2.ravel(),Y2.ravel()]).T,Z2.ravel(),(X,Y),method='cubic')
# plot
plt.figure()
CS = plt.contour(X, Y, Zsmooth, colors='b', linewidths=2, levels=levels)
You can freely play around with the grids used for interpolation, in this case I just used the original mesh, as it was at hand. You can also play around with different kinds of interpolation: the default 'linear' one will be faster, but less smooth.
Result after downsampling(left) and upsampling(right):
Of course you should still apply the small-line-removal algorithm after this resampling business, and keep in mind that this heavily distorts your input data (since if it wasn't distorted, then it wouldn't be smooth). Also, note that due to the crude method used in the downsampling step, we introduce some missing values near the top/right edges of the region under consideraton. If this is a problem, you should consider doing the downsampling based on griddata as I've noted earlier.

This is a pretty bad solution, but it's the only one that I've come up with. Use the get_contour_verts function in this solution you linked to, possibly with the matplotlib._cntr module so that nothing gets plotted initially. That gives you a list of contour lines, sections, vertices, etc. Then you have to go through that list and pop the contours you don't want. You could do this by calculating a minimum diameter, for example; if the max distance between points is less than some cutoff, throw it out.
That leaves you with a list of LineCollection objects. Now if you make a Figure and Axes instance, you can use Axes.add_collection to add all of the LineCollections in the list.
I checked this out really quick, but it seemed to work. I'll come back with a minimum working example if I get a chance. Hope it helps!
Edit: Here's an MWE of the basic idea. I wasn't familiar with plt._cntr.Cntr, so I ended up using plt.contour to get the initial contour object. As a result, you end up making two figures; you just have to close the first one. You can replace checkDiameter with whatever function works. I think you could turn the line segments into a Polygon and calculate areas, but you'd have to figure that out on your own. Let me know if you run into problems with this code, but it at least works for me.
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
def checkDiameter(seg, tol=.3):
# Function for screening line segments. NB: Not actually a proper diameter.
diam = (seg[:,0].max() - seg[:,0].min(),
seg[:,1].max() - seg[:,1].min())
return not (diam[0] < tol or diam[1] < tol)
# Create testing data
x = np.linspace(-1,1, 21)
xx, yy = np.meshgrid(x,x)
z = np.exp(-(xx**2 + .5*yy**2))
# Original plot with plt.contour
fig0, ax0 = plt.subplots()
# Make sure this contour object actually has a tiny contour to remove
cntrObj = ax0.contour(xx,yy,z, levels=[.2,.4,.6,.8,.9,.95,.99,.999])
# Primary loop: Copy contours into a new LineCollection
lineNew = list()
for lineOriginal in cntrObj.collections:
# Get properties of the original LineCollection
segments = lineOriginal.get_segments()
propDict = lineOriginal.properties()
propDict = {key: value for (key,value) in propDict.items()
if key in ['linewidth','color','linestyle']} # Whatever parameters you want to carry over
# Filter out the lines with small diameters
segments = [seg for seg in segments if checkDiameter(seg)]
# Create new LineCollection out of the OK segments
if len(segments) > 0:
lineNew.append(mpl.collections.LineCollection(segments, **propDict))
# Make new plot with only these line collections; display results
fig1, ax1 = plt.subplots()
ax1.set_xlim(ax0.get_xlim())
ax1.set_ylim(ax0.get_ylim())
for line in lineNew:
ax1.add_collection(line)
plt.show()
FYI: The bit with propDict is just to automate bringing over some of the line properties from the original plot. You can't use the whole dictionary at once, though. First, it contains the old plot's line segments, but you can just swap those for the new ones. But second, it appears to contain a number of parameters that are in conflict with each other: multiple linewidths, facecolors, etc. The {key for key in propDict if I want key} workaround is my way to bypass that, but I'm sure someone else can do it more cleanly.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

time-series segmentation in python - python

Related

How can I cut a piece away from a plot and set the point I need to zero?

How to do calculation on zoomed plot area

How to limit the length of a line on the plot showing the linear regression?

Recorded audio of one note produces multiple onset times

How to remove/omit smaller contour lines using matplotlib

Categories

Resources