I have to analyse a PPG signal. I found something to find the peaks but I can't use the values of the heights. They are stored in like a dictionary array or something and I don't know how to extract the values out of it. I tried using dict.values() but that didn't work.
import matplotlib.pyplot as plt
import numpy as np
from scipy.signal import savgol_filter
data = pd.read_excel('test_heartpy.xlsx')
arr = np.array(data)
time = arr[1:,0] # time in s
ECG = arr[1:,1] # ECG
PPG = arr[1:,2] # PPG
filtered = savgol_filter(PPG, 251, 3)
plt.plot(time, filtered)
plt.xlabel('Time (in s)')
plt.ylabel('PPG')
plt.grid('on')
The PPG signal looks like this. To search for the peaks I used:
# searching peaks
from scipy.signal import find_peaks
peaks, heights_peak_0 = find_peaks(PPG, height=0.2)
heights_peak = heights_peak_0.values()
plt.plot(PPG)
plt.plot(peaks, np.asarray(PPG)[peaks], "x")
plt.plot(np.zeros_like(PPG), "--", color="gray")
plt.title("PPG peaks")
plt.show()
print(heights_peak_0)
print(heights_peak)
print(peaks)
Printing:
{'peak_heights': array([0.4822998 , 0.4710083 , 0.43884277, 0.46728516, 0.47094727,
0.44702148, 0.43029785, 0.44146729, 0.43933105, 0.41400146,
0.45318604, 0.44335938])}
dict_values([array([0.4822998 , 0.4710083 , 0.43884277, 0.46728516, 0.47094727,
0.44702148, 0.43029785, 0.44146729, 0.43933105, 0.41400146,
0.45318604, 0.44335938])])
[787 2513 4181 5773 7402 9057 10601 12194 13948 15768 17518 19335]
Signal with highlighted peaks looks like this.
heights_peak_0 is the properties dict returned by scipy.signal.find_peaks
You can find more information about what is returned here
You can extract the array containing all the heights of the peaks with heights_peak_0["peak_heights"]
# the following will give you an array with the values of peaks
heights_peak_0['peak_heights']
# peaks seem to be the indices where find_peaks function foud peaks in the original signal. So you can get the peak values this way also
PPG[peaks]
According to the docs, the find_peaks() functions returns a tuple consisting of the peaks itself and a properties dict. As you are only interested in the peak values, you can simply ignore the second element of the tuple and only use the first one.
Assuming you want to have the 'coordinates' of your peaks you could then combine the peak heights (y-values) with its positions (x-values) like so (based on the first code snippet given in the docs):
import matplotlib.pyplot as plt
from scipy.misc import electrocardiogram
from scipy.signal import find_peaks
x = electrocardiogram()[2000:4000]
peaks, _ = find_peaks(x, distance=150)
peaks_x_values = peaks
peaks_y_values = x[peaks]
peak_coordinates = list(zip(peaks_x_values, peaks_y_values))
print(peak_coordinates)
plt.plot(x)
plt.plot(peaks_x_values, peaks_y_values, "x")
plt.show()
Printing:
[(65, 0.705), (251, 1.155), (431, 1.705), (608, 1.96), (779, 1.925), (956, 2.09), (1125, 1.745), (1292, 1.37), (1456, 1.2), (1614, 0.81), (1776, 0.665), (1948, 0.665)]
Related
I am trying to implement a butterworthfilter with python in jupyter Notebook. I wrote this code by a tutorial.
The data are from a CSV-File, it calls Samples.csv
The data in Samples.csv are like
998,4778415
1009,209592
1006,619094
1001,785406
993,9426543
990,1408991
992,736118
995,8127334
1002,381664
1006,094429
1000,634799
999,3287747
1002,318812
999,3287747
1004,427698
1008,516733
1007,964781
1002,680906
1000,14449
994,257009
The column calls Euclidian Norm. The range of the data are from 0 to 1679.286158 and theyre are 1838 rows.
This is the code in Jupyter:
from scipy.signal import filtfilt
from scipy import stats
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy
def plot():
data=pd.read_csv('Samples.csv',sep=";", decimal=",")
sensor_data=data[['Euclidian Norm']]
sensor_data=np.array(sensor_data)
time=np.linspace(0,1679.286158,1838)
plt.plot(time,sensor_data)
plt.show()
filtered_signal=bandPassFilter(sensor_data)
plt.plot(time,sensor_data)
plt.show()
def bandPassFilter(signal):
fs = 4000.0
lowcut=20.0
highcut=50.0
nyq=0.5*fs
low=lowcut/nyq
high=highcut/nyq
order =2
b,a=scipy.signal.butter(order,[low,high],'bandpass',analog=False)
y=scipy.signal.filtfilt(b,a,signal,axis=0)
return(y)
plot()
My problem is that nothing changes in my data. It doesnt filtered my data. The graph of the filtered data is the same like the source data. Does anyone know what could be wrong?
The first graph is the source data and the second graph is the filtered graph. It looks very similar. Its like the same graph
I can't comment yet.
You're never using filtered_signal and plot with the same arguments twice.
Here`s one of my implementations with added interpolation, very similar to yours:
def butterFit(data, freq, order=2):
ar = scipy.signal.butter(order, freq) # Gets params for filttilt
return spfilter.filtfilt(ar[0], ar[1], data)
def plotFilteredSplines(timeframe, data, amount_points):
# Generate evenly spread indices for the data points.
indices = np.arange(0, len(data), amount_points)
cutoff_freq = 2 / (2/10 * len(timeframe))
# Reshape the data with butter :)
data = butterFit(data, cutoff_freq)
# Plot Fitlered data
plt.plot(timeframe, data, '-.')
interpol_x = np.linspace(timeframe[0], timeframe[-1], 100)
# Get the cubic spline approx function
interpolation = sp.interpolate.interp1d(timeframe, data, kind='cubic')
# Plot the interpolation over the extended time frame.
plt.plot(interpol_x, interpolation(interpol_x), '-r')
Data clip I'm using
I'm trying to bandpass the attached EEG signal, then apply a hilbert transform and take the absolute of the hilbert to get the instantaneous power (e.g., here). The bandpassed signal looks fine (first plot), and the hilbert of the raw signal looks fine (second plot), but the hilbert of the bandpassed signal does not show up (last plot). The resulting array is: [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj].
Reproducible error with:
import numpy as np
from neurodsp.filt import filter_signal
from scipy import signal
import matplotlib.pyplot as plt
Fs = 1024
LBP, HBP = 1, 100
Chan1 = np.loadtxt('SampleData')
Chan1_BP = filter_signal(Chan1, Fs, 'bandpass', (LBP,HBP))
analytical_signal = signal.hilbert(Chan1)
amplitude_envelope = np.abs(analytical_signal)
#Show bandpassed signal works:
fig0 = plt.figure(figsize=(10, 8))
plt.plot(Chan1)
plt.plot(Chan1_BP)
fig1 = plt.figure(figsize=(10, 8))
plt.plot(Chan1)
plt.plot(amplitude_envelope)
# Now with bandpassed signal
analytical_signal = signal.hilbert(Chan1_BP)
amplitude_envelope = np.abs(analytical_signal)
fig2 = plt.figure(figsize=(10, 8))
plt.plot(Chan1_BP)
plt.plot(amplitude_envelope)
Take a closer look at the values in Chan1_BP. You'll see that the values at the beginning and end of the array are nan. The nans were generated by neurodsp.filt.filter_signal. The default filter used by filter_signal is a FIR filter, and the default behavior is to pad the output with nans for values that cannot be computed with the full length of the FIR filter.
You can change that behavior by passing remove_edges=False, e.g.
Chan1_BP = filter_signal(Chan1, Fs, 'bandpass', (LBP,HBP), remove_edges=False)
With that change, the plots should look like you expected.
To illustrate my problem I prepared an example:
First, I have two arrays 'a'and 'b'and I'm interested in their distribution:
import numpy as np
import matplotlib.pyplot as plt
a = np.array([1,2,2,2,2,4,8,1,9,5,3,1,2,9])
b = np.array([5,9,9,2,3,9,3,6,8,4,2,7,8,8])
n1,bin1,pat1 = plt.hist(a,np.arange(1,10,2),histtype='step')
n2,bin2,pat2 = plt.hist(b,np.arange(1,10,2), histtype='step')
plt.show()
This code gives me a histogram with two 'curves'. Now I want to subtract one 'curve' from the other, and by this I mean that I do this for each bin separately:
n3 = n2-n1
I don't need negative counts so:
for i in range(0,len(n2)):
if n3[i]<0:
n3[i]=0
else:
continue
The new histogram curve should be plotted in the same range as the previous ones and it should have the same number of bins. So I have the number of bins and their position (which will be the same as the ones for the other curves, please refer to the block above) and the frequency or counts (n3) that every bins should have. Do you have any ideas of how I can do this with the data that I have?
You can use a step function to plot n3 = n2 - n1. The only issue is that you need to provide one more value, otherwise the last value is not shown nicely. Also you need to use the where="post" option of the step function.
import numpy as np
import matplotlib.pyplot as plt
a = np.array([1,2,2,2,2,4,8,1,9,5,3,1,2,9])
b = np.array([5,9,9,2,3,9,3,6,8,4,2,7,8,8])
n1,bin1,pat1 = plt.hist(a,np.arange(1,10,2),histtype='step')
n2,bin2,pat2 = plt.hist(b,np.arange(1,10,2), histtype='step')
n3=n2-n1
n3[n3<0] = 0
plt.step(np.arange(1,10,2),np.append(n3,[n3[-1]]), where='post', lw=3 )
plt.show()
My data file is shared in the following link.
We can plot this data using the following script.
import matplotlib as mpl
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook
def read_datafile(file_name):
data = np.loadtxt(file_name, delimiter=',')
return data
data = read_datafile('mah_data.csv')
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("Data")
ax1.set_xlabel('t')
ax1.set_ylabel('s')
ax1.plot(x,y, c='r', label='My data')
leg = ax1.legend()
plt.show()
How can we detect peaks in python? I can't find a suitable peak detection algorithm in Python.
You can use the argrelextrema function in scipy.signal to return the indices of the local maxima or local minima of an array. This works for multi-dimensional arrays as well by specifying the axis.
from scipy.signal import argrelextrema
ind_max = argrelextrema(z, np.greater) # indices of the local maxima
ind_min = argrelextrema(z, np.less) # indices of the local minima
maxvals = z[ind_max]
minvals = z[ind_min]
More specifically, one can use the argrelmax or argrelmin to find the local maximas or local minimas. This also works for multi dimensional arrays using the axis argument.
from scipy.signal import argrelmax, argrelmin
ind_max = argrelmax(z, np.greater) # indices of the local maxima
ind_min = argrelmin(z, np.less) # indices of the local minima
maxvals = z[ind_max]
minvals = z[ind_min]
For more details, one can refer to this link: https://docs.scipy.org/doc/scipy/reference/signal.html#peak-finding
Try using peakutil (http://pythonhosted.org/PeakUtils/). Here is my solution to your question using peakutil.
import pandas as pd
import peakutils
data = pd.read_csv("mah_data.csv", header=None)
ts = data[0:10000][1] # Get the second column in the csv file
print(ts[0:10]) # Print the first 10 rows, for quick testing
# check peakutils for all the parameters.
# indices are the index of the points where peaks appear
indices = peakutils.indexes(ts, thres=0.4, min_dist=1000)
print(indices)
You should also checkout peak finding in scipy (https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks_cwt.html)
Try the findpeaks library.
pip install findpeaks
I can not find the data attached but suppose the data is a vector and stored in data:
import pandas as pd
data = pd.read_csv("mah_data.csv", header=None).values
# Import library
from findpeaks import findpeaks
# If the resolution of your data is low, I would recommend the ``lookahead`` parameter, and if your data is "bumpy", also the ``smooth`` parameter.
fp = findpeaks(lookahead=1, interpolate=10)
# Find peaks
results = fp.fit(data)
# Make plot
fp.plot()
# Results with respect to original input data.
results['df']
# Results based on interpolated smoothed data.
results['df_interp']
I am looking to find the peaks in some gaussian smoothed data that I have. I have looked at some of the peak detection methods available but they require an input range over which to search and I want this to be more automated than that. These methods are also designed for non-smoothed data. As my data is already smoothed I require a much more simple way of retrieving the peaks. My raw and smoothed data is in the graph below.
Essentially, is there a pythonic way of retrieving the max values from the array of smoothed data such that an array like
a = [1,2,3,4,5,4,3,2,1,2,3,2,1,2,3,4,5,6,5,4,3,2,1]
would return:
r = [5,3,6]
There exists a bulit-in function argrelextrema that gets this task done:
import numpy as np
from scipy.signal import argrelextrema
a = np.array([1,2,3,4,5,4,3,2,1,2,3,2,1,2,3,4,5,6,5,4,3,2,1])
# determine the indices of the local maxima
max_ind = argrelextrema(a, np.greater)
# get the actual values using these indices
r = a[max_ind] # array([5, 3, 6])
That gives you the desired output for r.
As of SciPy version 1.1, you can also use find_peaks. Below are two examples taken from the documentation itself.
Using the height argument, one can select all maxima above a certain threshold (in this example, all non-negative maxima; this can be very useful if one has to deal with a noisy baseline; if you want to find minima, just multiply you input by -1):
import matplotlib.pyplot as plt
from scipy.misc import electrocardiogram
from scipy.signal import find_peaks
import numpy as np
x = electrocardiogram()[2000:4000]
peaks, _ = find_peaks(x, height=0)
plt.plot(x)
plt.plot(peaks, x[peaks], "x")
plt.plot(np.zeros_like(x), "--", color="gray")
plt.show()
Another extremely helpful argument is distance, which defines the minimum distance between two peaks:
peaks, _ = find_peaks(x, distance=150)
# difference between peaks is >= 150
print(np.diff(peaks))
# prints [186 180 177 171 177 169 167 164 158 162 172]
plt.plot(x)
plt.plot(peaks, x[peaks], "x")
plt.show()
If your original data is noisy, then using statistical methods is preferable, as not all peaks are going to be significant. For your a array, a possible solution is to use double differentials:
peaks = a[1:-1][np.diff(np.diff(a)) < 0]
# peaks = array([5, 3, 6])
>> import numpy as np
>> from scipy.signal import argrelextrema
>> a = np.array([1,2,3,4,5,4,3,2,1,2,3,2,1,2,3,4,5,6,5,4,3,2,1])
>> argrelextrema(a, np.greater)
array([ 4, 10, 17]),)
>> a[argrelextrema(a, np.greater)]
array([5, 3, 6])
If your input represents a noisy distribution, you can try smoothing it with NumPy convolve function.
If you can exclude maxima at the edges of the arrays you can always check if one elements is bigger than each of it's neighbors by checking:
import numpy as np
array = np.array([1,2,3,4,5,4,3,2,1,2,3,2,1,2,3,4,5,6,5,4,3,2,1])
# Check that it is bigger than either of it's neighbors exluding edges:
max = (array[1:-1] > array[:-2]) & (array[1:-1] > array[2:])
# Print these values
print(array[1:-1][max])
# Locations of the maxima
print(np.arange(1, array.size-1)[max])