I am looking to find the peaks in some gaussian smoothed data that I have. I have looked at some of the peak detection methods available but they require an input range over which to search and I want this to be more automated than that. These methods are also designed for non-smoothed data. As my data is already smoothed I require a much more simple way of retrieving the peaks. My raw and smoothed data is in the graph below.
Essentially, is there a pythonic way of retrieving the max values from the array of smoothed data such that an array like
a = [1,2,3,4,5,4,3,2,1,2,3,2,1,2,3,4,5,6,5,4,3,2,1]
would return:
r = [5,3,6]
There exists a bulit-in function argrelextrema that gets this task done:
import numpy as np
from scipy.signal import argrelextrema
a = np.array([1,2,3,4,5,4,3,2,1,2,3,2,1,2,3,4,5,6,5,4,3,2,1])
# determine the indices of the local maxima
max_ind = argrelextrema(a, np.greater)
# get the actual values using these indices
r = a[max_ind] # array([5, 3, 6])
That gives you the desired output for r.
As of SciPy version 1.1, you can also use find_peaks. Below are two examples taken from the documentation itself.
Using the height argument, one can select all maxima above a certain threshold (in this example, all non-negative maxima; this can be very useful if one has to deal with a noisy baseline; if you want to find minima, just multiply you input by -1):
import matplotlib.pyplot as plt
from scipy.misc import electrocardiogram
from scipy.signal import find_peaks
import numpy as np
x = electrocardiogram()[2000:4000]
peaks, _ = find_peaks(x, height=0)
plt.plot(x)
plt.plot(peaks, x[peaks], "x")
plt.plot(np.zeros_like(x), "--", color="gray")
plt.show()
Another extremely helpful argument is distance, which defines the minimum distance between two peaks:
peaks, _ = find_peaks(x, distance=150)
# difference between peaks is >= 150
print(np.diff(peaks))
# prints [186 180 177 171 177 169 167 164 158 162 172]
plt.plot(x)
plt.plot(peaks, x[peaks], "x")
plt.show()
If your original data is noisy, then using statistical methods is preferable, as not all peaks are going to be significant. For your a array, a possible solution is to use double differentials:
peaks = a[1:-1][np.diff(np.diff(a)) < 0]
# peaks = array([5, 3, 6])
>> import numpy as np
>> from scipy.signal import argrelextrema
>> a = np.array([1,2,3,4,5,4,3,2,1,2,3,2,1,2,3,4,5,6,5,4,3,2,1])
>> argrelextrema(a, np.greater)
array([ 4, 10, 17]),)
>> a[argrelextrema(a, np.greater)]
array([5, 3, 6])
If your input represents a noisy distribution, you can try smoothing it with NumPy convolve function.
If you can exclude maxima at the edges of the arrays you can always check if one elements is bigger than each of it's neighbors by checking:
import numpy as np
array = np.array([1,2,3,4,5,4,3,2,1,2,3,2,1,2,3,4,5,6,5,4,3,2,1])
# Check that it is bigger than either of it's neighbors exluding edges:
max = (array[1:-1] > array[:-2]) & (array[1:-1] > array[2:])
# Print these values
print(array[1:-1][max])
# Locations of the maxima
print(np.arange(1, array.size-1)[max])
Related
I want to digitize (= average out over cells) photon count data into pixels given by a grid that tells how they are aligned. The photon count data is stored in a 2D array. I want to split that data into cells, each of which would correspond to a pixel. The idea is basically the same as changing an HD image to a smaller resolution. I'd like to achieve this in Python.
The digitizing function I've written:
import numpy as np
def digitize(function_data, grid_shape):
"""
function_data = 2D array of function values of some 3D shape,
eg.: exp(-(x^2 + y^2 -> want to digitize this
grid_shape: an array of length 2 which contains the dimensions of the smaller resolution
"""
l = len(function_data)
pixel_len_x = int(l/grid_shape[0])
pixel_len_y = int(l/grid_shape[1])
digitized_data = np.empty((grid_shape[0], grid_shape[1]))
for i in range(grid_shape[0]): #row-index of pixel in smaller-resolution grid
for j in range(grid_shape[1]): #column-index of pixel in smaller-resolution grid
hd_pixel = []
for k in range(pixel_len_y):
hd_pixel.append(z_data[k][j:j*pixel_len_x])
hd_pixel = np.ravel(hd_pixel) #turns 2D array into 1D to be able to compute average
pixel_avg = np.average(hd_pixel)
digitized_data[i][j] = pixel_avg
return digitized_data
In theory, this function should do what I want to achieve, but when tested it doesn't yield the expected results. Either a completed version of my function or any other method that achieves my goal would be extremely helpful.
You could also use a interpolation function, if you can use SciPy. Here we use one of the gridded data interpolating functions, RectBivariateSpline to upsample your function, but you can find numerous examples on this and other sites.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import RectBivariateSpline as rbs
# Sampling coordinates
x = np.linspace(-2,2,20)
y = np.linspace(-2,2,30)
# Your function
f = np.exp(-(x[:,None]**2 + y**2))
# Interpolator
interp = rbs(x, y, f)
# Higher resolution coordinates
x_hd = np.linspace(x.min(), x.max(), x.size * 5)
y_hd = np.linspace(y.min(), y.max(), y.size * 5)
# New higher res function
f_hd = interp(x_hd, y_hd, grid = True)
# Some plots
fig, ax = plt.subplots(ncols = 2)
ax[0].imshow(f)
ax[1].imshow(f_hd)
I have to analyse a PPG signal. I found something to find the peaks but I can't use the values of the heights. They are stored in like a dictionary array or something and I don't know how to extract the values out of it. I tried using dict.values() but that didn't work.
import matplotlib.pyplot as plt
import numpy as np
from scipy.signal import savgol_filter
data = pd.read_excel('test_heartpy.xlsx')
arr = np.array(data)
time = arr[1:,0] # time in s
ECG = arr[1:,1] # ECG
PPG = arr[1:,2] # PPG
filtered = savgol_filter(PPG, 251, 3)
plt.plot(time, filtered)
plt.xlabel('Time (in s)')
plt.ylabel('PPG')
plt.grid('on')
The PPG signal looks like this. To search for the peaks I used:
# searching peaks
from scipy.signal import find_peaks
peaks, heights_peak_0 = find_peaks(PPG, height=0.2)
heights_peak = heights_peak_0.values()
plt.plot(PPG)
plt.plot(peaks, np.asarray(PPG)[peaks], "x")
plt.plot(np.zeros_like(PPG), "--", color="gray")
plt.title("PPG peaks")
plt.show()
print(heights_peak_0)
print(heights_peak)
print(peaks)
Printing:
{'peak_heights': array([0.4822998 , 0.4710083 , 0.43884277, 0.46728516, 0.47094727,
0.44702148, 0.43029785, 0.44146729, 0.43933105, 0.41400146,
0.45318604, 0.44335938])}
dict_values([array([0.4822998 , 0.4710083 , 0.43884277, 0.46728516, 0.47094727,
0.44702148, 0.43029785, 0.44146729, 0.43933105, 0.41400146,
0.45318604, 0.44335938])])
[787 2513 4181 5773 7402 9057 10601 12194 13948 15768 17518 19335]
Signal with highlighted peaks looks like this.
heights_peak_0 is the properties dict returned by scipy.signal.find_peaks
You can find more information about what is returned here
You can extract the array containing all the heights of the peaks with heights_peak_0["peak_heights"]
# the following will give you an array with the values of peaks
heights_peak_0['peak_heights']
# peaks seem to be the indices where find_peaks function foud peaks in the original signal. So you can get the peak values this way also
PPG[peaks]
According to the docs, the find_peaks() functions returns a tuple consisting of the peaks itself and a properties dict. As you are only interested in the peak values, you can simply ignore the second element of the tuple and only use the first one.
Assuming you want to have the 'coordinates' of your peaks you could then combine the peak heights (y-values) with its positions (x-values) like so (based on the first code snippet given in the docs):
import matplotlib.pyplot as plt
from scipy.misc import electrocardiogram
from scipy.signal import find_peaks
x = electrocardiogram()[2000:4000]
peaks, _ = find_peaks(x, distance=150)
peaks_x_values = peaks
peaks_y_values = x[peaks]
peak_coordinates = list(zip(peaks_x_values, peaks_y_values))
print(peak_coordinates)
plt.plot(x)
plt.plot(peaks_x_values, peaks_y_values, "x")
plt.show()
Printing:
[(65, 0.705), (251, 1.155), (431, 1.705), (608, 1.96), (779, 1.925), (956, 2.09), (1125, 1.745), (1292, 1.37), (1456, 1.2), (1614, 0.81), (1776, 0.665), (1948, 0.665)]
How to get the coordinates of the big rectangles that line on the diagonal.
For example yellow [0,615], [615,1438], [1438,1526]
import numpy as np;
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
df = pd.DataFrame(array) # array is image numpy
df.shape #(1526, 360)
s = cosine_similarity(df) #(1526, 1526)
plt.matshow(s)
i try get peaks in first row, but have noise information
speak = 1-s[0]
peaks, _ = find_peaks(speak, distance=160, height=0.1)
print(peaks, len(peaks))
np.diff(peaks)
plt.plot(speak)
plt.plot(peaks, speak[peaks], "x")
plt.show()
Update, add another example
And upload to colab full script https://colab.research.google.com/drive/1hyDIDs-QjLjD2mVIX4nNOXOcvCZY4O2c?usp=sharing
Use np.diag(df) to get a list of diagonal elements. Check when value crosses threshold if the color in your screenshot stands for below/above some value, probably zero.
All the diagonal elements of cosine_similarity are same. So you should look for changes in nearby values.
You could try this:
factor = 1.01
look_nearby = 1
changes = []
for i in range(look_nearby, s.shape[0]-look_nearby):
if s[i, i+look_nearby] > factor*s[i, i-look_nearby] or factor*s[i, i+look_nearby] < s[i, i-look_nearby]:
changes.append(i)
print(changes)
Set the factor value according to your preference (as you do not want (1200, 1200) in the output of 1st image) and according to the values of s.
Solve with DBSCAN clustering (find in same question DBSCAN for clustering of geographic location data)
from sklearn.cluster import DBSCAN
clustering = DBSCAN(eps=.5, min_samples=10).fit_predict(s)
peaks = np.where(clustering[:-1] != clustering[1:])[0]
I have a vector with a min of two points in space, e.g:
A = np.array([-1452.18133319 3285.44737438 -7075.49516676])
B = np.array([-1452.20175668 3285.29632734 -7075.49110863])
I want to find the tangent of the vector at a discrete points along the curve, g.g the beginning and end of the curve. I know how to do it in Matlab but I want to do it in Python. This is the code in Matlab:
A = [-1452.18133319 3285.44737438 -7075.49516676];
B = [-1452.20175668 3285.29632734 -7075.49110863];
points = [A; B];
distance = [0.; 0.1667];
pp = interp1(distance, points,'pchip','pp');
[breaks,coefs,l,k,d] = unmkpp(pp);
dpp = mkpp(breaks,repmat(k-1:-1:1,d*l,1).*coefs(:,1:k-1),d);
ntangent=zeros(length(distance),3);
for j=1:length(distance)
ntangent(j,:) = ppval(dpp, distance(j));
end
%The solution would be at beginning and end:
%ntangent =
% -0.1225 -0.9061 0.0243
% -0.1225 -0.9061 0.0243
Any ideas? I tried to find the solution using numpy and scipy using multiple methods, e.g.
tck, u= scipy.interpolate.splprep(data)
but none of the methods seem satisfy what I want.
Give der=1 to splev to get the derivative of the spline:
from scipy import interpolate
import numpy as np
t=np.linspace(0,1,200)
x=np.cos(5*t)
y=np.sin(7*t)
tck, u = interpolate.splprep([x,y])
ti = np.linspace(0, 1, 200)
dxdt, dydt = interpolate.splev(ti,tck,der=1)
ok, I found the solution which is a little modification of "pv" above (note that splev works only for 1D vectors)
One problem I was having originally with "tck, u= scipy.interpolate.splprep(data)" is that it requires a min of 4 points to work (Matlab works with two points). I was using two points. After increasing the data points, it works as i want.
Here is the solution for completeness:
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
data = np.array([[-1452.18133319 , 3285.44737438, -7075.49516676],
[-1452.20175668 , 3285.29632734, -7075.49110863],
[-1452.32645025 , 3284.37412457, -7075.46633213],
[-1452.38226151 , 3283.96135828, -7075.45524248]])
distance=np.array([0., 0.15247556, 1.0834, 1.50007])
data = data.T
tck,u = interpolate.splprep(data, u=distance, s=0)
yderv = interpolate.splev(u,tck,der=1)
and the tangents are (which matches the Matlab results if the same data is used):
(-0.13394599723751408, -0.99063114953803189, 0.026614957159932656)
(-0.13394598523149195, -0.99063115868512985, 0.026614950816003666)
(-0.13394595055068903, -0.99063117647357712, 0.026614941718878599)
(-0.13394595652952143, -0.9906311632471152, 0.026614954146007865)
I am considering to use OpenCV's Kmeans implementation since it says to be faster...
Now I am using package cv2 and function kmeans,
I can not understand the parameters' description in their reference:
Python: cv2.kmeans(data, K, criteria, attempts, flags[, bestLabels[, centers]]) → retval, bestLabels, centers
samples – Floating-point matrix of input samples, one row per sample.
clusterCount – Number of clusters to split the set by.
labels – Input/output integer array that stores the cluster indices for every sample.
criteria – The algorithm termination criteria, that is, the maximum number of iterations and/or the desired accuracy. The accuracy is specified as criteria.epsilon. As soon as each of the cluster centers moves by less than criteria.epsilon on some iteration, the algorithm stops.
attempts – Flag to specify the number of times the algorithm is executed using different initial labelings. The algorithm returns the labels that yield the best compactness (see the last function parameter).
flags –
Flag that can take the following values:
KMEANS_RANDOM_CENTERS Select random initial centers in each attempt.
KMEANS_PP_CENTERS Use kmeans++ center initialization by Arthur and Vassilvitskii [Arthur2007].
KMEANS_USE_INITIAL_LABELS During the first (and possibly the only) attempt, use the user-supplied labels instead of computing them from the initial centers. For the second and further attempts, use the random or semi-random centers. Use one of KMEANS_*_CENTERS flag to specify the exact method.
centers – Output matrix of the cluster centers, one row per each cluster center.
what is the argument flags[, bestLabels[, centers]]) mean? and what about his one: → retval, bestLabels, centers ?
Here's my code:
import cv, cv2
import scipy.io
import numpy
# read data from .mat file
mat = scipy.io.loadmat('...')
keys = mat.keys()
values = mat.viewvalues()
data_1 = mat[keys[0]]
nRows = data_1.shape[1]
nCols = data_1.shape[0]
samples = cv.CreateMat(nRows, nCols, cv.CV_32FC1)
labels = cv.CreateMat(nRows, 1, cv.CV_32SC1)
centers = cv.CreateMat(nRows, 100, cv.CV_32FC1)
#centers = numpy.
for i in range(0, nCols):
for j in range(0, nRows):
samples[j, i] = data_1[i, j]
cv2.kmeans(data_1.transpose,
100,
criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_MAX_ITER, 0.1, 10),
attempts=cv2.KMEANS_PP_CENTERS,
flags=cv2.KMEANS_PP_CENTERS,
)
And I encounter such error:
flags=cv2.KMEANS_PP_CENTERS,
TypeError: <unknown> is not a numpy array
How should I understand the parameter list and the usage of cv2.kmeans? Thanks
the documentation on this function is almost impossible to find. I wrote the following Python code in a bit of a hurry, but it works on my machine. It generates two multi-variate Gaussian Distributions with different means and then classifies them using cv2.kmeans(). You may refer to this blog post to get some idea of the parameters.
Handle imports:
import cv
import cv2
import numpy as np
import numpy.random as r
Generate some random points and shape them appropriately:
samples = cv.CreateMat(50, 2, cv.CV_32FC1)
random_points = r.multivariate_normal((100,100), np.array([[150,400],[150,150]]), size=(25))
random_points_2 = r.multivariate_normal((300,300), np.array([[150,400],[150,150]]), size=(25))
samples_list = np.append(random_points, random_points_2).reshape(50,2)
random_points_list = np.array(samples_list, np.float32)
samples = cv.fromarray(random_points_list)
Plot the points before and after classification:
blank_image = np.zeros((400,400,3))
blank_image_classified = np.zeros((400,400,3))
for point in random_points_list:
cv2.circle(blank_image, (int(point[0]),int(point[1])), 1, (0,255,0),-1)
temp, classified_points, means = cv2.kmeans(data=np.asarray(samples), K=2, bestLabels=None,
criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_MAX_ITER, 1, 10), attempts=1,
flags=cv2.KMEANS_RANDOM_CENTERS) #Let OpenCV choose random centers for the clusters
for point, allocation in zip(random_points_list, classified_points):
if allocation == 0:
color = (255,0,0)
elif allocation == 1:
color = (0,0,255)
cv2.circle(blank_image_classified, (int(point[0]),int(point[1])), 1, color,-1)
cv2.imshow("Points", blank_image)
cv2.imshow("Points Classified", blank_image_classified)
cv2.waitKey()
Here you can see the original points:
Here are the points after they have been classified:
I hope that this answer may help you, it is not a complete guide to k-means, but it will at least show you how to pass the parameters to OpenCV.
The problem here is your data_1.transpose is not a numpy array.
OpenCV 2.3.1 and higher python bindings do not take anything except numpy array as image/array parameters. so, data_1.transpose has to be a numpy array.
Generally, all the points in OpenCV are of type numpy.ndarray
eg.
array([[[100., 433.]],
[[157., 377.]],
.
.
[[147., 247.]], dtype=float32)
where each element of array is
array([[100., 433.]], dtype=float32)
and the element of that array is
array([100., 433.], dtype=float32)