So far I found 4 ways to find peaks in Python, however none of them can specify the number of peaks like Matlab does. Can someone provide some insight?
import scipy.signal as sg
import numpy as np
# Method 1
sg.find_peaks_cwt(vector, np.arange(1,4),max_distances=np.arange(1, 4)*2)
# Method 2
sg.argrelextrema(np.array(vector),comparator=np.greater,order=2)
# Method 3
sg.find_peaks(vector, height=7, distance=2.1)
# Method 4
detect_peaks.detect_peaks(vector, mph=7, mpd=2)`
Below is the Matlab code that I want to emulate:
[pks,locs] = findpeaks(data,'Npeaks',n)
If you want the exact function Matlab has, why not just use that function? If you have the rest of your data in Python, then you can just use the module provided by Matlab.
import matlab.engine #import matlab engine
eng = matlab.engine.start_matlab() #Start matlab engine
a = a = [(0.1*i)*(0.1*i-1)*(0.1*i-2) for i in range(50)] #Create some data with peaks
b = eng.findpeaks(matlab.double(a),'Npeaks',1) #Find 1 peak
Try the findpeaks library. Multiple methods are available for the detections of peaks and valleys in 1D-vectors and 2D-arrays (images).
pip install findpeaks
Lets create some peaks:
i = 10000
xs = np.linspace(0,3.7*np.pi,i)
X = (0.3*np.sin(xs) + np.sin(1.3 * xs) + 0.9 * np.sin(4.2 * xs) + 0.06 *
np.random.randn(i))
# import library
from findpeaks import findpeaks
# Initialize
fp = findpeaks()
# Find the peaks (high/low)
results = fp.fit(X)
# Make plot
fp.plot()
# Some of the results:
results['df']
Related
A now closed discussion shows how to use the R dtw package in python. This is a little clumsy, but the R dtw package is great and better than currently available python dtw implementations. Unfortunately, the windowing functions like the Sakoe-Chiba band do not work when trying to specify a "window.size". There appears to be an issue with the mapping to the argument. Note that "." in arguments is supposed to be replaced with "_" when using rpy2. But following this convention, the argument is not being used for some reason.
import numpy as np
import rpy2.robjects.numpy2ri
from rpy2.robjects.packages import importr
rpy2.robjects.numpy2ri.activate()
# Set up our R namespaces
R = rpy2.robjects.r
DTW = importr('dtw')
# Generate our data
idx = np.linspace(0, 2*np.pi, 100)
template = np.cos(idx)
query = np.sin(idx) + np.array(R.runif(100))/10
# Calculate the alignment vector and corresponding distance
alignment = R.dtw(query, template, keep=True,window_type='sakoechiba',
window_size=5)
>>> RRuntimeError: Error in window.function(row(wm), col(wm), query.size= n, reference.size = m, :
argument "window.size" is missing, with no default
You can see that the error states "window.size" is missing, despite "window_size" clearly being specified in the rpy2 fashion.
Just a note from the future: this question is now superseded by the feature-equivalent dtw-python package (also found on PyPI). The rpy2-R-dtw bridge should no longer be necessary.
Answering my own question in case anyone ever has the same issue. The problem is the argument mapping and the R three dots ellipsis ‘...’. This can be fixed by specifying the mapping manually.
from rpy2.robjects.functions import SignatureTranslatedFunction
R.dtw = SignatureTranslatedFunction(R.dtw,
init_prm_translate={'window_size': 'window.size'})
So with this specification the window_size argument is used correctly.
import numpy as np
import rpy2.robjects.numpy2ri
from rpy2.robjects.packages import importr
from rpy2.robjects.functions import SignatureTranslatedFunction
rpy2.robjects.numpy2ri.activate()
# Set up our R namespaces
R = rpy2.robjects.r
DTW = importr('dtw')
R.dtw = SignatureTranslatedFunction(R.dtw,
init_prm_translate={'window_size': 'window.size'})
# Generate our data
idx = np.linspace(0, 2*np.pi, 100)
template = np.cos(idx)
query = np.sin(idx) + np.array(R.runif(100))/10
# Calculate the alignment vector and corresponding distance
alignment = R.dtw(query, template, keep=True,window_type='sakoechiba',
window_size=10)
dist = alignment.rx('distance')[0][0]
print(dist)
>>> 117.348292359
I am using the Python version of the Shogun Toolbox.
I want to use the LinearTimeMMD, which accepts data under the streaming interface CStreamingFeatures. I have the data in the form of two RealFeatures objects: feat_p and feat_q. These work just fine with the QuadraticTimeMMD.
In order to use it with the LinearTimeMMD, I need to create StreamingFeatures objects from these - In this case, these would be StreamingRealFeatures, as far as I know.
My first approach was using this:
gen_p, gen_q = StreamingRealFeatures(feat_p), StreamingRealFeatures(feat_q)
This however does not seem to work: The LinearTimeMMD delivers warnings and an unrealistic result (growing constantly with the number of samples) and calling gen_p.get_dim_feature_space() returns -1. Also, if I try calling gen_p.get_streamed_features(100) this results in a Memory Access Error.
I tried another approach using StreamingFileFromFeatures:
streamFile_p = sg.StreamingFileFromRealFeatures()
streamFile_p.set_features(feat_p)
streamFile_q = sg.StreamingFileFromRealFeatures()
streamFile_q.set_features(feat_q)
gen_p = StreamingRealFeatures(streamFile_p, False, 100)
gen_q = StreamingRealFeatures(streamFile_q, False, 100)
But this results in the same situation with the same described problems.
It seems that in both cases, the contents of the RealFeatures object handed to the StreamingRealFeatures object cannot be accessed.
What am I doing wrong?
EDIT: I was asked for a small working example to show the error:
import os
SHOGUN_DATA_DIR=os.getenv('SHOGUN_DATA_DIR', '../../../data')
import shogun as sg
from shogun import StreamingRealFeatures
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import laplace, norm
def sample_gaussian_vs_laplace(n=220, mu=0.0, sigma2=1, b=np.sqrt(0.5)):
# sample from both distributions
X=norm.rvs(size=n)*np.sqrt(sigma2)+mu
Y=laplace.rvs(size=n, loc=mu, scale=b)
return X,Y
# Main Script
mu=0.0
sigma2=1
b=np.sqrt(0.5)
n=220
X,Y=sample_gaussian_vs_laplace(n, mu, sigma2, b)
# turn data into Shogun representation (columns vectors)
feat_p=sg.RealFeatures(X.reshape(1,len(X)))
feat_q=sg.RealFeatures(Y.reshape(1,len(Y)))
gen_p, gen_q = StreamingRealFeatures(feat_p), StreamingRealFeatures(feat_q)
print("Dimensions: ", gen_p.get_dim_feature_space())
print("Number of features: ", gen_p.get_num_features())
print("Number of vectors: ", gen_p.get_num_vectors())
test_features = gen_p.get_streamed_features(1)
print("success")
EDIT 2: The Output of the working example:
Dimensions: -1
Number of features: -1
Number of vectors: 1
Speicherzugriffsfehler (Speicherabzug geschrieben)
EDIT 3: Additional Code with LinearTimeMMD using the RealFeatures directly.
mmd = sg.LinearTimeMMD()
kernel = sg.GaussianKernel(10, 1)
mmd.set_kernel(kernel)
mmd.set_p(feat_p)
mmd.set_q(feat_q)
mmd.set_num_samples_p(1000)
mmd.set_num_samples_q(1000)
alpha = 0.05
# Code taken from notebook example on
# http://www.shogun-toolbox.org/notebook/latest/mmd_two_sample_testing.html
# Location on page: In[16]
block_size=100
mmd.set_num_blocks_per_burst(block_size)
# compute an unbiased estimate in linear time
statistic=mmd.compute_statistic()
print("MMD_l[X,Y]^2=%.2f" % statistic)
EDIT 4: Additional code sample showing the growing mmd problem:
import os
SHOGUN_DATA_DIR=os.getenv('SHOGUN_DATA_DIR', '../../../data')
import shogun as sg
from shogun import StreamingRealFeatures
import numpy as np
from matplotlib import pyplot as plt
def mmd(n):
X = [(1.0,i) for i in range(n)]
Y = [(2.0,i) for i in range(n)]
X = np.array(X)
Y = np.array(Y)
# turn data into Shogun representation (columns vectors)
feat_p=sg.RealFeatures(X.reshape(2, len(X)))
feat_q=sg.RealFeatures(Y.reshape(2, len(Y)))
mmd = sg.LinearTimeMMD()
kernel = sg.GaussianKernel(10, 1)
mmd.set_kernel(kernel)
mmd.set_p(feat_p)
mmd.set_q(feat_q)
mmd.set_num_samples_p(100)
mmd.set_num_samples_q(100)
alpha = 0.05
block_size=100
mmd.set_num_blocks_per_burst(block_size)
# compute an unbiased estimate in linear time
statistic=mmd.compute_statistic()
print("N =", n)
print("MMD_l[X,Y]^2=%.2f" % statistic)
print()
for n in [1000, 10000, 15000, 20000, 25000, 30000]:
mmd(n)
Output:
N = 1000
MMD_l[X,Y]^2=-12.69
N = 10000
MMD_l[X,Y]^2=-40.14
N = 15000
MMD_l[X,Y]^2=-49.16
N = 20000
MMD_l[X,Y]^2=-56.77
N = 25000
MMD_l[X,Y]^2=-63.47
N = 30000
MMD_l[X,Y]^2=-69.52
For some reason, the pythonenv in my machine is broken. So, I couldn't give a snippet in Python. But let me point to a working example in C++ which attempts to address the issues (https://gist.github.com/lambday/983830beb0afeb38b9447fd91a143e67).
I think the easiest way is to create a StreamingRealFeatures instance directly from RealFeatures instance (like you tried the first time). Check test1() and test2() methods in the gist which shows the equivalence of using RealFeatures and StreamingRealFeatures in the use-case in question. The reason you were getting weird results when streaming directly is that in order to start the streaming process we need to call the start_parser method in the StreamingRealFeatures class. We handle these technicalities internally inside MMD classes. But when trying to use it directly, we need to invoke that separately (See test3() method in my attached example).
Please note that the compute_statistic() method doesn't return MMD directly, but rather returns \frac{n_x\times n_y}{n_x+n_y}\times MMD^2 (as mentioned in the doc http://shogun.ml/api/latest/classshogun_1_1CMMD.html). With that in mind, maybe the results you are getting for varying number of samples make sense.
Hope it helps.
I would like to know if there is a function envelope in Python to have the same result as this
I have already tried an envelope function in Python but there is this result and it doesn't correspond with what I want.
Though you don't mention exactly what function you use, it seems like you are using two different kinds of envelopes.
The way you call envelope in matlab, the relevant description is:
[yupper,ylower] = envelope(x) returns the upper and lower envelopes of
the input sequence, x, as the magnitude of its analytic signal. The
analytic signal of x is found using the discrete Fourier transform as
implemented in hilbert. The function initially removes the mean of x
and adds it back after computing the envelopes. If x is a matrix, then
envelope operates independently over each column of x.
Based on this, I suppose you would be looking for a way to get the Hilber transform in python. An example of this can be found here:
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import hilbert, chirp
duration = 1.0
fs = 400.0
samples = int(fs*duration)
t = np.arange(samples) / fs
signal = chirp(t, 20.0, t[-1], 100.0)
signal *= (1.0 + 0.5 * np.sin(2.0*np.pi*3.0*t) )
analytic_signal = hilbert(signal)
amplitude_envelope = np.abs(analytic_signal)
instantaneous_phase = np.unwrap(np.angle(analytic_signal))
instantaneous_frequency = np.diff(instantaneous_phase) / (2.0*np.pi) * fs
fig = plt.figure()
ax0 = fig.add_subplot(211)
ax0.plot(t, signal, label='signal')
ax0.plot(t, amplitude_envelope, label='envelope')
ax0.set_xlabel("time in seconds")
ax0.legend()
ax1 = fig.add_subplot(212)
ax1.plot(t[1:], instantaneous_frequency)
ax1.set_xlabel("time in seconds")
ax1.set_ylim(0.0, 120.0)
Resulting in:
Sometimes I would use obspy.signal.filter.envelope(data_array); But you can only get the upper line in your given example.
Obspy is a very useful package dealing with seismogram.
I wish to perform a fourier transform of the function 'stress' from 0 to infinity and extract the real and imaginary parts. I have the following code that does it using a numerical integration technique:
import numpy as np
from scipy.integrate import trapz
import fileinput
import sys,string
window = 200000 # length of the array I wish to transform (number of data points)
time = np.linspace(1,window,window)
freq = np.logspace(-5,2,window)
output = [0]*len(freq)
for index,f in enumerate(freq):
visco = trapz(stress*np.exp(-1j*f*t),t)
soln = visco*(1j*f)
output[index] = soln
print 'f storage loss'
for i in range(len(freq)):
print freq[i],output[i].real,output[i].imag
This gives me a nice transformation of my input data.
Now I have an array of size 2x10^6, and using the above technique is not feasible(computation time scales as O(N^2)), so I have turned to the inbuilt fft function in numpy.
There aren't too many arguments that you can specify to change this function, and so I'm finding it difficult to customize it to my needs.
So far I have
import numpy as np
import fileinput
import sys, string
np.set_printoptions(threshold='nan')
N = len(stress)
fvi = np.fft.fft(stress,n=N)
gprime = fvi.real
gdoubleprime = fvi.imag
for i in range(len(stress)):
print gprime[i], gdoubleprime[i]
And it's not giving me accurate results.
The DFT in python is of the form A_k = summation(a_m * exp(-2*piimk/n)) where the summation is from m = 0 to m = n-1 (http://docs.scipy.org/doc/numpy-1.10.1/reference/routines.fft.html). How can I change it to the form that I have mentioned in my first code, i.e. exp(-1jfreq*t) (freq is the frequency and t is the time which have already been predefined)? Or is there a post processing of the data that I have to do?
Thanks in advance for all your help.
I want to make this code faster, as it takes ~4 miliseconds for a 1000x1000 image with a window size of 10x10.
import numpy
import scipy.misc
from matplotlib import pyplot as plt
import time
def corr(a, b):
'''finds the correlation of 2 intensities'''
return (sum2(a,b)/(sum2(a,a)*sum2(b,b))**0.5)
def sum2(a,b):
s = 0
for x in range(len(a)):
s += a[x]*b[x]
return s
##the commented code displays the images
##plt.ion()
def find_same(img1,img2,startx,width,starty,hight):
'''inputs 2 filenames, startx, width of search window, and hight.'''
crop_img = img1[starty:(starty+hight),startx:(startx+width)]
plt.imshow(crop_img,interpolation='nearest')
plt.draw()
a = []
for x in numpy.nditer(crop_img): #converting image to array of intesities
a.append(float(x))
mcfinder = []
for x in range(img2.shape[1]-width):
finder = img2[starty:(starty+hight),x:(x+width)]
b = []
for y in numpy.nditer(finder):
b.append(float(y))
correl = corr(a,b) #find correlation
mcfinder.append(correl)
maxim = max(mcfinder)
place = mcfinder.index(maxim)
finder = img2[starty:(starty+hight),place:(place+width)]
## plt.imshow(finder,interpolation='nearest')
## plt.draw()
## time.sleep(1)
## plt.close()
return maxim,place
img1 = scipy.misc.imread('me1.bmp')
img2 = scipy.misc.imread('me2.bmp')
starttime = time.clock()
print find_same(img1,img2,210,40,200,40)
endtime = time.clock()
print endtime-starttime
Are there any ways to make this faster? Or am I doing this fundamentally wrong?
Please let me know. To run this you need matplotlib, scipy, and numpy.
[I lack the reputation to post this as a comment]
As mentioned in #cel's comment, you should vectorize your code by using only numpy operations instead of loops over lists.
It seems you are trying to do some template matching, did you have a look at the example for skimage.feature.match_template() from the scikit-image documentation? scikit-image also provides windowed views (skimage.util.view-as-windows()) of a numpy array which is very handy, when you are analyzing a numpy array block-by-block.
If you don't want to add another dependency you should use Scipy's special functions to compute the correlation for you, e.g. scipy.ndimage.filters.correlate() (also have a look at the other functions in scipy.ndimage.filter).