Using cross correlation in python

Using cross correlation in python - python

I'm trying to solve a correlation problem where I need to find where a pattern sequence is found inside a signal sequence. At some point I was able to find the correct solution only to begin trying to optimize the code, and the code I had accomplished wasn't saved. Now the cross correlation function just won't solve correctly and I don't know why. I have restarted the kernel multiple times.
Here is the code and the links to the text files that contain the signal and the pattern.
https://drive.google.com/file/d/1tBzHMUfmcx_gGR0arYPaQ5GB9MybXKRv/view?usp=sharing
https://drive.google.com/file/d/1TeSe9t8TeVHEp2BxKXYz6Ndlpah--yLg/view?usp=sharing
import numpy as np
import matplotlib.pyplot as plt
patron = np.loadtxt('patron.txt', delimiter=',', skiprows=1)
senal = np.loadtxt('señal.txt', delimiter=',', skiprows=1)
Fs=100
ts = np.arange(0,len(senal))
plt.figure(figsize=(20,8))
plt.subplot(3,1,1)
plt.plot(ts,patron)
plt.subplot(3,1,2)
plt.plot(ts,senal)
corr = np.correlate(senal,patron,"same")
print(np.where(corr == np.amax(corr))) #this should be where correlation reaches its maximum value, and where the functions are most "similar"
plt.subplot(3,1,3)
plt.plot(ts,corr, 'r')
How do I know I had it right? I plotted the "senal" sequence shifted 799 places (the value I had when the code was right) with:
np.roll(senal,799)
plt.plot(senal)
which resulted in this graph. It looks pretty intuitive when it resulted in a maximum correlation at index 799:

Hello I fliped the 'patron' and 'senal' values in the correlate function function and it seems good:
import numpy as np
import matplotlib.pyplot as plt
patron = np.loadtxt('patron.txt', delimiter=',', skiprows=1)
senal = np.loadtxt('señal.txt', delimiter=',', skiprows=1)
Fs=100
ts = np.arange(0,len(senal))
plt.figure(figsize=(20,8))
plt.subplot(3,1,1)
plt.plot(ts,patron)
plt.subplot(3,1,2)
plt.plot(ts,senal)
corr = np.correlate(patron,senal,'same')
print(np.argmax(corr)) #this should be where correlation reaches its maximum value, and where the functions are most "similar"
plt.subplot(3,1,3)
plt.plot(corr, 'r')

Related

How to implement a butterworth filter

I am trying to implement a butterworthfilter with python in jupyter Notebook. I wrote this code by a tutorial.
The data are from a CSV-File, it calls Samples.csv
The data in Samples.csv are like
998,4778415
1009,209592
1006,619094
1001,785406
993,9426543
990,1408991
992,736118
995,8127334
1002,381664
1006,094429
1000,634799
999,3287747
1002,318812
999,3287747
1004,427698
1008,516733
1007,964781
1002,680906
1000,14449
994,257009
The column calls Euclidian Norm. The range of the data are from 0 to 1679.286158 and theyre are 1838 rows.
This is the code in Jupyter:
from scipy.signal import filtfilt
from scipy import stats
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy
def plot():
data=pd.read_csv('Samples.csv',sep=";", decimal=",")
sensor_data=data[['Euclidian Norm']]
sensor_data=np.array(sensor_data)
time=np.linspace(0,1679.286158,1838)
plt.plot(time,sensor_data)
plt.show()
filtered_signal=bandPassFilter(sensor_data)
plt.plot(time,sensor_data)
plt.show()
def bandPassFilter(signal):
fs = 4000.0
lowcut=20.0
highcut=50.0
nyq=0.5*fs
low=lowcut/nyq
high=highcut/nyq
order =2
b,a=scipy.signal.butter(order,[low,high],'bandpass',analog=False)
y=scipy.signal.filtfilt(b,a,signal,axis=0)
return(y)
plot()
My problem is that nothing changes in my data. It doesnt filtered my data. The graph of the filtered data is the same like the source data. Does anyone know what could be wrong?
The first graph is the source data and the second graph is the filtered graph. It looks very similar. Its like the same graph

I can't comment yet.
You're never using filtered_signal and plot with the same arguments twice.
Here`s one of my implementations with added interpolation, very similar to yours:
def butterFit(data, freq, order=2):
ar = scipy.signal.butter(order, freq) # Gets params for filttilt
return spfilter.filtfilt(ar[0], ar[1], data)
def plotFilteredSplines(timeframe, data, amount_points):
# Generate evenly spread indices for the data points.
indices = np.arange(0, len(data), amount_points)
cutoff_freq = 2 / (2/10 * len(timeframe))
# Reshape the data with butter :)
data = butterFit(data, cutoff_freq)
# Plot Fitlered data
plt.plot(timeframe, data, '-.')
interpol_x = np.linspace(timeframe[0], timeframe[-1], 100)
# Get the cubic spline approx function
interpolation = sp.interpolate.interp1d(timeframe, data, kind='cubic')
# Plot the interpolation over the extended time frame.
plt.plot(interpol_x, interpolation(interpol_x), '-r')

Errors using curve_fit for Guassian fit of data

I'm trying to do a guassian fit for some experimental data but I keep running into error after error. I've followed a few different threads online but either the fit isn't good (it's just a horizontal line) or the code just won't run. I'm following this code from another thread. Below is my code.
I apologize if my code seems a bit messy. There are some bits from other attempts when I tried making it work. Hence the "astropy" import.
import math as m
import matplotlib.pyplot as plt
import numpy as np
from scipy import optimize as opt
import pandas as pd
import statistics as stats
from astropy import modeling
def gaus(x,a,x0,sigma, offset):
return a*m.exp(-(x-x0)**2/(2*sigma**2)) + offset
# Python program to get average of a list
def Average(lst):
return sum(lst) / len(lst)
wavelengths = [391.719, 391.984, 392.248, 392.512, 392.777, 393.041, 393.306, 393.57, 393.835, 394.099, 391.719, 391.455, 391.19, 390.926, 390.661, 390.396]
intensities = [511.85, 1105.85, 1631.85, 1119.85, 213.85, 36.85, 10.85, 6.85, 13.85, 7.85, 511.85, 200.85, 80.85, 53.85, 14.85, 24.85]
n=sum(intensities)
mean = sum(wavelengths*intensities)/n
sigma = m.sqrt(sum(intensities*(wavelengths-mean)**2)/n)
def gaus(x,a,x0,sigma):
return a*m.exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = opt.curve_fit(gaus,wavelengths,intensities,p0=[1,mean,sigma])
print(popt)
plt.scatter(wavelengths, intensities)
plt.title("Helium Spectral Line Peak 1")
plt.xlabel("Wavelength (nm)")
plt.ylabel("Intensity (a.u.)")
plt.show()
Thanks to the kind user, my curve seems to be working more reasonably well. However, one of the points seems to be back connecting to an earlier point? Screenshot below:

There are two problems with your code. The first is that you are performing vector operation on list which gives you the first error in the line mean = sum(wavelengths*intensities)/n. Therefore, you should use np.array instead. The second is that you take math.exp on python list which again throws an error as it takes a real number, so you should use np.exp here instead.
The following code solves your problem:
import matplotlib.pyplot as plt
import numpy as np
from scipy import optimize as opt
wavelengths = [391.719, 391.984, 392.248, 392.512, 392.777, 393.041,
393.306, 393.57, 393.835, 394.099, 391.719, 391.455,
391.19, 390.926, 390.661, 390.396]
intensities = [511.85, 1105.85, 1631.85, 1119.85, 213.85, 36.85, 10.85, 6.85,
13.85, 7.85, 511.85, 200.85, 80.85, 53.85, 14.85, 24.85]
wavelengths_new = np.array(wavelengths)
intensities_new = np.array(intensities)
n=sum(intensities)
mean = sum(wavelengths_new*intensities_new)/n
sigma = np.sqrt(sum(intensities_new*(wavelengths_new-mean)**2)/n)
def gaus(x,a,x0,sigma):
return a*np.exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = opt.curve_fit(gaus,wavelengths_new,intensities_new,p0=[1,mean,sigma])
print(popt)
plt.scatter(wavelengths_new, intensities_new, label="data")
plt.plot(wavelengths_new, gaus(wavelengths_new, *popt), label="fit")
plt.title("Helium Spectral Line Peak 1")
plt.xlabel("Wavelength (nm)")
plt.ylabel("Intensity (a.u.)")
plt.show()

ValueError: x and y must have same first dimension, but have shapes

I wonder how to best solve the following problem in my script: "ValueError: x and y must have same first dimension, but have shapes (1531,) and (1532,)".
What is the problem here? The problem is that the x and y axis of the plot don't share the exact same number of values (input) to plot. The result is the error message above.
Let us look at the code first:
# Initialize
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
from matplotlib.pyplot import cm
# Numpy.loadtxt – Loads data from a textfile.
# Scipy.signal.welch – Creation of the power-spectrum via welch method. f, Welch creates the ideal frequencies (f, Welch = Power Spectrum or Power Spectral Density)
Subjects = ["Subject1" "Subject2"]
for Subject in Subjects:
Txt = np.loadtxt("/datadir.../{0}/filename...{0}.txt".format(Subject), comments="#", delimiter=None,
converters=None, skiprows=0, usecols=0, unpack=False, ndmin=0, encoding=None, max_rows=None, like=None)
f, Welch = signal.welch(Txt, fs=1.0, window="hann", nperseg=None, noverlap=None, nfft=3062, detrend="constant", return_onesided=True, scaling="density", axis=-1, average="mean")
BypassZero1 = f[f > 0.00000000000001] # Avoids "RuntimeWarning: divide by zero encountered in log"
BypassZero2 = Welch[Welch > 0.00000000000001]
Log_f = np.log(BypassZero1, out=BypassZero1, where=BypassZero1 > 0)
Log_Welch = np.log(BypassZero2, out=BypassZero2, where=BypassZero2 > 0)
plt.plot(Log_f, Log_Welch)
The code lines "BypassZero1" and "BypassZero2" tell Python to only use values above 0.00000000000001 for both "f" and "Welch". Otherwise the problem "RuntimeWarning: divide by zero encountered in log" would occur in the following step where I apply the logarithm for both axes (Log_f and Log_Welch).
This is where the problem occurs for the last plt.plot line of the code. It seems that a different number of numeric values are "left over" for "f" and "Welch" after the previous step of using the Welch method and applying the logarithm for both axes.
I wonder if there is a possibility to deal with the 0.xxx values provided in the .txt file. Currently, only values above 0.00000000000001 for both f and Welch are used. This will lead to the different number of values for x and y, hence resulting in the impossibility of plotting the data.
What could be a solution for this problem?

As you pointed out, the error message indicates that your two arrays are of different length. This is because the mask of the second array should be the same as the mask of the first. Therefore, replacing BypassZero2 = Welch[Welch > 0.00000000000001] with BypassZero2 = Welch[f > 0.00000000000001] should fix the issue.

Basically, x and y coordinates we are plotting must be of same length, so that we can make sure it plots one on one.
Thus, ensure their lengths are equal.

plotting noise spectrum of the data

i have a single column file(contain only one column) and a matrix file(contain 10 columns) of data which are noisy data and i want to plot the noise spectrum of both file using python.
sample data for single column file is attached here
-1.064599999999999921e-02
-1.146800000000000076e-02
-1.011899999999999952e-02
-7.400200000000000007e-03
-4.306500000000000432e-03
-1.644800000000000081e-03
1.936600000000000127e-04
1.239199999999999980e-03
1.759200000000000043e-03
2.019799999999999981e-03
2.148699999999999916e-03
2.153099999999999806e-03
2.008799999999999822e-03
1.700899999999999981e-03
1.181500000000000042e-03
3.194000000000000116e-04
-1.072000000000000036e-03
-3.133799999999999954e-03
and sample data for matrix file is attached here
-2.596100000000000057e-03 -1.856000000000000011e-03 -1.821400000000000102e-02 5.023599999999999594e-03 -1.064599999999999921e-02 -1.906300000000000008e-02 -6.370799999999999380e-05 5.814800000000000177e-03 -5.391800000000000412e-03 -1.311000000000000013e-02
1.636700000000000047e-03 -8.651600000000000176e-04 -2.490799999999999959e-02 1.645399999999999988e-02 -1.146800000000000076e-02 -4.609199999999999929e-03 6.475800000000000306e-03 1.265800000000000085e-02 1.855799999999999898e-03 -5.387499999999999928e-03
4.516499999999999682e-03 1.438899999999999901e-03 -2.911599999999999952e-02 2.590800000000000047e-02 -1.011899999999999952e-02 2.378800000000000012e-02 1.080200000000000084e-02 1.994299999999999892e-02 8.882299999999999224e-03 2.866500000000000124e-03
5.604699999999999786e-03 4.557799999999999872e-03 -2.870800000000000088e-02 2.832300000000000095e-02 -7.400200000000000007e-03 2.882099999999999940e-02 1.145799999999999944e-02 2.488800000000000040e-02 1.367299999999999939e-02 8.998799999999999508e-03
4.797400000000000275e-03 7.657399999999999970e-03 -2.582800000000000026e-02 2.288000000000000103e-02 -4.306500000000000432e-03 8.315499999999999975e-03 7.967600000000000030e-03 2.487999999999999934e-02 1.516600000000000066e-02 1.177899999999999954e-02
2.314300000000000038e-03 9.749700000000000033e-03 -2.252099999999999935e-02 1.762000000000000025e-02 -1.644800000000000081e-03 -1.257800000000000064e-02 1.220600000000000070e-03 1.866299999999999903e-02 1.377199999999999952e-02 1.163999999999999931e-02
-1.290700000000000094e-03 9.894599999999999923e-03 -1.928900000000000059e-02 1.360300000000000051e-02 1.936600000000000127e-04 -2.438999999999999849e-02 -6.739199999999999878e-03 6.961199999999999853e-03 1.086299999999999939e-02 1.015199999999999957e-02
-5.137400000000000300e-03 7.453800000000000009e-03 -1.615099999999999869e-02 1.018799999999999914e-02 1.239199999999999980e-03 -1.585699999999999957e-02 -1.349500000000000005e-02 -7.773600000000000301e-03 7.680499999999999827e-03 9.148399999999999241e-03
-8.159500000000000086e-03 2.403600000000000094e-03 -1.270400000000000001e-02 5.359000000000000048e-03 1.759200000000000043e-03 -9.746799999999999908e-03 -1.730999999999999900e-02 -2.229599999999999985e-02 4.641100000000000433e-03 9.871700000000000613e-03
-9.419600000000000195e-03 -4.305599999999999705e-03 -8.259700000000000028e-03 -3.140800000000000015e-03 2.019799999999999981e-03 -5.883300000000000161e-03 -1.772100000000000064e-02 -2.695099999999999926e-02 1.592399999999999892e-03 1.255299999999999992e-02
-8.469000000000000833e-03 -1.101399999999999949e-02 -2.205400000000000155e-03 -1.641199999999999951e-02 2.148699999999999916e-03 -3.635199999999999890e-03 -1.558000000000000010e-02 -1.839000000000000010e-02 -1.408900000000000039e-03 1.642899999999999916e-02
-5.529599999999999967e-03 -1.553999999999999999e-02 5.413199999999999956e-03 -4.248000000000000040e-03 2.153099999999999806e-03 -2.403199999999999868e-03 -1.255099999999999966e-02 -8.339100000000000332e-03 -3.665700000000000035e-03 2.009499999999999828e-02
i tried with https://www.earthinversion.com/techniques/visualizing-power-spectral-density-demo-obspy/ but for my ascii data set i could not do it.I hope experts may help me .Thanks in advance.

Maybe this can give you a start. Give your matrix of data in the file "x.data", this plots the raw data as 10 curves, then runs an FFT on each column and displays the FFT. The FFT isn't very interesting with only 12 points, but it will spark ideas.
There's still the problem of "how do you define noise"? The signals you presented do not seem to be very noisy. Unless you know what kind of signal you're expecting,an FFT might not do much good.
import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack
data = np.loadtxt("x.data")
for i in range(data.shape[1]):
plt.plot( data[:,i] )
plt.show()
for i in range(data.shape[1]):
f = scipy.fftpack.fft(data[:,i])
plt.plot(np.abs(f))
plt.show()

Use numpy.loadtxt() to convert the data to a numpy array. Then you can apply the method described in the link you provided in order to obtain the spectra. E.g.:
import numpy as np
data = np.loadtxt("file.txt")
The you plot the spectrum for that data. E.g.:
import matplotlib.pyplot as plt import scipy.fftpack
yf = scipy.fftpack.fft(data)
fig, ax = plt.subplots()
ax.plot(np.abs(yf))
plt.show()

Detecting peaks in python plots

My data file is shared in the following link.
We can plot this data using the following script.
import matplotlib as mpl
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook
def read_datafile(file_name):
data = np.loadtxt(file_name, delimiter=',')
return data
data = read_datafile('mah_data.csv')
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("Data")
ax1.set_xlabel('t')
ax1.set_ylabel('s')
ax1.plot(x,y, c='r', label='My data')
leg = ax1.legend()
plt.show()
How can we detect peaks in python? I can't find a suitable peak detection algorithm in Python.

You can use the argrelextrema function in scipy.signal to return the indices of the local maxima or local minima of an array. This works for multi-dimensional arrays as well by specifying the axis.
from scipy.signal import argrelextrema
ind_max = argrelextrema(z, np.greater) # indices of the local maxima
ind_min = argrelextrema(z, np.less) # indices of the local minima
maxvals = z[ind_max]
minvals = z[ind_min]
More specifically, one can use the argrelmax or argrelmin to find the local maximas or local minimas. This also works for multi dimensional arrays using the axis argument.
from scipy.signal import argrelmax, argrelmin
ind_max = argrelmax(z, np.greater) # indices of the local maxima
ind_min = argrelmin(z, np.less) # indices of the local minima
maxvals = z[ind_max]
minvals = z[ind_min]
For more details, one can refer to this link: https://docs.scipy.org/doc/scipy/reference/signal.html#peak-finding

Try using peakutil (http://pythonhosted.org/PeakUtils/). Here is my solution to your question using peakutil.
import pandas as pd
import peakutils
data = pd.read_csv("mah_data.csv", header=None)
ts = data[0:10000][1] # Get the second column in the csv file
print(ts[0:10]) # Print the first 10 rows, for quick testing
# check peakutils for all the parameters.
# indices are the index of the points where peaks appear
indices = peakutils.indexes(ts, thres=0.4, min_dist=1000)
print(indices)
You should also checkout peak finding in scipy (https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks_cwt.html)

Try the findpeaks library.
pip install findpeaks
I can not find the data attached but suppose the data is a vector and stored in data:
import pandas as pd
data = pd.read_csv("mah_data.csv", header=None).values
# Import library
from findpeaks import findpeaks
# If the resolution of your data is low, I would recommend the ``lookahead`` parameter, and if your data is "bumpy", also the ``smooth`` parameter.
fp = findpeaks(lookahead=1, interpolate=10)
# Find peaks
results = fp.fit(data)
# Make plot
fp.plot()
# Results with respect to original input data.
results['df']
# Results based on interpolated smoothed data.
results['df_interp']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using cross correlation in python - python

Related

How to implement a butterworth filter

Errors using curve_fit for Guassian fit of data

ValueError: x and y must have same first dimension, but have shapes

plotting noise spectrum of the data

Detecting peaks in python plots

Categories

Resources