How to center the signals to start around zero - python

I have been working with some sensor data and I came across each one and realised that some of the sensors do not start around zero. So, I was wondering is there a way to be able to move signals to the centre? See the image below for the signal plots and individual plots.
Example of one of the sensors can be found in this chat (https://chat.stackoverflow.com/rooms/238608/signal)
Code:
(Get the signal)
for fp in DataPathList:
k += 1
# print(k)
# Load spreadsheet:
print('Opened file number: {}'.format(fp))
dataset = np.loadtxt(fname=fp)
y = dataset[:, column_no]
y_signal[k] = np.array(y)
y_S1_max_signal[k] = np.max(np.array(dataset[:, 0]))
S_F = 1000
N = np.array(y_signal[k]).shape[0]
S_T = 1 / S_F
t_n = S_T * N # seconds of sampling
x_time = np.linspace(0, t_n, N)
Edit 1:
I was able to solve this problem by subtracting the signal with its mean and managed to move the plot to start from 0. However, I have a question here will this cause a large change in my data?
for fp in DataPathList:
k += 1
# print(k)
# Load spreadsheet:
print('Opened file number: {}'.format(fp))
dataset = np.loadtxt(fname=fp)
y = dataset[:, column_no]
y_signal[k] = np.array(y)
y_signal[k] = np.array(y) - np.mean(np.array(y))
y_S1_max_signal[k] = np.max(np.array(dataset[:, 0]))
S_F = 1000
N = np.array(y_signal[k]).shape[0]
S_T = 1 / S_F
t_n = S_T * N # seconds of sampling
x_time = np.linspace(0, t_n, N)

You might want to post code in order for people to be able to help you better. Generally, what you might want to do is centre your peak at 0. If you have a numpy array, use np.argmax(array) to find the location of your maximum. Then in matplotlib, make sure that you subtract that index from your xspace (the list/array that goes on your x-axis).

Related

How to go about data modelling?

I've spent the last 2 hours or so figuring out how to apply it to my two variables. I am supposed to demonstrate/explain how I would handle the relationship of the two following variables in data modelling:
Pressure24h DangerLevel24h
1000.2 45
1014.8 90
990.8 14
998.4 95
1002.1 46
1006 21
There is another 185,000 data to work with but that's just a very small sample of it. Pressure24h is measured in hectopascals and DangerLevel24h is measured in percentage. That's the only information I have to work with.
Is there any method that can be used to approach this?
I created a scatter plot to show the relationship but that was as far as I have gotten so far.
https://i.stack.imgur.com/Ty5Yn.png
Here's my code as discussed in the comments:
def lobf(*cords):
cords = cords[0]
print(cords)
x_mean, y_mean = 0, 0
print(cords)
for x, y in cords:
x_mean += x # get x sum
y_mean += y # get y sum
x_mean /= len(cords) # get x mean
y_mean /= len(cords) # get y mean
# Step 2 from https://www.varsitytutors.com/hotmath/hotmath_help/topics/line-of-best-fit
sigma_numerator, sigma_denominator = 0, 0
for xi, yi in cords:
sigma_numerator += (xi - x_mean)*(yi - y_mean) # get numerator
sigma_denominator += (xi - x_mean)**2 # get denominator
m = sigma_numerator/sigma_denominator # get slope
c = y_mean - m*x_mean # get y-intercept
return m ,c
data_values = [(2,2), (4,4)] # Sample data value you can put yours here
# Creating a for loop for every increment of 5 to avoid the blue blob you got.
# You can change the increment as per your choice
predicted_values = []
increment = 5
m ,c = lobf(data_values)
for i in range(data_values[-1][0]+increment, len(data_values)*100, increment): # You can consider dangerlevel24h as your x
"""
Starts with incrementing the last x value of your data
"""
predicted_values.append((i,i*m + c)) # appends x, y
print(predicted_values)
You can then plot every value from predicted_values. By iterating through every 5th or your desired iteration, you can avoid blue blobs to form. Also, this method will help you in predicting future values that aren't in your data. You can also try using Pearson's Theory of Correlation which is related to this method.

Why does my Python For loop over count data more and more each loop?

I take image data, plot over a line in this image data, study the intensity across the line. I then create two new lines that correspond to this image data plot over the line and I take the intersection of these two lines as my result from the for loop. I go through multiple images in a directory and get a new result each trip through the loop. However, I need to enter the degree as the argument of my function, gather my results from the for loop, then change the degree and re run to get new data. When I try to print all these at once, I get duplicate data that gets increasingly more duplicated the program loops through. Something is wrong and I can't quite tell what. I am kind of a newbie so I know my code looks sloppy, but I am stuck on this and I have a lot of image data so I want to automate this as much as possible.
Thanks for reading my problem. Here is my code below:
path = "./Example Folder/Still_Images/" #IMPORTANT: Make sure path to image directory is accurate.
all_files = glob.glob(path + "*.SPE")
result = []
time = []
def func(degree):
for filename in all_files:
spe_file = winspec.SpeFile(os.path.join("",filename))
image_data1 = ndimage.gaussian_filter(np.flipud(spe_file.data.reshape(1024,1024).T), sigma=10)
run = 1000
rad = degree*np.pi/180
rise = run*np.arctan(rad)
y_int = 540
x_test = np.linspace(0, 1000, 1000)
a0, b0 = 0, 540 - rise # The next few lines of code makes a line with a slope relevant to our study
a1, b1 = 1000, 540
pyth1 = 1000**2 + rise**2
num1 = round(math.sqrt(pyth1))
x1, y1 = np.linspace(a0, a1, num1), np.linspace(b0, b1, num1)
zi = scipy.ndimage.map_coordinates(image_data1, np.vstack((y1,x1)))
abs_max = np.where(zi==np.max(zi))[0][0]
abs_min = np.where(zi==np.min(zi))[0][0]
lowi = abs_max -200 #IMPORTANT: Want a better fit? Adjust -200 to -150, or -250. Play with it.
maxi = abs_max
difference = maxi - lowi
f = np.linspace(0, num1, num1)
a = np.linspace(lowi, maxi, difference)
short = zi[lowi:maxi]
fit = np.polyfit(a,short,1)
ang_coeff = fit[0]
intercept = fit[1]
fit_eq = ang_coeff*a + intercept #obtaining the y axis values for the fitting function
full_line = ang_coeff*f + intercept
abs_full = abs(full_line)
abs_min = min(abs_full)
x_val = np.where(abs_full==abs_min)[0]
#print(x_val[0])
x_length = 1000 - x_val[0]
y_length = 540 - x_length*np.arctan(rad)
#print(x_length,y_length)
#print(x_val) #X-positions
result.append(x_val)
#print(filename[30:33])
ttt = (filename[30:33]) #<<<< IMPORTANT! You need to adjust these values to get your time data.
time.append(ttt)
timer = sorted (np.hstack(time))
arr = sorted (np.hstack(result))
#print(arr, timer, degree)
#plt.plot(timer,arr)
#plt.show() << Un comment to see plot (last plot provides all data)
return degree, arr, timer
#print(arr)
print(func(-25), func(-20), func(-15), func(-10), func(-5), func(0), func(5), func(10), func(15), func(20), func(25))

Plotting the mean square displacement of a 2D random walk as a function of δt

I've already created a code for random walk of 10000 steps and then repeated it 12 times and stored each run in a separate text file (which was required in the question). I then calculated the mean square displacement of it(not sure if it's done correct). I now need to 'plot my Mean Square Displacement as a function of δt, including errorbars σ = std(MSD)/√N, where std(MSD) is the standard deviation among the different runs and N is the number of runs.' and then compute the diffusion constant D from the curve and check that D = 2 (∆/dt) where dt = 1.
Here is my code so far:
import numpy as np
import matplotlib.pyplot as plt
import random as rd
import math
a = (np.zeros((10000, 2), dtype=np.float))
def randwalk(x,y):
theta= 2*math.pi*rd.random()
x+=math.cos(theta); # This uses the equation given, since we are told the spatial unit = 1
y+=math.sin(theta);
return (x,y)
x, y = 0.,0.
for i in range(10000): # Using for loop and range function to initialize the array
x, y = randwalk(x,y)
a[i,:] = x,y
fn_base = "random_walk_%i.txt" # Saves each run in a numbered text file, fn_base is a varaible to hold format
N = 12
for j in range(N):
rd.seed(j) # seed(j) explicitly sets the seed to random numbers
x , y = 0., 0.
for i in range(10000):
x, y = randwalk(x,y)
a[i,:] = x, y
fn = fn_base % j
np.savetxt(fn, a)
destinations = np.zeros((12, 2), dtype=np.float)
for j in range(12):
x, y = 0., 0.
for i in range(10000):
x, y = randwalk(x, y)
destinations[j] = x, y
square_distances = destinations[:,0] ** 2 + destinations[:,1] ** 2
m_s_d = np.mean(square_distances)
I think that to do it I just have to plot the msd against the number of steps? But I'm not sure how to do this. I saw a similar question on stackoverflow but the code for it is different than mine and I don't understand how to use that for my code.
I tried to do next
plt.figure()
t = 10000
plt.plot(m_s_d, t)
plt,show()
But this gives an error as the dimensions are not equal.
Edit ** I think my issue is that I am trying to plot it against number of steps when I should be plotting it against the change in time. However I can’t work out how to calculate the change in time dt?
Apologies in advance is question isn't formulated well, I am fairly new to computing. Thank you.

Speeding up normal distribution probability mass allocation

We have N users with P avg. points per user, where each point is a single value between 0 and 1. We need to distribute the mass of each point using a normal distribution with a known density of 0.05 as the points have some uncertainty. Additionally, we need to wrap the mass around 0 and 1 such that e.g. a point at 0.95 will also allocate mass around 0. I've provided a working example below, which bins the normal distribution into D=50 bins. The example uses the Python typing module, but you can ignore that if you'd like.
from typing import List, Any
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
D = 50
BINS: List[float] = np.linspace(0, 1, D + 1).tolist()
def probability_mass(distribution: Any, x0: float, x1: float) -> float:
"""
Computes the area under the distribution, wrapping at 1.
The wrapping is done by adding the PDF at +- 1.
"""
assert x1 > x0
return (
(distribution.cdf(x1) - distribution.cdf(x0))
+ (distribution.cdf(x1 + 1) - distribution.cdf(x0 + 1))
+ (distribution.cdf(x1 - 1) - distribution.cdf(x0 - 1))
)
def point_density(x: float) -> List[float]:
distribution: Any = scipy.stats.norm(loc=x, scale=0.05)
density: List[float] = []
for i in range(D):
density.append(probability_mass(distribution, BINS[i], BINS[i + 1]))
return density
def user_density(points: List[float]) -> Any:
# Find the density of each point
density: Any = np.array([point_density(p) for p in points])
# Combine points and normalize
combined = density.sum(axis=0)
return combined / combined.sum()
if __name__ == "__main__":
# Example for one user
data: List[float] = [.05, .3, .5, .5]
density = user_density(data)
# Example for multiple users (N = 2)
print([user_density(x) for x in [[.3, .5], [.7, .7, .7, .9]]])
### NB: THE REMAINING CODE IS FOR ILLUSTRATION ONLY!
### NB: THE IMPORTANT THING IS TO COMPUTE THE DENSITY FAST!
middle: List[float] = []
for i in range(D):
middle.append((BINS[i] + BINS[i + 1]) / 2)
plt.bar(x=middle, height=density, width=1.0 / D + 0.001)
plt.xlim(0, 1)
plt.xlabel("x")
plt.ylabel("Density")
plt.show()
In this example N=1, D=50, P=4. However, we want to scale this approach to N=10000 and P=100 while being as fast as possible. It's unclear to me how we'd vectorize this approach. How do we best speed up this?
EDIT
The faster solution can have slightly different results. For instance, it could approximate the normal distribution instead of using the precise normal distribution.
EDIT2
We only care about computing density using the user_density() function. The plot is only to help explain the approach. We do not care about the plot itself :)
EDIT3
Note that P is the avg. points per user. Some users may have more and some may have less. If it helps, you can assume that we can throw away points such that all users have a max of 2 * P points. It's fine to ignore this part while benchmarking as long as the solution can handle a flexible # of points per user.
You could get below 50ms for largest case (N=10000, AVG[P]=100, D=50) by using using FFT and creating data in numpy friendly format. Otherwise it will be closer to 300 msec.
The idea is to convolve a single normal distribution centered at 0 with a series Dirac deltas.
See image below:
Using circular convolution solves two issues.
naturally deals with wrapping at the edges
can be efficiently computed with FFT and Convolution Theorem
First one must create a distribution to be copied. Function mk_bell() created a histogram of a normal distribution of stddev 0.05 centered at 0.
The distribution wraps around 1. One could use arbitrary distribution here. The spectrum of the distribution is computed are used for fast convolution.
Next a comb-like function is created. The peaks are placed at indices corresponding to peaks in user density. E.g.
peaks_location = [0.1, 0.3, 0.7]
D = 10
maps to
peak_index = (D * peak_location).astype(int) = [1, 3, 7]
dist = [0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0] # ones at [1, 3, 7]
You can quickly create a composition of Diract Deltas by computing indices of the bins for each peak location with help of np.bincount() function.
In order to speed things even more one can compute comb-functions for user-peaks in parallel.
Array dist is 2D-array of shape NxD. It can be linearized to 1D array of shape (N*D). After this change element on position [user_id, peak_index] will be accessible from index user_id*D + peak_index.
With numpy-friendly input format (described below) this operation is easily vectorized.
The convolution theorem says that spectrum of convolution of two signals is equal to product of spectrums of each signal.
The spectrum is compute with numpy.fft.rfft which is a variant of Fast Fourier Transfrom dedicated to real-only signals (no imaginary part).
Numpy allows to compute FFT of each row of the larger matrix with one command.
Next, the spectrum of convolution is computed by simple multiplication and use of broadcasting.
Next, the spectrum is computed back to "time" domain by Inverse Fourier Transform implemented in numpy.fft.irfft.
To use the full speed of numpy one should avoid variable size data structure and keep to fixed size arrays. I propose to represent input data as three arrays.
uids the identifier for user, integer 0..N-1
peaks, the location of the peak
mass, the mass of the peek, currently it is 1/numer-of-peaks-for-user
This representation of data allows quick vectorized processing.
Eg:
user_data = [[0.1, 0.3], [0.5]]
maps to:
uids = [0, 0, 1] # 2 points for user_data[0], one from user_data[1]
peaks = [0.1, 0.3, 0.5] # serialized user_data
mass = [0.5, 0.5, 1] # scaling factors for each peak, 0.5 means 2 peaks for user 0
The code:
import numpy as np
import matplotlib.pyplot as plt
import time
def mk_bell(D, SIGMA):
# computes normal distribution wrapped and centered at zero
x = np.linspace(0, 1, D, endpoint=False);
x = (x + 0.5) % 1 - 0.5
bell = np.exp(-0.5*np.square(x / SIGMA))
return bell / bell.sum()
def user_densities_by_fft(uids, peaks, mass, D, N=None):
bell = mk_bell(D, 0.05).astype('f4')
sbell = np.fft.rfft(bell)
if N is None:
N = uids.max() + 1
# ensure that peaks are in [0..1) internal
peaks = peaks - np.floor(peaks)
# convert peak location from 0-1 to the indices
pidx = (D * (peaks + uids)).astype('i4')
dist = np.bincount(pidx, mass, N * D).reshape(N, D)
# process all users at once with Convolution Theorem
sdist = np.fft.rfft(dist)
sdist *= sbell
res = np.fft.irfft(sdist)
return res
def generate_data(N, Pmean):
# generateor for large data
data = []
for n in range(N):
# select P uniformly from 1..2*Pmean
P = np.random.randint(2 * Pmean) + 1
# select peak locations
chunk = np.random.uniform(size=P)
data.append(chunk.tolist())
return data
def make_data_numpy_friendly(data):
uids = []
chunks = []
mass = []
for uid, peaks in enumerate(data):
uids.append(np.full(len(peaks), uid))
mass.append(np.full(len(peaks), 1 / len(peaks)))
chunks.append(peaks)
return np.hstack(uids), np.hstack(chunks), np.hstack(mass)
D = 50
# demo for simple multi-distribution
data, N = [[0, .5], [.7, .7, .7, .9], [0.05, 0.3, 0.5, 0.5]], None
uids, peaks, mass = make_data_numpy_friendly(data)
dist = user_densities_by_fft(uids, peaks, mass, D, N)
plt.plot(dist.T)
plt.show()
# the actual measurement
N = 10000
P = 100
data = generate_data(N, P)
tic = time.time()
uids, peaks, mass = make_data_numpy_friendly(data)
toc = time.time()
print(f"make_data_numpy_friendly: {toc - tic}")
tic = time.time()
dist = user_densities_by_fft(uids, peaks, mass, D, N)
toc = time.time()
print(f"user_densities_by_fft: {toc - tic}")
The results on my 4-core Haswell machine are:
make_data_numpy_friendly: 0.2733159065246582
user_densities_by_fft: 0.04064297676086426
It took 40ms to process the data. Notice that processing data to numpy friendly format takes 6 times more time than the actual computation of distributions.
Python is really slow when it comes to looping.
Therefore I strongly recommend to generate input data directly in numpy-friendly way in the first place.
There are some issues to be fixed:
precision, can be improved by using larger D and downsampling
accuracy of peak location could be further improved by widening the spikes.
performance, scipy.fft offers move variants of FFT implementation that may be faster
This would be my vectorized approach:
data = np.array([0.05, 0.3, 0.5, 0.5])
np.random.seed(31415)
# random noise
randoms = np.random.normal(0,1,(len(data), int(1e5))) * 0.05
# samples with noise
samples = data[:,None] + randoms
# wrap [0,1]
samples = (samples % 1).ravel()
# histogram
hist, bins, patches = plt.hist(samples, bins=BINS, density=True)
Output:
I was able to reduce the time from about 4 seconds per sample of 100 datapoints to about 1 ms per sample.
It looks to me like you're spending quite a lot of time simulating a very large number of normal distributions. Since you're dealing with a very large sample size anyway, you may as well just use standard normal distribution values, because it'll all just average out anyway.
I recreated your approach (BaseMethod class), then created an optimized class (OptimizedMethod class), and evaluated them using a timeit decorator. The primary difference in my approach is the following line:
# Generate a standardized set of values to add to each sample to simulate normal distribution
self.norm_vals = np.array([norm.ppf(x / norm_val_n) * 0.05 for x in range(1, norm_val_n, 1)])
This creates a generic set of datapoints based on an inverse normal cumulative distribution function that we can add to each datapoint to simulate a normal distribution around that point. Then we just reshape the data into user samples and run np.histogram on the samples.
import numpy as np
import scipy.stats
from scipy.stats import norm
import time
# timeit decorator for evaluating performance
def timeit(method):
def timed(*args, **kw):
ts = time.time()
result = method(*args, **kw)
te = time.time()
print('%r %2.2f ms' % (method.__name__, (te - ts) * 1000 ))
return result
return timed
# Define Variables
N = 10000
D = 50
P = 100
# Generate sample data
np.random.seed(0)
data = np.random.rand(N, P)
# Run OP's method for comparison
class BaseMethod:
def __init__(self, d=50):
self.d = d
self.bins = np.linspace(0, 1, d + 1).tolist()
def probability_mass(self, distribution, x0, x1):
"""
Computes the area under the distribution, wrapping at 1.
The wrapping is done by adding the PDF at +- 1.
"""
assert x1 > x0
return (
(distribution.cdf(x1) - distribution.cdf(x0))
+ (distribution.cdf(x1 + 1) - distribution.cdf(x0 + 1))
+ (distribution.cdf(x1 - 1) - distribution.cdf(x0 - 1))
)
def point_density(self, x):
distribution = scipy.stats.norm(loc=x, scale=0.05)
density = []
for i in range(self.d):
density.append(self.probability_mass(distribution, self.bins[i], self.bins[i + 1]))
return density
#timeit
def base_user_density(self, data):
n = data.shape[0]
density = np.empty((n, self.d))
for i in range(data.shape[0]):
# Find the density of each point
row_density = np.array([self.point_density(p) for p in data[i]])
# Combine points and normalize
combined = row_density.sum(axis=0)
density[i, :] = combined / combined.sum()
return density
base = BaseMethod(d=D)
# Only running base method on first 2 rows of data because it's slow
density = base.base_user_density(data[:2])
print(density[:2, :5])
class OptimizedMethod:
def __init__(self, d=50, norm_val_n=50):
self.d = d
self.norm_val_n = norm_val_n
self.bins = np.linspace(0, 1, d + 1).tolist()
# Generate a standardized set of values to add to each sample to simulate normal distribution
self.norm_vals = np.array([norm.ppf(x / norm_val_n) * 0.05 for x in range(1, norm_val_n, 1)])
#timeit
def optimized_user_density(self, data):
samples = np.empty((data.shape[0], data.shape[1], self.norm_val_n - 1))
# transform datapoints to normal distributions around datapoint
for i in range(self.norm_vals.shape[0]):
samples[:, :, i] = data + self.norm_vals[i]
samples = samples.reshape(samples.shape[0], -1)
#wrap around [0, 1]
samples = samples % 1
#loop over samples for density
density = np.empty((data.shape[0], self.d))
for i in range(samples.shape[0]):
hist, bins = np.histogram(samples[i], bins=self.bins)
density[i, :] = hist / hist.sum()
return density
om = OptimizedMethod()
#Run optimized method on first 2 rows for apples to apples comparison
density = om.optimized_user_density(data[:2])
#Run optimized method on full data
density = om.optimized_user_density(data)
print(density[:2, :5])
Running on my system, the original method took about 8.4 seconds to run on 2 rows of data, while the optimized method took 1 millisecond to run on 2 rows of data and completed 10,000 rows in 4.7 seconds. I printed the first five values of the first 2 samples for each method.
'base_user_density' 8415.03 ms
[[0.02176227 0.02278653 0.02422535 0.02597123 0.02745976]
[0.0175103 0.01638513 0.01524853 0.01432158 0.01391156]]
'optimized_user_density' 1.09 ms
'optimized_user_density' 4755.49 ms
[[0.02142857 0.02244898 0.02530612 0.02612245 0.0277551 ]
[0.01673469 0.01653061 0.01510204 0.01428571 0.01326531]]

Find time shift of two signals using cross correlation

I have two signals which are related to each other and have been captured by two different measurement devices simultaneously.
Since the two measurements are not time synchronized there is a small time delay between them which I want to calculate. Additionally, I need to know which signal is the leading one.
The following can be assumed:
no or only very less noise present
speed of the algorithm is not an issue, only accuracy and robustness
signals are captured with an high sampling rate (>10 kHz) for several seconds
expected time delay is < 0.5s
I though of using-cross correlation for that purpose.
Any suggestions how to implement that in Python are very appreciated.
Please let me know if I should provide more information in order to find the most suitable algorithmn.
A popular approach: timeshift is the lag corresponding to the maximum cross-correlation coefficient. Here is how it works with an example:
import matplotlib.pyplot as plt
from scipy import signal
import numpy as np
def lag_finder(y1, y2, sr):
n = len(y1)
corr = signal.correlate(y2, y1, mode='same') / np.sqrt(signal.correlate(y1, y1, mode='same')[int(n/2)] * signal.correlate(y2, y2, mode='same')[int(n/2)])
delay_arr = np.linspace(-0.5*n/sr, 0.5*n/sr, n)
delay = delay_arr[np.argmax(corr)]
print('y2 is ' + str(delay) + ' behind y1')
plt.figure()
plt.plot(delay_arr, corr)
plt.title('Lag: ' + str(np.round(delay, 3)) + ' s')
plt.xlabel('Lag')
plt.ylabel('Correlation coeff')
plt.show()
# Sine sample with some noise and copy to y1 and y2 with a 1-second lag
sr = 1024
y = np.linspace(0, 2*np.pi, sr)
y = np.tile(np.sin(y), 5)
y += np.random.normal(0, 5, y.shape)
y1 = y[sr:4*sr]
y2 = y[:3*sr]
lag_finder(y1, y2, sr)
In the case of noisy signals, it is common to apply band-pass filters first. In the case of harmonic noise, they can be removed by identifying and removing frequency spikes present in the frequency spectrum.
Numpy has function correlate which suits your needs: https://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html
To complement Reveille's answer above (I reproduce his algorithm), I would like to point out some ideas for preprocessing the input signals.
Since there seems to be no fit-for-all (duration in periods, resolution, offset, noise, signal type, ...) you may play with it.
In my example the application of a window function improves the detected phase shift (within resolution of the discretization).
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
r2d = 180.0/np.pi # conversion factor RAD-to-DEG
delta_phi_true = 50.0/r2d
def detect_phase_shift(t, x, y):
'''detect phase shift between two signals from cross correlation maximum'''
N = len(t)
L = t[-1] - t[0]
cc = signal.correlate(x, y, mode="same")
i_max = np.argmax(cc)
phi_shift = np.linspace(-0.5*L, 0.5*L , N)
delta_phi = phi_shift[i_max]
print("true delta phi = {} DEG".format(delta_phi_true*r2d))
print("detected delta phi = {} DEG".format(delta_phi*r2d))
print("error = {} DEG resolution for comparison dphi = {} DEG".format((delta_phi-delta_phi_true)*r2d, dphi*r2d))
print("ratio = {}".format(delta_phi/delta_phi_true))
return delta_phi
L = np.pi*10+2 # interval length [RAD], for generality not multiple period
N = 1001 # interval division, odd number is better (center is integer)
noise_intensity = 0.0
X = 0.5 # amplitude of first signal..
Y = 2.0 # ..and second signal
phi = np.linspace(0, L, N)
dphi = phi[1] - phi[0]
'''generate signals'''
nx = noise_intensity*np.random.randn(N)*np.sqrt(dphi)
ny = noise_intensity*np.random.randn(N)*np.sqrt(dphi)
x_raw = X*np.sin(phi) + nx
y_raw = Y*np.sin(phi+delta_phi_true) + ny
'''preprocessing signals'''
x = x_raw.copy()
y = y_raw.copy()
window = signal.windows.hann(N) # Hanning window
#x -= np.mean(x) # zero mean
#y -= np.mean(y) # zero mean
#x /= np.std(x) # scale
#y /= np.std(y) # scale
x *= window # reduce effect of finite length
y *= window # reduce effect of finite length
print(" -- using raw data -- ")
delta_phi_raw = detect_phase_shift(phi, x_raw, y_raw)
print(" -- using preprocessed data -- ")
delta_phi_preprocessed = detect_phase_shift(phi, x, y)
Without noise (to be deterministic) the output is
-- using raw data --
true delta phi = 50.0 DEG
detected delta phi = 47.864788975654 DEG
...
-- using preprocessed data --
true delta phi = 50.0 DEG
detected delta phi = 49.77938053468019 DEG
...
Numpy has a useful function, called correlation_lags for this, which uses the underlying correlate function mentioned by other answers to find the time lag. The example displayed at the bottom of that page is useful:
from scipy import signal
from numpy.random import default_rng
rng = default_rng()
x = rng.standard_normal(1000)
y = np.concatenate([rng.standard_normal(100), x])
correlation = signal.correlate(x, y, mode="full")
lags = signal.correlation_lags(x.size, y.size, mode="full")
lag = lags[np.argmax(correlation)]
Then lag would be -100

Categories