I have a dataset containing around 1000 different time series. Some of these are showing clear periodicity, and some are not.
I want to be able to automatically determine if a time series has clear periodicity in it, so I know if I need to do seasonal decomposition of it before applying some outlier methods.
Here is a signal with daily periodicity, each sample is taken with 15 minute interval.
In order for me to try and automatically determine if there is daily periodicity, I have tried using different methods. The first approach is using detection function in library from kats.
from kats.consts import TimeSeriesData
from kats.detectors.seasonality import FFTDetector, ACFDetector
def detect_seasonality(df,feature,time_col,detector_type):
df_kpi = df[[feature]].reset_index().rename(columns={feature:'value'})
ts = TimeSeriesData(df_kpi,time_col_name=time_col)
if detector_type == 'fft':
detector = FFTDetector(ts)
elif detector_type == 'acf':
detector = ACFDetector(ts)
else:
raise Exception("Detector types are fft or acf")
detection = detector.detector()
seasonality_presence = detection['seasonality_presence']
return seasonality_presence
This approach returned "False" seasonality presence, both using fft and acf detector.
Another approach is using fft
import numpy as np
import scipy.signal
from matplotlib import pyplot as plt
L = np.array(df[kpi_of_interest].values)
L -= np.mean(L)
# Window signal
L *= scipy.signal.windows.hann(len(L))
fft = np.fft.rfft(L, norm="ortho")
plt.figure()
plt.plot(abs(fft))
But here we don't see any clear way to determine the daily periodicity I expected.
So in order for me to automatically detect the daily periodicity, are there any other better methods to apply here? Are there any necessary preprocessing steps needed for me in beforehand? Or could it simply be a lack of data? I only have around 10 days of data for each time series.
Related
I have a battery voltage data with respect to Datetime, collected data for one month I need to find out the number of cycles of battery hers is my data https://docs.google.com/spreadsheets/d/1K0XspcrpO94mv2wFgW45DzSjAO0Af1uSwmoHHVxDJwI/edit?usp=sharing
since I used peak detection algorithm it is not detecting the cycles correctly is there any way to find the cycles data
here is my code I used
import numpy as np
op_col = []
for i in df["voltage"]:
op_col.append(i)
np.set_printoptions(threshold=np.inf)
x = np.array(op_col)
from scipy.signal import find_peaks
peak, _ = find_peaks(x, width=800,prominence=1,height=51)
fig= plt.figure(figsize=(19,8))
plt.plot(x)
plt.xlim(0,45000)
plt.plot(peak, x[peak], "x", color = 'r')
here is my out put
how can I do it in another way any suggestions.
thanks in advance
I want to align two signals that are similar but shifted using cross-correlation. While this question has been answered a few times before (see references at the bottom), this situation is slightly different and / or I was unable to get the solutions work in my application.
The main difference is that the signals have different sampling rates and that I am inputting not just two signals, but their corresponding time vectors as well.
I thought I would be able to solve this problem by just interpolating both datasets onto the same time line, but I could not get this to work properly.
Here's what I have tried so far.
Create two signals at different sampling rates, the second being shifted by 7 seconds w.r.t the first signal. They are however the same signals if not for the different sampling rate.
import matplotlib.pyplot as plt
import numpy as np
from scipy.signal import correlate
from scipy.interpolate import interp1d
dt1 = 2.4
t1 = np.arange(0,20,dt1)
y1 = np.sin(t1) + t1/10
dt2 = 1
t2 = np.arange(0,20,dt2)
y2 = np.sin(t2) + t2/10
offset_t2 = 7 # would want to recover this eventually.
t2 = t2 + offset_t2
In order to not have to deal with the issue of the different sampling rates, I interpolate the two datasets onto timelines with the same sampling rate (the coarser one).
max_dt = max(dt1,dt2)
t1_resampled = np.arange(t1[0],t1[-1],max_dt)
t2_resampled = np.arange(t2[0],t2[-1],max_dt)
y1_resampled = interp1d(t1,y1)(t1_resampled)
y2_resampled = interp1d(t2,y2)(t2_resampled)
I try to use the maximum of the cross-correlation to get the shift that I need to apply but that does not yield the right result as shown in this plot.
fig,axs=plt.subplots(2,1)
ax = axs[0]
ax.plot(t1,y1,"-o",label='y1')
ax.plot(t2,y2,"-o",label='y2')
xcorr = correlate(y1_resampled,y2_resampled)
argmax_index = np.argmax(xcorr)
shift = (argmax_index-(len(y2_resampled)+1))*max_dt
ax.plot(t2+shift,y2,"-o",label='y2 shifted')
ax = axs[1]
ax.plot(xcorr)
ax.scatter(argmax_index,xcorr[argmax_index],color='red')
axs[0].legend()
print(f"computed shift: {shift}\nexpected shift: {offset_t2}")
Clearly the blue and the green curve do not overlap and the computed shift of -4.8 does not match the offset of 7.
So I wonder if someone could help me implementing the shift function that I need for my example. It should return a value delta_t such that when plotting (t1,y1) and (t2+delta_t,y2) the signals overlap as well as possible.
It should look something like the following snippet, but I am unable to implement it.
def shift(t1,y1,t2,y2)->float:
# If necessary, interpolate to same sampling rate.
# But this might not be necessary.
max_dt = max(dt1,dt2)
t1_resampled = np.arange(t1[0],t1[-1],max_dt)
t2_resampled = np.arange(t2[0],t2[-1],max_dt)
y1_resampled = interp1d(t1,y1)(t1_resampled)
y2_resampled = interp1d(t2,y2)(t2_resampled)
# Do something with the cross correlation ...
# ...
# delta_t = ...
return delta_t
References that did not help
Use of pandas.shift() to align datasets based on scipy.signal.correlate
Python aligning, stretching and synchronizing array data in python (signal processing)
Python cross correlation - why does shifting a timeseries not change the results (lag)?
I have taken an upper air sounding from UWYo Database and currently calculating the Brunt-Vaisala frequency (the 'squared' one, at the moment) using MetPy across several stations for some basic synoptic purposes.
The minimal (at some point) and reproducible code runs like this;
import metpy.calc as mpcalc
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
from metpy.units import units, pandas_dataframe_to_unit_arrays
from siphon.simplewebservice.wyoming import WyomingUpperAir
stations = ['RPLI', 'RPUB', '98433', 'RPMP', 'RPVP', 'RPMD'] #6 stations
station_data = {}
date = datetime(2016, 8, 14, 0)
for station in stations:
print(f'Getting {station}')
df = pandas_dataframe_to_unit_arrays(WyomingUpperAir.request_data(date, station))
df['theta'] = mpcalc.potential_temperature(df['pressure'], df['temperature'])
df['bv_squared'] = mpcalc.brunt_vaisala_frequency_squared(df['height'], df['theta'])
station_data[station] = df
mean_bv = []
for station in stations:
df = station_data[station]
keep_idx = (df['height'] >= 1000 * units.m) & (df['height'] <= 5 * units.km)
mean_bv.append(np.mean(df['bv_squared'][keep_idx]).m)
plt.title("Atmospheric Stability")
plt.plot(mean_bv)
plt.show()
which produces a simple plot like this
I would like to ask for help on how to smooth out those 'lines'/data, like by applying interpolation producing a smooth curve? I'm a bit novice, thus I look forward to your help and responses.
Essentially what you're looking for is to smooth or (low-pass) filter the data.
One option is to fit the data points to some kind of appropriate curve (polynomial, spline, exponential, etc.), and replace the original data values with with those computed from the curve. You can look at some of the tools in scipy.optimize to do the fit.
For filtering, there are a variety of options, from a moving average to more traditional filters; for this a good simple Savitzky-Golay filter. scipy.signal has a lot of tools to help you with this.
I'm trying to do the following:
Extract the melody of me asking a question (word "Hey?" recorded to
wav) so I get a melody pattern that I can apply to any other
recorded/synthesized speech (basically how F0 changes in time).
Use polynomial interpolation (Lagrange?) so I get a function that describes the melody (approximately of course).
Apply the function to another recorded voice sample. (eg. word "Hey." so it's transformed to a question "Hey?", or transform the end of a sentence to sound like a question [eg. "Is it ok." => "Is it ok?"]). Voila, that's it.
What I have done? Where am I?
Firstly, I have dived into the math that stands behind the fft and signal processing (basics). I want to do it programatically so I decided to use python.
I performed the fft on the entire "Hey?" voice sample and got data in frequency domain (please don't mind y-axis units, I haven't normalized them)
So far so good. Then I decided to divide my signal into chunks so I get more clear frequency information - peaks and so on - this is a blind shot, me trying to grasp the idea of manipulating the frequency and analyzing the audio data. It gets me nowhere however, not in a direction I want, at least.
Now, if I took those peaks, got an interpolated function from them, and applied the function on another voice sample (a part of a voice sample, that is also ffted of course) and performed inversed fft I wouldn't get what I wanted, right?
I would only change the magnitude so it wouldn't affect the melody itself (I think so).
Then I used spec and pyin methods from librosa to extract the real F0-in-time - the melody of asking question "Hey?". And as we would expect, we can clearly see an increase in frequency value:
And a non-question statement looks like this - let's say it's moreless constant.
The same applies to a longer speech sample:
Now, I assume that I have blocks to build my algorithm/process but I still don't know how to assemble them beacause there are some blanks in my understanding of what's going on under the hood.
I consider that I need to find a way to map the F0-in-time curve from the spectrogram to the "pure" FFT data, get an interpolated function from it and then apply the function on another voice sample.
Is there any elegant (inelegant would be ok too) way to do this? I need to be pointed in a right direction beceause I can feel I'm close but I'm basically stuck.
The code that works behind the above charts is taken just from the librosa docs and other stackoverflow questions, it's just a draft/POC so please don't comment on style, if you could :)
fft in chunks:
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
import os
file = os.path.join("dir", "hej_n_nat.wav")
fs, signal = wavfile.read(file)
CHUNK = 1024
afft = np.abs(np.fft.fft(signal[0:CHUNK]))
freqs = np.linspace(0, fs, CHUNK)[0:int(fs / 2)]
spectrogram_chunk = freqs / np.amax(freqs * 1.0)
# Plot spectral analysis
plt.plot(freqs[0:250], afft[0:250])
plt.show()
spectrogram:
import librosa.display
import numpy as np
import matplotlib.pyplot as plt
import os
file = os.path.join("/path/to/dir", "hej_n_nat.wav")
y, sr = librosa.load(file, sr=44100)
f0, voiced_flag, voiced_probs = librosa.pyin(y, fmin=librosa.note_to_hz('C2'), fmax=librosa.note_to_hz('C7'))
times = librosa.times_like(f0)
D = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max)
fig, ax = plt.subplots()
img = librosa.display.specshow(D, x_axis='time', y_axis='log', ax=ax)
ax.set(title='pYIN fundamental frequency estimation')
fig.colorbar(img, ax=ax, format="%+2.f dB")
ax.plot(times, f0, label='f0', color='cyan', linewidth=2)
ax.legend(loc='upper right')
plt.show()
Hints, questions and comments much appreciated.
The problem was that I didn't know how to modify the fundamental frequency (F0). By modifying it I mean modify F0 and its harmonics, as well.
The spectrograms in question show frequencies at certain points in time with power (dB) of certain frequency point.
Since I know which time bin holds which frequency from the melody (green line below) ...
....I need to compute a function that represents that green line so I can apply it to other speech samples.
So I need to use some interpolation method which takes as parameters the sample F0 function points.
One need to remember that degree of the polynomial should equal to the number of points. The example doesn't have that unfortunately, but the effect is somehow ok as for the prototype.
def _get_bin_nr(val, bins):
the_bin_no = np.nan
for b in range(0, bins.size - 1):
if bins[b] <= val < bins[b + 1]:
the_bin_no = b
elif val > bins[bins.size - 1]:
the_bin_no = bins.size - 1
return the_bin_no
def calculate_pattern_poly_coeff(file_name):
y_source, sr_source = librosa.load(os.path.join(ROOT_DIR, file_name), sr=sr)
f0_source, voiced_flag, voiced_probs = librosa.pyin(y_source, fmin=librosa.note_to_hz('C2'),
fmax=librosa.note_to_hz('C7'), pad_mode='constant',
center=True, frame_length=4096, hop_length=512, sr=sr_source)
all_freq_bins = librosa.core.fft_frequencies(sr=sr, n_fft=n_fft)
f0_freq_bins = list(filter(lambda x: np.isfinite(x), map(lambda val: _get_bin_nr(val, all_freq_bins), f0_source)))
return np.polynomial.polynomial.polyfit(np.arange(0, len(f0_freq_bins), 1), f0_freq_bins, 3)
def calculate_pattern_poly_func(coefficients):
return np.poly1d(coefficients)
Method calculate_pattern_poly_coeff calculates polynomial coefficients.
Using pythons poly1d lib I can compute function which can modify the speech. How to do that?
I just need to move up or down all values vertically at certain point in time.
for instance I want to move all frequencies at time bin 0,75 seconds up 3 times -> it means that frequency will be increased and the melody at that point will sound higher.
Code:
def transform(sentence_audio_sample, mode=None, show_spectrograms=False, frames_from_end_to_transform=12):
# cutting out silence
y_trimmed, idx = librosa.effects.trim(sentence_audio_sample, top_db=60, frame_length=256, hop_length=64)
stft_original = librosa.stft(y_trimmed, hop_length=hop_length, pad_mode='constant', center=True)
stft_original_roll = stft_original.copy()
rolled = stft_original_roll.copy()
source_frames_count = np.shape(stft_original_roll)[1]
sentence_ending_first_frame = source_frames_count - frames_from_end_to_transform
sentence_len = np.shape(stft_original_roll)[1]
for i in range(sentence_ending_first_frame + 1, sentence_len):
if mode == 'question':
by = int(_question_pattern(i) / 500)
elif mode == 'exclamation':
by = int(_exclamation_pattern(i) / 500)
else:
by = 0
rolled = _roll_column(rolled, i, by)
transformed_data = librosa.istft(rolled, hop_length=hop_length, center=True)
def _roll_column(two_d_array, column, shift):
two_d_array[:, column] = np.roll(two_d_array[:, column], shift)
return two_d_array
In this case I am simply rolling up or down frequencies referencing certain time bin.
This needs to be polished as it doesn't take into consideration an actual state of the transformed sample. It just rolls it up/down according to the factor calculated using the polynomial function computer earlier.
You can check full code of my project at github, "audio" package contains pattern calculator and audio transform algorithm described above.
Feel free to ask if something's unclear :)
There are several threads asking for a way to simulate time-inhomogenous poisson processes in python. The NeuroTools module offer a simple way to do so via the inh_poisson_generator () function. The help of this function is introduced at the bottom of this thread. The function was originally designed to simulate spike trains, and uses the thinning method.
I would like to simulate a spike train during 2000ms. The spike rate (in Hertz) changes every millisecond, and is comprised between 20 spikes/second and 160 spikes/second. I've tried to simulate this using the following code:
import NeuroTools
import numpy as np
from NeuroTools import stgen
import matplotlib.pyplot as plt
import random
st_gen = stgen.StGen()
time = np.arange(0, 2000)
t_rate = []
for i in range (2000):
t_rate.append(random.randrange(20, 161, 1))
t_rate = np.array(t_rate)
Psim = st_gen.inh_poisson_generator(rate = t_rate, t = time, t_stop = 2000, array = True)
However, the code returns very few timestamps (e.g., array([ 397.55345905, 1208.79804513, 1478.03525045, 1982.63643262]), which doesn't make sense to me. I would appreciate any help on this.
inh_poisson_generator(self, rate, t, t_stop, array=False) method of NeuroTools.stgen.StGen instance
Returns a SpikeTrain whose spikes are a realization of an inhomogeneous
poisson process (dynamic rate). The implementation uses the thinning
method, as presented in the references.
Inputs:
rate - an array of the rates (Hz) where rate[i] is active on interval
[t[i],t[i+1]]
t - an array specifying the time bins (in milliseconds) at which to
specify the rate
t_stop - length of time to simulate process (in ms)
array - if True, a numpy array of sorted spikes is returned,
rather than a SpikeList object.
Note:
t_start=t[0]
References:
Eilif Muller, Lars Buesing, Johannes Schemmel, and Karlheinz Meier
Spike-Frequency Adapting Neural Ensembles: Beyond Mean Adaptation and Renewal Theories
Neural Comput. 2007 19: 2958-3010.
Devroye, L. (1986). Non-uniform random variate generation. New York: Springer-Verlag.
Examples:
>> time = arange(0,1000)
>> stgen.inh_poisson_generator(time,sin(time), 1000)enter code here
I don't really have an answer for you but because this post helped me to get started with NeuroTools, I thought I'd share my small example which is working fine.
For the inh_poisson_generator() the rate input is in unit Hz and all times are in ms. I use an average rate of 1.6 spikes/ms, so I expect to receive ~4000 events. The results confirm that just fine!
I guess it might be an issue that you are using a non-continuous rate. However I barely know anything about the algorithm implemented for this function..
I hope my example can help you somehow!
import NeuroTools
from NeuroTools import stgen
v0=1.6 #spikes/ms
Amp=1 # amplitude in spikes/ms
w=4/1000 # periodic frequency in spikes/ms
st_gen = stgen.StGen()
tstop=2500.0
intervals=np.arange(0,tstop,0.05)
rate=np.array([])
for tt in intervals:
v_next=v0+Amp*math.sin(2*math.pi*w*tt)
if (v_next>0.0):
rate=np.append(rate,v_next*1000)
else: rate=np.append(rate,0.0)
PSim=st_gen.inh_poisson_generator(rate=rate,t = intervals, t_stop = 2500.0, array = True) # important to have rate in Hz and all other times in ms
print len(PSim)
print np.mean(rate)/1000*tstop