Compute time Series from PSD python - python
I have a signal spectrum PSD that looks like :
The frequency range of the PSD is np.linspace(0,2,500). I want to convert this spectrum into a time series of 600s . The code is shown below:
def spectrumToSeries(timeSeries,frequency,psdLoad):
'''
Function that gicen a PSD converts into a time series
'''
#
#Obtian interval frequency
df=frequency[2]-frequency[1]
#Obtian the spectrum amplitudes
amplitude=np.sqrt(2*np.array(psdLoad)*df)
#Pre allocation of matrices
epsilon=np.zeros((len(amplitude)))
randomSeries=np.zeros((len(amplitude)))
#Create time series from spectrum
#Generate random phases between [-2pi,2pi]
epsilon=-np.pi + 2*np.pi*np.random.randn(1,len(amplitude))
#Inverse Fourier
randomSeries=len(timeSeries)*np.real(np.fft.ifft(amplitude*np.exp(epsilon*1j*2*np.pi))));
return randomSeries
However my end result looks like:
timeSeries = spectrumToSeries(thrustBladed,param.frequency,analyticalThrustPSD[iwind])
The x axis is refering the number of points of the time series. However, the time series should be of 600s. Any help? Thanks
The result of your function "spectrumToSeries" is the same length as the array you give in the np.fft.ifft. Because the ifft function returns an array of the same length as the input.
So, because your initial psdLoad array has 500 elements, the "amplitude" array is 500 elements long too, and so as the randomSeries one, which is your function's result.
I don't really get the different inputs of your function. What is the first argument called timeSeries ? Is it an empty matrix of 600 elements awaiting for the result of the function ?
I am trying to compute time series from PSD myself so I'd love to see your function give a good result !
I think that if you want your time series to be 600 elements, you need to have a "frequency" and a "psdLoad" array of 600 elements. So what I am trying to do with my set of data is to fit my psdLoad with a function (psdLoad = f (frequency)). Then I can set the size of my arrays to the length of the timeseries I want at the end, and compute the ifft...
My own data is a record at 1Hz, over a day, so arrays of 86400 elements. I have to apply a filter to it, using a method with PSD. So I compute my PSD, which length is 129 elements, and once I have filtered it I want to end up with my filtered time series.
here is my code :
######################################################################"
## Computation of spectrum values : PSD & frequency ##
######################################################################"
psd_ampl0, freq = mlab.psd(Up13_7april, NFFT=256, Fs=1, detrend=mlab.detrend_linear, window=mlab.window_hanning, noverlap=0.5, sides='onesided')
################################################"
## Computation of the time series from the PSD ##
################################################"
def PSDToSeries(lenTimeSeries,freq,psdLoad):
'''
Function that gicen a PSD converts into a time series
'''
#
#Obtian interval frequency
df=freq[2]-freq[1]
print('df = ', df)
#Obtian the spectrum amplitudes
amplitude=(2*psdLoad*df)**0.5
#Pre allocation of matrices
epsilon=np.zeros((len(amplitude)))
randomSeries=np.zeros((len(amplitude)))
#Create time series from spectrum
#Generate random phases between [-2pi,2pi]
epsilon=-np.pi + 2*np.pi*np.random.randn(1,len(amplitude))
#Inverse Fourier
randomSeries=lenTimeSeries*np.real(np.fft.ifft(amplitude*np.exp(epsilon*1j*2*np.pi)));
return randomSeries
#-------------------------------------------------------------------------
#########################################################"
## Fitting a function on the PSD to add it more points ##
#########################################################"
#def fitting_function(freq,a,b,c,d,e,f):
#return a*(freq**5)+b*(freq**4)+c*(freq**3)+d*(freq**2)+e*freq+f
def fitting_function(freq,a,b,c):
return a*np.exp(freq*b)
# Look for the best fitting parameters of the choosen fitting function #
param_opt, pcov = optim.curve_fit(fitting_function,freq[1:],psd_ampl0[1:])
print('The best fitting parameters are : ',param_opt)
# Definition of the new PSD and frequency arrays extended to 86400 elements #
freq_extend = np.linspace(min(freq),max(freq), 86400)
psd_extend = fitting_function(freq_extend,param_opt[0], param_opt[1], param_opt[2])
#print(psd_allonge)
ts_length = Up13_7april.shape[0] #Length of the timeSeries I want to compute
print('ts_length = ', ts_length)
tsFromPSD = PSDToSeries(ts_length, freq_allonge, psd_allonge)
print('shape tsFromPSD : ', tsFromPSD.shape)
##################"
## Plot section ##
##################"
plt.figure(1)
plt.plot(freq[1:] ,psd_ampl0[1:],marker=',', ls='-',color='SkyBlue', label='original PSD')
plt.plot(freq_allonge, psd_allonge, marker=',', ls='-',color='DarkGreen', label='PSD rallonge')
plt.xlabel('Frequency [Hz]')
plt.ylabel('PSD of raw velocity module [(m/s)²/Hz]')
plt.grid(True)
plt.legend()
plt.figure(2)
plt.plot_date(time7april,Up13_7april, xdate=True, ydate=False, marker=',', ls='-', c='Grey', label='Original Signal')
plt.plot_date(time7april, tsFromPSD[0],xdate=True, ydate=False, marker=',', ls='-', label='After inverse PSD')
plt.suptitle('Original and Corrected time series for the 7th of April')
plt.grid(True)
plt.legend()
plt.show()
The array Up13_7april, is my initial time series, in this code I am just trying to compute the PSD and then come back to a time serie to compare the original signal and the final one. Here is the result :
[Sorry can't post any picture because I'm new to stackoverflow]
So my process is to find a function that fits the PSD. I use the Python scipy function called "optimize.curve_fit". It just gives you the best parameters to fit your data with a function that you provide.
Once I have my parameters, I create new PSD and frequency arrays, of 86400 elements. And finally I use your "PSDToSeries" function to compute the timeSeries.
I'm quite happy with the result... I think I just need to find a better fit of my PSD :
[Sorry can't post any picture because I'm new to stackoverflow]
Any idea ?
Related
Transforming Pyplot x axis
I am trying to plot an audio sample's amplitude across the time domain. I've used scipy.io.wavfile to read the audio sample and determine sampling rate: # read .wav file data = read("/Users/user/Desktop/voice.wav") # determine sample rate of .wav file # print(data[0]) # 48000 samples per second # 48000 (samples per second) * 4 (4 second sample) = 192000 samples overall # store the data read from .wav file audio = data[1] # plot the data plt.plot(audio[0 : 192000]) # see code above for how this value was determined This creates a plot displaying amplitude on y axis and sample number on the x axis. How can I transform the x axis to instead show seconds? I tried using plt.xticks but I don't think this is the correct use case based upon the error I received: # label title, axis, show the plot seconds = range(0,4) plt.xticks(range(0,192000), seconds) plt.ylabel("Amplitude") plt.xlabel("Time") plt.show() ValueError: The number of FixedLocator locations (192000), usually from a call to set_ticks, does not match the number of ticklabels (4).
You need to pass a t vector to the plotting command, a vector that you can generate on the fly using Numpy, so that after the command execution it is garbage collected (sooner or later, that is) from numpy import linspace plt.plot(linspace(0, 4, 192000, endpoint=False), audio[0 : 192000])
Time series dBFS plot output modification - current output plot not as expected (matplotlib)
I'm trying to plot the Amplitude (dBFS) vs. Time (s) plot of an audio (.wav) file using matplotlib. I managed to do that with the following code: def convert_to_decibel(sample): ref = 32768 # Using a signed 16-bit PCM format wav file. So, 2^16 is the max. value. if sample!=0: return 20 * np.log10(abs(sample) / ref) else: return 20 * np.log10(0.000001) from scipy.io.wavfile import read as readWav from scipy.fftpack import fft import matplotlib.pyplot as gplot1 import matplotlib.pyplot as gplot2 import numpy as np import struct import gc wavfile1 = '/home/user01/audio/speech.wav' wavsamplerate1, wavdata1 = readWav(wavfile1) wavdlen1 = wavdata1.size wavdtype1 = wavdata1.dtype gplot1.rcParams['figure.figsize'] = [15, 5] pltaxis1 = gplot1.gca() gplot1.axhline(y=0, c="black") gplot1.xticks(np.arange(0, 10, 0.5)) gplot1.yticks(np.arange(-200, 200, 5)) gplot1.grid(linestyle = '--') wavdata3 = np.array([convert_to_decibel(i) for i in wavdata1], dtype=np.int16) yvals3 = wavdata3 t3 = wavdata3.size / wavsamplerate1 xvals3 = np.linspace(0, t3, wavdata3.size) pltaxis1.set_xlim([0, t3 + 2]) pltaxis1.set_title('Amplitude (dBFS) vs Time(s)') pltaxis1.plot(xvals3, yvals3, '-') which gives the following output: I had also plotted the Power Spectral Density (PSD, in dBm) using the code below: from scipy.signal import welch as psd # Computes PSD using Welch's method. fpsd, wPSD = psd(wavdata1, wavsamplerate1, nperseg=1024) gplot2.rcParams['figure.figsize'] = [15, 5] pltpsdm = gplot2.gca() gplot2.axhline(y=0, c="black") pltpsdm.plot(fpsd, 20*np.log10(wPSD)) gplot2.xticks(np.arange(0, 4000, 400)) gplot2.yticks(np.arange(-150, 160, 10)) pltpsdm.set_xlim([0, 4000]) pltpsdm.set_ylim([-150, 150]) gplot2.grid(linestyle = '--') which gives the output as: The second output above, using the Welch's method plots a more presentable output. The dBFS plot though informative is not very presentable IMO. Is this because of: the difference in the domains (time in case of 1st output vs frequency in the 2nd output)? the way plot function is implemented in pyplot? Also, is there a way I can plot my dBFS output as a peak-to-peak style of plot just like in my PSD (dBm) plot rather than a dense stem plot? Would be much helpful and would appreciate any pointers, answers or suggestions from experts here as I'm just a beginner with matplotlib and plots in python in general.
TLNR This has nothing to do with pyplot. The frequency domain is different from the time domain, but that's not why you didn't get what you want. The calculation of dbFS in your code is wrong. You should frame your data, calculate RMSs or peaks in every frame, and then convert that value to dbFS instead of applying this transformation to every sample point. When we talk about the amplitude, we are talking about a periodic signal. And when we read in a series of data from a sound file, we read in a series of sample points of a signal(may be or be not periodic). The value of every sample point represents a, say, voltage value, or sound pressure value sampled at a specific time. We assume that, within a very short time interval, maybe 10ms for example, the signal is stationary. Every such interval is called a frame. Some specific function is applied to each frame usually, to reduce the sudden change at the edge of this frame, and these functions are called window functions. If you did nothing to every frame, you added rectangle windows to them. An example: when the sampling frequency of your sound is 44100Hz, in a 10ms-long frame, there are 44100*0.01=441 sample points. That's what the nperseg argument means in your psd function but it has nothing to do with dbFS. Given the knowledge above, now we can talk about the amplitude. There are two methods a get the value of amplitude in every frame: The most straightforward one is to get the maximum(peak) values in every frame. Another one is to calculate the RMS(Root Mean Sqaure) of every frame. After that, the peak values or RMS values can be converted to dbFS values. Let's start coding: import numpy as np import matplotlib.pyplot as plt from scipy.io import wavfile # Determine full scall(maximum possible amplitude) by bit depth bit_depth = 16 full_scale = 2 ** bit_depth # dbFS function to_dbFS = lambda x: 20 * np.log10(x / full_scale) # Read in the wave file fname = "01.wav" fs,data = wavfile.read(fname) # Determine frame length(number of sample points in a frame) and total frame numbers by window length(how long is a frame in seconds) window_length = 0.01 signal_length = data.shape[0] frame_length = int(window_length * fs) nframes = signal_length // frame_length # Get frames by broadcast. No overlaps are used. idx = frame_length * np.arange(nframes)[:,None] + np.arange(frame_length) frames = data[idx].astype("int64") # Convert to in 64 to avoid integer overflow # Get RMS and peaks rms = ((frames**2).sum(axis=1)/frame_length)**.5 peaks = np.abs(frames).max(axis=1) # Convert them to dbfs dbfs_rms = to_dbFS(rms) dbfs_peak = to_dbFS(peaks) # Let's start to plot # Get time arrays of every sample point and ever frame frame_time = np.arange(nframes) * window_length data_time = np.linspace(0,signal_length/fs,signal_length) # Plot f,ax = plt.subplots() ax.plot(data_time,data,color="k",alpha=.3) # Plot the dbfs values on a twin x Axes since the y limits are not comparable between data values and dbfs tax = ax.twinx() tax.plot(frame_time,dbfs_rms,label="RMS") tax.plot(frame_time,dbfs_peak,label="Peak") tax.legend() f.tight_layout() # Save serval details f.savefig("whole.png",dpi=300) ax.set_xlim(1,2) f.savefig("1-2sec.png",dpi=300) ax.set_xlim(1.295,1.325) f.savefig("1.2-1.3sec.png",dpi=300) The whole time span looks like(the unit of the right axis is dbFS): And the voiced part looks like: You can see that the dbFS values become greater while the amplitudes become greater at the vowel start point:
Determining Fourier Coefficients from Time Series Data
I asked a since deleted question regarding how to determine Fourier coefficients from time series data. I am resubmitting this because I have better formulated the problem and have a solution that I'll give as I think others may find this very useful. I have some time series data that I have binned into equally spaced time bins (a fact which will be crucial to my solution), and from that data I want to determine the Fourier series (or any function, really) that best describes the data. Here is a MWE with some test data to show the data I'm trying to fit: import numpy as np import matplotlib.pyplot as plt # Create a dependent test variable to define the x-axis of the test data. test_array = np.linspace(0, 1, 101) - 0.5 # Define some test data to try to apply a Fourier series to. test_data = [0.9783883464566918, 0.979599093567252, 0.9821424606299206, 0.9857575507812502, 0.9899278899999995, 0.9941848228346452, 0.9978438300395263, 1.0003009205426352, 1.0012208923679058, 1.0017130521235522, 1.0021799664031628, 1.0027475606936413, 1.0034168260869563, 1.0040914266144825, 1.0047781181102355, 1.005520348837209, 1.0061899214145387, 1.006846206627681, 1.0074483048543692, 1.0078691461988312, 1.008318736328125, 1.008446947572815, 1.00862051262136, 1.0085134881422921, 1.008337095516569, 1.0079539881889774, 1.0074857334630352, 1.006747783037474, 1.005962048923679, 1.0049115434782612, 1.003812267822736, 1.0026427549407106, 1.001251963531669, 0.999898555335968, 0.9984976286266923, 0.996995982142858, 0.9955652088974847, 0.9941647321428578, 0.9927727076023389, 0.9914750532544377, 0.990212467710371, 0.9891098035363466, 0.9875998927875242, 0.9828093773946361, 0.9722532524271845, 0.9574084365384614, 0.9411012303149601, 0.9251820309477757, 0.9121488392156851, 0.9033119748549322, 0.9002445803921568, 0.9032760564202343, 0.91192435882353, 0.9249696964980555, 0.94071381372549, 0.957139088974855, 0.9721083392156871, 0.982955287937743, 0.9880613320235758, 0.9897455322896282, 0.9909590626223097, 0.9922601592233015, 0.9936513112840472, 0.9951442427184468, 0.9967071285988475, 0.9982921493123781, 0.9998775465116277, 1.001389230174081, 1.0029109110251453, 1.0044033691406251, 1.0057110841487276, 1.0069551867704276, 1.008118776264591, 1.0089884470588228, 1.0098663972602735, 1.0104514566473979, 1.0109849223300964, 1.0112043902912626, 1.0114717968750002, 1.0113343036750482, 1.0112205972495087, 1.0108811786407768, 1.010500276264591, 1.0099054552529192, 1.009353759223301, 1.008592596116505, 1.007887223091976, 1.0070715634615386, 1.0063525891472884, 1.0055587861271678, 1.0048733732809436, 1.0041832862669238, 1.0035913326848247, 1.0025318871595328, 1.000088536345776, 0.9963596140350871, 0.9918380684931506, 0.9873937281553398, 0.9833394624277463, 0.9803621496062999, 0.9786476100386117] # Create a figure to view the data. fig, ax = plt.subplots(1, 1, figsize=(6, 6)) # Plot the data. ax.scatter(test_array, test_data, color="k", s=1) This outputs the following: The question is how to determine the Fourier series best describing this data. The usual formula for determining the Fourier coefficients requires inserting a function into an integral, but if I had a function to describe the data I wouldn't need the Fourier coefficients at all; the whole point of finding this series is to have a functional representation of the data. In the absence of such a function, then, how are the coefficients found?
My solution to this problem is to apply a discrete Fourier transform to the data using NumPy's implementation of the Fast Fourier Transform, numpy.fft.fft(); this is why it's critical that the data is evenly spaced in time, as FFT requires this. While the FFT is typically used to perform analysis of the frequency spectrum, the desired Fourier coefficients are directly related to the output of this function. Specifically, this function outputs a series of i complex-valued coefficients c. The Fourier series coefficients are found using the relations: Therefore the FFT allows the Fourier coefficients to be directly computed. Here is the MWE of my solution to this problem, expanding the example given above: import numpy as np import matplotlib.pyplot as plt # Set the number of equal-time bins to create. n_bins = 101 # Set the number of Fourier coefficients to use. n_coeff = 51 # Define a function to generate a Fourier series based on the coefficients determined by the Fast Fourier Transform. # This also includes a series of phases x to pass through the function. def create_fourier_series(x, coefficients): # Begin the series with the zeroeth-order Fourier coefficient. fourier_series = coefficients[0][0] / 2 # Now generate the first through n_coeff'th terms. The period is defined to be 1 since we're operating in phase # space. for n in range(1, n_coeff): fourier_series += (fourier_coeff[n][0] * np.cos(2 * np.pi * n * x) + fourier_coeff[n][1] * np.sin(2 * np.pi * n * x)) return fourier_series # Create a dependent test variable to define the x-axis of the test data. test_array = np.linspace(0, 1, n_bins) - 0.5 # Define some test data to try to apply a Fourier series to. test_data = [0.9783883464566918, 0.979599093567252, 0.9821424606299206, 0.9857575507812502, 0.9899278899999995, 0.9941848228346452, 0.9978438300395263, 1.0003009205426352, 1.0012208923679058, 1.0017130521235522, 1.0021799664031628, 1.0027475606936413, 1.0034168260869563, 1.0040914266144825, 1.0047781181102355, 1.005520348837209, 1.0061899214145387, 1.006846206627681, 1.0074483048543692, 1.0078691461988312, 1.008318736328125, 1.008446947572815, 1.00862051262136, 1.0085134881422921, 1.008337095516569, 1.0079539881889774, 1.0074857334630352, 1.006747783037474, 1.005962048923679, 1.0049115434782612, 1.003812267822736, 1.0026427549407106, 1.001251963531669, 0.999898555335968, 0.9984976286266923, 0.996995982142858, 0.9955652088974847, 0.9941647321428578, 0.9927727076023389, 0.9914750532544377, 0.990212467710371, 0.9891098035363466, 0.9875998927875242, 0.9828093773946361, 0.9722532524271845, 0.9574084365384614, 0.9411012303149601, 0.9251820309477757, 0.9121488392156851, 0.9033119748549322, 0.9002445803921568, 0.9032760564202343, 0.91192435882353, 0.9249696964980555, 0.94071381372549, 0.957139088974855, 0.9721083392156871, 0.982955287937743, 0.9880613320235758, 0.9897455322896282, 0.9909590626223097, 0.9922601592233015, 0.9936513112840472, 0.9951442427184468, 0.9967071285988475, 0.9982921493123781, 0.9998775465116277, 1.001389230174081, 1.0029109110251453, 1.0044033691406251, 1.0057110841487276, 1.0069551867704276, 1.008118776264591, 1.0089884470588228, 1.0098663972602735, 1.0104514566473979, 1.0109849223300964, 1.0112043902912626, 1.0114717968750002, 1.0113343036750482, 1.0112205972495087, 1.0108811786407768, 1.010500276264591, 1.0099054552529192, 1.009353759223301, 1.008592596116505, 1.007887223091976, 1.0070715634615386, 1.0063525891472884, 1.0055587861271678, 1.0048733732809436, 1.0041832862669238, 1.0035913326848247, 1.0025318871595328, 1.000088536345776, 0.9963596140350871, 0.9918380684931506, 0.9873937281553398, 0.9833394624277463, 0.9803621496062999, 0.9786476100386117] # Determine the fast Fourier transform for this test data. fast_fourier_transform = np.fft.fft(test_data[n_bins / 2:] + test_data[:n_bins / 2]) # Create an empty list to hold the values of the Fourier coefficients. fourier_coeff = [] # Loop through the FFT and pick out the a and b coefficients, which are the real and imaginary parts of the # coefficients calculated by the FFT. for n in range(0, n_coeff): a = 2 * fast_fourier_transform[n].real / n_bins b = -2 * fast_fourier_transform[n].imag / n_bins fourier_coeff.append([a, b]) # Create the Fourier series approximating this data. fourier_series = create_fourier_series(test_array, fourier_coeff) # Create a figure to view the data. fig, ax = plt.subplots(1, 1, figsize=(6, 6)) # Plot the data. ax.scatter(test_array, test_data, color="k", s=1) # Plot the Fourier series approximation. ax.plot(test_array, fourier_series, color="b", lw=0.5) This outputs the following: Note that how I defined the FFT (importing the second half of the data followed by the first half) is a consequence of how this data was generated. Specifically, the data runs from -0.5 to 0.5, but the FFT assumes it runs from 0.0 to 1.0, necessitating this shift. I've found that this works quite well for data that doesn't include very sharp and narrow discontinuities. I would be interested to hear if anyone has another suggested solution to this problem, and I hope people find this explanation clear and helpful.
Not sure if it helps you in anyway; I wrote a programme to interpoplate your data. This is done using buildingblocks==0.0.15 Please see below, import matplotlib.pyplot as plt from buildingblocks import bb import numpy as np Ydata = [0.9783883464566918, 0.979599093567252, 0.9821424606299206, 0.9857575507812502, 0.9899278899999995, 0.9941848228346452, 0.9978438300395263, 1.0003009205426352, 1.0012208923679058, 1.0017130521235522, 1.0021799664031628, 1.0027475606936413, 1.0034168260869563, 1.0040914266144825, 1.0047781181102355, 1.005520348837209, 1.0061899214145387, 1.006846206627681, 1.0074483048543692, 1.0078691461988312, 1.008318736328125, 1.008446947572815, 1.00862051262136, 1.0085134881422921, 1.008337095516569, 1.0079539881889774, 1.0074857334630352, 1.006747783037474, 1.005962048923679, 1.0049115434782612, 1.003812267822736, 1.0026427549407106, 1.001251963531669, 0.999898555335968, 0.9984976286266923, 0.996995982142858, 0.9955652088974847, 0.9941647321428578, 0.9927727076023389, 0.9914750532544377, 0.990212467710371, 0.9891098035363466, 0.9875998927875242, 0.9828093773946361, 0.9722532524271845, 0.9574084365384614, 0.9411012303149601, 0.9251820309477757, 0.9121488392156851, 0.9033119748549322, 0.9002445803921568, 0.9032760564202343, 0.91192435882353, 0.9249696964980555, 0.94071381372549, 0.957139088974855, 0.9721083392156871, 0.982955287937743, 0.9880613320235758, 0.9897455322896282, 0.9909590626223097, 0.9922601592233015, 0.9936513112840472, 0.9951442427184468, 0.9967071285988475, 0.9982921493123781, 0.9998775465116277, 1.001389230174081, 1.0029109110251453, 1.0044033691406251, 1.0057110841487276, 1.0069551867704276, 1.008118776264591, 1.0089884470588228, 1.0098663972602735, 1.0104514566473979, 1.0109849223300964, 1.0112043902912626, 1.0114717968750002, 1.0113343036750482, 1.0112205972495087, 1.0108811786407768, 1.010500276264591, 1.0099054552529192, 1.009353759223301, 1.008592596116505, 1.007887223091976, 1.0070715634615386, 1.0063525891472884, 1.0055587861271678, 1.0048733732809436, 1.0041832862669238, 1.0035913326848247, 1.0025318871595328, 1.000088536345776, 0.9963596140350871, 0.9918380684931506, 0.9873937281553398, 0.9833394624277463, 0.9803621496062999, 0.9786476100386117] Xdata=list(range(0,len(Ydata))) Xnew=list(np.linspace(0,len(Ydata),200)) Ynew=bb.interpolate(Xdata,Ydata,Xnew,40) plt.figure() plt.plot(Xdata,Ydata) plt.plot(Xnew,Ynew,'*') plt.legend(['Given Data', 'Interpolated Data']) plt.show() Should you want to further write code, I have also give code so that you can see the source code and learn: import module import inspect src = inspect.getsource(module) print(src)
Making two time-series with different sampling rates comparable
I have 2 sets of data, both time series that are Variable (same in both cases) vs. Time and I have imported and plotted them using pandas and matplotlib. from os import chdir chdir('C:\\Users\\me\\Documents\\Folder') # import necessary libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt # read in csv file file_df = pd.read_csv('C://Users//me//Documents//Folder//file.csv') # define csv columns and assign values VarA = file_df.loc[:, 'VarA'].values TimeA = file_df.loc[:, 'TimeA'].values VarB = file_df.loc[:, 'VarB'].values TimeB = file_df.loc[:, 'TimeB'].values # plot data selection and aesthetics plt.plot(TimeA, VarA) plt.plot(TimeB, VarB) # plot labels plt.xlabel('Time') plt.ylabel('Variable') #plot and add legend based on plot labels plt.legend() In both cases, the variable is sampled between 0 minutes and 320 minutes. However, one data set has 775 samples (taken at random intervals across the 320 minutes) and the other data set has 1732 samples (again, taken at random intervals across the 320 minutes). Essentially what I want to do is make two new datasets, based on the old ones, where I have the variable vs time between 0 and 320 minutes but both with the same amount of data points for variable A taken at the same time steps (e.g. variable at every minute for 320 samples). I'm guessing some interpolation is required? I genuinely don't know where to start. I have both datasets in the same .csv and I need them to be the same sample size so that I can run the following calculation. At the moment it doesn't run because 'VarA' and 'VarB' have different amounts of data. x_values = VarB y_values = VarA correlation_matrix = np.corrcoef(x_values, y_values) correlation_xy = correlation_matrix[0,1] r_squared = correlation_xy**2
I think resample might be useful here.
There are a lot of ways to solve this. What is the problem that you're trying to solve by calculating the correlation between two variables over time? One option is to calculate some kind of weighted moving average over time, and then do the correlation that way. The simplest way to do this is an exponentially weighted moving average, like the loess function. There are more sophisticated methods too. Here's some example code, where I took a cosine function and the same function with random noise added. To do a loess fit, use the loess() function, and to get access to the fitted values you want the "fitted" variable of value returned by lowess. x = seq(from = 1, to = 100) y1 = cos(x / 10) y2 = cos(x / 10) + rnorm(100, mean = 0, sd = 0.25) smooth_y2 = loess(y2 ~ x) plot(x, y1, type = 'l') lines(x, smooth_y2$fitted, type = 'l', col = 'red') print(cor(y1, smooth_y2$fitted))
NP.FFT on python list
Could you please advise me on the following: I gather data from an Arduino ADC and store the data in a list on a Raspberry Pi 4 with Python 3. The list is called 'dataList' and contains 1024 10 bits samples. This all works fine: I can reproduce the sampled signal on the Raspberry. I would like to use the power spectrum of the acquired signal using numpy FFT. I tried the following: [see below] This should illustrate what I'm trying to do; however this produces incoherent output. The sampled signal has a frequency of about 300 Hz. I would be very grateful for any hints in the right direction! def show_FFT(window): fft = np.fft.fft (dataList, 1024, -1, None) for X_value in range (0,512, 1): Y_value = fft ([X_value] gfxdraw.pixel (window, X_value, int(abs(Y_value), black)
As you mentioned in your question, you have a data set whith X starting from 0 to... but for numpy.fft.fft you must keep in mind that it is a discrete Fourier transform (DFT) which caculate the fft of equaly spaced samples and i must mntion that it must be a symetric range of dataset from -x to x. You can simply try it with a gausian finction and change the parameters as you wish and see what are the results... Since you didn''t give any data set here , I would refer you to a generl case with below code: import numpy as np from scipy import interpolate import matplotlib.pyplot as plt # create data from dataframes x = np.random.rand(50) #unequaly spaced measurment x.sort() y = np.exp(-x*x) #measured signal based on the answer here you can resample your data into equaly spaced points by: f = interpolate.interp1d(x, y) num = 500 xx = np.linspace(x[0], x[-1], num) yy = f(xx) plt.close('all') plt.plot(x,y,'bo') plt.plot(xx,yy, 'g.-') plt.show() enter image description here then you can make your x data symetric very simply by : x=xx y=yy xsample = x-((x.max()-x.min())/2) xsample=xsample-(xsample.max()+xsample.min())/2 x=xsample thne if you try fft you will get the corect results as: ysample =yy ysample_fft = np.fft.fftshift(np.abs(np.fft.fft(ysample/ysample.max()))) / np.sqrt(len(ysample)) plt.plot(xsample,ysample_fft/ysample_fft.max(),'b--') plt.show() enter image description here