I am new to the fourier theory and I've seen very good tutorials on how to apply fft to a signal and plot it in order to see the frequencies it contains. Somehow, all of them create a mix of sines as their data and i am having trouble adapting it to my real problem.
I have 242 hourly observations with a daily periodicity, meaning that my period is 24. So I expect to have a peak around 24 on my fft plot.
A sample of my data.csv is here:
https://pastebin.com/1srKFpJQ
Data plotted:
My code:
data = pd.read_csv('data.csv',index_col=0)
data.index = pd.to_datetime(data.index)
data = data['max_open_files'].astype(float).values
N = data.shape[0] #number of elements
t = np.linspace(0, N * 3600, N) #converting hours to seconds
s = data
fft = np.fft.fft(s)
T = t[1] - t[0]
f = np.linspace(0, 1 / T, N)
plt.ylabel("Amplitude")
plt.xlabel("Frequency [Hz]")
plt.bar(f[:N // 2], np.abs(fft)[:N // 2] * 1 / N, width=1.5) # 1 / N is a normalization factor
plt.show()
This outputs a very weird result where it seems I am getting the same value for every frequency.
I suppose that the problems comes with the definition of N, t and T but I cannot find anything online that has helped me understand this clearly. Please help :)
EDIT1:
With the code provided by charles answer I have a spike around 0 that seems very weird. I have used rfft and rfftfreq instead to avoid having too much frequencies.
I have read that this might be because of the DC component of the series, so after substracting the mean i get:
I am having trouble interpreting this, the spikes seem to happen periodically but the values in Hz don't let me obtain my 24 value (the overall frequency). Anybody knows how to interpret this ? What am I missing ?
The problem you're seeing is because the bars are too wide, and you're only seeing one bar. You will have to change the width of the bars to 0.00001 or smaller to see them show up.
Instead of using a bar chart, make your x axis using fftfreq = np.fft.fftfreq(len(s)) and then use the plot function, plt.plot(fftfreq, fft):
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = pd.read_csv('data.csv',index_col=0)
data.index = pd.to_datetime(data.index)
data = data['max_open_files'].astype(float).values
N = data.shape[0] #number of elements
t = np.linspace(0, N * 3600, N) #converting hours to seconds
s = data
fft = np.fft.fft(s)
fftfreq = np.fft.fftfreq(len(s))
T = t[1] - t[0]
f = np.linspace(0, 1 / T, N)
plt.ylabel("Amplitude")
plt.xlabel("Frequency [Hz]")
plt.plot(fftfreq,fft)
plt.show()
Related
I am trying to visualize a frequency spectrum using the BURG algroithm. The data that I am trying to visualize is the distance between heartbeats in milliseconds (e.g: [700, 650, 689, ..., 702]). Time distance is measured from R peak to R peak of next heartbeat.
Now I would like to visualize the frequency band with python's spectrum library (I'm a total noob). The minimum frequency that I am trying to display is 0.0033Hz, so all time differences in my dataset summarized are 5 Minutes long.
My approach was to first take the reciprocal of each value, then multiply by 1000, and then multiply by 60. This should get me the Bpm for each heartbeat.
This is what it looks like: [67.11409396 64.72491909 ... 64.58557589]
Afterwards I use spectrum's burg algorithm to create the PSD. The "data" list contains my BpM for each heartbeat.
AR, rho, ref = arburg(data.tolist(), 7)
PSD = arma2psd(AR, rho=rho, NFFT=1024)
PSD = PSD[len(PSD):len(PSD)//2:-1]
plot(linspace(0, 0.5, len(PSD)), 10*log10(abs(PSD)*2./(1.*pi)))
pylab.legend(['PSD estimate of x using Burg AR(7)'])
The graph that I get looks like this:
5 Minutes Spectrogram
This specific data already exists as a 3D-Spectrogram (Graph above is the equivalent to the last 5 Minutes of 3D-Spectrogram):
Long Time 3D-Spectrogram
My Graph does not seem to match the 3D-Spectrogram. My frequencies are way off.... What causes this and how can I fix it?
Also I would like the y-Axis in my Graph not in [dB] but in absolute Values. I tried with:
plot(linspace(0, 0.5, len(PSD)), abs(PSD))
but that did not really seem to work. It just drew a hyperbole.
Thank you for your help!
The spectrum package comes with a pburg class than can generate a frequencies array, this is shown below. If you want direct comparison between a spectrogram and AR PSDs, I would take the time definition used to compute the spectrogram to also compute the AR PSD per window.
Also, your spectrogram example image looks focused on very low frequencies, so you may want to increase nfft to increase frequency resolution.
import matplotlib.pyplot as plt
from scipy.signal import spectrogram
import numpy as np
from spectrum import pburg
# Parameter settings
n_seconds = 10
fs = 1000 # sampling rate, in hz
freq = 10
nfft = 4096
nperseg = fs
order = 8
# Simulate 10 hz sine wave with white noise
x = np.sin(np.arange(0, n_seconds, 1/fs) * freq * 2 * np.pi)
x += np.random.rand(len(x)) / 10
# Compute spectrogram
freqs, times, powers = spectrogram(x, fs=fs, nfft=nfft)
# Get spectrogram time definition
times = (times * fs).astype(int)
window_times = np.array((times-times[0], times+times[0])).T
# Compute Burg's spectrum per window
powers_burg = np.array([pburg(x[t[0]:t[1]], order=order,
NFFT=nfft, sampling=fs).psd for t in window_times]).T
freqs_burg = np.array(pburg(x, order=order, NFFT=nfft, sampling=fs).frequencies())
# Plot
inds = np.where(freqs < 20)
inds_burg = np.where(freqs_burg < 20)
fig, axes = plt.subplots(ncols=2, figsize=(10, 5))
axes[0].pcolormesh(times/fs, freqs[inds], powers[inds], shading='gouraud')
axes[1].pcolormesh(times/fs, freqs_burg[inds_burg], powers_burg[inds_burg], shading='gouraud')
axes[0].set_title('Spectrogram')
axes[1].set_title('Burg\'s Spectrogram')
I have some data I gathered analyzing the change of acceleration regarding time. But when I wrote the code below to have a good fit for the sinusoidal wave, this was the result. Is this because I don't have enough data or am I doing something wrong here?
Here you can see my graph:
Measurements plotted directly(no fit)
Fit with horizontal and vertical shift (curve_fit)
Increased data by linspace
Manually manipulated amplitude
Edit: I increased the data size by using the linspace function and plotting it but I am not sure why the amplitude doesn't match, is it because there are very few data to analyze? (I was able to manipulate the amplitude manually but I don't understand why it can't do it)
The code I am using for the fit
def model(x, a, b):
return a * np.sin(b * x)
param, parav_cov = cf(model, time, z_values)
array_x = np.linspace(800, 1400, 1000)
fig = plt.figure(figsize = (9, 4))
plt.scatter(time, z_values, color = "#3333cc", label = "Data")
plt.plot(array_x, model(array_x, param[0], param[1], param[2], param[3]), label = "Sin Fit")
I'd use an FFT to get a first guess at parameters, as this sort of thing is highly non-linear and curve_fit is unlikely to get very far otherwise. the reason for using a FFT is to get an initial idea of the frequency involved, not much more. 3Blue1Brown has a great video on FFTs if you've not seem it
I used web plot digitizer to get your data out of your plots, then pulled into Python and made sure it looked OK with:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('sinfit2.csv')
print(df.head())
giving me:
x y
0 809.3 0.3
1 820.0 0.3
2 830.3 19.6
3 839.9 19.6
4 849.6 0.4
I started by doing a basic FFT with NumPy (SciPy has the full fftpack which is more complete, but not needed here):
import numpy as np
from numpy.fft import fft
d = fft(df.y)
plt.plot(np.abs(d)[:len(d)//2], '.')
the np.abs(d) is because you get a complex number back containing both phase and amplitude, and [:len(d)//2] is because (for real valued input) the output is symmetric about the midpoint, i.e. d[5] == d[-5].
this says the largest component was 18, I tried plotting this by hand and it looked OK:
x = np.linspace(0, np.pi * 2, len(df))
plt.plot(df.x, df.y, '.-', lw=1)
plt.plot(df.x, np.sin(x * 18) * 10 + 10)
I'm multiplying by 10 and adding 10 is because the range of a sine is (-1, +1) and we need to take it to (0, 20).
next I passed these to curve_fit with a simplified model to help it along:
from scipy.optimize import curve_fit
def model(x, a, b):
return np.sin(x * a + b) * 10 + 10
(a, b), cov = curve_fit(model, x, df.y, [18, 0])
again I'm hardcoding the * 10 + 10 to get the range to match your data, which gives me a=17.8 and b=2.97
finally I plot the function sampled at a higher frequency to make sure all is OK:
plt.plot(df.x, df.y)
plt.plot(
np.linspace(810, 1400, 501),
model(np.linspace(0, np.pi*2, 501), a, b)
)
giving me:
which seems to look OK. note you might want to change these parameters so they fit your original X, and note my df.x starts at 810, so I might have missed the first point.
I want to plot the frequency version of planck's law. I first tried to do this independently:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline
# Planck's Law
# Constants
h = 6.62607015*(10**-34) # J*s
c = 299792458 # m * s
k = 1.38064852*(10**-23) # J/K
T = 20 # K
frequency_range = np.linspace(10**-19,10**19,1000000)
def plancks_law(nu):
a = (2*h*nu**3) / (c**2)
e_term = np.exp(h*nu/(k*T))
brightness = a /(e_term - 1)
return brightness
plt.plot(frequency_range,plancks_law(frequency_range))
plt.gca().set_xlim([1*10**-16 ,1*10**16 ])
plt.gca().invert_xaxis()
This did not work, I have an issue with scaling somehow. My next idea was to attempt to use this person's code from this question: Plancks Formula for Blackbody spectrum
import matplotlib.pyplot as plt
import numpy as np
h = 6.626e-34
c = 3.0e+8
k = 1.38e-23
def planck_f(freq, T):
a = 2.0*h*(freq**3)
b = h*freq/(k*T)
intensity = a/( (c**2 * (np.exp(b) - 1.0) ))
return intensity
# generate x-axis in increments from 1nm to 3 micrometer in 1 nm increments
# starting at 1 nm to avoid wav = 0, which would result in division by zero.
wavelengths = np.arange(1e-9, 3e-6, 1e-9)
frequencies = np.arange(3e14, 3e17, 1e14, dtype=np.float64)
intensity4000 = planck_f(frequencies, 4000.)
plt.gca().invert_xaxis()
This didn't work, because I got a divide by zero error. Except that I don't see where there is a division by zero, the denominator shouldn't ever be zero since the exponential term shouldn't ever be equal to one. I chose the frequencies to be the conversions of the wavelength values from the example code.
Can anyone help fix the problem or explain how I can get planck's law for frequency instead of wavelength?
You can not safely handle such large numbers; even for comparably "small" values of b = h*freq/(k*T) your float64 will overflow, e.g np.exp(709.)=8.218407461554972e+307 is ok, but np.exp(710.)=inf. You'll have to adjust your units (exponents) accordingly to avoid this!
Note that this is also the case in the other question you linked to, if you insert print( np.exp(b)[:10] ) within the definition of planck(), you can examine the first ten evaluated b's and you'll see the overflow in the first few occurrences. In any case, simply use the answer posted within the other question, but convert the x-axis in plt.plot(wavelengths, intensity) to frequency (i hope you know how to get from one to the other) :-)
I found this code :
import numpy as np
import matplotlib.pyplot as plt
# We create 1000 realizations with 200 steps each
n_stories = 1000
t_max = 500
t = np.arange(t_max)
# Steps can be -1 or 1 (note that randint excludes the upper limit)
steps = 2 * np.random.randint(0, 1 + 1, (n_stories, t_max)) - 1
# The time evolution of the position is obtained by successively
# summing up individual steps. This is done for each of the
# realizations, i.e. along axis 1.
positions = np.cumsum(steps, axis=1)
# Determine the time evolution of the mean square distance.
sq_distance = positions**2
mean_sq_distance = np.mean(sq_distance, axis=0)
# Plot the distance d from the origin as a function of time and
# compare with the theoretically expected result where d(t)
# grows as a square root of time t.
plt.figure(figsize=(10, 7))
plt.plot(t, np.sqrt(mean_sq_distance), 'g.', t, np.sqrt(t), 'y-')
plt.xlabel(r"$t$")
plt.tight_layout()
plt.show()
Instead of doing just steps -1 or 1 , I would like to do steps following a standard normal distribution ... when I am inserting np.random.normal(0,1,1000) instead of np.random.randint(...) it is not working.
I am really new to Python btw.
Many thanks in advance and Kind regards
You are entering a single number as third parameter of np.random.normal, therefore you get a 1d array, instead of 2d, see the documentation. Try this:
steps = np.random.normal(0, 1, (n_stories, t_max))
this is quite a specific problem I was hoping the community could help me out with. Thanks in advance.
So I have 2 sets of data, one is experimental and the other is based off of an equation. I am trying to fit my data points to this curve and hence obtain the missing variables I am interested in. Namely, a and b in the Ebfit function.
Here is the code:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as spys
from scipy.optimize import curve_fit
time = [60,220,520,1840]
Moment = [0.64227262,0.468318916,0.197100772,0.104512508]
Temperature = 25 # Bake temperature in degrees C
Nb = len(Moment) # Number of bake measurements
Baketime_a = time #[s]
N_Device = 10000 # No. of devices considered in the array
T_ambient = 273 + Temperature
kt = 0.0256*(T_ambient/298) # In units of eV
f0 = 1e9 # Attempt frequency
def Ebfit(x,a,b):
Eb_mean = a*(0.0256/kt) # Eb at bake temperature
Eb_sigma = b*Eb_mean
Foursigma = 4*Eb_sigma
Eb_a = np.linspace(Eb_mean-Foursigma,Eb_mean+Foursigma,N_Device)
dEb = Eb_a[1] - Eb_a[0]
pdfEb_a = spys.norm.pdf(Eb_a,Eb_mean,Eb_sigma)
## Retention Time
DMom = np.zeros(len(x),float)
tau = (1/f0)*np.exp(Eb_a)
for bb in range(len(x)):
DMom[bb]= (1 - 2*(sum(pdfEb_a*(1 - np.exp(np.divide(-x[bb],tau))))*dEb))
return DMom
a = 30
b = 0.10
params,extras = curve_fit(Ebfit,time,Moment)
x_new = list(range(0,2000,1))
y_new = Ebfit(x_new,params[0],params[1])
plt.plot(time,Moment, 'o', label = 'data points')
plt.plot(x_new,y_new, label = 'fitted curve')
plt.legend()
The main problem I am having is that the fitting of the data to the function does not work when I use large number of points. In the above code When I use the 4 points (time & moment), this code works fine.
I get the following values for a and b.
array([ 29.11832766, 0.13918353])
The expected values for a is (23-50) and b is (0.06 - 0.15). So these values are within the acceptable range. This is the corresponding plot:
However, when I use my actual experimental normalized data with about 500 points.
EDIT: This data:
Normalized Data
https://www.dropbox.com/s/64zke4wckxc1r75/Normalized%20Data.csv?dl=0
Raw Data
https://www.dropbox.com/s/ojgse5ibp59r8nw/Data1.csv?dl=0
I get the following values and plot for a and b which are out of the acceptable range,
array([-13.76687781, -12.90494196])
I know these values are wrong and if I were to do it manually (slowly adjusting values to obtain the proper fit) it would be around a=30.1 and b=0.09. And when plotted looks as such:
I have tried changing the initial guess values for a & b, other sets of experimental data as well and other suggestions in similar threads. None seem to work for me. Any help you can provide is appreciated. Thanks.
.
.
.
.
ADDITIONAL INFORMATION
The model I am trying to fit the data to comes from the following equation:
where Dmom = 1 - 2*Psw
a is the Eb value while b is the Sigma value where, Eb has a range of values determined by the probability density function and 4 times of the sigma values (i.e. Foursigma). This distribution is then summed up to use for the final equation.
It looks like you do need to play around with the initial guesses for a and b after all. Perhaps the function you're fitting is not very well behaved, which is why it's so prone to fail for intitial guesses away from the global minumum. That being said, here's a working example of how to fit your data:
import pandas as pd
data_df = pd.read_csv('data.csv')
time = data_df['Time since start, Time [s]'].values
moment = data_df['Signal X direction, Moment [emu]'].values
params, extras = curve_fit(Ebfit, time, moment, p0=[40, 0.3])
Yields the values of a and b of:
In [6]: params
Out[6]: array([ 30.47553689, 0.08839412])
Which results in a nicely aligned fit of a function.
x_big = np.linspace(1, 1800, 2000)
y_big = Ebfit(x_big, params[0], params[1])
plt.plot(time, moment, 'o', alpha=0.5, label='all points')
plt.plot(x_big, y_big, label = 'fitted curve')
plt.legend()
plt.show()