I am trying to implement FFT by using the conv1d function provided in Pytorch.
Generating artifical signal
import numpy as np
import torch
from torch.autograd import Variable
from torch.nn.functional import conv1d
from scipy import fft, fftpack
import matplotlib.pyplot as plt
%matplotlib inline
# Creating filters
d = 4096 # size of windows
def create_filters(d):
x = np.arange(0, d, 1)
wsin = np.empty((d,1,d), dtype=np.float32)
wcos = np.empty((d,1,d), dtype=np.float32)
window_mask = 1.0-1.0*np.cos(x)
for ind in range(d):
wsin[ind,0,:] = np.sin(2*np.pi*((ind+1)/d)*x)
wcos[ind,0,:] = np.cos(2*np.pi*((ind+1)/d)*x)
return wsin,wcos
wsin, wcos = create_filters(d)
wsin_var = Variable(torch.from_numpy(wsin), requires_grad=False)
wcos_var = Variable(torch.from_numpy(wcos),requires_grad=False)
# Creating signal
t = np.linspace(0,1,4096)
x = np.sin(2*np.pi*100*t)+np.sin(2*np.pi*200*t)+np.random.normal(scale=5,size=(4096))
plt.plot(x)
FFT with Pytorch
signal_input = torch.from_numpy(x.reshape(1,-1),)[:,None,:4096]
signal_input = signal_input.float()
zx = conv1d(signal_input, wsin_var, stride=1).pow(2)+conv1d(signal_input, wcos_var, stride=1).pow(2)
FFT with Scipy
fig = plt.figure(figsize=(20,5))
plt.plot(np.abs(fft(x).reshape(-1))[:500])
My Question
As you can see the two outputs are quite similar in terms of the peaks characteristics. That means my implementation is not totally wrong.
However, there are also some subtleties, such as the scale of the spectrum, and the signal to noise ratio. I am unable to figure out what's missing here to get the exact same result.
You calculated the power rather than the amplitude.
You simply need to add the line zx = zx.pow(0.5) to take the square root to get the amplitude.
As of version 1,8, PyTorch has a native implementation torch.fft:
torch.fft.fft(x)
Related
I created some data from two superposed normal distributions and then applied sklearn.neighbors.KernelDensity and scipy.stats.gaussian_kde to estimate the density function. However, using the same bandwith (1.0) and the same kernel, both methods produce a different outcome. Can someone explain me the reason for this? Thanks for help.
Below you can find the code to reproduce the issue:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
import seaborn as sns
from sklearn.neighbors import KernelDensity
n = 10000
dist_frac = 0.1
x1 = np.random.normal(-5,2,int(n*dist_frac))
x2 = np.random.normal(5,3,int(n*(1-dist_frac)))
x = np.concatenate((x1,x2))
np.random.shuffle(x)
eval_points = np.linspace(np.min(x), np.max(x))
kde_sk = KernelDensity(bandwidth=1.0, kernel='gaussian')
kde_sk.fit(x.reshape([-1,1]))
y_sk = np.exp(kde_sk.score_samples(eval_points.reshape(-1,1)))
kde_sp = gaussian_kde(x, bw_method=1.0)
y_sp = kde_sp.pdf(eval_points)
sns.kdeplot(x)
plt.plot(eval_points, y_sk)
plt.plot(eval_points, y_sp)
plt.legend(['seaborn','scikit','scipy'])
If I change the scipy bandwith to 0.25, the result of both methods look approximately the same.
What is meant by bandwidth in scipy.stats.gaussian_kde and sklearn.neighbors.KernelDensity is not the same. Scipy.stats.gaussian_kde uses a bandwidth factor https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html. For a 1-D kernel density estimation the following formula is applied:
the bandwidth of sklearn.neighbors.KernelDensity = bandwidth factor of the scipy.stats.gaussian_kde * standard deviation of the sample
For your estimation this probably means that your standard deviation equals 4.
I would like to refer to Getting bandwidth used by SciPy's gaussian_kde function for more information.
To be honest, I don't know why, but using scipy hyperparameter bw_method='scott' makes it work exactly the same as seaborn.
So, it seems to be all about the hyperparameters. We could find out why by understanding them in depth, but in the meantime just use ‘scott’ or ‘silverman’ instead of using a random scalar.
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
import seaborn as sns
from sklearn.neighbors import KernelDensity
n = 10000
dist_frac = 0.1
x1 = np.random.normal(-5,2,int(n*dist_frac))
x2 = np.random.normal(5,3,int(n*(1-dist_frac)))
x = np.concatenate((x1,x2))
np.random.shuffle(x)
eval_points = np.linspace(np.min(x), np.max(x))
kde_sk = KernelDensity(bandwidth=1, kernel='gaussian')
kde_sk.fit(x.reshape([-1,1]))
y_sk = np.exp(kde_sk.score_samples(eval_points.reshape(-1,1)))
kde_sp = gaussian_kde(x, bw_method='scott') ### I MEAN HERE! ###
y_sp = kde_sp.pdf(eval_points)
sns.kdeplot(x)
plt.plot(eval_points, y_sk)
plt.plot(eval_points, y_sp)
plt.legend(['seaborn','scikit','scipy'])
Increase the size of 'random normal'. your data points are too few.
try with n=500000 and check the results.
I have data set (1-D), with only one independent column. I would like to fit any model to it in order to sample from that model. The raw data
Data set
I tried various theoretical distributions from Fitter package (here https://pypi.org/project/fitter/), none of them works fine. Then i tried Kernel Density Estimation using sklearn. It is good, but i could not prevent negative values due to the way it works. Finally, i tried log normal, but it is not really perfect.
Code for log normal here
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy
import math
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
NN = 3915 # sample same number as original data set
df = pd.read_excel (r'Data_sets2.xlsx',sheet_name="Set1")
eps = 0.1 # Additional term for c
"""
Estimate parameters of log(c) as normal distribution
"""
df["c"] = df["c"] + eps
mu = np.mean(np.log(df["c"]))
s = np.std(np.log(df["c"]))
print("Mean:",mu,"std:",s)
def simulate(N):
c = []
for i in range(N):
c_s = np.exp(np.random.normal(loc = mu, scale = s, size=1)[0])
c.append(round(c_s))
return (c)
predicted_c = simulate(NN)
XX=scipy.arange(3915)
### plot C relation ###
plt.scatter(XX,df["c"],color='g',label="Original data")
plt.scatter(XX,predicted_c,color='r',label="Sample data")
plt.xlabel('Index')
plt.ylabel('c')
plt.legend()
plt.show()
original vs samples
What i am looking for is how to improve the fitting, any suggestions or direction to models that may fit my data with a better accuracy is appreciated. Thanks
Here is a graphical Python fitter for the scipy statistical distribution Double Gamma using your spreadsheet data, I hope this might be of some use as a Normal distribution seems to be a poor fit to this data set. The scipy documentation for dgamma is at https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.dgamma.html - incidentally,the double Weibull distribution fit almost as well.
import numpy as np
import scipy.stats as ss
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_excel (r'Data_sets2.xlsx',sheet_name="Set1")
eps = 0.1 # Additional term for c
data = df["c"] + eps
P = ss.dgamma.fit(data)
rX = np.linspace(min(data), max(data), 50)
rP = ss.dgamma.pdf(rX, *P)
plt.hist(data,bins=25, normed=True, color='slategrey')
plt.plot(rX, rP, color='darkturquoise')
plt.show()
Both Librosa and Scipy have the fft function, however, they give me a different spectrogram output even with the same signal input.
Scipy
I am trying to get the spectrogram with the following code
import numpy as np # fast vectors and matrices
import matplotlib.pyplot as plt # plotting
from scipy import fft
X = np.sin(np.linspace(0,1e10,5*44100))
fs = 44100 # assumed sample frequency in Hz
window_size = 2048 # 2048-sample fourier windows
stride = 512 # 512 samples between windows
wps = fs/float(512) # ~86 windows/second
Xs = np.empty([int(2*wps),2048])
for i in range(Xs.shape[0]):
Xs[i] = np.abs(fft(X[i*stride:i*stride+window_size]))
fig = plt.figure(figsize=(20,7))
plt.imshow(Xs.T[0:150],aspect='auto')
plt.gca().invert_yaxis()
fig.axes[0].set_xlabel('windows (~86Hz)')
fig.axes[0].set_ylabel('frequency')
plt.show()
Then I get the following spectrogram
Librosa
Now I try to get the same spectrogram with Librosa
from librosa import stft
X_libs = stft(X, n_fft=window_size, hop_length=stride)
X_libs = np.abs(X_libs)[:,:int(2*wps)]
fig = plt.figure(figsize=(20,7))
plt.imshow(X_libs[0:150],aspect='auto')
plt.gca().invert_yaxis()
fig.axes[0].set_xlabel('windows (~86Hz)')
fig.axes[0].set_ylabel('frequency')
plt.show()
Question
The two spectrogram are obviously different, specifically, the Librosa version has an attack at the very beginning.
What causes the difference? I don't see many parameters that I can tune in the documentation for Scipy and Librosa.
The reason for this is the argument center for librosa's stft. By default it's True (along with pad_mode = 'reflect').
From the docs:
librosa.core.stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, dtype=, pad_mode='reflect')
center:boolean
If True, the signal y is padded so that frame D[:, t] is centered at y[t * hop_length].
If False, then D[:, t] begins at y[t * hop_length]
pad_mode:string
If center=True, the padding mode to use at the edges of the signal. By default, STFT uses reflection padding.
Calling the STFT like this
X_libs = stft(X, n_fft=window_size, hop_length=stride,
center=False)
does lead to a straight line:
Note that librosa's stft also uses the Hann window function by default. If you want to avoid this and make it more like your Scipy stft implementation, call the stft with a window consisting only of ones:
X_libs = stft(X, n_fft=window_size, hop_length=stride,
window=np.ones(window_size),
center=False)
You'll notice that the line is thinner.
I was trying to fit beta prime distribution to my data using python. As there's scipy.stats.betaprime.fit, I tried this:
import numpy as np
import math
import scipy.stats as sts
import matplotlib.pyplot as plt
N = 5000
nb_bin = 100
a = 12; b = 106; scale = 36; loc = -a/(b-1)*scale
y = sts.betaprime.rvs(a,b,loc,scale,N)
a_hat,b_hat,loc_hat,scale_hat = sts.betaprime.fit(y)
print('Estimated parameters: \n a=%.2f, b=%.2f, loc=%.2f, scale=%.2f'%(a_hat,b_hat,loc_hat,scale_hat))
plt.figure()
count, bins, ignored = plt.hist(y, nb_bin, normed=True)
pdf_ini = sts.betaprime.pdf(bins,a,b,loc,scale)
pdf_est = sts.betaprime.pdf(bins,a_hat,b_hat,loc_hat,scale_hat)
plt.plot(bins,pdf_ini,'g',linewidth=2.0,label='ini');plt.grid()
plt.plot(bins,pdf_est,'y',linewidth=2.0,label='est');plt.legend();plt.show()
It shows me the result that:
Estimated parameters:
a=9935.34, b=10846.64, loc=-90.63, scale=98.93
which is quite different from the original one and the figure from the PDF:
If I give the real value of loc and scale as the input of fit function, the estimation result would be better. Has anyone worked on this part already or got a better solution?
Here is my first steps within the NumPy world.
As a matter of fact the target is plotting below 2-D function as a 3-D mesh:
N = \frac{n}{2\sigma\sqrt{\pi}}\exp^{-\frac{n^{2}x^{2}}{4\sigma^{2}}}
That could been done as a piece a cake in Matlab with below snippet:
[x,n] = meshgrid(0:0.1:20, 1:1:100);
mu = 0;
sigma = sqrt(2)./n;
f = normcdf(x,mu,sigma);
mesh(x,n,f);
But the bloody result is ugly enough to drive me trying Python capabilities to generate scientific plots.
I searched something and found that the primary steps to hit above mark in Pyhton might be acquired by below snippet:
from matplotlib.patches import Polygon
import numpy as np
from scipy.integrate import quad
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
sigma = 1
def integrand(x,n):
return (n/(2*sigma*np.sqrt(np.pi)))*np.exp(-(n**2*x**2)/(4*sigma**2))
t = np.linespace(0, 20, 0.01)
n = np.linespace(1, 100, 1)
lower_bound = -100000000000000000000 #-inf
upper_bound = t
tt, nn = np.meshgrid(t,n)
real_integral = quad(integrand(tt,nn), lower_bound, upper_bound)
Axes3D.plot_trisurf(real_integral, tt,nn)
Edit: With due attention to more investigations on Greg's advices, above code is the most updated snippet.
Here is the generated exception:
RuntimeError: infinity comparisons don't work for you
It is seemingly referring to the quad call...
Would you please helping me to handle this integrating-plotting problem?!...
Best
Just a few hints to get you in the right direction.
numpy.meshgrid can do the same as MatLABs function:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.meshgrid.html
When you have x and n you can do math just like in matlab:
sigma = numpy.sqrt(2)/n
(in python multiplication/division is default index by index - no dot needed)
scipy has a lot more advanced functions, see for example How to calculate cumulative normal distribution in Python for a 1D case.
For plotting you can use matplotlibs pcolormesh:
import matplotlib.pyplot as plt
plt.pcolormesh(x,n,real_integral)
Hope this helps until someone can give you a more detailed answer.