I am trying to interpolate spectrogram obtained from matplotlib using scipy's inetrp2d function, but somehow fail to get the same spectrogram. The data is available here
The actual spectrogram is:
And interpolated spectrogram is:
The code looks okay, but even then something is wrong. The code used is:
from __future__ import division
from matplotlib import ticker as mtick
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
import numpy as np
from bisect import bisect
from scipy import interpolate
from matplotlib.ticker import MaxNLocator
data = np.genfromtxt('spectrogram.dat', skiprows = 2, delimiter = ',')
pressure = data[:, 1] * 0.065
time = data[:, 0]
cax = plt.specgram(pressure * 100000, NFFT = 256, Fs = 50000, noverlap=4, cmap=plt.cm.gist_heat, zorder = 1)
f = interpolate.interp2d(cax[2], cax[1], cax[0], kind='cubic')
xnew = np.linspace(cax[2][0], cax[2][-1], 100)
ynew = np.linspace(cax[1][0], cax[1][-1], 100)
znew = 10 * np.log10(f(xnew, ynew))
fig = plt.figure(figsize=(6, 3.2))
ax = fig.add_subplot(111)
ax.set_title('colorMap')
plt.pcolormesh(xnew, ynew, znew, cmap=plt.cm.gist_heat)
# plt.colorbar()
plt.title('Interpolated spectrogram')
plt.colorbar(orientation='vertical')
plt.savefig('interp_spectrogram.pdf')
How to interpolate a spectrogram correctly with Python?
The key to your solution is in this warning, which you may or may not have seen:
RuntimeWarning: invalid value encountered in log10
znew = 10 * np.log10(f(xnew, ynew))
If your data is actually a power whose log you'd like to view explicitly as decibel power, take the log first, before fitting to the spline:
spectrum, freqs, t, im = cax
dB = 10*np.log10(spectrum)
#f = interpolate.interp2d(t, freqs, dB, kind='cubic') # docs for this recommend next line
f = interpolate.RectBivariateSpline(t, freqs, dB.T) # but this uses xy not ij, hence the .T
xnew = np.linspace(t[0], t[-1], 10*len(t))
ynew = np.linspace(freqs[0], freqs[-1], 10*len(freqs)) # was it wider spaced than freqs on purpose?
znew = f(xnew, ynew).T
Then plotting as you have:
Previous answer:
If you just want to plot on logscale, use matplotlib.colors.LogNorm
znew = f(xnew, ynew) # Don't take the log here
plt.figure(figsize=(6, 3.2))
plt.pcolormesh(xnew, ynew, znew, cmap=plt.cm.gist_heat, norm=colors.LogNorm())
And that looks like this:
Of course that still has gaps where its value is negative when plotted on a log scale. What your data means to you when the value is negative should dictate how you fill this in. One simple solution is to just set those values to the smallest positive value and they'd fill in as black:
Related
I have used seaborn's kdeplot on some data.
import seaborn as sns
import numpy as np
sns.kdeplot(np.random.rand(100))
Is it possible to return the fwhm from the curve created?
And if not, is there another way to calculate it?
You can extract the generated kde curve from the ax. Then get the maximum y value and search the x positions nearest to the half max:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
ax = sns.kdeplot(np.random.rand(100))
kde_curve = ax.lines[0]
x = kde_curve.get_xdata()
y = kde_curve.get_ydata()
halfmax = y.max() / 2
maxpos = y.argmax()
leftpos = (np.abs(y[:maxpos] - halfmax)).argmin()
rightpos = (np.abs(y[maxpos:] - halfmax)).argmin() + maxpos
fullwidthathalfmax = x[rightpos] - x[leftpos]
ax.hlines(halfmax, x[leftpos], x[rightpos], color='crimson', ls=':')
ax.text(x[maxpos], halfmax, f'{fullwidthathalfmax:.3f}\n', color='crimson', ha='center', va='center')
ax.set_ylim(ymin=0)
plt.show()
Note that you can also calculate a kde curve from scipy.stats.gaussian_kde if you don't need the plotted version. In that case, the code could look like:
import numpy as np
from scipy.stats import gaussian_kde
data = np.random.rand(100)
kde = gaussian_kde(data)
x = np.linspace(data.min(), data.max(), 1000)
y = kde(x)
halfmax = y.max() / 2
maxpos = y.argmax()
leftpos = (np.abs(y[:maxpos] - halfmax)).argmin()
rightpos = (np.abs(y[maxpos:] - halfmax)).argmin() + maxpos
fullwidthathalfmax = x[rightpos] - x[leftpos]
print(fullwidthathalfmax)
I don't believe there's a way to return the fwhm from the random dataplot without writing the code to calculate it.
Take into account some example data:
import numpy as np
arr_x = np.linspace(norm.ppf(0.00001), norm.ppf(0.99999), 10000)
arr_y = norm.pdf(arr_x)
Find the minimum and maximum points and calculate difference.
difference = max(arr_y) - min(arr_y)
Find the half max (in this case it is half min)
HM = difference / 2
Find the nearest data point to HM:
nearest = (np.abs(arr_y - HM)).argmin()
Calculate the distance between nearest and min to get the HWHM, then mult by 2 to get the FWHM.
I have a Unix time series (x) with an associated signal value (y) which is generated every minute, dropping the first value and appending a new one. I am trying to smooth the resulting curve without loosing time accuracy with a specific emphasis on the final value of the smoothed curve which will be written to a database. I would like to be able to adjust the smoothing to a considerable degree.
I have studied (as mathematical layman, more or less) all options I could find and I could master. I came across Savitzki Golay which looked perfect until I realized it works well on past data but fails to produce a reliable final value if no future data is available for smoothing. I have tried many other methods which produced results but could not be adjusted like Savgol.
import pandas as pd
from bokeh.plotting import figure, show, output_file
from bokeh.layouts import column
from math import pi
from scipy.signal import savgol_filter
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from scipy.interpolate import splrep, splev
from scipy.ndimage import gaussian_filter1d
from scipy.signal import lfilter
from scipy.interpolate import UnivariateSpline
import matplotlib.pyplot as plt
df_sim = pd.read_csv("/home/20190905_Signal_Smooth_Test.csv")
#sklearn Polynomial*****************************************
poly = PolynomialFeatures(degree=4)
X = df_sim.iloc[:, 0:1].values
print(X)
y = df_sim.iloc[:, 1].values
print(y)
X_poly = poly.fit_transform(X)
poly.fit(X_poly, y)
lin2 = LinearRegression()
lin2.fit(X_poly, y)
# Visualising the Polynomial Regression results
plt.scatter(X, y, color='blue')
plt.plot(X, lin2.predict(poly.fit_transform(X)), color='red')
plt.title('Polynomial Regression')
plt.xlabel('Time')
plt.ylabel('Signal')
plt.show()
#scipy interpolate********************************************
bspl = splrep(df_sim['timestamp'], df_sim['signal'], s=5)
bspl_y = splev(df_sim['timestamp'], bspl)
df_sim['signal_spline'] = bspl_y
#scipy gaussian filter****************************************
smooth = gaussian_filter1d(df_sim['signal'], 3)
df_sim['signal_gauss'] = smooth
#scipy lfilter************************************************
n = 5 # the larger n is, the smoother curve will be
b = [1.0 / n] * n
a = 1
histo_filter = lfilter(b, a, df_sim['signal'])
df_sim['signal_lfilter'] = histo_filter
print(df_sim)
#scipy UnivariateSpline**************************************
s = UnivariateSpline(df_sim['timestamp'], df_sim['signal'], s=5)
xs = df_sim['timestamp']
ys = s(xs)
df_sim['signal_univariante'] = ys
#scipy savgol filter****************************************
sg = savgol_filter(df_sim['signal'], 11, 3)
df_sim['signal_savgol'] = sg
df_sim['date'] = pd.to_datetime(df_sim['timestamp'], unit='s')
#plotting it all********************************************
print(df_sim)
w = 60000
TOOLS = "pan,wheel_zoom,box_zoom,reset,save"
p = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, plot_height=250,
title=f"Various Signals y vs Timestamp x")
p.xaxis.major_label_orientation = pi / 4
p.grid.grid_line_alpha = 0.9
p.line(x=df_sim['date'], y=df_sim['signal'], color='green')
p.line(x=df_sim['date'], y=df_sim['signal_spline'], color='blue')
p.line(x=df_sim['date'], y=df_sim['signal_gauss'], color='red')
p.line(x=df_sim['date'], y=df_sim['signal_lfilter'], color='magenta')
p.line(x=df_sim['date'], y=df_sim['signal_univariante'], color='yellow')
p1 = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, plot_height=250,
title=f"Savgol vs Signal")
p1.xaxis.major_label_orientation = pi / 4
p1.grid.grid_line_alpha = 0.9
p1.line(x=df_sim['date'], y=df_sim['signal'], color='green')
p1.line(x=df_sim['date'], y=df_sim['signal_savgol'], color='blue')
output_file("signal.html", title="Signal Test")
show(column(p, p1)) # open a browser
I expect a result that is similar to Savitzky Golay but with valid final smoothed values for the data series. None of the other methods present the same flexibility to adjust the grade of smoothing. Most other methods shift the curve to the right. I can provide to csv file for testing.
This really depends on why you are smoothing the data. Every smoothing method will have side effects, such as letting some 'noise' through more than other. Research 'phase response of filtering'.
A common technique to avoid the problem of missing data at the end of a symmetric filter is to just forecast your data a few points ahead and use that. For example, if you are using a 5-term moving average filter you will be missing 2 data points when you go to calculate your end value.
To forecast these two points, you could use the auto_arima() function from the pmdarima module, or look at the fbprophet module (which I find quite good for this kind of situation).
my dataset (patient No., time/millisecond, x, y, z, label)
1,15,70,39,-970,0
1,31,70,39,-970,0
1,46,60,49,-960,0
1,62,60,49,-960,0
1,78,50,39,-960,0
1,93,50,39,-960,0
.
.
.
i am trying to to use the spectrogam for x-axis signal in preprocessing stage to use it then as the input data for a machine learning model instead of using the original raw x-axis data
here is what i tried to do
import matplotlib.pyplot as plt
import numpy as np
dt = 0.0005
t = np.arange(0.0, 20.0, dt)
data = np.loadtxt("trainingdataset.txt", delimiter=",")
x = data[:]
NFFT = 1024 # the length of the windowing segments
Fs = int(1.0/dt) # the sampling frequency
ax1 = plt.subplot(211)
plt.plot(x)
plt.subplot(212, sharex=ax1)
Pxx, freqs, bins, im = plt.specgram(x, NFFT=NFFT, Fs=Fs, noverlap=900)
plt.show()
it gets me the following error
Warning (from warnings module):
File "C:\Users\hadeer.elziaat\AppData\Local\Programs\Python\Python36\lib\site-packages\matplotlib\axes\_axes.py", line 7221
Z = 10. * np.log10(spec)
RuntimeWarning: divide by zero encountered in log10
If x is your signal and you can assume that your sampling rate is the mean of time/millisecond, then probably you can use the librosa library to compute the mel-spectrogram using librosa.feature.melspectrogram, there's also other utils to compute signal related features.
I am trying to do a 3D quiver plot and combining it with odeint to solve a linearized equation. Basically, I want something similar to this but in 3D. The particular issue I am having is that near the end of the code, when I am doing the ax.quiver() plot, I keep getting the error that "val must be a float or nonzero sequence of floats", and I am unsure how to resolve it.
from scipy.integrate import odeint
from numpy import *
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
fig = plt.figure()
ax =fig.add_subplot(1, 1, 1, projection='3d')
ax.set_xlabel('x')
ax.set_ylabel('u')
ax.set_zlabel('u1')
def testplot(X, t=0,c=0.2):
x = X[0]
u = X[1]
u1=X[2]
dxdt =x**2*(-1+x+u)*(1-x+(-1+c)*u**2)
du1dt =c**2*u*(2+x*(-4+2.25*x)+(-4 + 4*x)*u**2 + 2*u**4 + x**2*u*u1)
dudt=u1*dxdt
return [dxdt, dudt,du1dt]
X0 = [0.01,0.995,-0.01]#initial values
t = linspace(0, 50, 250)
c=[0.2,0.5,1,2]#changing parameter
for m in c:
sol = odeint(testplot,X0,t,mxstep=5000000,args=(m,))#solve ode
ax.plot(sol[:,0],sol[:,1],sol[:,2],lw=1.5,label=r'$c=%.1f$'%m)
x = linspace(-3,3,15)
y = linspace(-4,4,15)
z= linspace(-2,2,15)
x,y,z = meshgrid(x,y,z) #create grid
X,Y,Z = testplot([x,y,z])
M = sqrt(X**2+Y**2+Z**2)#magnitude
M[M==0]=1.
X,Y,Z = X/M, Y/M, Z/M
ax.quiver(x,y,z,X,Y,Z,M,cmap=plt.cm.jet)
ax.minorticks_on()
ax.legend(handletextpad=0,loc='upper left')
setp(ax.get_legend().get_texts(),fontsize=12)
fig.savefig("testplot.svg",bbox_inches="tight",\
pad_inches=.15)
Looks like you have an extra argument in ax.quiver(). From what I can tell, it looks like "M" is the extra argument. Taking that out, your quiver call looks like:
ax.quiver(x,y,z,X,Y,Z,cmap=plt.cm.jet)
The resulting image looks like:
I've plotted a 3-d mesh in Matlab by below little m-file:
[x,n] = meshgrid(0:0.1:20, 1:1:100);
mu = 0;
sigma = sqrt(2)./n;
f = normcdf(x,mu,sigma);
mesh(x,n,f);
I am going to acquire the same result by utilization of Python and its corresponding modules, by below code snippet:
import numpy as np
from scipy.integrate import quad
import matplotlib.pyplot as plt
sigma = 1
def integrand(x, n):
return (n/(2*sigma*np.sqrt(np.pi)))*np.exp(-(n**2*x**2)/(4*sigma**2))
tt = np.linspace(0, 20, 2000)
nn = np.linspace(1, 100, 100)
T = np.zeros([len(tt), len(nn)])
for i,t in enumerate(tt):
for j,n in enumerate(nn):
T[i, j], _ = quad(integrand, -np.inf, t, args=(n,))
x, y = np.mgrid[0:20:0.01, 1:101:1]
plt.pcolormesh(x, y, T)
plt.show()
But the output of the Python is is considerably different with the Matlab one, and as a matter of fact is unacceptable.
I am afraid of wrong utilization of the functions just like linespace, enumerate or mgrid...
Does somebody have any idea about?!...
PS. Unfortunately, I couldn't insert the output plots within this thread...!
Best
..............................
Edit: I changed the linespace and mgrid intervals and replaced plot_surface method... The output is 3d now with the suitable accuracy and smoothness...
From what I see the equivalent solution would be:
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
x, n = np.mgrid[0:20:0.01, 1:100:1]
mu = 0
sigma = np.sqrt(2)/n
f = norm.cdf(x, mu, sigma)
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_surface(x, n, f, rstride=x.shape[0]//20, cstride=x.shape[1]//20, alpha=0.3)
plt.show()
Unfortunately 3D plotting with matplotlib is not as straight forward as with matlab.
Here is the plot from this code:
Your Matlab code generate 201 points through x:
[x,n] = meshgrid(0:0.1:20, 1:1:100);
While your Python code generate only 20 points:
tt = np.linspace(0, 19, 20)
Maybe it's causing accuracy problems?
Try this code:
tt = np.linspace(0, 20, 201)
The seminal points to resolve the problem was:
1- Necessity of the equivalence regarding the provided dimensions of the linespace and mgrid functions...
2- Utilization of a mesh with more density to make a bee line into a high degree of smoothness...
3- Application of a 3d plotter function, like plot_surf...
The current code is totally valid...