I'm trying to plot values from a recorded data set from an experiment.
When fitting the data with an exponential decay, it's very successful in the form of a normal plot.
But having the plot in a semi-log form gives me this.
import matplotlib.pyplot as plt
%matplotlib inline
import pylab as plb
import numpy as np
import scipy as sp
import csv
from scipy.optimize import curve_fit
# Run-4 Data
DecayTime4 = []
DecayCount4 = []
with open('Half_Life_Run4_Decay_AD.csv', 'r') as h:
reader = csv.reader(h, delimiter=',')
for row in reader:
DecayTime4.append(row[0])
DecayCount4.append(row[1])
DecayTime4 = np.array(DecayTime4)
DecayCount4 = np.array(DecayCount4)
def model_func(x, a, k, b):
return a * np.exp(-k*x) + b
# Run 4 Data Fitting plot
x4 = np.float32(DecayTime4)
y4 = np.float32(DecayCount4)
p0_R4 = (1.,1.e-5,1.)
optR4, pcovR4 = curve_fit(model_func, x4, y4, p0_R4)
aR4, kR4, bR4 = optR4
aR4p, kR4p, bR4p = pcovR4
y4M = model_func(x4, aR4, kR4, bR4)
fig4 = plt.figure(figsize=(15,6))
ax4 = fig4.add_subplot(111)
# Plot of data
ax4.plot(DecayTime4, DecayCount4, ".", color='lightcoral')
# Plot of best fit
ax4.plot(x4, y4M, color='k', label='Fitting Function: $f(t) = %0.2f e^{%0.3f\ t} %+0.2f$' % (aR4,kR4,bR4))
ax4.set_xlabel('Time (sec)')
ax4.set_ylabel('Count')
ax4.set_title('Run 4 of Cesium-137 Decay')
ax4.set_yscale('log')
ax4.legend(bbox_to_anchor=(1.0, 1.0), prop={'size':15}, fancybox=True, shadow=True)
The purpose of the semi-log is to show the accuracy of the exponential fit with the data.
It should real be a straight line like this image
The data set is large with a shape of (1401,).
Could it be that the curve_fit must not work well with large data sets?
Can this be correct?
Related
My dataset contains signals each signal has 9000 samples and recorded in 36 seconds. here I am trying to convert a signal from time domain to frequency domain by applying FFT. I read too much about it but still didn't make it clear especially how to set the parameters.
After applied FFT i get the same original signals (you can see the Graph below).
here .mat file you can download the dataset i used
original signal 'RED' + FFT signals 'BLUE'
import scipy.io as sio
import numpy as np
import matplotlib.pyplot as plt
from scipy.fftpack import fft
load dataset
m = sio.loadmat(r'C:\******\mill without Nan',struct_as_record=True)
data = m['mill']
X= data[0,3:4]['vib_spindle'] #just chose 2 signals as an example
Applying Fourier Transform on Time Series
t_n = 36 # seconds "length of signal"
N = 9000 # measurments "samples"
T = t_n / N #time step
f_s = 1/T # sampling interval
x_value = np.linspace(0,t_n,N)
y_values = X #my signals
composite_y_value = np.sum(y_values, axis=0) #sum signals
def get_fft_values(y_value, T, N, f_s):
f_values = np.linspace(0.0, 1.0/(2.0*T), N)
fft_values_ = fft(y_value)
fft_values = 2.0/N * np.abs(fft_values_[0:N])
return f_values, fft_values
f_values, fft_values = get_fft_values(composite_y_value, T, N, f_s)
plt.figure(figsize = (12, 6))
plt.subplot(121)
plt.plot( composite_y_value, linestyle='-', color='red')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.subplot(122)
plt.plot(f_values, fft_values, linestyle='-', color='blue')
plt.xlabel('Freq (Hz)')
plt.ylabel('FFT Amplitude |X(freq)|')
plt.tight_layout()
plt.show()
THEN the output is this Graph original signal 'RED' + FFT signals 'BLUE'
Data Acquisition and Processing: the data were sent through a high
speed data acquisition board with maximal sampling rate of 100 KHz. (maybe this information is helpful)
when i tried the same code on a simple random time series it works perfect!!
I have a file with a table. I am trying to plot a velDisp vs. ABSMAG. Here is my code:
import matplotlib.pyplot as plt
from astropy.io import fits
from astropy.io.fits import getdata
from astropy.table import Table
data = getdata("Subset.fits")
data, hdr = getdata("Subset.fits",1,header = True)
table = fits.open('Subset.fits')
data1 = Table(table[1].data)
#print("Columnns:", data1[0].columns)
graph = Table.read('Subset.fits')
mag = data1['ABSMAG']
r_mag = mag[:,2]
x = graph['ABSMAG']
y = graph['velDisp']
plt.scatter(x, y, color = 'r')
plt.title('Velocity Dispersion vs Absolute Magnitude')
plt.xlabel('Abs Mag(r_band)')
plt.ylabel('Velocity Dispersion')
plt.grid()
plt.show()
It's giving me the error that x and y must be the same size.The velDisp I believe is in 3D so this may need to be done in log space. Any idea how to do this?
I have a Unix time series (x) with an associated signal value (y) which is generated every minute, dropping the first value and appending a new one. I am trying to smooth the resulting curve without loosing time accuracy with a specific emphasis on the final value of the smoothed curve which will be written to a database. I would like to be able to adjust the smoothing to a considerable degree.
I have studied (as mathematical layman, more or less) all options I could find and I could master. I came across Savitzki Golay which looked perfect until I realized it works well on past data but fails to produce a reliable final value if no future data is available for smoothing. I have tried many other methods which produced results but could not be adjusted like Savgol.
import pandas as pd
from bokeh.plotting import figure, show, output_file
from bokeh.layouts import column
from math import pi
from scipy.signal import savgol_filter
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from scipy.interpolate import splrep, splev
from scipy.ndimage import gaussian_filter1d
from scipy.signal import lfilter
from scipy.interpolate import UnivariateSpline
import matplotlib.pyplot as plt
df_sim = pd.read_csv("/home/20190905_Signal_Smooth_Test.csv")
#sklearn Polynomial*****************************************
poly = PolynomialFeatures(degree=4)
X = df_sim.iloc[:, 0:1].values
print(X)
y = df_sim.iloc[:, 1].values
print(y)
X_poly = poly.fit_transform(X)
poly.fit(X_poly, y)
lin2 = LinearRegression()
lin2.fit(X_poly, y)
# Visualising the Polynomial Regression results
plt.scatter(X, y, color='blue')
plt.plot(X, lin2.predict(poly.fit_transform(X)), color='red')
plt.title('Polynomial Regression')
plt.xlabel('Time')
plt.ylabel('Signal')
plt.show()
#scipy interpolate********************************************
bspl = splrep(df_sim['timestamp'], df_sim['signal'], s=5)
bspl_y = splev(df_sim['timestamp'], bspl)
df_sim['signal_spline'] = bspl_y
#scipy gaussian filter****************************************
smooth = gaussian_filter1d(df_sim['signal'], 3)
df_sim['signal_gauss'] = smooth
#scipy lfilter************************************************
n = 5 # the larger n is, the smoother curve will be
b = [1.0 / n] * n
a = 1
histo_filter = lfilter(b, a, df_sim['signal'])
df_sim['signal_lfilter'] = histo_filter
print(df_sim)
#scipy UnivariateSpline**************************************
s = UnivariateSpline(df_sim['timestamp'], df_sim['signal'], s=5)
xs = df_sim['timestamp']
ys = s(xs)
df_sim['signal_univariante'] = ys
#scipy savgol filter****************************************
sg = savgol_filter(df_sim['signal'], 11, 3)
df_sim['signal_savgol'] = sg
df_sim['date'] = pd.to_datetime(df_sim['timestamp'], unit='s')
#plotting it all********************************************
print(df_sim)
w = 60000
TOOLS = "pan,wheel_zoom,box_zoom,reset,save"
p = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, plot_height=250,
title=f"Various Signals y vs Timestamp x")
p.xaxis.major_label_orientation = pi / 4
p.grid.grid_line_alpha = 0.9
p.line(x=df_sim['date'], y=df_sim['signal'], color='green')
p.line(x=df_sim['date'], y=df_sim['signal_spline'], color='blue')
p.line(x=df_sim['date'], y=df_sim['signal_gauss'], color='red')
p.line(x=df_sim['date'], y=df_sim['signal_lfilter'], color='magenta')
p.line(x=df_sim['date'], y=df_sim['signal_univariante'], color='yellow')
p1 = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, plot_height=250,
title=f"Savgol vs Signal")
p1.xaxis.major_label_orientation = pi / 4
p1.grid.grid_line_alpha = 0.9
p1.line(x=df_sim['date'], y=df_sim['signal'], color='green')
p1.line(x=df_sim['date'], y=df_sim['signal_savgol'], color='blue')
output_file("signal.html", title="Signal Test")
show(column(p, p1)) # open a browser
I expect a result that is similar to Savitzky Golay but with valid final smoothed values for the data series. None of the other methods present the same flexibility to adjust the grade of smoothing. Most other methods shift the curve to the right. I can provide to csv file for testing.
This really depends on why you are smoothing the data. Every smoothing method will have side effects, such as letting some 'noise' through more than other. Research 'phase response of filtering'.
A common technique to avoid the problem of missing data at the end of a symmetric filter is to just forecast your data a few points ahead and use that. For example, if you are using a 5-term moving average filter you will be missing 2 data points when you go to calculate your end value.
To forecast these two points, you could use the auto_arima() function from the pmdarima module, or look at the fbprophet module (which I find quite good for this kind of situation).
I have a set of integer values, and I want to set them to Weibull distribution and get the best fit parameters. Then I draw the histogram of data together with the pdf of Weibull distribution, using the best fit parameters. This is the code I used.
from jtlHandler import *
import warnings
import numpy as np
import pandas as pd
import scipy.stats as st
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
def get_pdf(latencies):
a = np.array(latencies)
ag = st.gaussian_kde(a)
ak = np.linspace(np.min(a), np.max(a), len(a))
agv = ag(ak)
plt.plot(ak,agv)
plt.show()
return (ak,agv)
def fit_to_distribution(distribution, data):
params = distribution.fit(data)
# Return MLEs for shape (if applicable), location, and scale parameters from data.
#
# MLE stands for Maximum Likelihood Estimate. Starting estimates for the fit are given by input arguments; for any arguments not provided with starting estimates, self._fitstart(data) is called to generate such.
return params
def make_distribution_pdf(dist, params, end):
arg = params[:-2]
loc = params[-2]
scale = params[-1]
# Build PDF and turn into pandas Series
x = np.linspace(0, end, end)
y = dist.pdf(x, loc=loc, scale=scale, *arg)
pdf = pd.Series(y, x)
return pdf
latencies = getLatencyList("filename")
latencies = latencies[int(9*(len(latencies)/10)):len(latencies)]
data = pd.Series(latencies)
params = fit_to_distribution(st.weibull_max, data)
print("Parameters for the fit: "+str(params))
# Make PDF
pdf = make_distribution_pdf(st.weibull_max, params, max(latencies))
# Display
plt.figure()
ax = pdf.plot(lw=2, label='PDF', legend=True)
data.plot(kind='hist', bins=200, normed=True, alpha=0.5, label='Data',
legend=True, ax=ax)
ax.set_title('Weibull distribution')
ax.set_xlabel('Latnecy')
ax.set_ylabel('Frequency')
plt.savefig("image.png")
This is the resulting figure.
As it is seen, the Weibull approximation is not simmilar to the original distribution of data.
How can I get the best Weibull approximation to my data?
You can fit a data set (set of numbers) to any distribution using the following two methods.
import os
import matplotlib.pyplot as plt
import sys
import math
import numpy as np
import scipy.stats as st
from scipy.stats._continuous_distns import _distn_names
from scipy.optimize import curve_fit
def fit_to_distribution(distribution, latency_values):
distribution = getattr(st, distribution)
params = distribution.fit(latency_values)
return params
def make_distribution_pdf(distribution, latency_list):
distribution = getattr(st, distribution)
params = distribution.fit(latency_list)
arg = params[:-2]
loc = params[-2]
scale = params[-1]
x = np.linspace(min(latency_list), max(latency_list), 10000)
y = distribution.pdf(x, loc=loc, scale=scale, *arg)
return x, y
I am trying to interpolate spectrogram obtained from matplotlib using scipy's inetrp2d function, but somehow fail to get the same spectrogram. The data is available here
The actual spectrogram is:
And interpolated spectrogram is:
The code looks okay, but even then something is wrong. The code used is:
from __future__ import division
from matplotlib import ticker as mtick
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
import numpy as np
from bisect import bisect
from scipy import interpolate
from matplotlib.ticker import MaxNLocator
data = np.genfromtxt('spectrogram.dat', skiprows = 2, delimiter = ',')
pressure = data[:, 1] * 0.065
time = data[:, 0]
cax = plt.specgram(pressure * 100000, NFFT = 256, Fs = 50000, noverlap=4, cmap=plt.cm.gist_heat, zorder = 1)
f = interpolate.interp2d(cax[2], cax[1], cax[0], kind='cubic')
xnew = np.linspace(cax[2][0], cax[2][-1], 100)
ynew = np.linspace(cax[1][0], cax[1][-1], 100)
znew = 10 * np.log10(f(xnew, ynew))
fig = plt.figure(figsize=(6, 3.2))
ax = fig.add_subplot(111)
ax.set_title('colorMap')
plt.pcolormesh(xnew, ynew, znew, cmap=plt.cm.gist_heat)
# plt.colorbar()
plt.title('Interpolated spectrogram')
plt.colorbar(orientation='vertical')
plt.savefig('interp_spectrogram.pdf')
How to interpolate a spectrogram correctly with Python?
The key to your solution is in this warning, which you may or may not have seen:
RuntimeWarning: invalid value encountered in log10
znew = 10 * np.log10(f(xnew, ynew))
If your data is actually a power whose log you'd like to view explicitly as decibel power, take the log first, before fitting to the spline:
spectrum, freqs, t, im = cax
dB = 10*np.log10(spectrum)
#f = interpolate.interp2d(t, freqs, dB, kind='cubic') # docs for this recommend next line
f = interpolate.RectBivariateSpline(t, freqs, dB.T) # but this uses xy not ij, hence the .T
xnew = np.linspace(t[0], t[-1], 10*len(t))
ynew = np.linspace(freqs[0], freqs[-1], 10*len(freqs)) # was it wider spaced than freqs on purpose?
znew = f(xnew, ynew).T
Then plotting as you have:
Previous answer:
If you just want to plot on logscale, use matplotlib.colors.LogNorm
znew = f(xnew, ynew) # Don't take the log here
plt.figure(figsize=(6, 3.2))
plt.pcolormesh(xnew, ynew, znew, cmap=plt.cm.gist_heat, norm=colors.LogNorm())
And that looks like this:
Of course that still has gaps where its value is negative when plotted on a log scale. What your data means to you when the value is negative should dictate how you fill this in. One simple solution is to just set those values to the smallest positive value and they'd fill in as black: