Display issue of fitted curve: cannot solve coarseness - python

Despite having a working script for curve fitting using the lmfit library, I am not able to solve a display issue. Indeed, having only 5 dependent values, the resulting graph is rather coarse.
Before switching to lmfit, I was using curve_fit and could solve the display issue by simply using np.linspace and plot the optimized values resulting from the fit procedure. Then, I was displaying the "real" values through plt.errorbar. With lmfit, the above solution yields a mismatch error, since it recognizes the "fake" independent variables and launches a mismatch type error.
My full script is the following:
import lmfit as lf
from lmfit import Model, Parameters
import numpy as np
import matplotlib.pyplot as plt
from math import atan
def on_res(omega_eff, thetas, R2avg=5, k_ex=0.1, phi_ex=500):
return R2avg*(np.sin(thetas))**2 + ((np.sin(thetas))**2)*(phi_ex*k_ex/(k_ex**2 + omega_eff**2))
model = Model(on_res,independent_vars=['omega_eff','thetas'])
params = model.make_params(R2avg=5, k_ex=0.01, phi_ex=1500)
carrier = 6146.53
O_1 = 5846
spin_locks = (1000, 2000, 3000, 4000, 5000)
delta_omega = (O_1 - carrier)
omega_eff1 = ((delta_omega**2) + (spin_locks[0]**2))**0.5
omega_eff2 = ((delta_omega**2) + (spin_locks[1]**2))**0.5
omega_eff3 = ((delta_omega**2) + (spin_locks[2]**2))**0.5
omega_eff4 = ((delta_omega**2) + (spin_locks[3]**2))**0.5
omega_eff5 = ((delta_omega**2) + (spin_locks[4]**2))**0.5
theta_rad1 = atan(spin_locks[0]/delta_omega)
theta_rad2 = atan(spin_locks[1]/delta_omega)
theta_rad3 = atan(spin_locks[2]/delta_omega)
theta_rad4 = atan(spin_locks[3]/delta_omega)
theta_rad5 = atan(spin_locks[4]/delta_omega)
x = (omega_eff1/1000, omega_eff2/1000, omega_eff3/1000, omega_eff4/1000, omega_eff5/1000)# , omega_eff6/1000)# , omega_eff7/1000)
theta = (theta_rad1, theta_rad2, theta_rad3, theta_rad4, theta_rad5)
R1rho_vals = (7.9328, 6.2642, 6.0005, 5.9972, 5.988)
e = (0.2, 0.2, 0.2, 0.2, 0.2)
new_x = np.linspace(0, 6, 1000)
omega_eff = np.array(x, dtype=float)
thetas = np.array(theta, dtype=float)
R1rho_vals = np.array(R1rho_vals, dtype=float)
error = np.array(e, dtype=float)
R2avg = []
k_ex = []
phi_ex = []
result = model.fit(R1rho_vals, params, weights=1/error, thetas=thetas, omega_eff=omega_eff, method = "emcee", steps = 1000)
print(result.fit_report())
plt.errorbar(x, R1rho_vals, yerr = error, fmt = ".k", markersize = 8, capsize = 3)
plt.plot(new_x, result.best_fit)
plt.show()
As you can see running it, it launches the mismatch shape error message. Changing the plt.plot line to plt.plot(x, result.best_fit) yields the graph correctly, but displaying a very coarse profile (as one would expect, having only 5 points on the x-axis).
Are you aware of any way to solve this? Checking the documentation, I noticed the examples provided all plot the results via the actual independent variables values, since they have enough experimental values.

You need to re-evaluate the ModelResult with your new values for the independent variables:
plt.plot(new_x, result.eval(omega_eff=new_x/1000., thetas=thetas))

Related

Why am I getting a figure size plotting error in Python?

My python Code is no longer working. I messed around with terminal commands (on mac) uninstalling and re installing matplot lib, etc...And think I messed something up.
For this specific problem, code that I know worked before, no longer does.
I have this code where I load in data and plot a fit over it:
data = np.loadtxt('peak4.txt', skiprows = 1)
x = data[:,0]
y = data[:,1]
nn=len(x)
def f1(x,p0,p1,p2,p3,p4):
return p0*exp(-(x-p1)**2/(2*(p2**2)))+p3*x+p4
pp = 5
guesses = (1,1,1,1,1)
(p0,p1,p2,p3,p4),cc = curve_fit(f1,x,y,p0=guesses)
ymod = f1(x,p0,p1,p2,p3,p4)
plt.figure(figsize=(7.5,5))
plt.rc('font', size=16)
plt.plot(x,y,'.b')
plt.plot(x,ymod,'r')
plt.show()
yfit = f1(x,p0,p1,p2,p3,p4)
yys = (yfit-y)**2
chisqr = sum(yys)/(nn-pp)
d = np.sqrt(np.diag(cc))
print(f'Fitting Parameters of f1: p0 = {p0:3.5f}, p1 = {p1:3.5f}, p2 = {p2:3.5f}, p3 = {p3:3.5f}, p4 = {p4:3.5f}')
print(f'\nParameter errors of f1: p0_err = {d[0]:3.5f}, p1_err = {d[1]:3.5f}, p2_err = {d[2]:3.5f}, p3_err = {d[3]:3.5f}, p4_err = {d[4]:3.5f}')
print(f'\nReduced chi-squared for f1: {chisqr:3.5f}')
I know for a fact this code worked before. It was printing a fitted plot, but now the output is just:
<Figure size 750x500 with 1 Axes>.
Please help
I ran part of your script that relates to matplotlib:
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('sample_data.txt', skiprows = 1) #<============ Just to test my sample data
x = data[:,0]
y = data[:,1]
ymod = y #<============ Just to test
plt.figure(figsize=(7.5,5))
plt.rc('font', size=16)
plt.plot(x,y,'.b')
plt.plot(x,ymod,'r')
plt.show()
It works very well. Output:
I'd suggest that you run a simple code (removing curve_fit function, etc.) to see if your matplotlib works.

Use Python lmfit with a variable number of parameters in function

I am trying to deconvolve complex gas chromatogram signals into individual gaussian signals. Here is an example, where the dotted line represents the signal I am trying to deconvolve.
I was able to write the code to do this using scipy.optimize.curve_fit; however, once applied to real data the results were unreliable. I believe being able to set bounds to my parameters will improve my results, so I am attempting to use lmfit, which allows this. I am having a problem getting lmfit to work with a variable number of parameters. The signals I am working with may have an arbitrary number of underlying gaussian components, so the number of parameters I need will vary. I found some hints here, but still can't figure it out...
Creating a python lmfit Model with arbitrary number of parameters
Here is the code I am currently working with. The code will run, but the parameter estimates do not change when the model is fit. Does anyone know how I can get my model to work?
import numpy as np
from collections import OrderedDict
from scipy.stats import norm
from lmfit import Parameters, Model
def add_peaks(x_range, *pars):
y = np.zeros(len(x_range))
for i in np.arange(0, len(pars), 3):
curve = norm.pdf(x_range, pars[i], pars[i+1]) * pars[i+2]
y = y + curve
return(y)
# generate some fake data
x_range = np.linspace(0, 100, 1000)
peaks = [50., 40., 60.]
a = norm.pdf(x_range, peaks[0], 5) * 2
b = norm.pdf(x_range, peaks[1], 1) * 0.1
c = norm.pdf(x_range, peaks[2], 1) * 0.1
fake = a + b + c
param_dict = OrderedDict()
for i in range(0, len(peaks)):
param_dict['pk' + str(i)] = peaks[i]
param_dict['wid' + str(i)] = 1.
param_dict['mult' + str(i)] = 1.
# In case, you'd like to see the plot of fake data
#y = add_peaks(x_range, *param_dict.values())
#plt.plot(x_range, y)
#plt.show()
# Initialize the model and fit
pmodel = Model(add_peaks)
params = pmodel.make_params()
for i in param_dict.keys():
params.add(i, value=param_dict[i])
result = pmodel.fit(fake, params=params, x_range=x_range)
print(result.fit_report())
I think you would be better off using lmfits ability to build composite model.
That is, with a single peak defined with
from scipy.stats import norm
def peak(x, amp, center, sigma):
return amp * norm.pdf(x, center, sigma)
(see also lmfit.models.GaussianModel), you can build a model with many peaks:
npeaks = 3
model = Model(peak, prefix='p1_')
for i in range(1, npeaks):
model = model + Model(peak, prefix='p%d_' % (i+1))
params = model.make_params()
Now model will be a sum of 3 Gaussian functions, and the params created for that model will have names like p1_amp, p1_center, p2_amp, ..., which you can add sensible initial values and/or bounds and/or constraints.
Given your example data, you could pass in initial values to make_params like
params = model.make_params(p1_amp=2.0, p1_center=50., p1_sigma=2,
p2_amp=0.2, p2_center=40., p2_sigma=2,
p3_amp=0.2, p3_center=60., p3_sigma=2)
result = model.fit(fake, params, x=x_range)
I was able to find a solution here:
https://lmfit.github.io/lmfit-py/builtin_models.html#example-3-fitting-multiple-peaks-and-using-prefixes
Building on the code above, the following accomplishes what I was trying to do...
from lmfit.models import GaussianModel
gauss1 = GaussianModel(prefix='g1_')
gauss2 = GaussianModel(prefix='g2_')
gauss3 = GaussianModel(prefix='g3_')
gauss4 = GaussianModel(prefix='g4_')
gauss5 = GaussianModel(prefix='g5_')
gauss = [gauss1, gauss2, gauss3, gauss4, gauss5]
prefixes = ['g1_', 'g2_', 'g3_', 'g4_', 'g5_']
mod = np.sum(gauss[0:len(peaks)])
pars = mod.make_params()
for i, prefix in zip(range(0, len(peaks)), prefixes[0:len(peaks)]):
pars[prefix + 'center'].set(peaks[i])
init = mod.eval(pars, x=x_range)
out = mod.fit(fake, pars, x=x_range)
print(out.fit_report(min_correl=0.5))
out.plot_fit()
plt.show()

How to specify size for bernoulli distribution with pymc3?

In trying to make my way through Bayesian Methods for Hackers, which is in pymc, I came across this code:
first_coin_flips = pm.Bernoulli("first_flips", 0.5, size=N)
I've tried to translate this to pymc3 with the following, but it just returns a numpy array, rather than a tensor (?):
first_coin_flips = pm.Bernoulli("first_flips", 0.5).random(size=50)
The reason the size matters is that it's used later on in a deterministic variable. Here's the entirety of the code that I have so far:
import pymc3 as pm
import matplotlib.pyplot as plt
import numpy as np
import mpld3
import theano.tensor as tt
model = pm.Model()
with model:
N = 100
p = pm.Uniform("cheating_freq", 0, 1)
true_answers = pm.Bernoulli("truths", p)
print(true_answers)
first_coin_flips = pm.Bernoulli("first_flips", 0.5)
second_coin_flips = pm.Bernoulli("second_flips", 0.5)
# print(first_coin_flips.value)
# Create model variables
def calc_p(true_answers, first_coin_flips, second_coin_flips):
observed = first_coin_flips * true_answers + (1-first_coin_flips) * second_coin_flips
# NOTE: Where I think the size param matters, since we're dividing by it
return observed.sum() / float(N)
calced_p = pm.Deterministic("observed", calc_p(true_answers, first_coin_flips, second_coin_flips))
step = pm.Metropolis(model.free_RVs)
trace = pm.sample(1000, tune=500, step=step)
pm.traceplot(trace)
html = mpld3.fig_to_html(plt.gcf())
with open("output.html", 'w') as f:
f.write(html)
f.close()
And the output:
The coin flips and uniform cheating_freq output look correct, but the observed doesn't look like anything to me, and I think it's because I'm not translating that size param correctly.
The pymc3 way to specify the size of a Bernoulli distribution is by using the shape parameter, like:
first_coin_flips = pm.Bernoulli("first_flips", 0.5, shape=N)

Python matplotli.psd fitting

I'm trying to make a fit using matplotlib.psd function. My datafile has 8 columns with displacement and speed for a particle (positionX, positionY, positionZ, AveragePositionXYZ, speedX, speedY, speedZ, AverageSpeedXYZ). Using the positionX for example, I try to get the Power Spectrum with matplotlib.psd:
power, freqs = plt.psd(data, len(data), Fs = 256, scale_by_freq=True, return_line=0)
Then, I try to make a curve fitting using linear regression with scipy stas.linregress:
slope, inter, r2, p, stderr = stats.linregress(x, y)
However, my results are very bad. I try to plot with:
line = (inter + slope * (10 * np.log10(freqs)))
plt.semilogx(freqs, line)
plt.show()
And get the following image:
I know that I have a lot of mistakes, and I try to get some solutions in the web. However, I have not had much success. So, I'm asking if there's someone here that could help me.
The datafile has the following format (first 10 lines):
1.50000000,0.00000000,0.00000000,0.50000000,0.00000000,0.00000000,0.00000000,0.00000000
1.49788889,0.00000000,0.00000000,0.49929630,-0.06333333,0.00000000,0.00000000,-0.02111111
1.49367078,0.00000005,0.00000000,0.49789028,-0.12654314,0.00000165,0.00000000,-0.04218050
1.48735391,0.00000027,0.00000000,0.49578473,-0.18950635,0.00000659,0.00000000,-0.06316659
1.47895054,0.00000082,0.00000000,0.49298379,-0.25210085,0.00001647,0.00000000,-0.08402813
1.46847701,0.00000192,0.00000000,0.48949298,-0.31420588,0.00003296,0.00000000,-0.10472431
1.45595360,0.00000385,0.00000000,0.48531915,-0.37570257,0.00005769,0.00000000,-0.12521496
1.44140445,0.00000692,0.00000000,0.48047046,-0.43647431,0.00009232,0.00000000,-0.14546066
1.42485754,0.00001154,0.00000000,0.47495636,-0.49640723,0.00013851,0.00000000,-0.16542291
1.40634452,0.00001814,0.00000000,0.46878755,-0.55539066,0.00019789,0.00000000,-0.18506426
My complete Python code is as follows:
import matplotlib.pyplot as plt
from scipy import stats
import numpy as np
filename = 'datafile.txt'
# Load data
file = np.genfromtxt(filename,
skip_header = 0,
skip_footer = 0,
delimiter = ',',
dtype = 'float32',
filling_values = 0,
usecols = (0, 1, 2, 3, 4, 5, 6, 7),
names = ['posX', 'posY', 'posZ', 'posMedias', 'velX', 'velY', 'velZ', 'velMedias'])
# Map values
posX = file['posX']
posY = file['posY']
posZ = file['posZ']
posMedia = file['posMedias']
velX = file['velX']
velY = file['velY']
velZ = file['velZ']
velMedia = file['velMedias']
# Column data that will be used
data = posMedia
# PSD calculation
power, freqs = plt.psd(data, len(data), Fs = 256, scale_by_freq=True, return_line=0)
# Linear fit
x = np.log10(freqs[1:])
y = np.log10(power[1:])
slope, inter, r2, p, stderr = stats.linregress(x, y)
print(slope, inter)
# Plot
line = (inter + slope * (10 * np.log10(freqs)))
plt.semilogx(freqs, line)
plt.show()
Thank you so much!

issue with sklearn.mixture.GMM (Gaussian Mixture Model)

I'm new to scikit-lear and GMM in general... I have some problem with the fit quality of a Gaussian Mixture Model in python (scikit-learn) .
I have an array of data, which you may find at DATA HERE that I want to fit with a GMM with n = 2 components.
As benchmark I superimpose a Normal fit.
Errors/weirdness:
setting n = 1 components, I cannot recover with GMM(1) the Normal benchmark fit
setting n = 2 components, the Normal fit is better than GMM(2) fit
GMM(n) seems to provide always the same fit...
Here is what I get: what I'm doing wrong here? (the picture displays the fits with GMM(2)). Thanks in advance for your help.
Code below (to run it, save data in the same folder)
from numpy import *
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
from collections import OrderedDict
from scipy.stats import norm
from sklearn.mixture import GMM
# Upload the data: "epsi" (array of floats)
file_xlsx = './db_X.xlsx'
data = pd.read_excel(file_xlsx)
epsi = data["epsi"].values;
t_ = len(epsi);
# Normal fit (for benchmark)
epsi_grid = arange(min(epsi),max(epsi)+0.001,0.001);
mu = mean(epsi);
sigma2 = var(epsi);
normal = norm.pdf(epsi_grid, mu, sqrt(sigma2));
# TENTATIVE - Gaussian mixture fit
gmm = GMM(n_components = 2); # fit quality doesn't improve if I set: covariance_type = 'full'
gmm.fit(reshape(epsi,(t_,1)));
gauss_mixt = exp(gmm.score(reshape(epsi_grid,(len(epsi_grid),1))));
# same result if I apply the definition of pdf of a Gaussian mixture:
# pdf_mixture = w_1 * N(mu_1, sigma_1) + w_2 * N(mu_2, sigma_2)
# as suggested in:
# http://stackoverflow.com/questions/24878729/how-to-construct-and-plot-uni-variate-gaussian-mixture-using-its-parameters-in-p
#
#gauss_mixt = array([p * norm.pdf(epsi_grid, mu, sd) for mu, sd, p in zip(gmm.means_.flatten(), sqrt(gmm.covars_.flatten()), gmm.weights_)]);
#gauss_mixt = sum(gauss_mixt, axis = 0);
# Create a figure showing the comparison between the estimated distributions
# setting the figure object
fig = plt.figure(figsize = (10,8))
fig.set_facecolor('white')
ax = plt.subplot(111)
# colors
red = [0.9, 0.3, 0.0];
grey = [0.9, 0.9, 0.9];
green = [0.2, 0.6, 0.3];
# x-axis limits
q_inf = float(pd.DataFrame(epsi).quantile(0.0025));
q_sup = float(pd.DataFrame(epsi).quantile(0.9975));
ax.set_xlim([q_inf, q_sup])
# empirical pdf of data
nb = int(10*log(t_));
ax.hist(epsi, bins = nb, normed = True, color = grey, edgecolor = 'k', label = "Empirical");
# Normal fit
ax.plot(epsi_grid, normal, color = green, lw = 1.0, label = "Normal fit");
# Gaussian Mixture fit
ax.plot(epsi_grid, gauss_mixt, color = red, lw = 1.0, label = "GMM(2)");
# title
ax.set_title("Issue: Normal fit out-performs the GMM fit?", size = 14)
# legend
ax.legend(loc='upper left');
plt.tight_layout()
plt.show()
The problem was the bound on the single components variances min_covar, which is by default 1e-3 and is meant to prevent overfitting.
Lowering that limit solved the problem (see picture):
gmm = GMM(n_components = 2, min_covar = 1e-12)

Categories