Using robust linear methods from python module "statsmodels" with weights?

Using robust linear methods from python module "statsmodels" with weights? - python

I have some data,y with errors, y_err, measured at x. I need to fit a straight line to this mimicking some code from matlab specifically the fit method with robust "on" and giving the weights as 1/yerr. The matlab documentation says it uses the bisquare method (also know as the TukeyBiweight method). My code so far is..
rlm_model = sm.RLM(y, x, M=sm.robust.norms.TukeyBiweight())
rlm_results = rlm_model.fit()
print rlm_results.params
however I need to find a way of including weights derived from yerr.
Hope people can help, this is the first time I have tried to used the statsmodel module.
In response to the first answer:
I tried;
y=y*(yerr)
x=x*(yerr)
x=sm.add_constant(x, prepend=False)
rlm_model = sm.RLM(y, x, M=sm.robust.norms.TukeyBiweight())
results=rlm_model.fit()
but sadly this doesnt match the matlab function.

Weights reflecting heteroscedasticity, that is unequal variance across observations, are not yet supported by statsmodels RLM.
As a workaround, you can divide your y and x by yerr in the call to RLM.
I think, in analogy to weighted least squares, the parameter estimates, their standard errors and other statistics are still correct in this case. But I haven't checked yet.
as reference:
Carroll, Raymond J., and David Ruppert. "Robust estimation in heteroscedastic linear models." The annals of statistics (1982): 429-441.
They also estimate the variance function, but for fixed weights 1/sigma_i the optimization just uses
(y_i - x_i beta) / sigma_i
The weights 1/sigma_i will only be relative weights and will still be multiplied with a robust estimate of the scale of the errors.

Related

Is MATLAB's PIQE function wrong?

I'm trying to train a Deep Learning model for image super resolution, and I wanted to implement the PIQE score as a loss function. Since I will be training the model with pytorch, I was trying to make an own Python implementation of the algorithm to compute the PIQE score.
As a first step, I looked at the MATLAB implementation of piqe (the link takes you to the main page but I am looking at the source code) to see how it's done and then adapt it to Python. There is one thing that bothers me, however.
The PIQE score starts off by calculating the Mean-Substracted Contrast-Normalized coefficients with the following formula:
But the matlab code at that step looks like this:
mu = imgaussfilt(ipImage,7/6,'FilterSize',7,'Padding','replicate');
sigma = sqrt(abs(imgaussfilt(ipImage.*ipImage,7/6,'FilterSize',7,'Padding','replicate') - mu.*mu));
imnorm = (ipImage-mu)./(sigma+1);
I'm puzzled about the calculation of the variance, sigma. In the algorithm of the paper, at each pixel, the mean of the 7x7 neighborhood is calculated and then subtracted from each value of said 7x7 neighborhood. Then, the differences are squared and multiplied by its corresponding Gaussian weight w(k,l)
Instead, the MATLAB algorithm multiplies the Gaussian weighting (by using imgausssfilt) with the squared pixel values, and then subtracts the squared means from that matrix, taking the absolute values of that operation. Correct me if I'm wrong, but isn't this a case of mistakenly using (a-b)² = a² - b² ?
Basically my question is if you could kindly confirm whether what I said before is true, or I misinterpreted the MATLAB code. Thanks in advance!

I understand why you are confused, but both are right. It uses the classical identity
Var(X) = E [(X-E(X))^2]= E(X^2) - E(X)^2
Just multiply your (I-mu)^2 out and compare the result with the definition of mu, then you will see that they cancel.

scikit KernelPCA unstable results

I'm trying to use KernelPCA for reducing the dimensionality of a dataset to 2D (both for visualization purposes and for further data analysis).
I experimented computing KernelPCA using a RBF kernel at various values of Gamma, but the result is unstable:
(each frame is a slightly different value of Gamma, where Gamma is varying continuously from 0 to 1)
Looks like it is not deterministic.
Is there a way to stabilize it/make it deterministic?
Code used to generate transformed data:
def pca(X, gamma1):
kpca = KernelPCA(kernel="rbf", fit_inverse_transform=True, gamma=gamma1)
X_kpca = kpca.fit_transform(X)
#X_back = kpca.inverse_transform(X_kpca)
return X_kpca

KernelPCA should be deterministic and evolve continuously with gamma. It is different from RBFSampler that does have built-in randomness in order to provide an efficient (more scalable) approximation of the RBF kernel.
However what can change in KernelPCA is the order of the principal components: in scikit-learn they are returned sorted in order of descending eigenvalue, so if you have 2 eigenvalues close to each other it could be that the order changes with gamma.
My guess (from the gif) is that this is what is happening here: the axes along which you are plotting are not constant so your data seems to jump around.
Could you provide the code you used to produce the gif?
I'm guessing it is a plot of the data points along the 2 first principal components but it would help to see how you produced it.
You could try to further inspect it by looking at the values of kpca.alphas_ (the eigenvectors) for each value of gamma.
Hope this makes sense.
EDIT: As you noted it looks like the points are reflected against the axis, the most plausible explanation is that one of the eigenvector flips sign (note this does not affect the eigenvalue).
I put in a simple gist to reproduce the issue (you'll need a Jupyter notebook to run it). You can see the sign-flipping when you change the value of gamma.
As a complement note that this kind of discrepancy happens only because you fit several times the KernelPCA object several times. Once you settled with a particular gamma value and you've fit kpca once you can call transform several times and get consistent results.
For the classical PCA the docs mention that:
Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.
I don't know about the behavior of a single KernelPCA object that you would fit several times (I did not find anything relevant in the docs).
It does not apply to your case though as you have to fit the object with several gamma values.

So... I can't give you a definitive answer on why KernelPCA is not deterministic. The behavior resembles the differences I've observed between the results of PCA and RandomizedPCA. PCA is deterministic, but RandomizedPCA is not, and sometimes the eigenvectors are flipped in sign relative to the PCA eigenvectors.
That leads me to my vague idea of how you might get more deterministic results....maybe. Use RBFSampler with a fixed seed:
def pca(X, gamma1):
kernvals = RBFSampler(gamma=gamma1, random_state=0).fit_transform(X)
kpca = PCA().fit_transform(X)
X_kpca = kpca.fit_transform(X)
return X_kpca

What is scipy's equivalent to matlab's `mle` function?

I'm trying to fit some data to a mixed model using an expectation maximization approach. In Matlab, the code is as follows
% mixture model's PDF
mixtureModel = ...
#(x,pguess,kappa) pguess/180 + (1-pguess)*exp(kappa*cos(2*x/180*pi))/(180*besseli(0,kappa));
% Set up parameters for the MLE function
options = statset('mlecustom');
options.MaxIter = 20000;
options.MaxFunEvals = 20000;
% fit the model using maximum likelihood estimate
params = mle(data, 'pdf', mixtureModel, 'start', [.1 1/10], ...
'lowerbound', [0 1/50], 'upperbound', [1 50], ...
'options', options);
The data parameter is a 1-D vector of floats.
I'm wondering how the equivalent computation can be achieved in Python. I looked into scipy.optimize.minimize, but this doesn't seem to be a drop-in replacement for Matlab's mle.
I'm a bit lost and overwhelmed, can somebody point me in the right direction (ideally with some example code?)
Thanks very much in advance!
Edit: In the meantime I've found this, but I'm still rather lost as (1) this seems primarily focused on mixed guassian models (which mine is not) and (2) my mathematical skills are severely lacking. That said, I'll happily accept an answer that elucidates how this notebook relates to my specific problem!

This is a mixture model (not mixed model) of uniform and von mises distributions whose parameters you are trying to infer using direct maximum likelihood estimation (not EM, although that may be more appropriate). You can find theses written on this exact problem if you search on the internet. SciPy doesn't have anything that would be as clear a choice as matlab's fmincon which it uses as its default in your code, but you could look for scipy optimization methods that allow bounds on parameters. The scipy interface is different from that of matlab's mle, and you will want to pass the data in the 'args' argument of the scipy minimization functions, whereas the pguess and kappa parameters will need to be represented by a parameter array of length 2.

I believe the scikit-learn toolkit has what you need:
http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GMM.html.
Gaussian Mixture Model
Representation of a Gaussian mixture model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a GMM distribution.
Initializes parameters such that every mixture component has zero mean and identity covariance.

curve fitting in scipy is of poor quality. How can I improve it?

I'm doing a fit of a set results to a predicted function. The function might be interpreted as linear but I might have to change it a little so I am doing curve fitting instead of linear regression. I use the curve_fit function in scipy. Here is how I use it
kappa = 1
alpha=2
popt,pcov = curve_fit(fitFunc1,self.X[0:3],self.Y[0:3],sigma=self.Err[0:3],p0=[kappa,alpha])
and here is fitFunc1
def fitFunc1(X,kappa,alpha):
out = []
for x in X:
y = log(kappa)
y += 4*log(pi)
y += alpha*x
y -= 2*log(2)
out.append(-y)
return np.array(out)
Here is an example of the fit . The green line is a matlab fit. The red one is a scipy fit. I carry the fist over the first three dots.

You are using non-linear fitting routines to fit the data, not linear least-squares as invoked by A\b. The result is that the matlab and/or scipy minimization routines are getting stuck in local minima during the optimizations, leading to different results.
You should get the same results (to within numerical precision) if you apply logs to the raw data prior to linear fitting with A\b (in matlab).
edit
Inspecting function fitFunc1 it looks like the x/y data have already been transformed prior to the fit within scipy.
I performed a linear fit with the data shown, using matlab. The results using linear least squares with the operation polyfit(x,y,1) (essentially a linear fit) is very similar to the scipy result:
In any case, the data looks piecewise linear so a better solution may be to attempt a piecewise linear fit. On the other the log transformation can do all sorts of unwanted stuff, so performing nonlinear fits on the original data without performing a log tranform may be the best solution.

If you don't mind having a little bit of extra work I suggest using PyMinuit or iMinuit, both are minimisation packages based on Seal Minuit.
Then you can minimise a Chi Sq function or maximise the likelihood of your data in relation to your fit function. They also provide all the errors and everything you would like to know about the fit.
Hope this helps! xD

Maximum Likelihood Estimate pseudocode

I need to code a Maximum Likelihood Estimator to estimate the mean and variance of some toy data. I have a vector with 100 samples, created with numpy.random.randn(100). The data should have zero mean and unit variance Gaussian distribution.
I checked Wikipedia and some extra sources, but I am a little bit confused since I don't have a statistics background.
Is there any pseudo code for a maximum likelihood estimator? I get the intuition of MLE but I cannot figure out where to start coding.
Wiki says taking argmax of log-likelihood. What I understand is: I need to calculate log-likelihood by using different parameters and then I'll take the parameters which gave the maximum probability. What I don't get is: where will I find the parameters in the first place? If I randomly try different mean & variance to get a high probability, when should I stop trying?

I just came across this, and I know its old, but I'm hoping that someone else benefits from this. Although the previous comments gave pretty good descriptions of what ML optimization is, no one gave pseudo-code to implement it. Python has a minimizer in Scipy that will do this. Here's pseudo code for a linear regression.
# import the packages
import numpy as np
from scipy.optimize import minimize
import scipy.stats as stats
import time
# Set up your x values
x = np.linspace(0, 100, num=100)
# Set up your observed y values with a known slope (2.4), intercept (5), and sd (4)
yObs = 5 + 2.4*x + np.random.normal(0, 4, 100)
# Define the likelihood function where params is a list of initial parameter estimates
def regressLL(params):
# Resave the initial parameter guesses
b0 = params[0]
b1 = params[1]
sd = params[2]
# Calculate the predicted values from the initial parameter guesses
yPred = b0 + b1*x
# Calculate the negative log-likelihood as the negative sum of the log of a normal
# PDF where the observed values are normally distributed around the mean (yPred)
# with a standard deviation of sd
logLik = -np.sum( stats.norm.logpdf(yObs, loc=yPred, scale=sd) )
# Tell the function to return the NLL (this is what will be minimized)
return(logLik)
# Make a list of initial parameter guesses (b0, b1, sd)
initParams = [1, 1, 1]
# Run the minimizer
results = minimize(regressLL, initParams, method='nelder-mead')
# Print the results. They should be really close to your actual values
print results.x
This works great for me. Granted, this is just the basics. It doesn't profile or give CIs on the parameter estimates, but its a start. You can also use ML techniques to find estimates for, say, ODEs and other models, as I describe here.
I know this question was old, hopefully you've figured it out since then, but hopefully someone else will benefit.

If you do maximum likelihood calculations, the first step you need to take is the following: Assume a distribution that depends on some parameters. Since you generate your data (you even know your parameters), you "tell" your program to assume Gaussian distribution. However, you don't tell your program your parameters (0 and 1), but you leave them unknown a priori and compute them afterwards.
Now, you have your sample vector (let's call it x, its elements are x[0] to x[100]) and you have to process it. To do so, you have to compute the following (f denotes the probability density function of the Gaussian distribution):
f(x[0]) * ... * f(x[100])
As you can see in my given link, f employs two parameters (the greek letters µ and σ). You now have to calculate the values for µ and σ in a way such that f(x[0]) * ... * f(x[100]) takes the maximum possible value.
When you've done that, µ is your maximum likelihood value for the mean, and σ is the maximum likelihood value for standard deviation.
Note that I don't explicitly tell you how to compute the values for µ and σ, since this is a quite mathematical procedure I don't have at hand (and probably I would not understand it); I just tell you the technique to get the values, which can be applied to any other distributions as well.
Since you want to maximize the original term, you can "simply" maximize the logarithm of the original term - this saves you from dealing with all these products, and transforms the original term into a sum with some summands.
If you really want to calculate it, you can do some simplifications that lead to the following term (hope I didn't mess up anything):
Now, you have to find values for µ and σ such that the above beast is maximal. Doing that is a very nontrivial task called nonlinear optimization.
One simplification you could try is the following: Fix one parameter and try to calculate the other. This saves you from dealing with two variables at the same time.

You need a numerical optimisation procedure. Not sure if anything is implemented in Python, but if it is then it'll be in numpy or scipy and friends.
Look for things like 'the Nelder-Mead algorithm', or 'BFGS'. If all else fails, use Rpy and call the R function 'optim()'.
These functions work by searching the function space and trying to work out where the maximum is. Imagine trying to find the top of a hill in fog. You might just try always heading up the steepest way. Or you could send some friends off with radios and GPS units and do a bit of surveying. Either method could lead you to a false summit, so you often need to do this a few times, starting from different points. Otherwise you may think the south summit is the highest when there's a massive north summit overshadowing it.

As joran said, the maximum likelihood estimates for the normal distribution can be calculated analytically. The answers are found by finding the partial derivatives of the log-likelihood function with respect to the parameters, setting each to zero, and then solving both equations simultaneously.
In the case of the normal distribution you would derive the log-likelihood with respect to the mean (mu) and then deriving with respect to the variance (sigma^2) to get two equations both equal to zero. After solving the equations for mu and sigma^2, you'll get the sample mean and sample variance as your answers.
See the wikipedia page for more details.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.