Robust Multivariate Polynomial Regression in Python - python

Is there an easy way to do a multivariate robust polynomial regression in Python? E.g.
y = a + bx_1 + cx_2 + dx_1x_2 + ex_1^2 + fx_2^2
and possibly higher degree terms, where a,b,c,d,e,f are constants and the x_i are the dependent variables (there could be more than 2).
I have a set of data plagued by outliers, so the normality assumption does not hold. I have not much knowledge of regressions, but I've found that there are 'robust' methods to deal with this problem. Unfortunately I have not been able to find an easy way to do this in Python, without having to code the entire method. Have I overlooked something? Or should I maybe use another more suited language, such as R? (Since I don't know anything of R, and this is part of a larger problem which I've coded in Python, I would rather do it in Python. But maybe learning R is more efficient than trying to do this kind of stuff in Python.)
Thanks in advance.

R is very well suited for this, and there are libraries that let you talk to R from Python, like RPy2
http://rpy.sourceforge.net/rpy2.html
And here is a tutorial on robust regression with R:
http://www.ats.ucla.edu/stat/r/dae/rreg.htm

Related

Algorithms for implementing analytic calculus

I'm interested in writing a program in Python that can parse a mathematical expression and carry out operations on it in algebraic form. Most of these will be pretty easy, something like 2x+5x+xy. My program would take this and return the simplified 7x+xy. However I'd also like to eventually extend it's functionality to calculus as well, so if it's given something like integrate: 5x it should be able to return (5x^2)/2 + c. I'm aware that for this the sympy library can be used, but I'm interested in implementing it myself as a learning process. Are there any algorithims that exist to carry out algebraic calculus for any arbitrary integral/differential I could implement, or is this a more daunting task I've set for myself here? Any reasources would be appreciated.
It might be worthwhile to ask why you're looking at doing this, and understanding the sorts of things that you want to be able to integrate, because I think you might not know that you've actually stumbled onto a very difficult question.
If you are looking for an exercise to learn about simple low level polynomial integration and python - ie. something of the form x^2 + 2x + 7 - and the idea is to learn about string manipulation and simple math, then by all means I think you should attempt to do this and see how far you can get. Look at the various math related libaries (like Sympy or Numpy) and see what you can get.
If you are looking to produce a powerful tool to be used for Calculus, then I think you should rethink your idea and consider using something like Wolfram Alpha. A project like this could take years and you might not get a good result from it.

Method of moments in scipy?

Following from this question, is there a way to use any method other than MLE (maximum-likelihood estimation) for fitting a continuous distribution in scipy? I think that my data may be resulting in the MLE method diverging, so I want to try using the method of moments instead, but I can't find out how to do it in scipy. Specifically, I'm expecting to find something like
scipy.stats.genextreme.fit(data, method=method_of_moments)
Does anyone know if this is possible, and if so how to do it?
Few things to mention:
1) scipy does not have support for GMM. There is some support for GMM via statsmodels (http://statsmodels.sourceforge.net/stable/gmm.html), you can also access many R routines via Rpy2 (and R is bound to have every flavour of GMM ever invented): http://rpy.sourceforge.net/rpy2/doc-2.1/html/index.html
2) Regarding stability of convergence, if this is the issue, then probably your problem is not with the objective being maximised (eg. likelihood, as opposed to a generalised moment) but with the optimiser. Gradient optimisers can be really fussy (or rather, the problems we give them are not really suited for gradient optimisation, leading to poor convergence).
If statsmodels and Rpy do not give you the routine you need, it is perhaps a good idea to write out your moment computation out verbose, and see how you can maximise it yourself - perhaps a custom-made little tool would work well for you?

Multilateration Algorithm

I'm trying to call upon the famous multilateration algorithm in order to pinpoint a radiation emission source given a set of arrival times for various detectors. I have the necessary data, but I'm still having trouble implementing this calculation; I am relatively new with Python.
I know that, if I were to do this by hand, I would use matrices and carry out elementary row operations in order to find my 3 unknowns (x,y,z), but I'm not sure how to code this. Is there a way to have Python implement ERO, or is there a better way to carry out my computation?
Depending on your needs, you could try:
NumPy if your interested in numerical solutions. As far as I remember, it could solve linear equations. Don't know how it deals with non-linear resolution.
SymPy for symbolic math. It solves symbolically linear equations ... according to their main page.
The two above are "generic" math packages. I doubt you will find (easily) any dedicated (and maintained) library for your specific need. Their was already a question on that topic here: Multilateration of GPS Coordinates

How to convert Math Formula to Python code?

Are there any easy ways to convert mathematical formulas to Python code?
Perhaps translators, web reference, specific book chapters, anything ~
For regular expressions there are programs such as Kodos and sites such as pythonregex.com, so I was hoping there would be something similar for formula notation and Python.
No, this isn't possible in general. There are mathematical functions that aren't computable (for example, see wikipedia/Halting_problem). There are other mathematical functions where it's just not obvious how to code them up (consider a difficult integral or differential equations). There are many books written on finding numerical solutions to these sorts of problems (you can find some links here: wikipedia/Numerical_analysis).
For simple cases, you can transcribe mathematical formulae directly, but any automated means of translation would require a formal language for writing mathematical formulae in what would be a programming language in itself. This would beg the question, since you would be trading writing mathematical formulae in one language with writing them in another.
You could try to make your own with sympy and pyparsing.
so how to convert a math formula into python?
well, i myself dont know pretty much of python (i am just a beginner) but i have got some tips to help you:
first, you should like calculate the formula using x and y (or a and b) and write that down on a paper.
lets say phythagoras theory ( which is kinda basic).
c2 = a2 + b2;
so you have to input those a and b and write this formula to calculate c (which will be c = sqrt a2 + b**2)
now lets say you dont know this formulas (i can hear you). then? well, you should search it up on internet, listen to its explanation and try to transform them to python.
this is like noobs advice, but you know - no easy ways in programming.
(well, just search it on internet, i guess?)

Integrate stiff ODEs with Python

I'm looking for a good library that will integrate stiff ODEs in Python. The issue is, scipy's odeint gives me good solutions sometimes, but the slightest change in the initial conditions causes it to fall down and give up. The same problem is solved quite happily by MATLAB's stiff solvers (ode15s and ode23s), but I can't use it (even from Python, because none of the Python bindings for the MATLAB C API implement callbacks, and I need to pass a function to the ODE solver). I'm trying PyGSL, but it's horrendously complex. Any suggestions would be greatly appreciated.
EDIT: The specific problem I'm having with PyGSL is choosing the right step function. There are several of them, but no direct analogues to ode15s or ode23s (bdf formula and modified Rosenbrock if that makes sense). So what is a good step function to choose for a stiff system? I have to solve this system for a really long time to ensure that it reaches steady-state, and the GSL solvers either choose a miniscule time-step or one that's too large.
If you can solve your problem with Matlab's ode15s, you should be able to solve it with the vode solver of scipy. To simulate ode15s, I use the following settings:
ode15s = scipy.integrate.ode(f)
ode15s.set_integrator('vode', method='bdf', order=15, nsteps=3000)
ode15s.set_initial_value(u0, t0)
and then you can happily solve your problem with ode15s.integrate(t_final). It should work pretty well on a stiff problem.
(See also Link)
Python can call C. The industry standard is LSODE in ODEPACK. It is public-domain. You can download the C version. These solvers are extremely tricky, so it's best to use some well-tested code.
Added: Be sure you really have a stiff system, i.e. if the rates (eigenvalues) differ by more than 2 or 3 orders of magnitude. Also, if the system is stiff, but you are only looking for a steady-state solution, these solvers give you the option of solving some of the equations algebraically. Otherwise, a good Runge-Kutta solver like DVERK will be a good, and much simpler, solution.
Added here because it would not fit in a comment: This is from the DLSODE header doc:
C T :INOUT Value of the independent variable. On return it
C will be the current value of t (normally TOUT).
C
C TOUT :IN Next point where output is desired (.NE. T).
Also, yes Michaelis-Menten kinetics is nonlinear. The Aitken acceleration works with it, though. (If you want a short explanation, first consider the simple case of Y being a scalar. You run the system to get 3 Y(T) points. Fit an exponential curve through them (simple algebra). Then set Y to the asymptote and repeat. Now just generalize to Y being a vector. Assume the 3 points are in a plane - it's OK if they're not.) Besides, unless you have a forcing function (like a constant IV drip), the MM elimination will decay away and the system will approach linearity. Hope that helps.
PyDSTool wraps the Radau solver, which is an excellent implicit stiff integrator. This has more setup than odeint, but a lot less than PyGSL. The greatest benefit is that your RHS function is specified as a string (typically, although you can build a system using symbolic manipulations) and is converted into C, so there are no slow python callbacks and the whole thing will be very fast.
I am currently studying a bit of ODE and its solvers, so your question is very interesting to me...
From what I have heard and read, for stiff problems the right way to go is to choose an implicit method as a step function (correct me if I am wrong, I am still learning the misteries of ODE solvers). I cannot cite you where I read this, because I don't remember, but here is a thread from gsl-help where a similar question was asked.
So, in short, seems like the bsimp method is worth taking a shot, although it requires a jacobian function. If you cannot calculate the Jacobian, I will try with rk2imp, rk4imp, or any of the gear methods.

Categories