Converting from R to Python, trying to understand a line - python

I have a fairly simple question. I have been converting some statistical analysis code from R to Python. Up until now, I have been doing just fine, but I have gotten stuck on this particular line:
nlsfit <- nls(N~pnorm(m, mean=mean, sd=sd),data=data4fit,start=list(mean=mu, sd=sig), control=list(maxiter=100,warnOnly = TRUE))
Essentially, the program is calculating the non-linear least-squares fit for a set of data, the "nls" command. In the original text, the "tilde" looks like an "enye", I'm not sure if that is significant.
As I understand the equivalent of pnorm in Python is norm.cdf from from scipy.stats. What I want to know is, what does the "tilde/enye" do before the pnorm function is invoked. "m" is a predefined variable, while "mean" and "sd" are not.
I also found some code, essentially reproducing nls in Python: nls Python code, however, because of the date of the post (2013), I was wondering if there are any more recent equivalents, preferably written in Pyton 3.
Any advice is appreiated, thanks!

As you can see from ?nls: the first argument in nsl is formula:
formula: a nonlinear model formula including variables and parameters.
Will be coerced to a formula if necessary
Now, if you do ?formula, we can read this:
The models fit by, e.g., the lm and glm functions are specified in a
compact symbolic form. The ~ operator is basic in the formation of
such models. An expression of the form y ~ model is interpreted as a
specification that the response y is modelled by a linear predictor
specified symbolically by model
Therefore, the ~ in your case nls join the response/dependent/regressand variable in the left with the regressors/explanatory variables in the right part of your nonlinear least squares.
Best!

This minimizes
sum((N - pnorm(m, mean=mean, sd=sd))^2)
using starting values for mean and sd specified in start. It will perform a maximum of 100 iterations and it will return instead of signalling an error in the case of termination before convergence.
The first argument to nls is an R formula which specifies the regression where the left hand side of the tilde (N) is the dependent variable and the right side is the function of the parameters (mean, sd) and data (m) used to predict it.
Note that formula objects do not have a fixed meaning in R but rather each function can interpret them in any way it likes. For example, formula objects used by nls are interpreted differently than formula objects used by lm. In nls the formula y ~ a + b * x would be used to specify a linear regression but in lm the same regression would be expressed as y ~ x .
See ?pnorm, ?nls, ?nls.control and ?formula .

Related

Normalising Schödinger Wavefunctions using np.linalg.norm

I've computed the eigenvalues and eigenstates of a Hamiltonian in Python. I have a matrix containing all the wavefunctions in discrete space psi. I'd like to normalise the total wavefunction (or the 'ket') (i.e the matrix of vectors) such that its modulus squared integrates to 1.
I've tried the following:
A= np.linalg.norm(abs(psi.T)**2)
normed_psi=psi.T/np.sqrt(A)
print(np.linalg.norm(normed_psi))
The matrix is transposed so I can access each state using psi[n].
However, the output of the print statement is:
20.44795885105457
When it should be 1.I feel like I'm not using linalg.norm correctly. I've also tried using my own integral function using the trapezium rule to no success.
I'm not really sure as to what to do at this point. Any help would be great.
It seems you're confusing np.linalg.norm and np.sum, up to the usual floating point issues these two snippets should be identical:
normed_psi = psi.T / np.sqrt(np.sum(psi.T**2))
normed_psi = psi.T / np.linalg.norm(psi.T)

How to give an arbitrary initial condition on odient in python

I'm trying to solve a first order linear differential equation in one variable, and am currently using the odient module in scipy.integrate. However, the initial condition it takes in $y_0$ is evaluated at the initial boundary of the domain $x_0$, while what I have is the value of $y$ at some random point $x$.
Suggestions on similar questions were to use solve_bvp, which doesn't quite solve my problem either.
How do I go about this?
Numerical integrators always march in only one direction from the initial point. To get a two-sided solution one would have to call the numerical integrator twice, forward and backward, for instance as
ta = np.linspace(x0,a,Na+1)
ya = odeint(f,y0,ta)
tb = np.linspace(x0,b,Nb+1)
yb = odeint(f,y0,tb)
You can leave these two parts separate for further uses like plotting, or join them into one array each
t=np.concatenate([ta[::-1],tb[1:]])
y=np.concatenate([ya[::-1],yb[1:]])

Constraint on parameters in lmfit

I am trying to fit 3 peaks using lmfit with a skewed-Voigt profile (this is not that important for my question). I want to set a constraint on the peaks centers of the form:
peak1 = SkewedVoigtModel(prefix='sv1_')
pars.update(peak1.make_params())
pars['sv1_center'].set(x)
peak2 = SkewedVoigtModel(prefix='sv2_')
pars.update(peak2.make_params())
pars['sv2_center'].set(1000+x)
peak3 = SkewedVoigtModel(prefix='sv3_')
pars.update(peak3.make_params())
pars['sv3_center'].set(2000+x)
Basically I want them to be 1000 apart from each other, but I need to fit for the actual shift, x. I know that I can force some parameters to be equal using pars['sv2_center'].set(expr='sv1_center'), but what I would need is pars['sv2_center'].set(expr='sv1_center'+1000) (which doesn't work just like that). How can I achieve what I need? Thank you!
Just do:
pars['sv2_center'].set(expr='sv1_center+1000')
pars['sv3_center'].set(expr='sv1_center+2000')
The constraint expression is a Python expression that will be evaluated every time the constrained parameter needs to get its value.

Building a filter with Python & MATLAB, results are not the same

I want to translate this MATLAB code into Python, I guess I did everything right, even though I didn't get the same results.
MATLAB script:
n=2 %Filter_Order
Wn=[0.4 0.6] %# Normalized cutoff frequencies
[b,a] = butter(n,Wn,'bandpass') % Transfer function coefficients of the filter
Python script:
import numpy as np
from scipy import signal
n=2 #Filter_Order
Wn=np.array([0.4,0.6]) # Normalized cutoff frequencies
b, a = signal.butter(n, Wn, btype='band') #Transfer function coefficients of the filter
a coefficients in MATLAB: 1, -5.55e-16, 1.14, -1.66e-16, 0.41
a coefficients in Python: 1, -2.77e-16, 1.14, -1.94e-16, 0.41
Could it just be a question of precision, since the two different values (the 2nd and 4th) are both on the order of 10^(-16)?!
The b coefficients are the same on the other hand.
You machine precision is about 1e-16 (in MATLAB this can be checked easily with eps(), I presume about the same in Python). The 'error' you are dealing with is thus on the order of machine precision, i.e. not actually calculable within fitting precision.
Also of note is that MATLAB ~= Python (or != in Python), thus the implementations of butter() on one hand and signal.butter() on the other will be slightly different, even if you use the exact same numbers, due to the way both languages are translated to machine code.
It rarely matters to have coefficients differing 16 orders of magnitude; the smaller ones would be essentially neglected. In case you do need exact values, consider using either symbolic math, or some kind of Variable Precision Arithmetic (vpa() in MATLAB), but I guess that in your case the difference is irrelevant.

Testing C++ Math functions with Python's C Extension - Precision issues

I wrote a C++ wrapper class to some functions in LAPACK. In order to test the class, I use the Python C Extension, where I call numpy, and do the same operations, and compare the results by taking the difference
For example, for the inverse of a matrix, I generate a random matrix in C++, then pass it as a string (with many, many digits, like 30 digits) to Python's terminal using PyRun_SimpleString, and assign the matrix as numpy.matrix(...,dtype=numpy.double) (or numpy.complex128). Then I use numpy.linalg.inv() to calculate the inverse of the same matrix. Finally, I take the difference between numpy's result and my result, and use numpy.isclose with a specific relative tolerance to see whether the results are close enough.
The problem: The problem is that when I use C++ floats, the relative precision I need to be able to compare is about 1e-2!!! And yet with this relative precision I get some statistical failures (with low probability).
Doubles are fine... I can do 1e-10 and it's statistically safe.
While I know that floats have intrinsic bit precision of about 1e-6, I'm wondering why I have to go so low to 1e-2 to be able to compare the results, and it still fails some times!
So, going so low down to 1e-2 got me wondering whether I'm thinking about this whole thing the wrong way. Is there something wrong with my approach?
Please ask for more details if you need it.
Update 1: Eric requested example of Python calls. Here is an example:
//create my matrices
Matrix<T> mat_d = RandomMatrix<T>(...);
auto mat_d_i = mat_d.getInverse();
//I store everything in the dict 'data'
PyRun_SimpleString(std::string("data={}").c_str());
//original matrix
//mat_d.asString(...) will return in the format [[1,2],[3,4]], where 32 is 32 digits per number
PyRun_SimpleString(std::string("data['a']=np.matrix(" + mat_d.asString(32,'[',']',',') + ",dtype=np.complex128)").c_str());
//pass the inverted matrix to Python
PyRun_SimpleString(std::string("data['b_c']=np.matrix(" + mat_d_i.asString(32,'[',']',',') + ",dtype=np.complex128)").c_str());
//inverse in numpy
PyRun_SimpleString(std::string("data['b_p']=np.linalg.inv(data['a'])").c_str());
//flatten the matrices to make comparing them easier (make them 1-dimensional)
PyRun_SimpleString("data['fb_p']=((data['b_p']).flatten().tolist())[0]");
PyRun_SimpleString("data['fb_c']=((data['b_c']).flatten().tolist())[0]");
//make the comparison. The function compare_floats(f1,f2,t) calls numpy.isclose(f1,f2,rtol=t)
//prec is an integer that takes its value from a template function, where I choose the precision I want based on type
PyRun_SimpleString(std::string("res=list(set([compare_floats(data['fb_p'][i],data['fb_c'][i],1e-"+ std::to_string(prec) +") for i in range(len(data['fb_p']))]))[0]").c_str());
//the set above eliminates repeated True and False. If all results are True, we expect that res=[True], otherwise, the test failed somewhere
PyRun_SimpleString(std::string("res = ((len(res) == 1) and res[0])").c_str());
//Now if res is True, then success
Comments in the code describe the procedure step-by-step.

Categories