MNLogit fit and summary displays all nan - python

I am new to ML world.
Trying to do Logistic regression from Stats model. However, when I execute I get current Function Value as nan
I tried checking if dataframe is finite as I saw it might be cause. But that turns out to be ok.
Referred the below link, but did not work in my case.
update : Still did not figure it out
Referred Links :
MNLogit in statsmodel returning nan
numpy: Invalid value encountered in true_divide
Please can someone help me on this ?
[![Finite Values result][1]][1]
[![Error with current function value as nan][2]][2]
[![All Nans in summary][3]][3]

After reviewing your problem and the solution identified in referred link#1. {FYI it gives the same error as shown in your screen capture.}
It seems like you need to identify a different solving method.
In your code you can try to do the following to have the same solution as link#1.
result=logit_model.fit(method='bfgs')
I'm sorry if you already tried this. Unfortunately, I cannot test it without the dataset, let me know if that works.

Related

Using torchmetrics with nan values

I'm, working on a a DL project and using pytorch lightning and torchmetrics.
I'm using a metric that is just irrelevant for examples, which means that there exist batches for which this metric will get a NaN value.
The problem is the aggregated value for the function epoch-wise is also computed to be NaN because of that.
Is there a possible workaround? torchmetrics is very convenient and I would like to avoid a switch.
I have seen the torchmetrics.MeanMetric object (and similar ones) but I couldn't make it work. If the solution goes through this kind of object I would very much appreciate an example.

Cox PH on Lifelines shows convergence problem

I'm running a Cox PH model using lifelines package on Python.
I find it strange that if I run the model on the whole data there is no problem running it, however when I do a cross-validation (using the package's own validation function) a convergence error appears.
Any idea how I can solve this? The documentation suggested using a penalizer but I haven't found a value that lets me run the thing.
Here's my code if you're wondering:
# Gone right
cph = CoxPHFitter()
cph.fit(daten, "length_of_arrears2", event_col='cured2')
# Gone wrong
cph = CoxPHFitter(penalizer=10)
scores = k_fold_cross_validation(cph, daten, 'length_of_arrears2', event_col='cured2', k=5)
This is the error it outputs:
ConvergenceError: Convergence halted due to matrix inversion problems. Suspicion is high collinearity. Please see the following tips in the lifelines documentation: https://lifelines.readthedocs.io/en/latest/Examples.html#problems-with-convergence-in-the-cox-proportional-hazard-modelMatrix is singular.
I checked the correlation table and some variables are quite correlated but it's still a bit weird to me that it works on the full thing but not on the cross val.
Is there a good way to get rid of high correlation without removing a variable completely?
Edit:
I did a few more tests on it. First I removed all variables with more than 0.74 correlation, that did not work on the KFold approach.
Then, I manually split the data in 90/10, it worked, so I kept trying until 70/30, because 60/40 didn't work already.
Any idea?

Python - How to Select Certain Lags in ARIMA Model?

I want to fit a model = ARIMA(ret_log, order=(5,0,0)), but with second lag and third lag in AR part set to zero due to non-significant autocorrelation, how can I do it in Python? I know in R it is easily doable.
I've seen similar questions been asked for R, but only one such question be asked for Python(Link Here). However, the answer does not seem to work, nor do I think the person who raised the question was satisfied.
I tried tsa.arima.model.ARIMA.fix_params and tsa.arima.model.ARIMA.fit_constrained, but both threw out AttributeError, such as 'ARMA' object has no attribute 'fit_constrained'.
Anyone has any idea? Thanks.
As mentined in statsmodels.tsa.arima.model.ARIMA documentation
p and q may either be integers or lists of integers
. So you can easily input the lags to include as a list.
Please note that I have not tried it and cannot guarantee that it will work.

Can Ngboost algorithm processing missing values automatic?

I get a new GBDT algorithm named Ngboost invented by stanfordmlgroup. I want to use it and call encode
pip install ngboost==0.2.0
to install it.
and then I train a dataset that donot impute or delete missing value.
however I get a error:
Input contains NaN, infinity or a value too large for dtype('float32').
is this mean Ngboost cannot processing missing value automatic like xgboost?
You have two possibilities with this error.
1- You have some really large value. Check the max of your columns.
2- The algorithm don't support NAN and inf type so you have to handle them like in some other regression models.
Here's a response from one of the ngboost creators about that
Hey #omsuchak, thanks for the suggestion. There is no one "natural" or good way to generically handle missing data. If ngboost were to do this for you, we would be making a number of choices behind the scenes that would be obscured from the user.
If we limited ourselves to use cases where the base learner is a regression tree (like we do with the feature importances) there are some reasonable default choices for what to do with missing data. Implementing those strategies here is probably not crazy hard to do but it's also not a trivial task. Either way, I'd want the user to have a transparent choice about what is going on. I'd be open to review pull requests on that front as they satisfy that requirement, but it's not something I plan on working on myself in the foreseeable future. I'll close for now but if anyone wants to try to add this please feel free to comment.
And then you can see other answer about how to solve that, for example with sklearn.impute.MissingIndicator module (to indicate to the model the presence of missings) or some Imputer module.
If you need a practical example you can try with the survival example (located in the repo!).

Incorrect value of objective function in simple example solved with pyomo

I have recently started to use pyomo for my research, and I'm studying its use with the book "Pyomo-Optimization modelling in Python".
As my research has to do with heat exchanger networks I am currently trying to build and solve a very simple problem before expanding into more complex and meaningful ones.
Here is the model I input into pyomo.
from coopr.pyomo import*
model=AbstractModel()
Tcin1=300
Thin1=500
mc= 135
mh=128
Cpc=3.1
Cph=2.2
model.Thout1=Var(initialize=480, within=PositiveReals)
model.Tcout1=Var(initialize=310, within=PositiveReals)
model.Q=Var(initialize=2000, within=PositiveReals)
import math
def HeatEx(model):
return ((Thin1-model.Tcout1)-(model.Thout1-Tcin1))/(math.log(Thin1-model.Tcout1)-math.log(model.Thout1-Tcin1))
model.obj=Objective(rule=HeatEx, sense=minimize)
model.con1 = Constraint(expr=(mc*Cpc*(Thin1-model.Thout1) ==
mh*Cph*(model.Tcout1 - Tcin1)))
model.con2=Constraint(expr=(model.Q==mc*Cpc*(Thin1-model.Thout1)))
model.con3=Constraint(expr=(model.Tcout1==310))
I've been running it through the terminal using the ipopt solver as pyomo --solver=ipopt --summary NoFouling.py.
My problem is that I get an incorrect value for the objective. It's says the objective is -60.5025857388 (with variable Thout1 = 493.271206691) which is incorrect. In an attempt to realize what the problem is, I replaced model.Thout1 in the objective function with the value 493.271206691,re-ran the model and obtained the correct objective value which is 191.630949982. This is very strange because all the variable values coming out of pyomo are correct even when the objective function value is wrong. In brief, if I take those values that seemingly give a wrong result and calculate manually the function from those, I get the correct result.
What is the cause of this difference? How can I resolve this problem?
For the record I'm running Python2.7 via Enthought Canopy, on a computer running CentOS 6.5. I also have to confess that I'm a bit new to both python and using a linux system. I have searched through the internet for pyomo answers, but this one seems to be too specific and I have found nothing really useful.
Many thanks
In Python 2.7 the default '/' behaviour is integer division.
I'm assuming you want floating point division in your objective function, if so add the following line at the beginning of your script
from __future__ import division

Categories