statsmodel: panel regression - python

I am currently using from pandas.stats.plm import PanelOLS to run Panel regressions. I am needing to switch to statsmodel so that I can ouput heteroskedastic robust results. I have been unable to find notation on calling a panel regression for statsmodel. In general, I find the documentation for statsmodel not very user friendly. Is someone familiar with panel regression syntax in statsmodel?

The linearmodels package is created to extend the statsmodels package to panelOLS (see https://github.com/bashtage/linearmodels). Here is the example from the package doc:
import numpy as np
from statsmodels.datasets import grunfeld
data = grunfeld.load_pandas().data
data.year = data.year.astype(np.int64)
# MultiIndex, entity - time
data = data.set_index(['firm','year'])
from linearmodels import PanelOLS
mod = PanelOLS(data.invest, data[['value','capital']], entity_effect=True)
res = mod.fit(cov_type='clustered', cluster_entity=True)
Best Daniel

Related

Get random effect intercept value for every level in statsmodels

I'm going through a tutorial on mixed-effects models in Python.
I'm building a model where litter is the random effect. In the tutorial, the output contains the variance across the litter intercepts. However, in Bayesian hierarchical modeling, I'm also able to see the intercepts for every level of the random effect variable.
How would I see that here?
import pandas as pd
import statsmodels.api as sm
import scipy.stats as stats
import statsmodels.formula.api as smf
df = pd.read_csv("http://www-personal.umich.edu/~bwest/rat_pup.dat", sep = "\t")
model = smf.mixedlm("weight ~ litsize + C(treatment) + C(sex, Treatment('Male')) + C(treatment):C(sex, Treatment('Male'))",
df,
groups= "litter").fit()
model.summary()
I would also ideally like to see the estimate of the intercept across all litters. Then, how would I interpret that overall intercept compared to the intercept for each single litter?
If there's a better Python package for what I'm striving for, please suggest.

Library errors with pmdarima and statsmodels

I have a problem with some libraries for time series.
In particular first error rise when i import this library
from pmdarima.arima import auto_arima
As suggested in another post I use the command !pip install pmdarima to solve this problem. But then I have to restart the runtime otherwise I can't compile and I also have to re-use the command every time I open my colab/jupyter notebook.
So my first question is related to this issue. Is there any solution to avoid this process every time?
The second problem is connected to the first one, because I import other libraries that are:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mpl
import datetime as datetime
from pmdarima.arima import auto_arima
from sklearn.metrics import mean_squared_error
from statsmodels.tsa.seasonal import seasonal_decompose
from dateutil.parser import parse
from statsmodels.tsa.stattools import adfuller
from pandas.plotting import autocorrelation_plot
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima_model import ARIMA
from pmdarima import auto_arima
from statsmodels.tsa.statespace.sarimax import SARIMAX
According to the fact that I know how to solve the first problem, then I have several lines of code related to the time series prediction and when I have to use a function where I'm using the ARIMA model:
def Predict(train,test,Order1,Order2,Order3,parForecastLenght=31):
# Build Model
model = ARIMA(train.astype("float32"), order=(Order1, Order2, Order3))
fitted = model.fit(disp=-1)
# Forecast
fc, se, conf = fitted.forecast(parForecastLenght, alpha=0.05)
# Make as pandas series
fc_series = pd.Series(fc, index=test.iloc[0:parForecastLenght].index)
lower_series = pd.Series(conf[:, 0], index=test.iloc[0:parForecastLenght].index)
upper_series = pd.Series(conf[:, 1], index=test.iloc[0:parForecastLenght].index)
# Plot
plt.figure(figsize=(12,5), dpi=100)
plt.plot(train, label='training')
plt.plot(test, label='actual')
plt.plot(fc_series, label='forecast')
plt.fill_between(lower_series.index, lower_series, upper_series, color='k', alpha=.15)
plt.title('Forecast vs Actuals')
plt.legend(loc='upper left', fontsize=8)
plt.show()
return fc_series
when I use try to execute this code:
model1 = Predict(train_Att_Assunzioni,test_Att_Assunzioni,0,0,0,30)
appears this kind of error:
NotImplementedError:
statsmodels.tsa.arima_model.ARMA and statsmodels.tsa.arima_model.ARIMA have
been removed in favor of statsmodels.tsa.arima.model.ARIMA (note the .
between arima and model) and statsmodels.tsa.SARIMAX.
statsmodels.tsa.arima.model.ARIMA makes use of the statespace framework and
is both well tested and maintained. It also offers alternative specialized
parameter estimators.
So again I check posts on stackoverflow, I tried to implement the suggested operations, but nothing seems to work except for the substitution of the library from from statsmodels.tsa.arima_model import ARIMA to from statsmodels.tsa.arima.model import ARIMA
but then the first problem rise again.
N.B. I tried to install statsmodels, pmadarima, I tried to change my work enviroment from colab to jupyter lab, but nothing

No module named 'statsmodels.tsa.arima' in Colab but not in Pycharm

# ARIMA example
from statsmodels.tsa.arima.model import ARIMA
data = [200,30,30,35,30,20,26,35,30,33,40,29,29,30,30,30,30,20,26,35,30,33,40,29,29,30,30,30]
# fit model
model = ARIMA(data, order=(10, 1, 10))
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data), typ='levels')
print(yhat)
The
from statsmodels.tsa.arima.model import ARIMA is wokring perfectly in pycharm but while running the same code in colab it throws
There are very few supports there on internet for this library, so I would appreciate any sort of help or any workaround please.
Try,
from statsmodels.tsa.arima_model import ARIMA
if you don't have statsmodel installed then also do,
pip install statsmodels
You import statsmodels like this:
import statsmodels.api as sm
And then you can use SARIMA like this:
model=sm.tsa.arima.ARIMA(data,order=(10, 1, 10))
The arima_model import is deprecated. You can read more about using ARIMA here.
You need a newer version. Try to run the following in your Colab:
!pip install statsmodels==0.12.1
It will allow the import that you want.

Create file OLS in Python Statsmodels

I dont have much knowledge in Python but I have to crack this for an assessment completion,
Question:
Run the following code to load the required libraries and create the data set to fit the model.
from sklearn.datasets import load_boston
import pandas as pd
boston = load_boston()
dataset = pd.DataFrame(boston.data, columns=boston.feature_names)
dataset['target'] = boston.target
print(dataset.head())
I have to perform the following steps to complete this scenario.
For the boston dataset loaded in the above code snippet, perform linear regression.
Use the target variable as the dependent variable.
Use the RM variable as the independent variable.
Fit a single linear regression model using statsmodels package in python.
Import statsmodels packages appropriately in your code.
Upon fitting the model, Identify the coefficients.
Finally print the model summary in your code.
You can write your code using vim app.py .
Press i for insert mode.
Press esc and then :wq to save and quit the editor.
Please help me to understand how to get this completed. Your valuable comments are much appreciated
Thanks in Advance
from sklearn.datasets import load_boston
import pandas as pd
boston = load_boston()
dataset = pd.DataFrame(boston.data, columns=boston.feature_names)
dataset['target'] = boston.target
print(dataset.head())
import statsmodels.api as sm
import statsmodels.formula.api as smf
X = dataset["RM"]
y = dataset['target']
X = sm.add_constant(X)
model = smf.OLS(y,X).fit()
predictions = model.predict(X)
print(model.summary())

How to silence statsmodels.fit() in python

When I want to fit some model in python,
I often use fit() method in statsmodels.
And some cases I write a script for automating fitting:
import statsmodels.formula.api as smf
import pandas as pd
df = pd.read_csv('mydata.csv') # contains column x and y
fitted = smf.poisson('y ~ x', df).fit()
My question is how to silence the fit() method.
In my environment it outputs some information about fitting to standard output like:
Optimization terminated successfully.
Current function value: 2.397867
Iterations 11
but I don't need it.
I couldn't find the argument which controls standard output printing.
How can I silence fit() method?
Python 3.3.4, IPython 2.0.0, pandas 0.13.1, statsmodels 0.5.0.
Use the disp argument to fit. It controls the verbosity of the optimizers in scipy.
mod.fit(disp=0)
See the documentation for fit.

Categories