Do we have an implementation of Bayesian structural time series in Python? - python

We are looking for a close pythonian implementation of the r library bsts.
To be precise, I'm looking for something that allows me to emulate the functionality of 'add_regressor' from fbprophet.
Have already tried Pybsts (the kernel kept dying), and
According to a thread on tensorflow_probability Github account, it doesn't support multivariate mode yet.
Any help would be appreciated.
Thanks

This blog post from Tensorflow Probability shows how to add an exogenous regressor with the TFP structural time series tools. In particular, check out the usage of the temperature_effect variable in the Example: Forecasting Demand for Electricity section!

I recently wrote a version of R's bsts package in Python. It doesn't have all of bsts's features, but it does have options for level, trend, seasonality, and regression. The syntax closely follows statsmodels' UnobservedComponents module. You can find the code and description of the package here: https://github.com/devindg/pybuc.

Related

regime switching multivariate garch

I have a regression with 4 independent variables and a dependent variable. I want to implement a Regime switching GARCH model but have been unable to find a package in R,Python or Matlab. MSGARCH package available in R is for uni-variate series series, apart from this I haven't come across any available packages.
Is there any available package?
In Matlab, there is available the MS_Regress-Matlab, which can be connected to the
MFE toolbox of Sheppard for more functionalities.
Let me know if this helps, or if you have found any other way.
There is a MATLAB code developed recently to handle the multivariate MS GARCH model, check this link

mixed integer quadratic programming in python

I was wondering if someone could give me some guidance in setting up my objective.
I am trying to minimise variance in python with some cardinality constraints on the number of assets in my portfolio. I am not sure what package would help me do this. And if there was a working example for the above.
Below is a MIQP model that illustrates how we can model a portfolio problem with the number assets limited to be between minAssets and maxAssets. If an asset is in the portfolio, furthermore its fraction is limited to be between fmin and fmax.
In this link you can also see how you can try to solve this problem with just a series of linear MIP problems.
MIQP solvers are readily available: CVXPY/ECOS_BB, Cplex, and Gurobi are a few examples. These are all callable from Python. A simple portfolio QP model would be a good starting point (no doubt such a model is available in the examples for any of these solvers).
You may have a look at some links, which are about python package CVXOPT:
https://cvxopt.org/examples/book/portfolio.html
https://scaron.info/blog/quadratic-programming-in-python.html

Complete separation of logistic regression data

I've been running some large logistic regression models in SAS, which take 4+ hours to converge. Recently however I acquired access to a Hadoop cluster and can use Python to fit the same models much faster (something more like 10-15 minutes).
Problematically, I have some complete/quasi-complete separation of data points in my data which results in failure to converge; I was using the FIRTH command in SAS to produce robust parameter estimates despite that, but there seems to be no equivalent option for Python, either in sklearn or statsmodels (I'm mostly using the latter).
Is there another way to get around this problem in Python?
AFAIK, there is no Firth penalization available in Python. Statsmodels has an open issue but nobody is working on it at the moment.
As alternative it would be possible to use a different kind of penalization, e.g. as available in sklearn or maybe statsmodels.
The other option is to change the observed response variable. Firth can be implemented by augmenting the dataset. However, I don't know of any recipe or prototype for this in Python.
https://github.com/statsmodels/statsmodels/issues/3561
Statsmodels has ongoing work on penalization but currently the emphasis is on feature/variable selection (elastic net, SCAD) and quadratic penalization for generalized additive models GAM, especially for splines.
Firth uses data dependent penalization which does not fit the generic penalization framework where the penalization structure is a data independent "prior".
Conditional likelihood is another way to work around perfect separation. This is in a Statsmodels PR that is basically ready to use:
https://github.com/statsmodels/statsmodels/pull/5304

PCA with missing values in Python

I'm trying to do a PCA analysis on a masked array. From what I can tell, matplotlib.mlab.PCA doesn't work if the original 2D matrix has missing values. Does anyone have recommendations for doing a PCA with missing values in Python?
Thanks.
Imputing data will skew the result in ways that might bias the PCA estimates. A better approach is to use a PPCA algorithm, which gives the same result as PCA, but in some implementations can deal with missing data more robustly.
I have found two libraries. You have
Package PPCA on PyPI, which is called PCA-magic on github
Package PyPPCA, having the same name on PyPI and github
Since the packages are in low maintenance, you might want to implement it yourself instead. The code above build on theory presented in the well quoted (and well written!) paper by Tipping and Bishop 1999. It is available on Tippings home page if you want guidance on how to implement PPCA properly.
As an aside, the sklearn implementation of PCA is actually a PPCA implementation based on TippingBishop1999, but they have not chosen to implement it in such a way that it handles missing values.
EDIT: both the libraries above had issues so I could not use them directly myself. I forked PyPPCA and bug fixed it. Available on github.
I think you will probably need to do some preprocessing of the data before doing PCA.
You can use:
sklearn.impute.SimpleImputer
https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer
With this function you can automatically replace the missing values for the mean, median or most frequent value. Which of this options is the best is hard to tell, it depends on many factors such as how the data looks like.
By the way, you can also use PCA using the same library with:
sklearn.decomposition.PCA
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
And many others statistical functions and machine learning tecniques.

Johansen cointegration test in python

I can't find any reference on funcionality to perform Johansen cointegration test in any Python module dealing with statistics and time series analysis (pandas and statsmodel). Does anybody know if there's some code around that can perform such a test for cointegration among time series?
This is now implemented in Python's statsmodels:
from statsmodels.tsa.vector_ar.vecm import coint_johansen
x = getx() # dataframe of n series for cointegration analysis
jres = coint_johansen(x, det_order=0, k_ar_diff=1)
For a full description of inputs/results, see the documentation.
statsmodels doesn't have a Johansen cointegration test. And, I have never seen it in any other python package either.
statsmodels has VAR and structural VAR, but no VECM (vector error correction models) yet.
update:
As Wes mentioned, there is now a pull request for Johansen's cointegration test for statsmodels. I have translated the matlab version in LeSage's spatial econometrics toolbox and wrote a set of tests to verify that we get the same results.
It should be available in the next release of statsmodels.
update 2:
The test for cointegration coint_johansen was included in statsmodels 0.9.0 together with the vector error correction models VECM.
(see also 3rd answer)
See http://github.com/statsmodels/statsmodels/pull/453
Check this: https://searchcode.com/codesearch/view/88477497/
It provides a library where you can find the Johansen cointegration test.

Categories