ADF test in statsmodels in Python

ADF test in statsmodels in Python - python

I am trying to run a Augmented Dickey-Fuller test in statsmodels in Python, but I seem to be missing something.
This is the code that I am trying:
import numpy as np
import statsmodels.tsa.stattools as ts
x = np.array([1,2,3,4,3,4,2,3])
result = ts.adfuller(x)
I get the following error:
Traceback (most recent call last):
File "C:\Users\Akavall\Desktop\Python\Stats_models\stats_models_test.py", line 12, in <module>
result = ts.adfuller(x)
File "C:\Python27\lib\site-packages\statsmodels-0.4.1-py2.7-win32.egg\statsmodels\tsa\stattools.py", line 201, in adfuller
xdall = lagmat(xdiff[:,None], maxlag, trim='both', original='in')
File "C:\Python27\lib\site-packages\statsmodels-0.4.1-py2.7-win32.egg\statsmodels\tsa\tsatools.py", line 305, in lagmat
raise ValueError("maxlag should be < nobs")
ValueError: maxlag should be < nobs
My Numpy Version: 1.6.1
My statsmodels Version: 0.4.1
I am using windows.
I am looking at the documentation here but can't figure what I am doing wrong. What am I missing?
Thanks in Advance.

I figured it out. By default maxlag is set to None, while it should be set to integer. Something like this works:
import numpy as np
import statsmodels.tsa.stattools as ts
x = np.array([1,2,3,4,3,4,2,3])
result = ts.adfuller(x, 1) # maxlag is now set to 1
Output:
>>> result
(-2.6825663173365015, 0.077103947319183241, 0, 7, {'5%': -3.4775828571428571, '1%': -4.9386902332361515, '10%': -2.8438679591836733}, 15.971188911270618)

Related

Numexpr in Python doesn't recognise a declared symbol

I'm trying to do some plots of some symbolic data. I have some expression from a regression in the form:
expr = '(((((((((1.0)*(2.0)))-(ER)))-(-0.37419122066665467))*0.006633039574629684)*(0.006633039574629684*((((T)-(((1.0)+(P)))))-(P))))+0.1451920626347467)'
Where expr here is some prediction: f = f(T, P, ER). I know this particular example is a crazy expression but it's not really super important. Basically, supposing I have some dataframe, plotdata, I am trying to produce plots with:
import pandas
import sympy
import numexpr
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
expr = '(((((((((1.0)*(2.0)))-(ER)))-(-0.37419122066665467))*0.006633039574629684)*(0.006633039574629684*((((T)-(((1.0)+(P)))))-(P))))+0.1451920626347467)'
#Extract some data for surface plot but fixing one variable
plotdata = plotdata.loc[(plotdata.P == 1)]
#Extract data as lists for plotting
x = list(plotdata['T'])
y = list(plotdata['ER'])
f_real = list(plotdata['f'])
T_sympy = sympy.Symbol('T')
P_sympy = sympy.Symbol('P')
ER_sympy = sympy.Symbol('ER')
f_pred = numexpr.evaluate(expr)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_trisurf(x,y,f_real, alpha = 0.3)
ax.plot_surface(x,y,f_pred)
However, I am getting an error with f_pred.
numexpr.evaluate(expr)
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/numexpr/necompiler.py", line 744, in getArguments
a = local_dict[name]
KeyError: 'ER'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<ipython-input-100-c765b0f1e5ce>", line 1, in <module>
numexpr.evaluate(expr)
File "/anaconda3/lib/python3.7/site-packages/numexpr/necompiler.py", line 818, in evaluate
arguments = getArguments(names, local_dict, global_dict)
File "/anaconda3/lib/python3.7/site-packages/numexpr/necompiler.py", line 746, in getArguments
a = global_dict[name]
KeyError: 'ER'
I am not super familiar with the numexpr package. However, I have been building this up from a 1D regression to now a 3D regression. ER was my 1D variable and was working fine. I have obviously slightly altered my code since the 1D case but I am still slightly at a loss as to why this error is popping up.
Any pointers would be greatly appreciated.

I've figured it out. Pretty silly error in the end. I needed to change:
#Extract data as lists for plotting
x = list(plotdata['T'])
y = list(plotdata['ER'])
to:
T = list(plotdata['T'])
ER = list(plotdata['ER'])
P = list(plotdata['P'])
i.e. numexpr.evaluate was looking for the input data, not the symbol!

Medium numbers in Broyden1

I used broyden1 in Python resolver. The question was answered here, I need to use a bit larger numbers, but not newton_krylov. If I use numbers over 100, then it starts throwing errors.
The code is here:
import numpy as np
import scipy.optimize
from scipy.optimize import fsolve
from functools import partial
from itertools import repeat
small_data=[100,220,350,480]
def G(small_data, x):
return np.cos(x) +x[::-1] - small_data
G_partial = partial(G, small_data)
approximate=list(repeat(1,period))
y = scipy.optimize.broyden1(G_partial, approximate, f_tol=1e-14)
print(y)
The error is:
Warning (from warnings module):
File "C:\Python\Python38\lib\site-packages\scipy\optimize\nonlin.py", line 1004
d = v / vdot(df, v)
RuntimeWarning: invalid value encountered in true_divide
Traceback (most recent call last):
File "read_data.py", line 176, in <module>
y = scipy.optimize.broyden1(G_partial, approximate, f_tol=1e-14)
File "<string>", line 6, in broyden1
File "C:\Python\Python38\lib\site-
packages\scipy\optimize\nonlin.py", line 350, in nonlin_solve
raise NoConvergence(_array_like(x, x0))
scipy.optimize.nonlin.NoConvergence: [ 99.49247662 219.22593164 350.14354166 480.95722345]

I found that the best method is changing the equation in Boryden1 to :
y = scipy.optimize.broyden1(G_partial, approximate, f_tol=5000e-14)
instead of:
f_tot=1e-14
so larger values will be accepted with a good accuracy

Running deseq2 through rpy2

I am trying to run DEseq2 from Python using rpy2.
How should I pass the design matrix?
My script is as follows:
from numpy import *
from numpy.random import multinomial, random
from rpy2 import robjects
import rpy2.robjects.numpy2ri
robjects.numpy2ri.activate()
from rpy2.robjects.packages import importr
deseq = importr('DESeq2')
# Generate some data. 1000 genes, 10 samples
n = 1000
probabilities = random(n)
probabilities /= sum(probabilities)
data = zeros((n,10), int)
for i in range(10):
data[:,i] = multinomial(1000000, probabilities)
# Make the data frame
d = {}
categories = ('1','2') * 5
d["key_1"] = robjects.IntVector(categories)
dataframe = robjects.DataFrame(d)
# Create the design matrix, and run DESeqDataSetFromMatrix
design = "~ key_1" # <--- I guess this is wrong
dds = deseq.DESeqDataSetFromMatrix(countData=data, colData=dataframe,design=design)
The error I am getting is
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/rpy2-2.8.5-py3.6-macosx-10.11-x86_64.egg/rpy2/rinterface/__init__.py:186: RRuntimeWarning: Error: $ operator is invalid for atomic vectors
warnings.warn(x, RRuntimeWarning)
Traceback (most recent call last):
File "testrpy.py", line 23, in <module>
dds = deseq.DESeqDataSetFromMatrix(countData=data, colData=dataf,design=design)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/rpy2-2.8.5-py3.6-macosx-10.11-x86_64.egg/rpy2/robjects/functions.py", line 178, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/rpy2-2.8.5-py3.6-macosx-10.11-x86_64.egg/rpy2/robjects/functions.py", line 106, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error: $ operator is invalid for atomic vectors
My guess is that the design argument is not correct.
Does anybody have an example of running DEseq via rpy2?
Thanks.

Ah ! You were almost there:
# Create the design matrix, and run DESeqDataSetFromMatrix
design = "~ key_1" # <--- I guess this is wrong
design is a string, but I guess that it should be a formula. Formulae are language objects in R.
Try with:
from rpy2.robjects import Formula
design = Formula("~ key_1")

Interpolate a discontinuous function with Scipy

I am having problems interpolating some data points using Scipy. I guess that it might depend on the fact that the function I'm trying to interpolate is discontinuous at x roughly 4.
Here is the code I'm using to interpolate:
from scipy import *
y_interpolated = interp1d(x,y,buonds_error=False,fill_value=0.,kind='cubic')
new_x_array = arange(min(x),max(x),0.05)
plot(new_x_array,x_interpolated(new_x_array),'r-')
The error I get is
File "<stdin>", line 2, in <module>
File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/scipy/interpolate/interpolate.py", line 357, in __call__
out_of_bounds = self._check_bounds(x_new)
File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/scipy/interpolate/interpolate.py", line 415, in _check_bounds
raise ValueError("A value in x_new is above the interpolation "
ValueError: A value in x_new is above the interpolation range.
These are my data points:
1.56916432074 -27.9998263169
1.76773750527 -27.6198430485
1.98360238449 -27.2397962268
2.25133982943 -26.8596491107
2.49319293195 -26.5518194791
2.77823462692 -26.1896935372
3.07201297519 -25.9540514619
3.46090507092 -25.7362456112
3.65968688527 -25.6453922172
3.84116464506 -25.53652509
3.97070419447 -25.3374215879
4.03087127145 -24.8493356465
4.08217147954 -24.0540196233
4.12470899596 -23.0960856364
4.17612639206 -22.4634289328
4.19318305992 -22.1380894034
4.2708234589 -21.902951035
4.3745696768 -21.9027079759
4.52158254627 -21.9565591238
4.65985875536 -21.8839570732
4.80666329863 -21.6486676004
4.91026629192 -21.4496126386
5.05709528961 -21.2685401725
5.29054655428 -21.2860476871
5.54129211534 -21.3215908912
5.73174988353 -21.6645019816
6.06035782465 -21.772138994
6.30243916407 -21.7715483093
6.59656410998 -22.0238656166
6.86481948673 -22.3665921479
7.01182409559 -22.4385289076
7.17609125906 -22.4200564296
7.37494987052 -22.4376476472
7.60844044988 -22.5093814451
7.79869207061 -22.5812017094
8.00616642549 -22.5445612485
8.17903446593 -22.4899243886
8.29141325457 -22.4715846981

What version of scipy are you using?
The script you posted has some syntax errors (I assume due to wrong copy and paste).
This script works, with scipy.__version__ == 0.9.0. .
import sys
from scipy import *
from scipy.interpolate import *
from pylab import plot
x = []
y = []
for line in sys.stdin:
a, b = line.split()
x.append(float(a))
y.append(float(b))
y_interpolated = interp1d(x,y,bounds_error=False,fill_value=0.,kind='cubic')
new_x_array = arange(min(x),max(x),0.05)
plot(new_x_array,y_interpolated(new_x_array),'r-')

rpy2 problems, nls passing list() as argument from python to R

I am trying to fit a nonlinear curve using rpy2 from numpy array, but are stuck as I do not know how to pass the 'start' argument on the R side. I use R 2.12.1 and python 2.6.6
Error in function (formula, data = parent.frame(), start, control = nls.control(), :
parameters without starting value in 'data': responsev, predictorv
Traceback (most recent call last):
File "./employmentsHoro.py", line 279, in <module>
nls.nls2(formula=formula, data=dataf, start=mylist)
File "/usr/lib/python2.6/dist-packages/rpy2/robjects/functions.py", line 83, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "/usr/lib/python2.6/dist-packages/rpy2/robjects/functions.py", line 35, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in function (formula, data = parent.frame(),start, control = nls.control(), :
parameters without starting value in 'data': responsev, predictorv
Can anyone help me determine how to pass a list() object to the nls formula?
the relevant part of my code is this:
import rpy2.robjects as robjects
from rpy2.robjects import DataFrame, Formula
import rpy2.robjects.numpy2ri as npr
import numpy as np
from rpy2.robjects.packages import importr
nls = importr('nls2')
stats = importr('stats')
mylist = robjects.r('list(a=700,b=0.8,c=200000)')
dataf = DataFrame({'responsev': professions, 'predictorv': totalEmployment})
starter= DataFrame({'a':700,'b':0.80,'c':200000})
formula = Formula('responsev ~I( a*(predictorv/c)^b )/( 1+( predictorv/c )^b )')
nls.nls2(formula=formula, data=dataf, start=starter)

The main error is this one:
Error in function (formula, data = parent.frame(), start, control =
nls.control(), : parameters without starting value in
'data': responsev, predictorv
Where are declared the variable professions? and DataEmployment?
seems they don't have a starting value, maybe you have to change/transform in something that R
understands?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

ADF test in statsmodels in Python - python

Related

Numexpr in Python doesn't recognise a declared symbol

Medium numbers in Broyden1

Running deseq2 through rpy2

Interpolate a discontinuous function with Scipy

rpy2 problems, nls passing list() as argument from python to R

Categories

Resources