Call anova on lme4.lmer output via RPy - python

I am trying to analyze the deviance of a set of linear models generated with lme4.lmer() via RPy. This notebook here shows a full example with me importing my deps, loading my files, running my lme4.lmer() and failing to get anova to run on them.
For your convenience here is again a paste of the line that is failing and which I would like to see work.
compare = stats.anova(res[0], res[1], res[2])
Error in Ops.data.frame(data, data[[1]]) :
list of length 3 not meaningful
In addition: Warning message:
In anova.merMod(<S4 object of class "lmerMod">, <S4 object of class "lmerMod">, :
failed to find unique model names, assigning generic names
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
<ipython-input-47-fe0ffa3b55de> in <module>()
----> 1 compare = stats.anova(res[0], res[1], res[2])
/usr/lib64/python2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, args, **kwargs)
84 v = kwargs.pop(k)
85 kwargs[r_k] = v
---> 86 return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
/usr/lib64/python2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
33 for k, v in kwargs.iteritems():
34 new_kwargs[k] = conversion.py2ri(v)
---> 35 res = super(Function, self).__call__(*new_args, **new_kwargs)
36 res = conversion.ri2py(res)
37 return res
RRuntimeError: Error in Ops.data.frame(data, data[[1]]) :
list of length 3 not meaningful
This code runs perfectly in R as:
> mydata = read.csv("http://chymera.eu/data/test/r_data.csv")
> library(lme4)
Loading required package: lattice
Loading required package: Matrix
> lme1 = lme4.lmer(formula='RT~cat2 + (1|ID)', data=mydata, REML=FALSE)
Error: could not find function "lme4.lmer"
> lme1 = lmer(formula='RT~cat1 + (1|ID)', data=mydata, REML=FALSE)
> lme2 = lmer(formula='RT~cat2 + (1|ID)', data=mydata, REML=FALSE)
> anova(lme1,lme2)
> lme3 = lmer(formula='RT~cat2*cat1 + (1|ID)', data=mydata, REML=FALSE)
> stats::anova(lme1, lme2, lme3)
Data: mydata
Models:
lme1: RT ~ cat1 + (1 | ID)
lme2: RT ~ cat2 + (1 | ID)
lme3: RT ~ cat2 * cat1 + (1 | ID)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
lme1 4 116.68 122.29 -54.342 108.68
lme2 4 149.59 155.19 -70.793 141.59 0.000 0 1
lme3 6 117.19 125.59 -52.594 105.19 36.398 2 1.248e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Can you help me make it run in RPy as well?

When in R stats::anova() is presumably inferring the model names from the unevaluated expressions in the function call. Here that is lme1, 'lme2, and lme3.
Now consider rewriting your R code without the use of variable names, as this would be closer to what is happening in your current implementation with rpy2 as the data DataFrame and the fitted models are not bound to a variable name. This would give what follows (note: "closer" not "equal" - details about this would just distract from the main point):
stats::anova(lmer(formula='RT~cat1 + (1|ID)',
data=read.csv("http://chymera.eu/data/test/r_data.csv"),
REML=FALSE),
lmer(formula='RT~cat2 + (1|ID)',
data=read.csv("http://chymera.eu/data/test/r_data.csv"),
REML=FALSE),
lmer(formula='RT~cat2*cat1 + (1|ID)',
data=read.csv("http://chymera.eu/data/test/r_data.csv"),
REML=FALSE))
The outcome is an error in R.
Error in names(mods) <- sub("#env$", "", mNms) :
'names' attribute [6] must be the same length as the vector [3]
In addition: Warning message:
In anova.merMod(lmer(formula = "RT~cat1 + (1|ID)", data = read.csv("http://chymera.eu/data/test/r_data.csv"), :
failed to find unique model names, assigning generic names
What this suggests is that the R function lme4:::anova.meMod is making assumptions that can easily be violated, and the authors of the package should be notified.
It is also showing that expressions will be used to identify the model in the resulting text output.
The following is probably lacking a bit of elegance, but should be both a workaround and a way to keep labels for the models short.
# bind the DataFrame to an R symbol
robjects.globalenv['dataf'] = dfr
# build models, letting R fetch the symbol `dataf` when it is evaluating
# the parameters in the function call
res = list()
for formula in formulae:
lme_res = lme4.lmer(formula=formula, data=base.as_symbol("dataf"), REML='false')
res.append(lme_res)
# This is enough to work around the problem
compare = stats.anova(res[0], res[1], res[2])
# if not happy with the model names displayed by `compare`,
# globalenv can be filled further
names = list()
for i, value in enumerate(res):
names.append('lme%i' % i)
robjects.globalenv[names[i]] = value
# call `anova`
compare = stats.anova(*[base.as_symbol(x) for x in names])

This is a bug in the anova method for merMod objects: it's essentially caused by the names of the objects being passed to R being too long, so that when deparse()d they end up being character vectors with (unexpectedly) more than one element. This is fixed by https://github.com/lme4/lme4/commit/075c78d128db9d8398f43474621e49f32fdb5bd1 ; there is also now an (undocumented) argument model.names that can be specified to override the deparsing of model names.
You can install the development version using devtools::install_github("lme4","lme4"), otherwise you may have to wait a while for a patched version to be released ... can't think of a workaround other than structuring your call so that language objects that get passed to R are shorter when deparsed ...

Related

Constraint 'feasibility_cut[1]' does not have a proper value. Found 'True'

I'm new to python and pyomo, so I would kindly appreciate your help,
I'm currently having trouble trying to add a constraint to my mathematical model in Pyomo, the problem is while I try to add the "feasibility_cut", it says "Constraint 'feasibility_cut[1]' does not have a proper value. Found 'True' ", what I understand from this is that, pyomo sees this constraint as a logical comparative constraint, which is I don't know why!
Here is a part of the code that I think is necessary to see:
RMP = ConcreteModel()
RMP.ymp = Var(SND.E, within=Integers)
RMP.z = Var(within = Reals)
S1 = (len(SND.A), len(SND.K))
S2 = (len(SND.A), len(SND.A))
uBar= np.zeros(S1)
vBar=np.zeros(S2)
RMP.optimality_cut = ConstraintList()
RMP.feasibility_cut = ConstraintList()
expr2 = (sum(SND.Fixed_Cost[i,j]*RMP.ymp[i,j] for i,j in SND.E) + RMP.z)
RMP.Obj_RMP = pe.Objective(expr = expr2, sense = minimize)
iteration=0
epsilon = 0.01
while (UB-LB)>epsilon :
iteration = iteration +1
DSPsolution = Solver.solve(DSP)
for i in SND.A:
for k in SND.K:
uBar[i-1,k-1] = value(DSP.u[i,k])
for i,j in SND.E:
vBar[i-1,j-1] = value(DSP.v[i,j])
if value(DSP.Obj_DSP) == DSPinf:
RMP.feasCut.add()
else:
RMP.optimCut.add()
RMPsolution = solver.solve(RMP)
UB=min(UB,)
LB=max(LB,value(RMP.Obj_RMP))
if value(DSP.Obj_DSP) == DSPinf:
RMP.feasibility_cut.add( 0>= sum(-SND.Capacity[i,j]*vBar[i-1,j-1]*RMP.ymp[i,j] for i,j in
SND.E) + sum(uBar[i-1,k-1]*SND.New_Demand[k,i] for i in SND.A for k in SND.K if (k,i) in
SND.New_Demand) )
else:
RMP.optimality_cut.add( RMP.z >= sum(SND.Fixed_Cost[i,j]*RMP.ymp[i,j] for i,j in SND.E) +
sum(uBar[i-1,k-1]*SND.New_Demand[k,i] for i in SND.A for k in SND.K) -
sum(SND.Capacity[i,j]*vBar[i-1,j-1]*RMP.ymp[i,j] for i,j in SND.E) )
Welcome to the site.
A couple preliminaries... When you post code that generates an error, it is customary (and easier for those to help) if you post the entire code necessary to reproduce the error and the stack trace, or at least identify what line is causing the error.
So when you use a constraint list in pyomo, everything you add to it must be a legal expression in terms of the model variables, parameters, and other constants etc. You are likely getting the error because you are adding an expression that evaluates to True. So it is likely that an expression you are adding does not depend on a model variable. See the example below.
Also, you need to be careful mingling numpy and pyomo models, numpy arrays etc. can cause some confusion and odd errors. I'd recommend putting all data into the model or using pure python data types (lists, sets, dictionaries).
Here are 2 errors. You have an empty add() in your code too, which will throw an error.
In [1]: from pyomo.environ import *
In [2]: m = ConcreteModel()
In [3]: m.my_constraints = ConstraintList()
In [4]: m.X = Var()
In [5]: m.my_constraints.add(m.X >= 5) # good!
Out[5]: <pyomo.core.base.constraint._GeneralConstraintData at 0x7f8778cfa880>
In [6]: m.my_constraints.add() # error!
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-cf466911f3a1> in <module>
----> 1 m.my_constraints.add()
TypeError: add() missing 1 required positional argument: 'expr'
In [7]: m.my_constraints.add(3 <= 4) # error: trivial evaluation!
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-a0bec84404b0> in <module>
----> 1 m.my_constraints.add(3 <= 4)
...
ValueError: Invalid constraint expression. The constraint expression resolved to a trivial Boolean (True) instead of a Pyomo object. Please modify your rule to return Constraint.Feasible instead of True.
Error thrown for Constraint 'my_constraints[2]'
In [8]:

Getting error in student t-test using python

I would like to perform student t-test from the following two samples with null hypothesis as they have same mean:
$cat data1.txt
2
3
4
5
5
5
5
2
$cat data2.txt
4
7
9
10
8
7
3
I got the idea and a script to perform t-test from https://machinelearningmastery.com/how-to-code-the-students-t-test-from-scratch-in-python/
My script is:
$cat ttest.py
from math import sqrt
from numpy import mean
from scipy.stats import sem
from scipy.stats import t
def independent_ttest(data1, data2, alpha):
# calculate means
mean1, mean2 = mean(data1), mean(data2)
# calculate standard errors
se1, se2 = sem(data1), sem(data2)
# standard error on the difference between the samples
sed = sqrt(se1**2.0 + se2**2.0)
# calculate the t statistic
t_stat = (mean1 - mean2) / sed
# degrees of freedom
df = len(data1) + len(data2) - 2
# calculate the critical value
cv = t.ppf(1.0 - alpha, df)
# calculate the p-value
p = (1.0 - t.cdf(abs(t_stat), df)) * 2.0
# return everything
return t_stat, df, cv, p
data1 = open('data1.txt')
data2 = open('data2.txt')
alpha = 0.05
t_stat, df, cv, p = independent_ttest(data1, data2, alpha)
print('t=%.3f, df=%d, cv=%.3f, p=%.3f' % (t_stat, df, cv, p))
# interpret via critical value
if abs(t_stat) <= cv:
print('Accept null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')
# interpret via p-value
if p > alpha:
print('Accept null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')
While I run this script as python3 ttest.py , I am getting following error. I think I need to change the print statement, but can't able to do it.
Traceback (most recent call last):
File "t-test.py", line 28, in <module>
t_stat, df, cv, p = independent_ttest(data1, data2, alpha)
File "t-test.py", line 10, in independent_ttest
mean1, mean2 = mean(data1), mean(data2)
File "/home/kay/.local/lib/python3.5/site-packages/numpy/core/fromnumeric.py", line 3118, in mean
out=out, **kwargs)
File "/home/kay/.local/lib/python3.5/site-packages/numpy/core/_methods.py", line 87, in _mean
ret = ret / rcount
TypeError: unsupported operand type(s) for /: '_io.TextIOWrapper' and 'int'
So your issue is that you are opening the files but not reading the data from the file (or converting it to a list). Basically, opening a file just prepares the file to be read by Python - you need to read it separately.
Also as a quick sidenote, make sure to close the file when you are done or else you could run into issues if you run the code multiple times in quick succession. The code below should work for your needs, just replace the calls to open with this code, replacing file names and other details as needed. The array here is the data you are looking for to pass to independent_ttest.
array = []
with open("test1.txt") as file:
while value:=file.readline():
array.append(int(value))
print(array)
We open our file using with to make sure it is closed at the end.
Then we use a while loop to read each line. The := assigns each line to value as they are looped through.
Finally, for each value we convert it from string to int and then append it to our list.
Hope this helped! Let me know if you have any questions!

Cost function using absolute value and division by decision variables

I am trying to implement a cost function in a pydrake Mathematical program, however I encounter problems whenever I try to divide by a decision variable and use the abs(). A shortened version of my attempted implementation is as follows, I tried to include only what I think may be relevant.
T = 50
na = 3
nq = 5
prog = MathematicalProgram()
h = prog.NewContinuousVariables(rows=T, cols=1, name='h')
qd = prog.NewContinuousVariables(rows=T+1, cols=nq, name='qd')
d = prog.NewContinuousVariables(1, name='d')
u = prog.NewContinuousVariables(rows=T, cols=na, name='u')
def energyCost(vars):
assert vars.size == 2*na + 1 + 1
split_at = [na, 2*na, 2*na + 1]
qd, u, h, d = np.split(vars, split_at)
return np.abs([qd.dot(u)*h/d])
for t in range(T):
vars = np.concatenate((qd[t, 2:], u[t,:], h[t], d))
prog.AddCost(energyCost, vars=vars)
initial_guess = np.empty(prog.num_vars())
solver = SnoptSolver()
result = solver.Solve(prog, initial_guess)
The error I am getting is:
RuntimeError Traceback (most recent call last)
<ipython-input-55-111da18cdce0> in <module>()
22 initial_guess = np.empty(prog.num_vars())
23 solver = SnoptSolver()
---> 24 result = solver.Solve(prog, initial_guess)
25 print(f'Solution found? {result.is_success()}.')
RuntimeError: PyFunctionCost: Output must be of .ndim = 0 (scalar) and .size = 1. Got .ndim = 2 and .size = 1 instead.
To the best of my knowledge the problem is the dimensions of the output, however I am unsure of how to proceed. I spent quite some time trying to fix this, but with little success. I also tried changing np.abs to pydrake.math.abs, but then I got the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-56-c0c2f008616b> in <module>()
22 initial_guess = np.empty(prog.num_vars())
23 solver = SnoptSolver()
---> 24 result = solver.Solve(prog, initial_guess)
25 print(f'Solution found? {result.is_success()}.')
<ipython-input-56-c0c2f008616b> in energyCost(vars)
14 split_at = [na, 2*na, 2*na + 1]
15 qd, u, h, d = np.split(vars, split_at)
---> 16 return pydrake.math.abs([qd.dot(u)*h/d])
17
18 for t in range(T):
TypeError: abs(): incompatible function arguments. The following argument types are supported:
1. (arg0: float) -> float
2. (arg0: pydrake.autodiffutils.AutoDiffXd) -> pydrake.autodiffutils.AutoDiffXd
3. (arg0: pydrake.symbolic.Expression) -> pydrake.symbolic.Expression
Invoked with: [array([<AutoDiffXd 1.691961398933386e-257 nderiv=8>], dtype=object)]
Any help would be greatly appreciated, thanks!
BTW, as Tobia has mentioned, dividing a decision variable in the cost function could be problematic. There are two approaches to avoid the problem
Impose a bound on your decision variable, and 0 is not included in this bound. For example, say you want to optimize
min f(x) / y
If you can impose a bound that y > 1, then SNOPT will not try to use y=0, thus you avoid the division by zero problem.
One trick is to introduce another variable as the result of division, and then minimize this variable.
For example, say you want to optimize
min f(x) / y
You could introduce a slack variable z = f(x) / y. And formulate this problem as
min z
s.t f(x) - y * z = 0
Some observations:
The kind of cost function you are trying to use does not need the use of a python function to be enforced. You can just say (even though it would raise other errors as is) prog.AddCost(np.abs([qd[t, 2:].dot(u[t,:])*h[t]/d])).
The argument of prog.AddCost must be a Drake scalar expression. So be sure that your numpy matrix multiplications return a scalar. In the case above they return a (1,1) numpy array.
To minimize the absolute value, you need something a little more sophisticated than that. In the current form you are passing a nondifferentiable objective function: solvers do not quite like that. Say you want to minimize abs(x). A standard trick in optimization is to add an extra (slack) variable, say s, and add the constraints s >= x, s >= -x, and then minimize s itself. All these constraints and this objective are differentiable and linear.
Regarding the division of the objective by an optimization variable. Whenever you can, you should avoid that. For example (I'm 90% sure) that solvers like SNOPT or IPOPT set the initial guess to zero if you do not provide one. This implies that, if you do not provide a custom initial guess, at the first evaluation of the constraints, the solver will have a division by zero and it'll crash.

How to use the R 'with' operator in rpy2

I am doing an ordinal logistic regression, and following the guide here for the analysis: R Data Analysis Examples: Ordinal Logistic Regression
My dataframe (consult) looks like:
n raingarden es_score consult_case
garden_id
27436 7 0 3 0
27437 1 0 0 1
27439 1 1 1 1
37253 1 0 3 0
37256 3 0 0 0
I am at the part where I need to to create graph to test the proportional odds assumption, with the command in R as follows:
(s <- with(dat, summary(es_score ~ n + raingarden + consult_case, fun=sf)))
(es_score is an ordinal ranked score with values between 0 - 4; n is an integer; raingarden and consult_case, binary values of 0 or 1)
I have the sf function:
sf <- function(y) {
c('Y>=1' = qlogis(mean(y >= 1)),
'Y>=2' = qlogis(mean(y >= 2)),
'Y>=3' = qlogis(mean(y >= 3)))
}
in a utils.r file that I access as follows:
from rpy2.robjects.packages import STAP
with open('/file_path/utils.r', 'r') as f:
string = f.read()
sf = STAP(string, "sf")
And want to do something along the lines of:
R = ro.r
R.with(work_case_control, R.summary(formula, fun=sf))
The major problem is that the R withoperator is seen as a python keyword, so that even if I access it with ro.r.with it is still recognized as a python keyword. (As a side note: I tried using R's apply method instead, but got an error that TypeError: 'SignatureTranslatedAnonymousPackage' object is not callable ... I assume this is referring to my function sf?)
I also tried using the R assignment methods in rpy2 as follows:
R('sf = function(y) { c(\'Y>=1\' = qlogis(mean(y >= 1)), \'Y>=2\' = qlogis(mean(y >= 2)), \'Y>=3\' = qlogis(mean(y >= 3)))}')
R('s <- with({0}, summary(es_score~raingarden + consult_case, fun=sf)'.format(consult))
but ran into issues where the dataframe column names were somehow causing the error: RRuntimeError: Error in (function (file = "", n = NULL, text = NULL, prompt = "?", keep.source = getOption("keep.source"), :
<text>:1:19: unexpected symbol
1: s <- with( n raingarden
I could of course do this all in R, but I have a very involved ETL script in python, and would thus prefer to keep everything in python using rpy2 (I did try this using mord for scipy-learn to run my regreession, but it is pretty primitive).
Any suggestions would be most welcome right now.
EDIT
I tried various combinations #Parfait's suggestions, and qualifying the fun argument is syntactically incorrect, as per PyCharm interpreter (see image with red highlighting at end): ... it doesn't matter what the qualifier is, either, I always get an error
that SyntaxError: keyword can't be an expression.
On the other hand, with no qualifier, there is no syntax error: , but I do get the error TypeError: 'SignatureTranslatedAnonymousPackage' object is not callable when using the function sf as obtained:
from rpy2.robjects.packages import STAP
with open('/Users/gregsilverman/development/python/rest_api/rest_api/scripts/utils.r', 'r') as f:
string = f.read()
sf = STAP(string, "sf")
With that in mind, I created a package in R with the function sf, imported it, and tried various combos with the only one producing no error, being: print(base._with(consult_case_control, R.summary(formula, fun=gms.sf))) (gms is a reference to the package in R I made).
The output though makes no sense:
Length Class Mode
3 formula call
I am expecting a table ala the one on the UCLA site. Interesting. I am going to try recreating my analysis in R, just for the heck of it. I still would like to complete it in python though.
Consider bracketing the with call and be sure to qualify all arguments including fun:
ro.r['with'](work_case_control, ro.r.summary(formula, ro.r.summary.fun=sf))
Alternatively, import R's base package. And to avoid conflict with Python's named method with() translate the R name:
from rpy2.robjects.packages import importr
base = importr('base', robject_translations={'with': '_with'})
base._with(work_case_control, ro.r.summary(formula, ro.r.summary.fun=sf))
And be sure to properly create your formula. Consider using R's stats packages' as.formula to build from string. Notice too another translation is made due to naming conflict:
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})
formula = stats.as_formula('es_score ~ n + raingarden + consult_case')

What object to pass to R from rpy2?

I'm unable to make the following code work, though I don't see this error working strictly in R.
from rpy2.robjects.packages import importr
from rpy2 import robjects
import numpy as np
forecast = importr('forecast')
ts = robjects.r['ts']
y = np.random.randn(50)
X = np.random.randn(50)
y = ts(robjects.FloatVector(y), start=robjects.IntVector((2004, 1)), frequency=12)
X = ts(robjects.FloatVector(X), start=robjects.IntVector((2004, 1)), frequency=12)
forecast.Arima(y, xreg=X, order=robjects.IntVector((1, 0, 0)))
It's especially confusing considering the following code works fine
forecast.auto_arima(y, xreg=X)
I see the following traceback no matter what I give for X, using numpy interface or not. Any ideas?
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
<ipython-input-20-b781220efb93> in <module>()
13 X = ts(robjects.FloatVector(X), start=robjects.IntVector((2004, 1)), frequency=12)
14
---> 15 forecast.Arima(y, xreg=X, order=robjects.IntVector((1, 0, 0)))
/home/skipper/.local/lib/python2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
84 v = kwargs.pop(k)
85 kwargs[r_k] = v
---> 86 return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
/home/skipper/.local/lib/python2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
33 for k, v in kwargs.iteritems():
34 new_kwargs[k] = conversion.py2ri(v)
---> 35 res = super(Function, self).__call__(*new_args, **new_kwargs)
36 res = conversion.ri2py(res)
37 return res
RRuntimeError: Error in `colnames<-`(`*tmp*`, value = if (ncol(xreg) == 1) nmxreg else paste(nmxreg, :
length of 'dimnames' [2] not equal to array extent
Edit:
The problem is that the following lines of code do not evaluate to a column name, which seems to be the expectation on the R side.
sub = robjects.r['substitute']
deparse = robjects.r['deparse']
deparse(sub(X))
I don't know well enough what the expectations of this code should be in R, but I can't find an RPy2 object that passes this check by returning something of length == 1. This really looks like a bug to me.
R> length(deparse(substitute((rep(.2, 1000)))))
[1] 1
But in Rpy2
[~/]
[94]: robjects.r.length(robjects.r.deparse(robjects.r.substitute(robjects.r('rep(.2, 1000)'))))
[94]:
<IntVector - Python:0x7ce1560 / R:0x80adc28>
[ 78]
This is one manifestation (see this other related issue for example) of the same underlying issue: R expressions are evaluated lazily and can be manipulated within R and this leads to idioms that do not translate well (in Python expression are evaluated immediately, and one has to move to the AST to manipulate code).
An answers to the second part of your question. In R, substitute(rep(.2, 1000)) is passing the unevaluated expression rep(.2, 1000) to substitute(). Doing in rpy2
substitute('rep(.2, 1000)')`
is passing a string; the R equivalent would be
substitute("rep(.2, 1000)")
The following is letting you get close to R's deparse(substitute()):
from rpy2.robjects.packages import importr
base = importr('base')
from rpy2 import rinterface
# expression
e = rinterface.parse('rep(.2, 1000)')
dse = base.deparse(base.substitute(e))
>>> len(dse)
1
>>> print(dse) # not identical to R
"expression(rep(0.2, 1000))"
Currently, one way to work about this is to bind R objects to R symbols
(preferably in a dedicated environment rather than in GlobalEnv), and use
the symbols in an R call written as a string:
from rpy2.robjects import Environment, reval
env = Environment()
for k,v in (('y', y), ('xreg', X), ('order', robjects.IntVector((1, 0, 0)))):
env[k] = v
# make an expression
expr = rinterface.parse("forecast.Arima(y, xreg=X, order=order)")
# evaluate in the environment
res = reval(expr, envir=env)
This is not something I am happy about as a solution, but I have never found the time to work on a better solution.
edit: With rpy2-2.4.0 it becomes possible to use R symbols and do the following:
RSymbol = robjects.rinterface.SexpSymbol
pairlist = (('x', RSymbol('y')),
('xreg', RSymbol('xreg')),
('order', RSymbol('order')))
res = forecast.Arima.rcall(pairlist,
env)
This is not yet the most intuitive interface. May be something using a context manager would be better.
there is a way to just simply pass your variables to R without sub-situations and return the results back to python. You can find a simple example here https://stackoverflow.com/a/55900840/5350311 . I guess it is more clear what you are passing to R and what you will get back in return, specially if you are working with For loops and large number of variables.

Categories