What object to pass to R from rpy2? - python

I'm unable to make the following code work, though I don't see this error working strictly in R.
from rpy2.robjects.packages import importr
from rpy2 import robjects
import numpy as np
forecast = importr('forecast')
ts = robjects.r['ts']
y = np.random.randn(50)
X = np.random.randn(50)
y = ts(robjects.FloatVector(y), start=robjects.IntVector((2004, 1)), frequency=12)
X = ts(robjects.FloatVector(X), start=robjects.IntVector((2004, 1)), frequency=12)
forecast.Arima(y, xreg=X, order=robjects.IntVector((1, 0, 0)))
It's especially confusing considering the following code works fine
forecast.auto_arima(y, xreg=X)
I see the following traceback no matter what I give for X, using numpy interface or not. Any ideas?
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
<ipython-input-20-b781220efb93> in <module>()
13 X = ts(robjects.FloatVector(X), start=robjects.IntVector((2004, 1)), frequency=12)
14
---> 15 forecast.Arima(y, xreg=X, order=robjects.IntVector((1, 0, 0)))
/home/skipper/.local/lib/python2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
84 v = kwargs.pop(k)
85 kwargs[r_k] = v
---> 86 return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
/home/skipper/.local/lib/python2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
33 for k, v in kwargs.iteritems():
34 new_kwargs[k] = conversion.py2ri(v)
---> 35 res = super(Function, self).__call__(*new_args, **new_kwargs)
36 res = conversion.ri2py(res)
37 return res
RRuntimeError: Error in `colnames<-`(`*tmp*`, value = if (ncol(xreg) == 1) nmxreg else paste(nmxreg, :
length of 'dimnames' [2] not equal to array extent
Edit:
The problem is that the following lines of code do not evaluate to a column name, which seems to be the expectation on the R side.
sub = robjects.r['substitute']
deparse = robjects.r['deparse']
deparse(sub(X))
I don't know well enough what the expectations of this code should be in R, but I can't find an RPy2 object that passes this check by returning something of length == 1. This really looks like a bug to me.
R> length(deparse(substitute((rep(.2, 1000)))))
[1] 1
But in Rpy2
[~/]
[94]: robjects.r.length(robjects.r.deparse(robjects.r.substitute(robjects.r('rep(.2, 1000)'))))
[94]:
<IntVector - Python:0x7ce1560 / R:0x80adc28>
[ 78]

This is one manifestation (see this other related issue for example) of the same underlying issue: R expressions are evaluated lazily and can be manipulated within R and this leads to idioms that do not translate well (in Python expression are evaluated immediately, and one has to move to the AST to manipulate code).
An answers to the second part of your question. In R, substitute(rep(.2, 1000)) is passing the unevaluated expression rep(.2, 1000) to substitute(). Doing in rpy2
substitute('rep(.2, 1000)')`
is passing a string; the R equivalent would be
substitute("rep(.2, 1000)")
The following is letting you get close to R's deparse(substitute()):
from rpy2.robjects.packages import importr
base = importr('base')
from rpy2 import rinterface
# expression
e = rinterface.parse('rep(.2, 1000)')
dse = base.deparse(base.substitute(e))
>>> len(dse)
1
>>> print(dse) # not identical to R
"expression(rep(0.2, 1000))"
Currently, one way to work about this is to bind R objects to R symbols
(preferably in a dedicated environment rather than in GlobalEnv), and use
the symbols in an R call written as a string:
from rpy2.robjects import Environment, reval
env = Environment()
for k,v in (('y', y), ('xreg', X), ('order', robjects.IntVector((1, 0, 0)))):
env[k] = v
# make an expression
expr = rinterface.parse("forecast.Arima(y, xreg=X, order=order)")
# evaluate in the environment
res = reval(expr, envir=env)
This is not something I am happy about as a solution, but I have never found the time to work on a better solution.
edit: With rpy2-2.4.0 it becomes possible to use R symbols and do the following:
RSymbol = robjects.rinterface.SexpSymbol
pairlist = (('x', RSymbol('y')),
('xreg', RSymbol('xreg')),
('order', RSymbol('order')))
res = forecast.Arima.rcall(pairlist,
env)
This is not yet the most intuitive interface. May be something using a context manager would be better.

there is a way to just simply pass your variables to R without sub-situations and return the results back to python. You can find a simple example here https://stackoverflow.com/a/55900840/5350311 . I guess it is more clear what you are passing to R and what you will get back in return, specially if you are working with For loops and large number of variables.

Related

Error with quantifier in Z3Py

I would like Z3 to check whether it exists an integer t that satisfies my formula. I'm getting the following error:
Traceback (most recent call last):
File "D:/z3-4.6.0-x64-win/bin/python/Expl20180725.py", line 18, in <module>
g = ForAll(t, f1(t) == And(t>=0, t<10, user[t].rights == ["read"] ))
TypeError: list indices must be integers or slices, not ArithRef
Code:
from z3 import *
import random
from random import randrange
class Struct:
def __init__(self, **entries): self.__dict__.update(entries)
user = [Struct() for i in range(10)]
for i in range(10):
user[i].uid = i
user[i].rights = random.choice(["create","execute","read"])
s=Solver()
f1 = Function('f1', IntSort(), BoolSort())
t = Int('t')
f2 = Exists(t, f1(t))
g = ForAll(t, f1(t) == And(t>=0, t<10, user[t].rights == ["read"] ))
s.add(g)
s.add(f2)
print(s.check())
print(s.model())
You are mixing and matching Python and Z3 expressions, and while that is the whole point of Z3py, it definitely does not mean that you can mix/match them arbitrarily. In general, you should keep all the "concrete" parts in Python, and relegate the symbolic parts to "z3"; carefully coordinating the interaction in between. In your particular case, you are accessing a Python list (your user) with a symbolic z3 integer (t), and that is certainly not something that is allowed. You have to use a Z3 symbolic Array to access with a symbolic index.
The other issue is the use of strings ("create"/"read" etc.) and expecting them to have meanings in the symbolic world. That is also not how z3py is intended to be used. If you want them to mean something in the symbolic world, you'll have to model them explicitly.
I'd strongly recommend reading through http://ericpony.github.io/z3py-tutorial/guide-examples.htm which is a great introduction to z3py including many of the advanced features.
Having said all that, I'd be inclined to code your example as follows:
from z3 import *
import random
Right, (create, execute, read) = EnumSort('Right', ('create', 'execute', 'read'))
users = Array('Users', IntSort(), Right)
for i in range(10):
users = Store(users, i, random.choice([create, execute, read]))
s = Solver()
t = Int('t')
s.add(t >= 0)
s.add(t < 10)
s.add(users[t] == read)
r = s.check()
if r == sat:
print s.model()[t]
else:
print r
Note how the enumerated type Right in the symbolic land is used to model your "permissions."
When I run this program multiple times, I get:
$ python a.py
5
$ python a.py
9
$ python a.py
unsat
$ python a.py
6
Note how unsat is produced, if it happens that the "random" initialization didn't put any users with a read permission.

How to use the R 'with' operator in rpy2

I am doing an ordinal logistic regression, and following the guide here for the analysis: R Data Analysis Examples: Ordinal Logistic Regression
My dataframe (consult) looks like:
n raingarden es_score consult_case
garden_id
27436 7 0 3 0
27437 1 0 0 1
27439 1 1 1 1
37253 1 0 3 0
37256 3 0 0 0
I am at the part where I need to to create graph to test the proportional odds assumption, with the command in R as follows:
(s <- with(dat, summary(es_score ~ n + raingarden + consult_case, fun=sf)))
(es_score is an ordinal ranked score with values between 0 - 4; n is an integer; raingarden and consult_case, binary values of 0 or 1)
I have the sf function:
sf <- function(y) {
c('Y>=1' = qlogis(mean(y >= 1)),
'Y>=2' = qlogis(mean(y >= 2)),
'Y>=3' = qlogis(mean(y >= 3)))
}
in a utils.r file that I access as follows:
from rpy2.robjects.packages import STAP
with open('/file_path/utils.r', 'r') as f:
string = f.read()
sf = STAP(string, "sf")
And want to do something along the lines of:
R = ro.r
R.with(work_case_control, R.summary(formula, fun=sf))
The major problem is that the R withoperator is seen as a python keyword, so that even if I access it with ro.r.with it is still recognized as a python keyword. (As a side note: I tried using R's apply method instead, but got an error that TypeError: 'SignatureTranslatedAnonymousPackage' object is not callable ... I assume this is referring to my function sf?)
I also tried using the R assignment methods in rpy2 as follows:
R('sf = function(y) { c(\'Y>=1\' = qlogis(mean(y >= 1)), \'Y>=2\' = qlogis(mean(y >= 2)), \'Y>=3\' = qlogis(mean(y >= 3)))}')
R('s <- with({0}, summary(es_score~raingarden + consult_case, fun=sf)'.format(consult))
but ran into issues where the dataframe column names were somehow causing the error: RRuntimeError: Error in (function (file = "", n = NULL, text = NULL, prompt = "?", keep.source = getOption("keep.source"), :
<text>:1:19: unexpected symbol
1: s <- with( n raingarden
I could of course do this all in R, but I have a very involved ETL script in python, and would thus prefer to keep everything in python using rpy2 (I did try this using mord for scipy-learn to run my regreession, but it is pretty primitive).
Any suggestions would be most welcome right now.
EDIT
I tried various combinations #Parfait's suggestions, and qualifying the fun argument is syntactically incorrect, as per PyCharm interpreter (see image with red highlighting at end): ... it doesn't matter what the qualifier is, either, I always get an error
that SyntaxError: keyword can't be an expression.
On the other hand, with no qualifier, there is no syntax error: , but I do get the error TypeError: 'SignatureTranslatedAnonymousPackage' object is not callable when using the function sf as obtained:
from rpy2.robjects.packages import STAP
with open('/Users/gregsilverman/development/python/rest_api/rest_api/scripts/utils.r', 'r') as f:
string = f.read()
sf = STAP(string, "sf")
With that in mind, I created a package in R with the function sf, imported it, and tried various combos with the only one producing no error, being: print(base._with(consult_case_control, R.summary(formula, fun=gms.sf))) (gms is a reference to the package in R I made).
The output though makes no sense:
Length Class Mode
3 formula call
I am expecting a table ala the one on the UCLA site. Interesting. I am going to try recreating my analysis in R, just for the heck of it. I still would like to complete it in python though.
Consider bracketing the with call and be sure to qualify all arguments including fun:
ro.r['with'](work_case_control, ro.r.summary(formula, ro.r.summary.fun=sf))
Alternatively, import R's base package. And to avoid conflict with Python's named method with() translate the R name:
from rpy2.robjects.packages import importr
base = importr('base', robject_translations={'with': '_with'})
base._with(work_case_control, ro.r.summary(formula, ro.r.summary.fun=sf))
And be sure to properly create your formula. Consider using R's stats packages' as.formula to build from string. Notice too another translation is made due to naming conflict:
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})
formula = stats.as_formula('es_score ~ n + raingarden + consult_case')

Python - integration with rpy2 and 'must be atomic' error

While using package rpy2, I get the error
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...)
: 'x' must be atomic Traceback (most recent call last): File
"", line 1, in File
"/usr/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line
86, in call
return super(SignatureTranslatedFunction, self).call(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line
35, in call
res = super(Function, self).call(*new_args, **new_kwargs) rpy2.rinterface.RRuntimeError: Error in sort.int(x, na.last = na.last,
decreasing = decreasing, ...) : 'x' must be atomic
when executing
file.R_func.rdc([1,2,3,4,5],[1,3,4,5,6],20,1.67)
where file.py is defined as follows:
from rpy2.robjects.packages import SignatureTranslatedAnonymousPackage
string = """
rdc <- function(x,y,k,s) {
x <- cbind(apply(as.matrix(x),2,function(u) ecdf(u)(u)),1)
y <- cbind(apply(as.matrix(y),2,function(u) ecdf(u)(u)),1)
wx <- matrix(rnorm(ncol(x)*k,0,s),ncol(x),k)
wy <- matrix(rnorm(ncol(y)*k,0,s),ncol(y),k)
cancor(cbind(cos(x%*%wx),sin(x%*%wx)), cbind(cos(y%*%wy),sin(y%*%wy)))$cor[1]
}
"""
R_func = SignatureTranslatedAnonymousPackage(string, "R_func")
How do I have to pass x and y to rdc()?
When doing
file.R_func.rdc([1,2,3,4,5],[1,3,4,5,6],20,1.67)
an implicit conversion of Python objects is performed before passing them as parameters to the underlying R function.
By default, [1,2,3,4,5] (which is a Python list) will be converted to an R list and R lists are "non-atomic vectors", meaning that each element in the list can be an arbitrary object by opposition to an "atomic" type such as boolean ("logical" in R lingo), an integer, a string, etc...
Try:
from rpy2.robjects.vectors import IntVector, FloatVector
# FloatVector is imported as an alternative if you need/prefer floats
file.R_func.rdc(IntVector([1,2,3,4,5]),
IntVector([1,3,4,5,6]),
20,
1.67)

Call anova on lme4.lmer output via RPy

I am trying to analyze the deviance of a set of linear models generated with lme4.lmer() via RPy. This notebook here shows a full example with me importing my deps, loading my files, running my lme4.lmer() and failing to get anova to run on them.
For your convenience here is again a paste of the line that is failing and which I would like to see work.
compare = stats.anova(res[0], res[1], res[2])
Error in Ops.data.frame(data, data[[1]]) :
list of length 3 not meaningful
In addition: Warning message:
In anova.merMod(<S4 object of class "lmerMod">, <S4 object of class "lmerMod">, :
failed to find unique model names, assigning generic names
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
<ipython-input-47-fe0ffa3b55de> in <module>()
----> 1 compare = stats.anova(res[0], res[1], res[2])
/usr/lib64/python2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, args, **kwargs)
84 v = kwargs.pop(k)
85 kwargs[r_k] = v
---> 86 return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
/usr/lib64/python2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
33 for k, v in kwargs.iteritems():
34 new_kwargs[k] = conversion.py2ri(v)
---> 35 res = super(Function, self).__call__(*new_args, **new_kwargs)
36 res = conversion.ri2py(res)
37 return res
RRuntimeError: Error in Ops.data.frame(data, data[[1]]) :
list of length 3 not meaningful
This code runs perfectly in R as:
> mydata = read.csv("http://chymera.eu/data/test/r_data.csv")
> library(lme4)
Loading required package: lattice
Loading required package: Matrix
> lme1 = lme4.lmer(formula='RT~cat2 + (1|ID)', data=mydata, REML=FALSE)
Error: could not find function "lme4.lmer"
> lme1 = lmer(formula='RT~cat1 + (1|ID)', data=mydata, REML=FALSE)
> lme2 = lmer(formula='RT~cat2 + (1|ID)', data=mydata, REML=FALSE)
> anova(lme1,lme2)
> lme3 = lmer(formula='RT~cat2*cat1 + (1|ID)', data=mydata, REML=FALSE)
> stats::anova(lme1, lme2, lme3)
Data: mydata
Models:
lme1: RT ~ cat1 + (1 | ID)
lme2: RT ~ cat2 + (1 | ID)
lme3: RT ~ cat2 * cat1 + (1 | ID)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
lme1 4 116.68 122.29 -54.342 108.68
lme2 4 149.59 155.19 -70.793 141.59 0.000 0 1
lme3 6 117.19 125.59 -52.594 105.19 36.398 2 1.248e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Can you help me make it run in RPy as well?
When in R stats::anova() is presumably inferring the model names from the unevaluated expressions in the function call. Here that is lme1, 'lme2, and lme3.
Now consider rewriting your R code without the use of variable names, as this would be closer to what is happening in your current implementation with rpy2 as the data DataFrame and the fitted models are not bound to a variable name. This would give what follows (note: "closer" not "equal" - details about this would just distract from the main point):
stats::anova(lmer(formula='RT~cat1 + (1|ID)',
data=read.csv("http://chymera.eu/data/test/r_data.csv"),
REML=FALSE),
lmer(formula='RT~cat2 + (1|ID)',
data=read.csv("http://chymera.eu/data/test/r_data.csv"),
REML=FALSE),
lmer(formula='RT~cat2*cat1 + (1|ID)',
data=read.csv("http://chymera.eu/data/test/r_data.csv"),
REML=FALSE))
The outcome is an error in R.
Error in names(mods) <- sub("#env$", "", mNms) :
'names' attribute [6] must be the same length as the vector [3]
In addition: Warning message:
In anova.merMod(lmer(formula = "RT~cat1 + (1|ID)", data = read.csv("http://chymera.eu/data/test/r_data.csv"), :
failed to find unique model names, assigning generic names
What this suggests is that the R function lme4:::anova.meMod is making assumptions that can easily be violated, and the authors of the package should be notified.
It is also showing that expressions will be used to identify the model in the resulting text output.
The following is probably lacking a bit of elegance, but should be both a workaround and a way to keep labels for the models short.
# bind the DataFrame to an R symbol
robjects.globalenv['dataf'] = dfr
# build models, letting R fetch the symbol `dataf` when it is evaluating
# the parameters in the function call
res = list()
for formula in formulae:
lme_res = lme4.lmer(formula=formula, data=base.as_symbol("dataf"), REML='false')
res.append(lme_res)
# This is enough to work around the problem
compare = stats.anova(res[0], res[1], res[2])
# if not happy with the model names displayed by `compare`,
# globalenv can be filled further
names = list()
for i, value in enumerate(res):
names.append('lme%i' % i)
robjects.globalenv[names[i]] = value
# call `anova`
compare = stats.anova(*[base.as_symbol(x) for x in names])
This is a bug in the anova method for merMod objects: it's essentially caused by the names of the objects being passed to R being too long, so that when deparse()d they end up being character vectors with (unexpectedly) more than one element. This is fixed by https://github.com/lme4/lme4/commit/075c78d128db9d8398f43474621e49f32fdb5bd1 ; there is also now an (undocumented) argument model.names that can be specified to override the deparsing of model names.
You can install the development version using devtools::install_github("lme4","lme4"), otherwise you may have to wait a while for a patched version to be released ... can't think of a workaround other than structuring your call so that language objects that get passed to R are shorter when deparsed ...

Generating LaTeX tables from R summary with RPy and xtable

I am running a few linear model fits in python (using R as a backend via RPy) and I would like to export some LaTeX tables with my R "summary" data.
This thread explains quite well how to do it in R (with the xtable function), but I cannot figure out how to implement this in RPy.
The only relevant thing searches such as "Chunk RPy" or "xtable RPy" returned was this, which seems to load the package in python but not to use it :-/
Here's an example of how I use RPy and what happens.
And this would be the error without bothering to load any data:
from rpy2.robjects.packages import importr
xtable = importr('xtable')
latex = xtable('')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-131-8b38f31b5bb9> in <module>()
----> 1 latex = xtable(res_sum)
2 print latex
TypeError: 'SignatureTranslatedPackage' object is not callable
I have tried using the stargazer package instead of xtable and I get the same error.
Ok, I solved it, and I'm a bit ashamed to say that it was a total no-brainer.
You just have to call the functions as xtable.xtable() or stargazer.stargazer().
To easily generate TeX data from Python, I wrote the following function;
import re
def tformat(txt, v):
"""Replace the variables between [] in raw text with the contents
of the named variables. Between the [] there should be a variable name,
a colon and a formatting specification. E.g. [smin:.2f] would give the
value of the smin variable printed as a float with two decimal digits.
:txt: The text to search for replacements
:v: Dictionary to use for variables.
:returns: The txt string with variables substituted by their formatted
values.
"""
rx = re.compile(r'\[(\w+)(\[\d+\])?:([^\]]+)\]')
matches = rx.finditer(txt)
for m in matches:
nm, idx, fmt = m.groups()
try:
if idx:
idx = int(idx[1:-1])
r = format(v[nm][idx], fmt)
else:
r = format(v[nm], fmt)
txt = txt.replace(m.group(0), r)
except KeyError:
raise ValueError('Variable "{}" not found'.format(nm))
return txt
You can use any variable name from the dictionary in the text that you pass to this function and have it replaced by the formatted value of that variable.
What I tend to do is to do my calculations in Python, and then pass the output of the globals() function as the second parameter of tformat:
smin = 235.0
smax = 580.0
lst = [0, 1, 2, 3, 4]
t = r'''The strength of the steel lies between SI{[smin:.0f]}{MPa} and \SI{[smax:.0f]}{MPa}. lst[2] = [lst[2]:d].'''
print tformat(t, globals())
Feel free to use this. I put it in the public domain.
Edit: I'm not sure what you mean by "linear model fits", but might numpy.polyfit do what you want in Python?
To resolve your problem, please update stargazer to version 4.5.3, now available on CRAN. Your example should then work perfectly.

Categories