How to use the R 'with' operator in rpy2 - python

I am doing an ordinal logistic regression, and following the guide here for the analysis: R Data Analysis Examples: Ordinal Logistic Regression
My dataframe (consult) looks like:
n raingarden es_score consult_case
garden_id
27436 7 0 3 0
27437 1 0 0 1
27439 1 1 1 1
37253 1 0 3 0
37256 3 0 0 0
I am at the part where I need to to create graph to test the proportional odds assumption, with the command in R as follows:
(s <- with(dat, summary(es_score ~ n + raingarden + consult_case, fun=sf)))
(es_score is an ordinal ranked score with values between 0 - 4; n is an integer; raingarden and consult_case, binary values of 0 or 1)
I have the sf function:
sf <- function(y) {
c('Y>=1' = qlogis(mean(y >= 1)),
'Y>=2' = qlogis(mean(y >= 2)),
'Y>=3' = qlogis(mean(y >= 3)))
}
in a utils.r file that I access as follows:
from rpy2.robjects.packages import STAP
with open('/file_path/utils.r', 'r') as f:
string = f.read()
sf = STAP(string, "sf")
And want to do something along the lines of:
R = ro.r
R.with(work_case_control, R.summary(formula, fun=sf))
The major problem is that the R withoperator is seen as a python keyword, so that even if I access it with ro.r.with it is still recognized as a python keyword. (As a side note: I tried using R's apply method instead, but got an error that TypeError: 'SignatureTranslatedAnonymousPackage' object is not callable ... I assume this is referring to my function sf?)
I also tried using the R assignment methods in rpy2 as follows:
R('sf = function(y) { c(\'Y>=1\' = qlogis(mean(y >= 1)), \'Y>=2\' = qlogis(mean(y >= 2)), \'Y>=3\' = qlogis(mean(y >= 3)))}')
R('s <- with({0}, summary(es_score~raingarden + consult_case, fun=sf)'.format(consult))
but ran into issues where the dataframe column names were somehow causing the error: RRuntimeError: Error in (function (file = "", n = NULL, text = NULL, prompt = "?", keep.source = getOption("keep.source"), :
<text>:1:19: unexpected symbol
1: s <- with( n raingarden
I could of course do this all in R, but I have a very involved ETL script in python, and would thus prefer to keep everything in python using rpy2 (I did try this using mord for scipy-learn to run my regreession, but it is pretty primitive).
Any suggestions would be most welcome right now.
EDIT
I tried various combinations #Parfait's suggestions, and qualifying the fun argument is syntactically incorrect, as per PyCharm interpreter (see image with red highlighting at end): ... it doesn't matter what the qualifier is, either, I always get an error
that SyntaxError: keyword can't be an expression.
On the other hand, with no qualifier, there is no syntax error: , but I do get the error TypeError: 'SignatureTranslatedAnonymousPackage' object is not callable when using the function sf as obtained:
from rpy2.robjects.packages import STAP
with open('/Users/gregsilverman/development/python/rest_api/rest_api/scripts/utils.r', 'r') as f:
string = f.read()
sf = STAP(string, "sf")
With that in mind, I created a package in R with the function sf, imported it, and tried various combos with the only one producing no error, being: print(base._with(consult_case_control, R.summary(formula, fun=gms.sf))) (gms is a reference to the package in R I made).
The output though makes no sense:
Length Class Mode
3 formula call
I am expecting a table ala the one on the UCLA site. Interesting. I am going to try recreating my analysis in R, just for the heck of it. I still would like to complete it in python though.

Consider bracketing the with call and be sure to qualify all arguments including fun:
ro.r['with'](work_case_control, ro.r.summary(formula, ro.r.summary.fun=sf))
Alternatively, import R's base package. And to avoid conflict with Python's named method with() translate the R name:
from rpy2.robjects.packages import importr
base = importr('base', robject_translations={'with': '_with'})
base._with(work_case_control, ro.r.summary(formula, ro.r.summary.fun=sf))
And be sure to properly create your formula. Consider using R's stats packages' as.formula to build from string. Notice too another translation is made due to naming conflict:
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})
formula = stats.as_formula('es_score ~ n + raingarden + consult_case')

Related

Python (for Revit Dynamo): unexpected token in while loop iteration

I'm a Python noobie trying to find my way around using it for Dynamo. I've had quite a bit of success using simple while loops/nested ifs to neaten my Dynamo scripts; however, I've been stumped by this recent error.
I'm attempting to pull in lists of data (flow rates from pipe fittings) and then output the maximum flow rate of each fitting by comparing the indices of each list (a cross fitting would have 4 flow rates in Revit, I'm comparing each pipe inlet/outlet flow rate and calculating the maximum for sizing purposes). For some reason, adding lists in the while loop and iterating the indices gives me the "unexpected token" error which I presume is related to "i += 1" according to online debuggers.
I've been using this while loop code format for a bit now and it has always worked for non-listed related iterations. Can anyone give me some guidance here?
Thank you in advance!
Error in Dynamo:
Warning: IronPythonEvaluator.EvaluateIronPythonScript
operation failed.
unexpected token 'i'
Code used:
import sys
import clr
clr.AddReference('ProtoGeometry')
from Autodesk.DesignScript.Geometry import *
dataEnteringNode = IN
a = IN[0]
b = IN[1]
c = IN[2]
d = IN[3]
start = 0
end = 3
i = start
y=[]
while i < end:
y.append(max( (a[i], b[i], c[i] ))
i += 1
OUT = y

Error with quantifier in Z3Py

I would like Z3 to check whether it exists an integer t that satisfies my formula. I'm getting the following error:
Traceback (most recent call last):
File "D:/z3-4.6.0-x64-win/bin/python/Expl20180725.py", line 18, in <module>
g = ForAll(t, f1(t) == And(t>=0, t<10, user[t].rights == ["read"] ))
TypeError: list indices must be integers or slices, not ArithRef
Code:
from z3 import *
import random
from random import randrange
class Struct:
def __init__(self, **entries): self.__dict__.update(entries)
user = [Struct() for i in range(10)]
for i in range(10):
user[i].uid = i
user[i].rights = random.choice(["create","execute","read"])
s=Solver()
f1 = Function('f1', IntSort(), BoolSort())
t = Int('t')
f2 = Exists(t, f1(t))
g = ForAll(t, f1(t) == And(t>=0, t<10, user[t].rights == ["read"] ))
s.add(g)
s.add(f2)
print(s.check())
print(s.model())
You are mixing and matching Python and Z3 expressions, and while that is the whole point of Z3py, it definitely does not mean that you can mix/match them arbitrarily. In general, you should keep all the "concrete" parts in Python, and relegate the symbolic parts to "z3"; carefully coordinating the interaction in between. In your particular case, you are accessing a Python list (your user) with a symbolic z3 integer (t), and that is certainly not something that is allowed. You have to use a Z3 symbolic Array to access with a symbolic index.
The other issue is the use of strings ("create"/"read" etc.) and expecting them to have meanings in the symbolic world. That is also not how z3py is intended to be used. If you want them to mean something in the symbolic world, you'll have to model them explicitly.
I'd strongly recommend reading through http://ericpony.github.io/z3py-tutorial/guide-examples.htm which is a great introduction to z3py including many of the advanced features.
Having said all that, I'd be inclined to code your example as follows:
from z3 import *
import random
Right, (create, execute, read) = EnumSort('Right', ('create', 'execute', 'read'))
users = Array('Users', IntSort(), Right)
for i in range(10):
users = Store(users, i, random.choice([create, execute, read]))
s = Solver()
t = Int('t')
s.add(t >= 0)
s.add(t < 10)
s.add(users[t] == read)
r = s.check()
if r == sat:
print s.model()[t]
else:
print r
Note how the enumerated type Right in the symbolic land is used to model your "permissions."
When I run this program multiple times, I get:
$ python a.py
5
$ python a.py
9
$ python a.py
unsat
$ python a.py
6
Note how unsat is produced, if it happens that the "random" initialization didn't put any users with a read permission.

Generating LaTeX tables from R summary with RPy and xtable

I am running a few linear model fits in python (using R as a backend via RPy) and I would like to export some LaTeX tables with my R "summary" data.
This thread explains quite well how to do it in R (with the xtable function), but I cannot figure out how to implement this in RPy.
The only relevant thing searches such as "Chunk RPy" or "xtable RPy" returned was this, which seems to load the package in python but not to use it :-/
Here's an example of how I use RPy and what happens.
And this would be the error without bothering to load any data:
from rpy2.robjects.packages import importr
xtable = importr('xtable')
latex = xtable('')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-131-8b38f31b5bb9> in <module>()
----> 1 latex = xtable(res_sum)
2 print latex
TypeError: 'SignatureTranslatedPackage' object is not callable
I have tried using the stargazer package instead of xtable and I get the same error.
Ok, I solved it, and I'm a bit ashamed to say that it was a total no-brainer.
You just have to call the functions as xtable.xtable() or stargazer.stargazer().
To easily generate TeX data from Python, I wrote the following function;
import re
def tformat(txt, v):
"""Replace the variables between [] in raw text with the contents
of the named variables. Between the [] there should be a variable name,
a colon and a formatting specification. E.g. [smin:.2f] would give the
value of the smin variable printed as a float with two decimal digits.
:txt: The text to search for replacements
:v: Dictionary to use for variables.
:returns: The txt string with variables substituted by their formatted
values.
"""
rx = re.compile(r'\[(\w+)(\[\d+\])?:([^\]]+)\]')
matches = rx.finditer(txt)
for m in matches:
nm, idx, fmt = m.groups()
try:
if idx:
idx = int(idx[1:-1])
r = format(v[nm][idx], fmt)
else:
r = format(v[nm], fmt)
txt = txt.replace(m.group(0), r)
except KeyError:
raise ValueError('Variable "{}" not found'.format(nm))
return txt
You can use any variable name from the dictionary in the text that you pass to this function and have it replaced by the formatted value of that variable.
What I tend to do is to do my calculations in Python, and then pass the output of the globals() function as the second parameter of tformat:
smin = 235.0
smax = 580.0
lst = [0, 1, 2, 3, 4]
t = r'''The strength of the steel lies between SI{[smin:.0f]}{MPa} and \SI{[smax:.0f]}{MPa}. lst[2] = [lst[2]:d].'''
print tformat(t, globals())
Feel free to use this. I put it in the public domain.
Edit: I'm not sure what you mean by "linear model fits", but might numpy.polyfit do what you want in Python?
To resolve your problem, please update stargazer to version 4.5.3, now available on CRAN. Your example should then work perfectly.

Python-RPy - "Error in x$terms : $ operator is invalid for atomic vectors"

I am new in RPy, so excuse me, if my question is trivial. I'm trying to write the top solution from this topic: Screening (multi)collinearity in a regression model in Python, but I get following error:
rpy.RPy_RException: Error in x$terms : $ operator is invalid for atomic vectors
Code I wrote:
from rpy import *
r.set_seed(42)
a=r.rnorm(100)
b=r.rnorm(100)
m=r.model_matrix('~a+b')
What am I doing wrong?
EDIT:
using reply written by agstudy (thank You for help!) I prepared solution working for rpy2
from rpy2 import robjects
rset_seed = robjects.r('set.seed')
fmla = robjects.Formula('~a+b')
model_matrix = robjects.r('model.matrix')
rnorm = robjects.r('rnorm')
rset_seed(42)
env = fmla.environment
env['a']=rnorm(100)
env['b']=rnorm(100)
m=model_matrix(fmla)
This should work
fmla = r.Formula('~a+b')
env = fmla.environment
env['a'] = a
env['b'] = b
r.model_matrix(fmla)
In R, you can reproduce the error
set_seed(42)
a=rnorm(100)
b=rnorm(100)
m=model.matrix('~a+b')
Error: $ operator is invalid for atomic vectors
m=model.matrix(formula('~a+b')) ## this works
(Intercept) a b
1 1 -0.1011361 0.4354445
2 1 0.3782215 -1.5322641
3 1 1.4772023 0.3280948
4 1 0.2892421 1.9012016
5 1 -0.2596562 0.2036678
6 1 -0.5585396 -0.1536021

ggplot2 hell with rpy2-2.0.7 + python 2.6 + r 2.11 (windows 7)

i am using rpy2-2.0.7 (i need this to work with windows 7, and compiling the binaries for the newer rpy2 versions is a mess) to push a two-column dataframe into r, create a few layers in ggplot2, and output the image into a <.png>.
i have wasted countless hours fidgeting around with the syntax; i did manage to output the files i needed at one point, but (stupidly) did not notice and continued fidgeting around with my code ...
i would sincerely appreciate any help; below is a (trivial) example for demonstration. Thank you very much for your help!!! ~ Eric Butter
import rpy2.robjects as rob
from rpy2.robjects import r
import rpy2.rlike.container as rlc
from array import array
r.library("grDevices") # import r graphics package with rpy2
r.library("lattice")
r.library("ggplot2")
r.library("reshape")
picpath = 'foo.png'
d1 = ["cat","dog","mouse"]
d2 = array('f',[1.0,2.0,3.0])
nums = rob.RVector(d2)
name = rob.StrVector(d1)
tl = rlc.TaggedList([nums, name], tags = ('nums', 'name'))
dataf = rob.RDataFrame(tl)
## r['png'](file=picpath, width=300, height=300)
## r['ggplot'](data=dataf)+r['aes_string'](x='nums')+r['geom_bar'](fill='name')+r['stat_bin'](binwidth=0.1)
r['ggplot'](data=dataf)
r['aes_string'](x='nums')
r['geom_bar'](fill='name')
r['stat_bin'](binwidth=0.1)
r['ggsave']()
## r['dev.off']()
*The output is just a blank image (181 b).
here are a couple common errors R itself throws as I fiddle around in ggplot2:
r['png'](file=picpath, width=300, height=300)
r['ggplot']()
r['layer'](dataf, x=nums, fill=name, geom="bar")
r['geom_histogram']()
r['stat_bin'](binwidth=0.1)
r['ggsave'](file=picpath)
r['dev.off']()
*RRuntimeError: Error: No layers in plot
r['png'](file=picpath, width=300, height=300)
r['ggplot'](data=dataf)
r['aes'](geom="bar")
r['geom_bar'](x=nums, fill=name)
r['stat_bin'](binwidth=0.1)
r['ggsave'](file=picpath)
r['dev.off']()
*RRuntimeError: Error: When setting aesthetics, they may only take one value. Problems: fill,x
I use rpy2 solely through Nathaniel Smith's brilliant little module called rnumpy (see the "API" link at the rnumpy home page). With this you can do:
from rnumpy import *
r.library("ggplot2")
picpath = 'foo.png'
name = ["cat","dog","mouse"]
nums = [1.0,2.0,3.0]
r["dataf"] = r.data_frame(name=name, nums=nums)
r("p <- ggplot(dataf, aes(name, nums, fill=name)) + geom_bar(stat='identity')")
r.ggsave(picpath)
(I'm guessing a little about how you want the plot to look, but you get the idea.)
Another great convenience is entering "R mode" from Python with the ipy_rnumpy module. (See the "IPython integration" link at the rnumpy home page).
For complicated stuff, I usually prototype in R until I have the plotting commands worked out. Error reporting in rpy2 or rnumpy can get quite messy.
For instance, the result of an assignment (or other computation) is sometimes printed even when it should be invisible. This is annoying e.g. when assigning to large data frames. A quick workaround is to end the offending line with a trailing statement that evaluates to something short. For instance:
In [59] R> long <- 1:20
Out[59] R>
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20
In [60] R> long <- 1:100; 0
Out[60] R> [1] 0
(To silence some recurrent warnings in rnumpy, I've edited rnumpy.py to add 'from warnings import warn' and replace 'print "error in process_revents: ignored"' with 'warn("error in process_revents: ignored")'. That way, I only see the warning once per session.)
You have to engage the dev() before you shut it off, which means that you have to print() (like JD guesses above) prior to throwing dev.off().
from rpy2 import robjects
r = robjects.r
r.library("ggplot2")
robjects.r('p = ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar()')
r.ggsave('/stackBar.jpeg')
robjects.r('print(p)')
r['dev.off']()
To make it slightly more easy when you have to draw more complex plots:
from rpy2 import robjects
from rpy2.robjects.packages import importr
import rpy2.robjects.lib.ggplot2 as ggplot2
r = robjects.r
grdevices = importr('grDevices')
p = r('''
library(ggplot2)
p <- ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar()
p <- p + opts(title = "{0}")
# add more R code if necessary e.g. p <- p + layer(..)
p'''.format("stackbar"))
# you can use format to transfer variables into R
# use var.r_repr() in case it involves a robject like a vector or data.frame
p.plot()
# grdevices.dev_off()

Categories