I am running a few linear model fits in python (using R as a backend via RPy) and I would like to export some LaTeX tables with my R "summary" data.
This thread explains quite well how to do it in R (with the xtable function), but I cannot figure out how to implement this in RPy.
The only relevant thing searches such as "Chunk RPy" or "xtable RPy" returned was this, which seems to load the package in python but not to use it :-/
Here's an example of how I use RPy and what happens.
And this would be the error without bothering to load any data:
from rpy2.robjects.packages import importr
xtable = importr('xtable')
latex = xtable('')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-131-8b38f31b5bb9> in <module>()
----> 1 latex = xtable(res_sum)
2 print latex
TypeError: 'SignatureTranslatedPackage' object is not callable
I have tried using the stargazer package instead of xtable and I get the same error.
Ok, I solved it, and I'm a bit ashamed to say that it was a total no-brainer.
You just have to call the functions as xtable.xtable() or stargazer.stargazer().
To easily generate TeX data from Python, I wrote the following function;
import re
def tformat(txt, v):
"""Replace the variables between [] in raw text with the contents
of the named variables. Between the [] there should be a variable name,
a colon and a formatting specification. E.g. [smin:.2f] would give the
value of the smin variable printed as a float with two decimal digits.
:txt: The text to search for replacements
:v: Dictionary to use for variables.
:returns: The txt string with variables substituted by their formatted
values.
"""
rx = re.compile(r'\[(\w+)(\[\d+\])?:([^\]]+)\]')
matches = rx.finditer(txt)
for m in matches:
nm, idx, fmt = m.groups()
try:
if idx:
idx = int(idx[1:-1])
r = format(v[nm][idx], fmt)
else:
r = format(v[nm], fmt)
txt = txt.replace(m.group(0), r)
except KeyError:
raise ValueError('Variable "{}" not found'.format(nm))
return txt
You can use any variable name from the dictionary in the text that you pass to this function and have it replaced by the formatted value of that variable.
What I tend to do is to do my calculations in Python, and then pass the output of the globals() function as the second parameter of tformat:
smin = 235.0
smax = 580.0
lst = [0, 1, 2, 3, 4]
t = r'''The strength of the steel lies between SI{[smin:.0f]}{MPa} and \SI{[smax:.0f]}{MPa}. lst[2] = [lst[2]:d].'''
print tformat(t, globals())
Feel free to use this. I put it in the public domain.
Edit: I'm not sure what you mean by "linear model fits", but might numpy.polyfit do what you want in Python?
To resolve your problem, please update stargazer to version 4.5.3, now available on CRAN. Your example should then work perfectly.
Related
I am trying to reproduce R codes in Python and I have to generate same random numbers in both languages. I know that using the same seed is not enough to get same random numbers and reading one of the answers on this platform regarding this topic I have discovered that there exists: SyncRNG library, which generates same random numbers between R and Python. Everything looks fine as long as I have discovered that on Python 3.7.3 I can generate via SyncRNG just one number because as soon as you iterate the procedure, for instance, with a for loop you get this error:
OverflowError: Python int too large to convert to C long.
As I was mentioning:
>>> from SyncRNG import SyncRNG
>>> s = SyncRNG(seed=123)
>>> r = s.rand()
>>> r
0.016173338983207965
and as we can see it works. The method ".rand()" generates random numbers between zero and one.
But if I try to iterate:
>>> from SyncRNG import SyncRNG
>>> s = SyncRNG(seed=123)
>>> b = []
>>> for i in range(5):
temp = s.rand()
b.append(temp)
and I get this:
OverflowError: Python int too large to convert to C long
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<pyshell#41>", line 2, in <module>
temp = s.rand()
File "C:\Users\Stefano\AppData\Local\Programs\Python\Python37\lib\site-packages\SyncRNG\__init__.py", line 27, in rand
return self.randi() * 2.3283064365387e-10
File "C:\Users\Stefano\AppData\Local\Programs\Python\Python37\lib\site-packages\SyncRNG\__init__.py", line 22, in randi
tmp = syncrng.rand(self.state)
SystemError: <built-in function rand> returned a result with an error set
So, I humbly ask if someone is able to solve this problem. If I lost old answers about this topic I'm sorry, please link them in the answer section.
Thank you!
This is not a solution, but a workaround. The overflow bug is not reproducible on my system. The workaround:
from SyncRNG import SyncRNG
s = SyncRNG(seed=123)
ct = 0
b = []
while ct < 5:
try:
temp = s.rand()
b.append(temp)
ct += 1
except OverflowError:
pass
Performance will be compromised due to the try/except on each loop.
I would like Z3 to check whether it exists an integer t that satisfies my formula. I'm getting the following error:
Traceback (most recent call last):
File "D:/z3-4.6.0-x64-win/bin/python/Expl20180725.py", line 18, in <module>
g = ForAll(t, f1(t) == And(t>=0, t<10, user[t].rights == ["read"] ))
TypeError: list indices must be integers or slices, not ArithRef
Code:
from z3 import *
import random
from random import randrange
class Struct:
def __init__(self, **entries): self.__dict__.update(entries)
user = [Struct() for i in range(10)]
for i in range(10):
user[i].uid = i
user[i].rights = random.choice(["create","execute","read"])
s=Solver()
f1 = Function('f1', IntSort(), BoolSort())
t = Int('t')
f2 = Exists(t, f1(t))
g = ForAll(t, f1(t) == And(t>=0, t<10, user[t].rights == ["read"] ))
s.add(g)
s.add(f2)
print(s.check())
print(s.model())
You are mixing and matching Python and Z3 expressions, and while that is the whole point of Z3py, it definitely does not mean that you can mix/match them arbitrarily. In general, you should keep all the "concrete" parts in Python, and relegate the symbolic parts to "z3"; carefully coordinating the interaction in between. In your particular case, you are accessing a Python list (your user) with a symbolic z3 integer (t), and that is certainly not something that is allowed. You have to use a Z3 symbolic Array to access with a symbolic index.
The other issue is the use of strings ("create"/"read" etc.) and expecting them to have meanings in the symbolic world. That is also not how z3py is intended to be used. If you want them to mean something in the symbolic world, you'll have to model them explicitly.
I'd strongly recommend reading through http://ericpony.github.io/z3py-tutorial/guide-examples.htm which is a great introduction to z3py including many of the advanced features.
Having said all that, I'd be inclined to code your example as follows:
from z3 import *
import random
Right, (create, execute, read) = EnumSort('Right', ('create', 'execute', 'read'))
users = Array('Users', IntSort(), Right)
for i in range(10):
users = Store(users, i, random.choice([create, execute, read]))
s = Solver()
t = Int('t')
s.add(t >= 0)
s.add(t < 10)
s.add(users[t] == read)
r = s.check()
if r == sat:
print s.model()[t]
else:
print r
Note how the enumerated type Right in the symbolic land is used to model your "permissions."
When I run this program multiple times, I get:
$ python a.py
5
$ python a.py
9
$ python a.py
unsat
$ python a.py
6
Note how unsat is produced, if it happens that the "random" initialization didn't put any users with a read permission.
I am doing an ordinal logistic regression, and following the guide here for the analysis: R Data Analysis Examples: Ordinal Logistic Regression
My dataframe (consult) looks like:
n raingarden es_score consult_case
garden_id
27436 7 0 3 0
27437 1 0 0 1
27439 1 1 1 1
37253 1 0 3 0
37256 3 0 0 0
I am at the part where I need to to create graph to test the proportional odds assumption, with the command in R as follows:
(s <- with(dat, summary(es_score ~ n + raingarden + consult_case, fun=sf)))
(es_score is an ordinal ranked score with values between 0 - 4; n is an integer; raingarden and consult_case, binary values of 0 or 1)
I have the sf function:
sf <- function(y) {
c('Y>=1' = qlogis(mean(y >= 1)),
'Y>=2' = qlogis(mean(y >= 2)),
'Y>=3' = qlogis(mean(y >= 3)))
}
in a utils.r file that I access as follows:
from rpy2.robjects.packages import STAP
with open('/file_path/utils.r', 'r') as f:
string = f.read()
sf = STAP(string, "sf")
And want to do something along the lines of:
R = ro.r
R.with(work_case_control, R.summary(formula, fun=sf))
The major problem is that the R withoperator is seen as a python keyword, so that even if I access it with ro.r.with it is still recognized as a python keyword. (As a side note: I tried using R's apply method instead, but got an error that TypeError: 'SignatureTranslatedAnonymousPackage' object is not callable ... I assume this is referring to my function sf?)
I also tried using the R assignment methods in rpy2 as follows:
R('sf = function(y) { c(\'Y>=1\' = qlogis(mean(y >= 1)), \'Y>=2\' = qlogis(mean(y >= 2)), \'Y>=3\' = qlogis(mean(y >= 3)))}')
R('s <- with({0}, summary(es_score~raingarden + consult_case, fun=sf)'.format(consult))
but ran into issues where the dataframe column names were somehow causing the error: RRuntimeError: Error in (function (file = "", n = NULL, text = NULL, prompt = "?", keep.source = getOption("keep.source"), :
<text>:1:19: unexpected symbol
1: s <- with( n raingarden
I could of course do this all in R, but I have a very involved ETL script in python, and would thus prefer to keep everything in python using rpy2 (I did try this using mord for scipy-learn to run my regreession, but it is pretty primitive).
Any suggestions would be most welcome right now.
EDIT
I tried various combinations #Parfait's suggestions, and qualifying the fun argument is syntactically incorrect, as per PyCharm interpreter (see image with red highlighting at end): ... it doesn't matter what the qualifier is, either, I always get an error
that SyntaxError: keyword can't be an expression.
On the other hand, with no qualifier, there is no syntax error: , but I do get the error TypeError: 'SignatureTranslatedAnonymousPackage' object is not callable when using the function sf as obtained:
from rpy2.robjects.packages import STAP
with open('/Users/gregsilverman/development/python/rest_api/rest_api/scripts/utils.r', 'r') as f:
string = f.read()
sf = STAP(string, "sf")
With that in mind, I created a package in R with the function sf, imported it, and tried various combos with the only one producing no error, being: print(base._with(consult_case_control, R.summary(formula, fun=gms.sf))) (gms is a reference to the package in R I made).
The output though makes no sense:
Length Class Mode
3 formula call
I am expecting a table ala the one on the UCLA site. Interesting. I am going to try recreating my analysis in R, just for the heck of it. I still would like to complete it in python though.
Consider bracketing the with call and be sure to qualify all arguments including fun:
ro.r['with'](work_case_control, ro.r.summary(formula, ro.r.summary.fun=sf))
Alternatively, import R's base package. And to avoid conflict with Python's named method with() translate the R name:
from rpy2.robjects.packages import importr
base = importr('base', robject_translations={'with': '_with'})
base._with(work_case_control, ro.r.summary(formula, ro.r.summary.fun=sf))
And be sure to properly create your formula. Consider using R's stats packages' as.formula to build from string. Notice too another translation is made due to naming conflict:
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})
formula = stats.as_formula('es_score ~ n + raingarden + consult_case')
I am trying to extract some dividend data and am having some success. What I want to do not is have the required string be generated automatically off of a list of Ticker Symbols (IBM, MSFT, GE . . .etc).
The below python code works:
import Quandl
divdf=Quandl.get("SEC/DIV_IBM", authtoken="W3P77LRwVFzvFfL9siB4")
divdf.head()
Out[1]:
Date Dividend
1962-02-06 0.00100
1962-05-08 0.00100
1962-08-07 0.00100
1962-11-05 0.00100
1963-02-05 0.00133
But when I try to create the string inside get() it does not. Here is the code I have tried:
partA=('"SEC/DIV_')
symbol=('IBM"')
authtoken='"W3P77LRwVFzvFfL9siB4"'
totalGrab = partA + symbol + ", authtoken=" + authtoken
when I print totalGrab it looks as if it would work because it appears to be identical to the original string . but unfortunately it does not work.
In [19]:
print(totalGrab)
"SEC/DIV_IBM", authtoken="W3P77LRwVFzvFfL9siB4"
try1=Quandl.get(totalGrab)
gives me this error:
ErrorDownloading: Error Downloading! HTTP Error 400: BAD_REQUEST
I tried this as well, with no luck:
divdf=Quandl.get(partA + symbol, authtoken="W3P77LRwVFzvFfL9siB4")
any thoughts on a fix?
thanks much for any attention to this.
John
In the code that works:
divdf=Quandl.get("SEC/DIV_IBM", authtoken="W3P77LRwVFzvFfL9siB4")
the get() method takes two arguments: "SEC/DIV_IBM" and the authToken.
In the example that does not work, you are passing in a single argument, which is a String.
Here's an example to demonstrate what you are doing:
a = 2
b = 3
sum(a,b) ## this works
c = "{0}, {1}".format(a,b) ## generates the string "2,3"
sum(c) ## this won't work.
I'm unable to make the following code work, though I don't see this error working strictly in R.
from rpy2.robjects.packages import importr
from rpy2 import robjects
import numpy as np
forecast = importr('forecast')
ts = robjects.r['ts']
y = np.random.randn(50)
X = np.random.randn(50)
y = ts(robjects.FloatVector(y), start=robjects.IntVector((2004, 1)), frequency=12)
X = ts(robjects.FloatVector(X), start=robjects.IntVector((2004, 1)), frequency=12)
forecast.Arima(y, xreg=X, order=robjects.IntVector((1, 0, 0)))
It's especially confusing considering the following code works fine
forecast.auto_arima(y, xreg=X)
I see the following traceback no matter what I give for X, using numpy interface or not. Any ideas?
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
<ipython-input-20-b781220efb93> in <module>()
13 X = ts(robjects.FloatVector(X), start=robjects.IntVector((2004, 1)), frequency=12)
14
---> 15 forecast.Arima(y, xreg=X, order=robjects.IntVector((1, 0, 0)))
/home/skipper/.local/lib/python2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
84 v = kwargs.pop(k)
85 kwargs[r_k] = v
---> 86 return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
/home/skipper/.local/lib/python2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
33 for k, v in kwargs.iteritems():
34 new_kwargs[k] = conversion.py2ri(v)
---> 35 res = super(Function, self).__call__(*new_args, **new_kwargs)
36 res = conversion.ri2py(res)
37 return res
RRuntimeError: Error in `colnames<-`(`*tmp*`, value = if (ncol(xreg) == 1) nmxreg else paste(nmxreg, :
length of 'dimnames' [2] not equal to array extent
Edit:
The problem is that the following lines of code do not evaluate to a column name, which seems to be the expectation on the R side.
sub = robjects.r['substitute']
deparse = robjects.r['deparse']
deparse(sub(X))
I don't know well enough what the expectations of this code should be in R, but I can't find an RPy2 object that passes this check by returning something of length == 1. This really looks like a bug to me.
R> length(deparse(substitute((rep(.2, 1000)))))
[1] 1
But in Rpy2
[~/]
[94]: robjects.r.length(robjects.r.deparse(robjects.r.substitute(robjects.r('rep(.2, 1000)'))))
[94]:
<IntVector - Python:0x7ce1560 / R:0x80adc28>
[ 78]
This is one manifestation (see this other related issue for example) of the same underlying issue: R expressions are evaluated lazily and can be manipulated within R and this leads to idioms that do not translate well (in Python expression are evaluated immediately, and one has to move to the AST to manipulate code).
An answers to the second part of your question. In R, substitute(rep(.2, 1000)) is passing the unevaluated expression rep(.2, 1000) to substitute(). Doing in rpy2
substitute('rep(.2, 1000)')`
is passing a string; the R equivalent would be
substitute("rep(.2, 1000)")
The following is letting you get close to R's deparse(substitute()):
from rpy2.robjects.packages import importr
base = importr('base')
from rpy2 import rinterface
# expression
e = rinterface.parse('rep(.2, 1000)')
dse = base.deparse(base.substitute(e))
>>> len(dse)
1
>>> print(dse) # not identical to R
"expression(rep(0.2, 1000))"
Currently, one way to work about this is to bind R objects to R symbols
(preferably in a dedicated environment rather than in GlobalEnv), and use
the symbols in an R call written as a string:
from rpy2.robjects import Environment, reval
env = Environment()
for k,v in (('y', y), ('xreg', X), ('order', robjects.IntVector((1, 0, 0)))):
env[k] = v
# make an expression
expr = rinterface.parse("forecast.Arima(y, xreg=X, order=order)")
# evaluate in the environment
res = reval(expr, envir=env)
This is not something I am happy about as a solution, but I have never found the time to work on a better solution.
edit: With rpy2-2.4.0 it becomes possible to use R symbols and do the following:
RSymbol = robjects.rinterface.SexpSymbol
pairlist = (('x', RSymbol('y')),
('xreg', RSymbol('xreg')),
('order', RSymbol('order')))
res = forecast.Arima.rcall(pairlist,
env)
This is not yet the most intuitive interface. May be something using a context manager would be better.
there is a way to just simply pass your variables to R without sub-situations and return the results back to python. You can find a simple example here https://stackoverflow.com/a/55900840/5350311 . I guess it is more clear what you are passing to R and what you will get back in return, specially if you are working with For loops and large number of variables.