I am attempting to use the R "NADA" package using the rpy2 interface in Python. The end goal is to perform survival analysis on left-censored environmental data. Things seem to be interacting correctly between Python and R for other functions, and I am able to perform a test function in R, but I get an error when attempting the same through rpy2.
This is my code in Python. It is entirely fictitious data.
from rpy2.robjects import FloatVector, BoolVector, FactorVector
from rpy2.robjects.packages import importr
nada = importr('NADA')
obs = FloatVector([1.0,2.0,3.0,5.0,56.0,1.0,4.0])
nds = BoolVector([False, True, True, True, True, False, True])
groups = FactorVector([1,0,1,0,1,1,0])
nada.cendiff(obs, nds, groups)
This is the error message I receive:
Traceback (most recent call last):
File "C:/Users/XXXXXXX/rpy2_test.py", line 9, in <module>
nada.cendiff(obs, nds, groups)
File "C:\Program Files\Python35\lib\site-packages\rpy2\robjects\functions.py", line 178, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "C:\Program Files\Python35\lib\site-packages\rpy2\robjects\functions.py", line 106, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in terms.formula(tmp, simplify = TRUE) :
invalid model formula in ExtractVars
This code works fine in the R terminal:
library("NADA")
cendiff(c(1.0,2.0,3.0,5.0,56.0,1.0,4.0), c(FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE), factor(c(1,0,1,0,1,1,0)))
I tried adding some print lines at the rpy2 error lines listed, and suspect there may be an issue with rpy2 removing the levels from the factor vector when sending them to the function. However, I'm in new territory and that may just be a red herring.
If anyone can lend some insight or offer an alternative, I would appreciate it. I have a lot of data processing coded in Python and going all R isn't a good option, but R has more analysis options so I was hoping rpy2 would do the trick.
When in doubt about whether rpy2 and/or one of its conversion rules
are doing something unexpected, it is relatively easy to check it.
For example here:
from rpy2.robjects.vectors import FactorVector
from rpy2.robjects import r, globalenv
# factor with rpy2
groups = FactorVector([1,0,1,0,1,1,0])
# bind it to symbol in R's GlobalEnv
globalenv['groups_rpy2'] = groups
# it is the same as building the factor in R ?
r("""
...: groups <- factor(c(1,0,1,0,1,1,0))
...: print(identical(groups, groups_rpy2))
...: """)
[1]
TRUE
# apparently so
I am suspecting that this is caused by the fact that (unevaluated) expression statements are used in the R library you are using, and rpy2 is passing anonymous R objects. I had a quick glance at that code and I can see:
setMethod("cendiff",
signature(obs="numeric", censored="logical", groups="factor"),
cencen.vectors.groups)
and
cencen.vectors.groups =
function(obs, censored, groups, ...)
{
cl = match.call()
f = substitute(Cen(a, b)~g, list(a=cl[[2]], b=cl[[3]], g=cl[[4]]))
f = as.formula(f)
environment(f) = parent.frame()
callGeneric(f, ...)
}
One way to work around that is to bind your objects to symbols in an R namespace/environment and evaluate the call in that namespace. It could be done with any R environment but if using "GlobalEnv" (in that case remember that the content of GlobalEnv persists until the embedded R is closed):
from rpy2.robjects.packages import importr
base = importr('base')
# bind to R symbols
globalenv["obs"] = obs
globalenv["nds"] = nds
globalenv["groups"] = groups
# make the call
nada.cendiff(base.as_symbol('obs'),
base.as_symbol('nds'),
base.as_symbol('groups'))
(See an other use of as_symbol in Minimal example of rpy2 regression using pandas data frame)
Related
I simply tried the following:
import rpy2
import rpy2.robjects as RObjects
from rpy2.robjects.packages import importr
princurve = importr('princurve', robject_translations = {"plot_principal_curve": "plot.principal.curve"})
princurve = importr('princurve', robject_translations = {"points_principal_curve": "points.principal.curve"})
and got this error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\me\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\packages.py", line 498, in importr
symbol_resolve=symbol_resolve)
File "C:\Users\me\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\packages.py", line 202, in __init__
self.__fill_rpy2r__(on_conflict=on_conflict)
File "C:\Users\me\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\packages.py", line 328, in __fill_rpy2r__
.__fill_rpy2r__(on_conflict=on_conflict))
File "C:\Users\me\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\packages.py", line 238, in __fill_rpy2r__
exception)
File "C:\Users\me\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\packages_utils.py", line 120, in _fix_map_symbols
raise exception(msg)
rpy2.robjects.packages.LibraryError: Conflict when converting R symbols in the package "princurve" to Python symbols:
-lines_principal_curve -> lines.principal.curve, lines.principal_curve
- plot_principal_curve -> plot.principal.curve, plot.principal_curve
- points_principal_curve -> points.principal.curve, points.principal_curve
To turn this exception into a simple warning use the parameter `on_conflict="warn"`
can anyone help?
You were almost there! In robject_translations you need to provide R name -> Python name mapping, but your dictionary seemed to be the other way around. You also need to have all the mappings in a single dictionary. To make it super clear, you can resolve the conflicts like this:
princurve_example_1 = importr(
"princurve",
robject_translations={
"plot.principal.curve": "plot_dot_principal_dot_curve",
"lines.principal.curve": "lines_dot_principal_dot_curve",
"points.principal.curve": "points_dot_principal_dot_curve",
# optional (if omitted, you will get them under plot_principal_curve, etc.):
"plot.principal_curve": "plot_dot_principal_curve",
"lines.principal_curve": "lines_dot_principal_curve",
"points.principal_curve": "points_dot_principal_curve"
}
)
# then, after creating the curve and storing it in curve variable:
princurve_example_1.plot_dot_principal_dot_curve(curve)
# or
princurve_example_1.plot_dot_principal_curve(curve)
However, after consulting the pincurve documentation I see that the principal.curve is deprecated and you should use principal_curve instead (good to see more R packages finally moving to the convention of using underscores in function and variable names when possible!); therefore you can just do:
princurve = importr(
"princurve",
robject_translations={
"plot.principal.curve": "plot_principal_curve_deprecated",
"lines.principal.curve": "lines_principal_curve_deprecated",
"points.principal.curve": "points_principal_curve_deprecated",
}
)
# auto-generated from "plot.principal_curve"
princurve.plot_principal_curve(curve)
# manually mapped from "plot.principal.curve"
princurve.plot_principal_curve_deprecated(curve)
I recently installed python-WikEdDiff package to my system. I understand it is a python extension of the original JavaScript WikEdDiff tool. I tried to use it but I couldn't find any documentation for it. I am stuck at using WikEdDiff.diff(). I wish to use the other functions of this class, such as getFragments() and others, but on checking, it shows the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.4/dist-packages/WikEdDiff/diff.py", line 1123, in detectBlocks
self.getSameBlocks()
File "/usr/local/lib/python3.4/dist-packages/WikEdDiff/diff.py", line 1211, in getSameBlocks
while j is not None and self.oldText.tokens[j].link is None:
IndexError: list index out of range
On checking, I found out that the tokens[] structure in the object remains empty whereas it should have been initialized.
Is there an initialize function that I need to call apart from the default constructor? Or is it something to do with the `WikEdDiffConfig' config structure I passed to the constructor?
You get this error because the WikEdDiff object was cleared internally inside diff(), as shown in this section of the code:
def diff( self, oldString, newString ):
...
# Free memory
self.newText.tokens.clear()
self.oldText.tokens.clear()
# Assemble blocks into fragment table
fragments = self.getDiffFragments()
# Free memory
self.blocks.clear()
self.groups.clear()
self.sections.clear()
...
return fragments
If you just need the fragments, use the returned variable of diff() like this:
import WikEdDiff as WED
config=WED.WikEdDiffConfig()
w = WED.WikEdDiff(config)
f = w.diff("abc", "efg")
# do whatever you want with f, but don't use w
print(' '.join([i.text+i.type for i in f]))
# outputs '{ [ (> abc- ) abc< efg+ ] }'
I'm trying to get up and running using the TTreeReader approach to reading TTrees in PyROOT. As a guide, I am using the ROOT 6 Analysis Workshop (http://root.cern.ch/drupal/content/7-using-ttreereader) and its associated ROOT file (http://root.cern.ch/root/files/tutorials/mockupx.root).
from ROOT import *
fileName = "mockupx.root"
file = TFile(fileName)
tree = file.Get("MyTree")
treeReader = TTreeReader("MyTree", file)
After this, I am a bit lost. I attempt to access variable information using the TTreeReader object and it doesn't quite work:
>>> rvMissingET = TTreeReaderValue(treeReader, "missingET")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/ROOT/v6-03-01/root/lib/ROOT.py", line 198, in __call__
result = _root.MakeRootTemplateClass( *newargs )
SystemError: error return without exception set
Where am I going wrong here?
TTreeReaderValue is a templated class, as shown in the example on the TTreeReader documentation, so you need to specify the template type.
You can do this with
rvMissingET = ROOT.TTreeReaderValue(ROOT.Double)(treeReader, "missingET")
The Python built-ins can be used for int and float types, e.g.
rvInt = ROOT.TTreeReaderValue(int)(treeReader, "intBranch")
rvFloat = ROOT.TTreeReaderValue(float)(treeReader, "floatBranch")
Also note that using TTreeReader in PyROOT is not recommended. (If you're looking for faster ntuple branch access in Python, you might look in to the Ntuple class I wrote.)
I am trying to find vif value before fitting the logistic regression model in python using rpy2. My data is residing in the mysql data base. The problem I am facing how to use that in r function. This is what I am trying to replicate in python
library(car)
vif_result <- vif(lm(target~var1+var2+var3+var4+var5+var6+var7+var8+var9+var10, data=modeldata))
My code
from rpy2 import robjects as ro
import MySQLdb
db = MySQLdb.connect(host="localhost", user="***", passwd="***")
cur = db.cursor()
cur.execute("use python")
q= "select identifier,target,var1,var2,var3,var4,var5,var6,var7,var8,var9,var10 from test"
cur.execute(q)
testdata=cur.fetchall()
ro.r.library('car')
vif=ro.r["vif"]
lm=ro.r["lm"]
mydata= ro.r["data.frame"]
modeldata=mydata(testdata)
Now this is throwing error
Traceback (most recent call last):
File "<pyshell#54>", line 1, in <module>
test= ro.r['as.data.frame'](testdata)
File "C:\Python27\lib\site-packages\rpy2\robjects\functions.py", line 86, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "C:\Python27\lib\site-packages\rpy2\robjects\functions.py", line 31, in __call__
new_args = [conversion.py2ri(a) for a in args]
File "C:\Python27\lib\site-packages\rpy2\robjects\__init__.py", line 134, in default_py2ri
raise(ValueError("Nothing can be done for the type %s at the moment." %(type(o))))
ValueError: Nothing can be done for the type at the moment.
I want to use the data stotred in tuple testdata in the following formulae. But I am not able to convert the data extracted from mysql query into robject or data frame .
formula=ro.Formula(target~var1+var2+var3+var4+var5+var6+var7+var8+var9+var10, data=modeldata)
vif_result=vif(lm(formula))
One approach that I am testing is to convery each field into list and then convert into data frame to be used in R. But that approach will take lot of computational time. There must be a better way to achieve this. I tried finding resource on rpy2 but couldn't find anything good other than rpy2 documentation.
Any help is very much appreciated
Thanks in advance
I am thinking to make a progress bar with python in terminal. First, I have to get the width(columns) of terminal window. In python 2.7, there is no standard library can do this on Windows. I know maybe I have to call Windows Console API manually.
According to MSDN and Python Documentation, I wrote the following code:
import ctypes
import ctypes.wintypes
class CONSOLE_SCREEN_BUFFER_INFO(ctypes.Structure):
_fields_ = [
('dwSize', ctypes.wintypes._COORD),
('dwCursorPosition', ctypes.wintypes._COORD),
('wAttributes', ctypes.c_ushort),
('srWindow', ctypes.wintypes._SMALL_RECT),
('dwMaximumWindowSize', ctypes.wintypes._COORD)
]
hstd = ctypes.windll.kernel32.GetStdHandle(ctypes.c_ulong(-11)) # STD_OUTPUT_HANDLE = -11
print hstd
csbi = CONSOLE_SCREEN_BUFFER_INFO()
print ctypes.sizeof(csbi) # <---------------
ret = ctypes.windll.kernel32.GetConsoleScreenBufferInfo(ctypes.c_ulong(hstd), csbi)
print ret
print csbi.dwSize.X
It works fine. I set about deleting some print in code. But after that, it doesn't work! GetLastError return 6 (Invalid Handle). After times of trying, I found that there must be SOMETHING at the pointed position of the code such as print 'hello', import sys or sys.stdout.flush(). At first, I guess that maybe it need time to do something. So I tried to put time.sleep(2) at that position, but it still doesn't work.
But, if I do use struct instead of ctypes.Structure, there's no such problem.
import ctypes
import struct
hstd = ctypes.windll.kernel32.GetStdHandle(-11) # STD_OUTPUT_HANDLE = -11
csbi = ctypes.create_string_buffer(22)
res = ctypes.windll.kernel32.GetConsoleScreenBufferInfo(hstd, csbi)
width, height, curx, cury, wattr, left, top, right, bottom, maxx, maxy = struct.unpack("hhhhHhhhhhh", csbi.raw)
print bufx
Is there any one can tell me why the irrelevant code made such a difference?
You need to pass the struct by reference:
ret = ctypes.windll.kernel32.GetConsoleScreenBufferInfo(
ctypes.c_ulong(hstd),
ctypes.byref(csbi)
)
I would also recommend that you declare the restype for GetStdHandle. That will mean that your code is ready to run under a 64 bit process. I'd write it like this:
ctypes.windll.kernel32.GetStdHandle.restype = ctypes.wintypes.HANDLE
hstd = ctypes.windll.kernel32.GetStdHandle(-11) # STD_OUTPUT_HANDLE = -11
csbi = CONSOLE_SCREEN_BUFFER_INFO()
ret = ctypes.windll.kernel32.GetConsoleScreenBufferInfo(
hstd,
ctypes.byref(csbi)
)
Actually, in my version of Python, your code reports a much more useful error. I see this:
Traceback (most recent call last):
File "test.py", line 16, in
ret = ctypes.windll.kernel32.GetConsoleScreenBufferInfo(ctypes.c_ulong(hstd), csbi)
ValueError: Procedure probably called with too many arguments (20 bytes in
excess)
This is enough to make it clear that there is an binary mismatch at the interface between the Python code and the native code.
I suspect that if you get a more recent version of Python, you'd also benefit from this stack imbalance checking.