rpy2 Conflict when converting R symbols in the package to Python symbols - python

I simply tried the following:
import rpy2
import rpy2.robjects as RObjects
from rpy2.robjects.packages import importr
princurve = importr('princurve', robject_translations = {"plot_principal_curve": "plot.principal.curve"})
princurve = importr('princurve', robject_translations = {"points_principal_curve": "points.principal.curve"})
and got this error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\me\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\packages.py", line 498, in importr
symbol_resolve=symbol_resolve)
File "C:\Users\me\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\packages.py", line 202, in __init__
self.__fill_rpy2r__(on_conflict=on_conflict)
File "C:\Users\me\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\packages.py", line 328, in __fill_rpy2r__
.__fill_rpy2r__(on_conflict=on_conflict))
File "C:\Users\me\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\packages.py", line 238, in __fill_rpy2r__
exception)
File "C:\Users\me\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\packages_utils.py", line 120, in _fix_map_symbols
raise exception(msg)
rpy2.robjects.packages.LibraryError: Conflict when converting R symbols in the package "princurve" to Python symbols:
-lines_principal_curve -> lines.principal.curve, lines.principal_curve
- plot_principal_curve -> plot.principal.curve, plot.principal_curve
- points_principal_curve -> points.principal.curve, points.principal_curve
To turn this exception into a simple warning use the parameter `on_conflict="warn"`
can anyone help?

You were almost there! In robject_translations you need to provide R name -> Python name mapping, but your dictionary seemed to be the other way around. You also need to have all the mappings in a single dictionary. To make it super clear, you can resolve the conflicts like this:
princurve_example_1 = importr(
"princurve",
robject_translations={
"plot.principal.curve": "plot_dot_principal_dot_curve",
"lines.principal.curve": "lines_dot_principal_dot_curve",
"points.principal.curve": "points_dot_principal_dot_curve",
# optional (if omitted, you will get them under plot_principal_curve, etc.):
"plot.principal_curve": "plot_dot_principal_curve",
"lines.principal_curve": "lines_dot_principal_curve",
"points.principal_curve": "points_dot_principal_curve"
}
)
# then, after creating the curve and storing it in curve variable:
princurve_example_1.plot_dot_principal_dot_curve(curve)
# or
princurve_example_1.plot_dot_principal_curve(curve)
However, after consulting the pincurve documentation I see that the principal.curve is deprecated and you should use principal_curve instead (good to see more R packages finally moving to the convention of using underscores in function and variable names when possible!); therefore you can just do:
princurve = importr(
"princurve",
robject_translations={
"plot.principal.curve": "plot_principal_curve_deprecated",
"lines.principal.curve": "lines_principal_curve_deprecated",
"points.principal.curve": "points_principal_curve_deprecated",
}
)
# auto-generated from "plot.principal_curve"
princurve.plot_principal_curve(curve)
# manually mapped from "plot.principal.curve"
princurve.plot_principal_curve_deprecated(curve)

Related

Python - Additional "members" appended to JSON object when passing it to function

I have the following JSON object located in its own file called build.json:
{
"name": "utils",
"version": "1.0.0",
"includes": [],
"libraries": [],
"testLibraries": []
}
I obtain this object in my Python program using the following method:
def getPackage(packageName):
jsonFilePath = os.path.join(SRCDIR, packageName, "build.json")
packageJson = None
try:
with open(jsonFilePath, "r") as jsonFile:
packageJson = json.load(jsonFile)
except:
return None
return packageJson
I verify that the JSON object for the current package (which is one of many packages I am iterating over) did not come back None in the following method. Note that I am temporarily printing out the keys of the dictionary:
def compileAllPackages():
global COMPILED_PACKAGES
for packageName in os.listdir(SRCDIR):
package = getPackage(packageName)
if package == None:
continue
# TEMP ==============
for i in package:
print(i)
# ===================
compiledSuccessfully = compilePackage(package)
if not compiledSuccessfully:
return False
return True
Lastly, I am currently also printing out the keys of the dictionary once it is received in the compilePackage function:
def compilePackage(package):
global COMPILED_PACKAGES, INCLUDE_TESTS
# TEMP ==============
for i in package:
print(i)
# ===================
...
Output from compileAllPackages function:
name
version
includes
libraries
testLibraries
Output from compilePackage function:
name
version
includes
libraries
testLibraries
u
t
i
l
s
I can not for the life of me figure out what is happening to my dictionary during that function call??? Please note that the build.json file is located within a directory named "utils".
Edit:
The Python script is located separate from the build.json file and works on absolute paths. It should also be noted that after getting that strange output, I also get the following exception when trying to access a valid key later (it seems to think the dictionary is a string?...):
Traceback (most recent call last):
File "/Users/nate/bin/BuildTool/unix/build.py", line 493, in <module>
main()
File "/Users/nate/bin/BuildTool/unix/build.py", line 481, in main
compiledSuccessfully = compileAllPackages()
File "/Users/nate/bin/BuildTool/unix/build.py", line 263, in compileAllPackages
compiledSuccessfully = compilePackage(package)
File "/Users/nate/bin/BuildTool/unix/build.py", line 287, in compilePackage
compiledSuccessfully = compilePackage(include)
File "/Users/nate/bin/BuildTool/unix/build.py", line 279, in compilePackage
includes = getPackageIncludes(package)
File "/Users/nate/bin/BuildTool/unix/build.py", line 194, in getPackageIncludes
includes = [package["name"]] # A package always includes itself
TypeError: string indices must be integers
Edit: If I change the parameter name to something other than 'package', I no longer get that weird output or an exception later on. This is not necessarily a fix, however, as I do not know what could be wrong with the name 'package'. There are no globals named as such either.
The answer ended up being very stupid. compilePackage() has the possibility of being called recursively, due to any dependencies the package may rely on. In recursive calls to the function, I was passing a string to the function rather than a dictionary.
I tried your code and the result is like this
Output from compileAllPackages function:
name
version
includes
libraries
testLibraries
Output from compilePackage function:
name
version
includes
libraries
testLibraries
My directory structure is like this
├── test.py
└── tt
└── cc
└── utils
└── build.json
I think your code is correct, it should be that the path parameter you passed is incorrect.

Python jsonpickle error: 'OrderedDict' object has no attribute '_OrderedDict__root'

I'm hitting this exception with jsonpickle, when trying to pickle a rather complex object that unfortunately I'm not sure how to describe here. I know that makes it tough to say much, but for what it's worth:
>>> frozen = jsonpickle.encode(my_complex_object_instance)
>>> thawed = jsonpickle.decode(frozen)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/jsonpickle/__init__.py",
line 152, in decode
return unpickler.decode(string, backend=backend, keys=keys)
:
:
File "/Library/Python/2.7/site-packages/jsonpickle/unpickler.py",
line 336, in _restore_from_dict
instance[k] = value
File "/Library/Python/2.7/site-packages/botocore/vendored/requests/packages/urllib3/packages/ordered_dict.py",
line 49, in __setitem__
root = self.__root
AttributeError: 'OrderedDict' object has no attribute '_OrderedDict__root'
I don't find much of assistance when googling the error. I do see what looks like the same issue was resolved at some time past for simpler objects:
https://github.com/jsonpickle/jsonpickle/issues/33
The cited example in that report works for me:
>>> jsonpickle.decode(jsonpickle.encode(collections.OrderedDict()))
OrderedDict()
>>> jsonpickle.decode(jsonpickle.encode(collections.OrderedDict(a=1)))
OrderedDict([(u'a', 1)])
Has anyone ever run into this themselves and found a solution? I ask with the understanding that my case may be "differently idiosynchratic" than another known example.
The requests module for me seems to be running into problems when I .decode(). After looking at the jsonpickle code a bit, I decided to fork it and change the following lines to see what was going on (and I ended up keeping a private copy of jsonpickle with the changes so I can move forward).
In jsonpickle/unpickler.py (in my version it's line 368), search for the if statement section in the method _restore_from_dict():
if (util.is_noncomplex(instance) or
util.is_dictionary_subclass(instance)):
instance[k] = value
else:
setattr(instance, k, value)
and change it to this (it will logERROR the ones that are failing and then you can either keep the code in place or change your OrderedDict's version that have __root)
if (util.is_noncomplex(instance) or
util.is_dictionary_subclass(instance)):
# Currently requests.adapters.HTTPAdapter is using a non-standard
# version of OrderedDict which doesn't have a _OrderedDict__root
# attribute
try:
instance[k] = value
except AttributeError as e:
import logging
import pprint
warnmsg = 'Unable to unpickle {}[{}]={}'.format(pprint.pformat(instance), pprint.pformat(k), pprint.pformat(value))
logging.error(warnmsg)
else:
setattr(instance, k, value)

rpy2: Invalid Model formula in Extract Vars (tmp, simplify = TRUE)

I am attempting to use the R "NADA" package using the rpy2 interface in Python. The end goal is to perform survival analysis on left-censored environmental data. Things seem to be interacting correctly between Python and R for other functions, and I am able to perform a test function in R, but I get an error when attempting the same through rpy2.
This is my code in Python. It is entirely fictitious data.
from rpy2.robjects import FloatVector, BoolVector, FactorVector
from rpy2.robjects.packages import importr
nada = importr('NADA')
obs = FloatVector([1.0,2.0,3.0,5.0,56.0,1.0,4.0])
nds = BoolVector([False, True, True, True, True, False, True])
groups = FactorVector([1,0,1,0,1,1,0])
nada.cendiff(obs, nds, groups)
This is the error message I receive:
Traceback (most recent call last):
File "C:/Users/XXXXXXX/rpy2_test.py", line 9, in <module>
nada.cendiff(obs, nds, groups)
File "C:\Program Files\Python35\lib\site-packages\rpy2\robjects\functions.py", line 178, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "C:\Program Files\Python35\lib\site-packages\rpy2\robjects\functions.py", line 106, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in terms.formula(tmp, simplify = TRUE) :
invalid model formula in ExtractVars
This code works fine in the R terminal:
library("NADA")
cendiff(c(1.0,2.0,3.0,5.0,56.0,1.0,4.0), c(FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE), factor(c(1,0,1,0,1,1,0)))
I tried adding some print lines at the rpy2 error lines listed, and suspect there may be an issue with rpy2 removing the levels from the factor vector when sending them to the function. However, I'm in new territory and that may just be a red herring.
If anyone can lend some insight or offer an alternative, I would appreciate it. I have a lot of data processing coded in Python and going all R isn't a good option, but R has more analysis options so I was hoping rpy2 would do the trick.
When in doubt about whether rpy2 and/or one of its conversion rules
are doing something unexpected, it is relatively easy to check it.
For example here:
from rpy2.robjects.vectors import FactorVector
from rpy2.robjects import r, globalenv
# factor with rpy2
groups = FactorVector([1,0,1,0,1,1,0])
# bind it to symbol in R's GlobalEnv
globalenv['groups_rpy2'] = groups
# it is the same as building the factor in R ?
r("""
...: groups <- factor(c(1,0,1,0,1,1,0))
...: print(identical(groups, groups_rpy2))
...: """)
[1]
TRUE
# apparently so
I am suspecting that this is caused by the fact that (unevaluated) expression statements are used in the R library you are using, and rpy2 is passing anonymous R objects. I had a quick glance at that code and I can see:
setMethod("cendiff",
signature(obs="numeric", censored="logical", groups="factor"),
cencen.vectors.groups)
and
cencen.vectors.groups =
function(obs, censored, groups, ...)
{
cl = match.call()
f = substitute(Cen(a, b)~g, list(a=cl[[2]], b=cl[[3]], g=cl[[4]]))
f = as.formula(f)
environment(f) = parent.frame()
callGeneric(f, ...)
}
One way to work around that is to bind your objects to symbols in an R namespace/environment and evaluate the call in that namespace. It could be done with any R environment but if using "GlobalEnv" (in that case remember that the content of GlobalEnv persists until the embedded R is closed):
from rpy2.robjects.packages import importr
base = importr('base')
# bind to R symbols
globalenv["obs"] = obs
globalenv["nds"] = nds
globalenv["groups"] = groups
# make the call
nada.cendiff(base.as_symbol('obs'),
base.as_symbol('nds'),
base.as_symbol('groups'))
(See an other use of as_symbol in Minimal example of rpy2 regression using pandas data frame)

How do I use python-WikEdDiff?

I recently installed python-WikEdDiff package to my system. I understand it is a python extension of the original JavaScript WikEdDiff tool. I tried to use it but I couldn't find any documentation for it. I am stuck at using WikEdDiff.diff(). I wish to use the other functions of this class, such as getFragments() and others, but on checking, it shows the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.4/dist-packages/WikEdDiff/diff.py", line 1123, in detectBlocks
self.getSameBlocks()
File "/usr/local/lib/python3.4/dist-packages/WikEdDiff/diff.py", line 1211, in getSameBlocks
while j is not None and self.oldText.tokens[j].link is None:
IndexError: list index out of range
On checking, I found out that the tokens[] structure in the object remains empty whereas it should have been initialized.
Is there an initialize function that I need to call apart from the default constructor? Or is it something to do with the `WikEdDiffConfig' config structure I passed to the constructor?
You get this error because the WikEdDiff object was cleared internally inside diff(), as shown in this section of the code:
def diff( self, oldString, newString ):
...
# Free memory
self.newText.tokens.clear()
self.oldText.tokens.clear()
# Assemble blocks into fragment table
fragments = self.getDiffFragments()
# Free memory
self.blocks.clear()
self.groups.clear()
self.sections.clear()
...
return fragments
If you just need the fragments, use the returned variable of diff() like this:
import WikEdDiff as WED
config=WED.WikEdDiffConfig()
w = WED.WikEdDiff(config)
f = w.diff("abc", "efg")
# do whatever you want with f, but don't use w
print(' '.join([i.text+i.type for i in f]))
# outputs '{ [ (> abc- ) abc< efg+ ] }'

how to use TTreeReader in PyROOT

I'm trying to get up and running using the TTreeReader approach to reading TTrees in PyROOT. As a guide, I am using the ROOT 6 Analysis Workshop (http://root.cern.ch/drupal/content/7-using-ttreereader) and its associated ROOT file (http://root.cern.ch/root/files/tutorials/mockupx.root).
from ROOT import *
fileName = "mockupx.root"
file = TFile(fileName)
tree = file.Get("MyTree")
treeReader = TTreeReader("MyTree", file)
After this, I am a bit lost. I attempt to access variable information using the TTreeReader object and it doesn't quite work:
>>> rvMissingET = TTreeReaderValue(treeReader, "missingET")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/ROOT/v6-03-01/root/lib/ROOT.py", line 198, in __call__
result = _root.MakeRootTemplateClass( *newargs )
SystemError: error return without exception set
Where am I going wrong here?
TTreeReaderValue is a templated class, as shown in the example on the TTreeReader documentation, so you need to specify the template type.
You can do this with
rvMissingET = ROOT.TTreeReaderValue(ROOT.Double)(treeReader, "missingET")
The Python built-ins can be used for int and float types, e.g.
rvInt = ROOT.TTreeReaderValue(int)(treeReader, "intBranch")
rvFloat = ROOT.TTreeReaderValue(float)(treeReader, "floatBranch")
Also note that using TTreeReader in PyROOT is not recommended. (If you're looking for faster ntuple branch access in Python, you might look in to the Ntuple class I wrote.)

Categories