Python - integration with rpy2 and 'must be atomic' error - python

While using package rpy2, I get the error
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...)
: 'x' must be atomic Traceback (most recent call last): File
"", line 1, in File
"/usr/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line
86, in call
return super(SignatureTranslatedFunction, self).call(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line
35, in call
res = super(Function, self).call(*new_args, **new_kwargs) rpy2.rinterface.RRuntimeError: Error in sort.int(x, na.last = na.last,
decreasing = decreasing, ...) : 'x' must be atomic
when executing
file.R_func.rdc([1,2,3,4,5],[1,3,4,5,6],20,1.67)
where file.py is defined as follows:
from rpy2.robjects.packages import SignatureTranslatedAnonymousPackage
string = """
rdc <- function(x,y,k,s) {
x <- cbind(apply(as.matrix(x),2,function(u) ecdf(u)(u)),1)
y <- cbind(apply(as.matrix(y),2,function(u) ecdf(u)(u)),1)
wx <- matrix(rnorm(ncol(x)*k,0,s),ncol(x),k)
wy <- matrix(rnorm(ncol(y)*k,0,s),ncol(y),k)
cancor(cbind(cos(x%*%wx),sin(x%*%wx)), cbind(cos(y%*%wy),sin(y%*%wy)))$cor[1]
}
"""
R_func = SignatureTranslatedAnonymousPackage(string, "R_func")
How do I have to pass x and y to rdc()?

When doing
file.R_func.rdc([1,2,3,4,5],[1,3,4,5,6],20,1.67)
an implicit conversion of Python objects is performed before passing them as parameters to the underlying R function.
By default, [1,2,3,4,5] (which is a Python list) will be converted to an R list and R lists are "non-atomic vectors", meaning that each element in the list can be an arbitrary object by opposition to an "atomic" type such as boolean ("logical" in R lingo), an integer, a string, etc...
Try:
from rpy2.robjects.vectors import IntVector, FloatVector
# FloatVector is imported as an alternative if you need/prefer floats
file.R_func.rdc(IntVector([1,2,3,4,5]),
IntVector([1,3,4,5,6]),
20,
1.67)

Related

Combining R and python through rpy2: How to read in a python list to R

I have found similar questions here and sort of here, but I can't seem to figure out how to do it for my own data.
I have a set of lists of floats in python (in reality, each list is about 1,000 floats long); e.g.
[0.01,0.02,0.03,0.04,0.05]
[0.1,0.2,0.4,0.5,0.6,0.7]
[0.01,0.2,0.05,0.4]
For each list, I want to convert the python list to an R list, perform a FDR test on the R list to get a list of Q values, and then convert the R list of Q values back to a python list and then continue on with my script.
The code I have:
for each_list in SetOfLists:
ro.r("library('devtools')") #load necessary package for task
ro.r("library('qvalue')") #load necessary package for task
pvals = ro.FloatVector(each_list) #explain that each list is a set of floats
print ro.r("qobj <-qvalue(p=" + pvals + ")") #run the r function on each list
#ro.r("qobj$lfdr") #get the FDR values from the R output
#Then convert this list of FDR values back to python
I'm having a problem with this line:
print ro.r("qobj <-qvalue(p=" + pvals + ")")
For example, if I make that line:
print ro.r("qobj <-qvalue(p=" + pvals + ")")
The error is:
> Traceback (most recent call last): File "CalculateFDR.py", line 33,
> in <module>
> print ro.r("qobj <-qvalue(p=" + pvals + ")") TypeError: cannot concatenate 'str' and 'FloatVector' objects
If I change the line slightly to:
print ro.r("qobj <-qvalue(p= pvals)")
The error is:
res = super(Function, self).__call__(*new_args, **new_kwargs)
Traceback (most recent call last):
File "CalculateFDR.py", line 33, in <module>
print ro.r("qobj <-qvalue(p=pvals)")
File "/home/nis/aoife/env/local/lib/python2.7/site-packages/rpy2/robjects/__init__.py", line 321, in __call__
res = self.eval(p)
File "/home/nis/aoife/env/local/lib/python2.7/site-packages/rpy2/robjects/functions.py", line 178, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "/home/nis/aoife/env/local/lib/python2.7/site-packages/rpy2/robjects/functions.py", line 106, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in qvalue(p = pvals) : object 'pvals' not found
I know the problem is I'm not properly converting the python list to an R list, I'm not sure how to do this correctly; hence advice is appreciated.
Just in case this helps anyone else, the answer is:
devtools = importr("devtools")
qvalue = importr("qvalue")
for each_list in SetOfLists:
pvals = ro.FloatVector(v)
rcode = 'qobj <-qvalue(p=%s)' %(pvals.r_repr())
res = ro.r(rcode)
r_output1 = 'qobj$pvalue'
r_output2 = 'qobj$qvalue'
r_pvalue = ro.r(r_output1)
r_qvalue = ro.r(r_output2)
DictOfValues = dict(zip(r_pvalue,r_qvalue))
This is the code to get a python list into R.

How do I fix the 'TypeError: hasattr(): attribute name must be string' error?

I have the following code:
import pymc as pm
from matplotlib import pyplot as plt
from pymc.Matplot import plot as mcplot
import numpy as np
from matplotlib import rc
res = [18.752, 12.450, 11.832]
v = pm.Uniform('v', 0, 20)
errors = pm.Uniform('errors', 0, 100, size = 3)
taus = 1/(errors ** 2)
mydist = pm.Normal('mydist', mu = v, tau = taus, value = res, observed = True)
model=pm.Model([mydist, errors, taus, v, res])
mcmc=pm.MCMC(model) # This is line 19 where the TypeError originates
mcmc.sample(20000,10000)
mcplot(mcmc.trace('mydist'))
For some reason it doesn't work, I get the 'TypeError: hasattr(): attribute name must be string' error, with the following trace:
Traceback (most recent call last):
File "<ipython-input-49-759ebaf4321c>", line 1, in <module>
runfile('C:/Users/Paul/.spyder2-py3/temp.py', wdir='C:/Users/Paul/.spyder2-py3')
File "C:\Users\Paul\Miniconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 685, in runfile
execfile(filename, namespace)
File "C:\Users\Paul\Miniconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "C:/Users/Paul/.spyder2-py3/temp.py", line 19, in <module>
mcmc=pm.MCMC(model)
File "C:\Users\Paul\Miniconda3\lib\site-packages\pymc\MCMC.py", line 82, in __init__
**kwds)
File "C:\Users\Paul\Miniconda3\lib\site-packages\pymc\Model.py", line 197, in __init__
Model.__init__(self, input, name, verbose)
File "C:\Users\Paul\Miniconda3\lib\site-packages\pymc\Model.py", line 99, in __init__
ObjectContainer.__init__(self, input)
File "C:\Users\Paul\Miniconda3\lib\site-packages\pymc\Container.py", line 606, in __init__
conservative_update(self, input_to_file)
File "C:\Users\Paul\Miniconda3\lib\site-packages\pymc\Container.py", line 549, in conservative_update
if not hasattr(obj, k):
TypeError: hasattr(): attribute name must be string
How do I make it work and output "mydist"?
Edit: I posted a wrong trace at first by accident.
Edit2: It all must be because res doesn't have a name, because it's an array, but I don't know how to assign a name to it, so it'll make this work.
I must admit that I'm not familiar with pymc, but changing it to the following at least made the application run:
mydist = pm.Normal('mydist', mu = v, tau = taus, value = res, observed = False)
mcmc=pm.MCMC([mydist, errors, taus, v, res])
This seems to be because you were wrapping everything in a Model which is an extension of ObjectContainer, but since you passed it a list, MCMC file_items in Container.py tried to assign index 4 in a list to something using replace, but since Model is an ObjectContainer it assigned the key 4 in it's __dict__ causing the weird TypeError you got. Removing the wrapping Model caused MCMC to correctly use an ListContainer instead.
Now, there's probably a bug in Model.py on line 543 where observable stochastics aren't stored in the database - the expression is for object in self.stochastics | self.deterministics: but I suspect it should include self.observable_stochastics too - so I needed to change observable to False or the last line would throw a KeyError.
I'm not familiar enough with pymc to determine if it's actually or bug or desired behaviour so I leave it up to you to submit an issue about it.
You simply need to define res as a numpy array:
res = np.array([18.752, 12.450, 11.832])
Then you'll get an error here mcmc.trace('mydist')because mydist is observed data, and therefore is not sampled. You probably want to plot other variables...

Error: cannot perform reduce with flexible type

I'm trying to plot a histogram, but I'm keep getting this error;
Traceback (most recent call last):
File "<pyshell#62>", line 1, in <module>
plt.hist(a)
File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 2827, in hist
stacked=stacked, **kwargs)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 8312, in hist
xmin = min(xmin, xi.min())
File "/usr/lib/python2.7/dist-packages/numpy/core/_methods.py", line 21, in _amin
out=out, keepdims=keepdims)
TypeError: cannot perform reduce with flexible type
I'm very new to python and what I'm trying to do is this;
import numpy, matplotlib.pyplot
line = " "
a = []
b = []
c = []
alpha = []
beta = []
gama = []
while x.readline():
line = x.readline()
a.append(line[16:23])
b.append(line[25:32])
c.append(line[27:34])
alpha.append(line[40:47])
beta.append(line[49:54])
gama.append(line[56:63])
pyplot.hist(a)'
when ever I run this piece of code I'm getting that error. Where did I go wrong? I really appreciate a help
It looks like you are attempting to draw the histogram based on strings, rather than on numbers. Try something like this instead:
from matplotlib import pyplot
import random
# generate a series of numbers
a = [random.randint(1, 10) for _ in xrange(100)]
# generate a series of strings that look like numbers
b = [str(n) for n in a]
# try to create histograms of the data
pyplot.hist(a) # it produces a histogram (approximately flat, as expected)
pyplot.hist(b) # produces the error as you reported.
In general it is better to use a pre-written library to read data from external files (see e.g., numpy's genfromtxt or the csv module).
But at the very least, you likely need to treat the data you have read in as numerical, since readline returns strings. For instance:
for line in f.read():
fields = line.strip().split()
nums = [int(field) for field in fields]
now nums gives you a list of integers from that row.

python error using PaCal statistical package

I recently started exploring Python and have encountred a problem with a package named PaCal
Everything looks to be working fine except that I keep having this error anytime I want to print out some data (like in print A.mean() )
the error line is :
Traceback (most recent call last):
File "C:\Users\rmobenta\Desktop\tt.py", line 12, in <module>
print A.interval(0.95)
File "C:\Python27\lib\site-packages\pacal\distr.py", line 229, in interval
return self.quantile(p_lim), self.quantile(1.0 - p_lim)
File "C:\Python27\lib\site-packages\pacal\distr.py", line 215, in quantile
return self.get_piecewise_cdf().inverse(y)
File "C:\Python27\lib\site-packages\pacal\segments.py", line 1721, in inverse
x = findinv(segi.f, a = segi.a, b = segi.b, c = y, rtol = params.segments.cumint.reltol, maxiter = params.segments.cumint.maxiter) # TODO PInd, MInf
File "C:\Python27\lib\site-packages\pacal\utils.py", line 384, in findinv
return brentq(lambda x : fun(x) - c, a, b, **kwargs)
File "C:\Python27\lib\site-packages\scipy\optimize\zeros.py", line 414, in brentq
raise ValueError("rtol too small (%g < %g)" % (rtol, _rtol))
ValueError: rtol too small (1e-16 < 4.44089e-16)
I am using a two-line script that I got for a demo (given by the author of this package) and have no idea how to tackle this issue.
Here is the script:
from pacal import *
Y = UniformDistr(1, 2)
X = UniformDistr(3, 4)
A = atan(Y / X)
A.plot()
print A.mean()
print A.interval(0.95)
The problem comes from PaCal that defines in l.141 of params.py: segments.vumint.reltol = 1e-16.
This is the value passed as rtol in segments.py to the SciPy function brentq().
Finally it is compared to numpy.finfo(float).eps * 2 (l.413 and l.10 of scipy/optimize/zeros.py) and is unfortunately lesser.
So it could be a problem of PaCal implementation, not related to your code.
Note that the value you provided to interval() corresponds to the default value (l.222 of distr.py).
I think you should contact the PaCal developers to get more informations and probably open an issue.

What object to pass to R from rpy2?

I'm unable to make the following code work, though I don't see this error working strictly in R.
from rpy2.robjects.packages import importr
from rpy2 import robjects
import numpy as np
forecast = importr('forecast')
ts = robjects.r['ts']
y = np.random.randn(50)
X = np.random.randn(50)
y = ts(robjects.FloatVector(y), start=robjects.IntVector((2004, 1)), frequency=12)
X = ts(robjects.FloatVector(X), start=robjects.IntVector((2004, 1)), frequency=12)
forecast.Arima(y, xreg=X, order=robjects.IntVector((1, 0, 0)))
It's especially confusing considering the following code works fine
forecast.auto_arima(y, xreg=X)
I see the following traceback no matter what I give for X, using numpy interface or not. Any ideas?
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
<ipython-input-20-b781220efb93> in <module>()
13 X = ts(robjects.FloatVector(X), start=robjects.IntVector((2004, 1)), frequency=12)
14
---> 15 forecast.Arima(y, xreg=X, order=robjects.IntVector((1, 0, 0)))
/home/skipper/.local/lib/python2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
84 v = kwargs.pop(k)
85 kwargs[r_k] = v
---> 86 return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
/home/skipper/.local/lib/python2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
33 for k, v in kwargs.iteritems():
34 new_kwargs[k] = conversion.py2ri(v)
---> 35 res = super(Function, self).__call__(*new_args, **new_kwargs)
36 res = conversion.ri2py(res)
37 return res
RRuntimeError: Error in `colnames<-`(`*tmp*`, value = if (ncol(xreg) == 1) nmxreg else paste(nmxreg, :
length of 'dimnames' [2] not equal to array extent
Edit:
The problem is that the following lines of code do not evaluate to a column name, which seems to be the expectation on the R side.
sub = robjects.r['substitute']
deparse = robjects.r['deparse']
deparse(sub(X))
I don't know well enough what the expectations of this code should be in R, but I can't find an RPy2 object that passes this check by returning something of length == 1. This really looks like a bug to me.
R> length(deparse(substitute((rep(.2, 1000)))))
[1] 1
But in Rpy2
[~/]
[94]: robjects.r.length(robjects.r.deparse(robjects.r.substitute(robjects.r('rep(.2, 1000)'))))
[94]:
<IntVector - Python:0x7ce1560 / R:0x80adc28>
[ 78]
This is one manifestation (see this other related issue for example) of the same underlying issue: R expressions are evaluated lazily and can be manipulated within R and this leads to idioms that do not translate well (in Python expression are evaluated immediately, and one has to move to the AST to manipulate code).
An answers to the second part of your question. In R, substitute(rep(.2, 1000)) is passing the unevaluated expression rep(.2, 1000) to substitute(). Doing in rpy2
substitute('rep(.2, 1000)')`
is passing a string; the R equivalent would be
substitute("rep(.2, 1000)")
The following is letting you get close to R's deparse(substitute()):
from rpy2.robjects.packages import importr
base = importr('base')
from rpy2 import rinterface
# expression
e = rinterface.parse('rep(.2, 1000)')
dse = base.deparse(base.substitute(e))
>>> len(dse)
1
>>> print(dse) # not identical to R
"expression(rep(0.2, 1000))"
Currently, one way to work about this is to bind R objects to R symbols
(preferably in a dedicated environment rather than in GlobalEnv), and use
the symbols in an R call written as a string:
from rpy2.robjects import Environment, reval
env = Environment()
for k,v in (('y', y), ('xreg', X), ('order', robjects.IntVector((1, 0, 0)))):
env[k] = v
# make an expression
expr = rinterface.parse("forecast.Arima(y, xreg=X, order=order)")
# evaluate in the environment
res = reval(expr, envir=env)
This is not something I am happy about as a solution, but I have never found the time to work on a better solution.
edit: With rpy2-2.4.0 it becomes possible to use R symbols and do the following:
RSymbol = robjects.rinterface.SexpSymbol
pairlist = (('x', RSymbol('y')),
('xreg', RSymbol('xreg')),
('order', RSymbol('order')))
res = forecast.Arima.rcall(pairlist,
env)
This is not yet the most intuitive interface. May be something using a context manager would be better.
there is a way to just simply pass your variables to R without sub-situations and return the results back to python. You can find a simple example here https://stackoverflow.com/a/55900840/5350311 . I guess it is more clear what you are passing to R and what you will get back in return, specially if you are working with For loops and large number of variables.

Categories