numpy2ri conversion problem with rpy2 2.2.2 - python

I am using rpy2-2.2.2 with the new free Enthought python distribution that includes numpy 1.6.0 and python 2.7.2. I easy_installed rpy2 which resulted in v. 2.2.2 being installed and all tests were successful.
The problem I'm having is with code I wrote that worked fine with rpy2 2.1.8 and python 2.6. The issue is in converting from numpy to R for arrays.
Here is a snippet of the relevant code:
import rpy2
import rpy2.rinterface as rinterface
import rpy2.robjects as rob
import rpy2.rlike.container as rlc
import numpy as np
import rpy2.robjects.numpy2ri
r = rob.r
...
HGr = rob.conversion.py2ri(HG_reg)
RHSr = rob.conversion.py2ri(RHS)
#
CalData = rlc.TaggedList([HGr,RHSr],tags=('hg','rhs'))
CalData = rob.DataFrame(CalData)
r('''library(pls)''')
#rob.globalEnv["HGr"] = HGr
#rob.globalEnv["RHSr"] = RHSr
rob.globalenv["CalData"] = CalData
# perform the PLS regression
if wetlflag:
HGresults = r.plsr(r("hg ~ rhs.1 + rhs.2 + rhs.3 + rhs.4"),data=CalData,validation="LOO")
I will gladly admit it's not the most elegant way to do things, but it worked before and now when I need to provide results all is broken (!). The error I get is the following:
Traceback (most recent call last):
File "Mercury_PLS_WL_DF.py", line 224, in <module>
HGr = rob.conversion.py2ri(HG_reg)
File "/Library/Frameworks/Python.framework/Versions/7.1/lib/python2.7/site-packages/rpy2-2.2.2dev_20110726-py2.7-macosx-10.5-i386.egg/rpy2/robjects/__init__.py", line 134, in default_py2ri
raise(ValueError("Nothing can be done for the type %s at the moment." %(type(o))))
ValueError: Nothing can be done for the type <type 'numpy.ndarray'> at the moment.
I found the discussion here and got the impression that numpy arrays are now automatically converted to R arrays, but commenting out the rob.conversion.py2ri(HG_reg) statements and using the numpy arrays directly also seems to fail. Am I missing something obvious? Why would this break between 2.1.8 and 2.2.2?

From http://rpy.sourceforge.net/rpy2/doc-2.2/html/numpy.html#from-numpy-to-rpy2:
Warning
In earlier versions of rpy2, the import was all that was needed to have the conversion. A side-effect when importing a module can lead to problems, and there is now an extra step to make the conversion active: call the function rpy2.robjects.activate().
So put rpy2.robjects.activate() after the import and you should be fine.

Related

Calling R writeRaster from Python with rpy2

I am using an R package within my Python code to import, process and save a GeoTIFF raster. The importing and processing works just fine, but I struggle to save the raster file again. This is a simplified version of the code I'm running:
import rpy2.rinterface as ri
import rpy2.robjects as rob
import rpy2.robjects.packages as rpackages
raster = rpackages.importr('raster')
r_raster = raster.raster(geotiff_input_path)
# r_raster = process_raster(r_raster)
raster.writeRaster(r_raster, geotiff_output_path, overwrite=True)
However, the code fails with AttributeError: module 'raster' has no attribute 'writeRaster'.
I seem to misunderstand how to call writeRaster properly.

cython float64 error although float32 specifically set

I am trying to implement user's #rkp solution to their own question of how to speed up sparse matrix multiplications with cython by using the pycuda library (please note this is their second solution in their post).
After installing pycuda, pymetis etc and running their exact same code (in IDLE Python 3.5.2) I am getting:
TypeError: 'numpy.float64' object cannot be interpreted as an integer
It turns out the (reproducible) part that produces this error is:
import numpy as np
import pycuda.autoinit
import pycuda.driver as drv
import pycuda.gpuarray as gpuarray
from pycuda.sparse.packeted import PacketedSpMV
from pycuda.tools import DeviceMemoryPool
from scipy.sparse import csr_matrix
COUNT = 100
N = 5000
P = 0.1
DTYPE = np.int32
#construct objects
np.random.seed(0)
a_dense = np.random.rand(N, N).astype(DTYPE)
a_dense[np.random.rand(N, N) >= P] = 0
a_sparse = csr_matrix(a_dense)
#PacketedSpMV produces the error
spmv = PacketedSpMV(a_sparse, is_symmetric=False, dtype=DTYPE)
And the full error:
Traceback (most recent call last):
File "C:/Users/svobodov/Desktop/data/tests/cython/t.py", line 23, in <module>
spmv = PacketedSpMV(a_sparse, is_symmetric=False, dtype=DTYPE)
File "C:\Python35\lib\site-packages\pycuda\sparse\packeted.py", line 185, in __init__
local_row_costs)
File "pkt_build_cython.pyx", line 22, in pycuda.sparse.pkt_build_cython.build_pkt_data_structure
TypeError: 'numpy.float64' object cannot be interpreted as an integer
I initially thought this to be the cython-related double-precision error but this is obviously something different as it is expecting specifically an integer rather than float32..
I tried tweaking the pkt_build_cython.pyx but without any success or confidence that I did it properly.
Any ideas on how to resolve this please?
As identified in comments, this was a result of a missing integer cast within an internal routine in the PyCUDA codebase.
The bug was actually fixed in 2018, so if you use any PyCUDA 2019 release, you should have the corrected code and this issue should not occur.

Running glmnet with rpy2 on sparse design matrix?

I have a python snippet which works just fine to run GLMNET on np.array X and y. However, when X is a column sparse matrix from scipy, the code fails as rpy2 is not able to convert X. Am I making an obvious mistake?
A MCVE is:
import numpy as np
from scipy import sparse
from rpy2 import robjects
import rpy2.robjects.packages as rpackages
from rpy2.robjects import numpy2ri
from rpy2.robjects import pandas2ri
if __name__ == "__main__":
X = sparse.rand(5, 20, density=0.1)
y = np.random.randn(5)
numpy2ri.activate()
pandas2ri.activate()
utils = rpackages.importr('utils')
utils.chooseCRANmirror(ind=1)
if not rpackages.isinstalled('glmnet'):
utils.install_packages("glmnet")
glmnet = rpackages.importr('glmnet')
glmnet = robjects.r['glmnet']
glmnet_fit = glmnet(X, y, intercept=False, standardize=False)
And when I run it I get a NotImplementedError:
Conversion 'py2ri' not defined for objects of type '<class 'scipy.sparse.csc.csc_matrix'>'
Could I provide X in a different way? I'd be surprised if rpy2 could not handle sparse matrices.
You can create a sparse matrix with rpy2 as follows:
import numpy as np
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from scipy import sparse
X = sparse.rand(5, 20, density=0.1).tocoo()
r_Matrix = importr("Matrix")
r_Matrix.sparseMatrix(
i=ro.IntVector(X.row + 1),
j=ro.IntVector(X.col + 1),
x=ro.FloatVector(X.data),
dims=ro.IntVector(X.shape))
There is indeed no converter Python -> R for your object type included in rpy2. Your Python object is not a conventional arrays but a sparse matrix as you note it (scipy.sparse.csc.csc_matrix to be specific), implemented as one of the numerical extensions available for numpy. As numpy itself is not even required to use rpy2 the support for extension of numpy is rather sparse, at the notable exception of pandas since data tables are ubiquitous.
You may want to write your own converter from css_matrix to gcCMatrix in the R package Matrix (https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/dgCMatrix-class.html) as the package glmnet appears to be able to handle them.
Writing a custom converter will require how to map or copy the content of the Python object to its chosen R counterpart, but once done plugging the code into rpy2 should be quite easy:
https://rpy2.github.io/doc/v2.9.x/html/generated_rst/s4class.html#custom-conversion
Consider opening an issue as a "feature request" on the rpy2 issue tracker, and reporting progress and outcome, with the hope to see this turn into a pull request complete with unit tests
Also a quick solution that might work would be to save the sparse matrix file temporarily.
import numpy as np
import rpy2.robjects as ro
import warnings
from rpy2.rinterface import RRuntimeWarning
import rpy2.robjects.numpy2ri as numpy2ri
from scipy.io import mmwrite
mmwrite('temp.mtx',matrix)
ro.r('X <- readMM("temp.mtx")')
I would be very interested though, if someone comes with a custom converter for avoiding that copy to disk.

NameError: global name 'imshow' is not defined but Matplotlib is imported

I'm currently writing a python script which plots a numpy matrix containing some data (which I'm not having any difficulty computing). For complicated reasons having to do with how I'm creating that data, I have to go through terminal. I've done problems like this a million times in Spyder using imshow(). So, I thought I'd try to do the same in terminal. Here's my code:
from numpy import *
from matplotlib import *
def make_picture():
f = open("DATA2.txt")
arr = zeros((200, 200))
l = f.readlines()
for i in l:
j = i[:-1]
k = j.split(" ")
arr[int(k[0])][int(k[1])] = float(k[2])
f.close()
imshow(arr)
make_picture()
Suffice it to say, the array stuff works just fine. I've tested it, and it extracts the data perfectly well. So, I've got this 200 by 200 array of numbers floating around my RAM and I'd like to display it. When I run this code in Spyder, I get exactly what I expected. However, when I run this code in Terminal, I get an error message:
Traceback (most recent call last):
File "DATAmine.py", line 15, in <module>
make_picture()
File "DATAmine.py", line 13, in make_picture
imshow(arr)
NameError: global name 'imshow' is not defined
(My program's called DATAmine.py) What's the deal here? Is there something else I should be importing? I know I had to configure my Spyder paths, so I wonder if I don't have access to those paths or something. Any suggestions would be greatly appreciated. Thanks!
P.S. Perhaps I should mention I'm using Ubuntu. Don't know if that's relevant.
To make your life easier you can use
from pylab import *
This will import the full pylab package, which includes matplotlib and numpy.
Cheers

Converting python objects for rpy2

The following code is supposed to create a heatmap in rpy2
import numpy as np
from rpy2.robjects import r
data = np.random.random((10,10))
r.heatmap(data)
However, it results in the following error
Traceback (most recent call last):
File "z.py", line 8, in <module>
labRow=rowNames, labCol=colNames)
File "C:\Python25\lib\site-packages\rpy2\robjects\__init__.py", line 418, in __call__
new_args = [conversion.py2ri(a) for a in args]
File "C:\Python25\lib\site-packages\rpy2\robjects\__init__.py", line 93, in default_py2ri
raise(ValueError("Nothing can be done for the type %s at the moment." %(type(o))))
ValueError: Nothing can be done for the type <type 'numpy.ndarray'> at the moment.
From the documentation I learn that r.heatmap expects "a numeric matrix". How do I convert np.array to the required data type?
You need to add
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
See more in rpy2 documentation numpy section (here for the older 2.x version)
Prior to 2.2.x the import alone was sufficient.
That import alone is sufficient to
switch an automatic conversion of
numpy objects into rpy2 objects.
Why make this an optional import,
while it could have been included in
the function py2ri() (as done in the
original patch submitted for that
function) ?
Although both are valid and reasonable
options, the design decision was taken
in order to decouple rpy2 from numpy
the most, and do not assume that
having numpy installed automatically
meant that a programmer wanted to use
it.
For rpy2 2.2.4 I had to add:
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
For me (2.2.1) the following also worked (as documented on http://rpy.sourceforge.net/rpy2/doc-2.2/html/numpy.html):
import rpy2.robjects as ro
from rpy2.robjects.numpy2ri import numpy2ri
ro.conversion.py2ri = numpy2ri

Categories