In R we can use Rcpp to call a cpp function as the one below:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
SEXP critcpp(SEXP a, SEXP b){
NumericMatrix X(a);
NumericVector crit(b);
int p = XtX.ncol();
NumericMatrix critstep(p,p);
NumericMatrix deltamin(p,p);
List lst(2);
for (int i = 0; i < (p-1); i++){
for (int j = i+1; j < p; j++){
--some calculations
}
}
lst[0] = critstep;
lst[1] = deltamin;
return lst;
}
I want to do the same thing in python.
I have gone through Boost,SWIG etc but it seems complicated to my newbie Python eyes.
Can the python wizards here kindly point me in the right direction.
I need to call this C++ function from inside a Python function.
Since I think the only real answer is by spending some time in rewriting the function you posted, or by writing a some sort of wrapper for the function (absolutely possible but quite time consuming) I'm answering with a completely different approach...
Without passing by any sort of compiled conversion, a really faster way (from a programming time point of view, not in efficiency) may be directly calling the R interpreter with the module of the function you posted from within python, through the python rpy2 module, as described here. It requires the panda module, to handle the data frames from R.
The module to use (in python) are:
import numpy as np # for handling numerical arrays
import scipy as sp # a good utility
import pandas as pd # for data frames
from rpy2.robjects.packages import importr # for importing your module
import rpy2.robjects as ro # for calling R interpreter from within python
import pandas.rpy.common as com # for storing R data frames in pandas data frames.
In your code you should import your module by calling importr
importr('your-module-with-your-cpp-function')
and you can send directly commands to R by issuing:
ro.r('x = your.function( blah blah )')
x_rpy = ro.r('x')
type(x_rpy)
# => rpy2.robjects.your-object-type
you can store your data in a data frame by:
py_df = com.load_data('variable.name')
and push back a data frame through:
r_df = com.convert_t_r_dataframe(py_df)
ro.globalenv['df'] = r_df
This is for sure a workaround for your question, but it may be considered as a reasonable solution for certain applications, even if I do not suggest it for "production".
Related
Even if numba, cython (and especially cython.inline) exist, in some cases, it would be interesting to have inline C code in Python.
Is there a built-in way (in Python standard library) to have inline C code?
PS: scipy.weave used to provide this, but it's Python 2 only.
Directly in the Python standard library, probably not. But it's possible to have something very close to inline C in Python with the cffi module (pip install cffi).
Here is an example, inspired by this article and this question, showing how to implement a factorial function in Python + "inline" C:
from cffi import FFI
ffi = FFI()
ffi.set_source("_test", """
long factorial(int n) {
long r = n;
while(n > 1) {
n -= 1;
r *= n;
}
return r;
}
""")
ffi.cdef("""long factorial(int);""")
ffi.compile()
from _test import lib # import the compiled library
print(lib.factorial(10)) # 3628800
Notes:
ffi.set_source(...) defines the actual C source code
ffi.cdef(...) is the equivalent of the .h header file
you can of course add some cleaning code after, if you don't need the compiled library at the end (however, cython.inline does the same and the compiled .pyd files are not cleaned by default, see here)
this quick inline use is particularly useful during a prototyping / development phase. Once everything is ready, you can separate the build (that you do only once), and the rest of the code which imports the pre-compiled library
It seems too good to be true, but it seems to work!
I have a Python script (converted from an IPython Notebook) with
def train_mod_inR():
get_ipython().magic(u'R -i myinput')
get_ipython().magic(u'R -o res res <- Rfunction(myinput)')
return res
I am trying to just run the script using Python without any of the magic, so I am using rpy2. Does anyone know any easy way to translate the function above so it just works with standard rpy2. E.g., is it:
def train_mod_inR()
r2py.robjects('-i myinput')
r2py.robjects('-o res res <- Rfunction(myinput)')
return res
In fact, does anyone have a good workaround for this in general? I have a feeling this will require a bit more than what I've shown - is there any command in r2py directly that will allow us to interpret R like is being done by the magic command?
You were pretty close.
An exact translation of what is happening in the magic would be
(NOTE: the object "converter" is defined is the next code block to break down what is happening, just assume that it is just an rpy2 converter for now):
from rpy2.robjects.conversion import localconverter
# Take the Python object known as "myinput", pass it to R
# (going through the rpy2 conversion layer), and make it known to R
# as "myinput".
with localconverter(converter) as cv:
r2py.robjects.globalenv['myinput'] = myinput
# Run the R function known (to R) as "Rfunction", using the object known
# to R as "myinput" as the argument. Make the resulting object known to
# as "res".
r2py.robjects('res <- Rfunction(myinput)')
# Take the R objects "res", pass it to Python (going through the rpy2
# conversion layer), and make it known to Python as "res".
with localconverter(converter) as cv:
res = r2py.robjects.globalenv['res']
The conversion can be customized in the R magic (https://rpy2.github.io/doc/v3.1.x/html/interactive.html#module-rpy2.ipython.rmagic), but assuming that you are using the current default it would be (testing whether numpy or pandas are installed is left as an exercise for the reader):
from rpy2.robjects.conversion import Converter
converter = Converter('ipython conversion')
if has_numpy:
from rpy2.robjects import numpy2ri
converter += numpy2ri.converter
if has_pandas:
from rpy2.robjects import pandas2ri
converter += pandas2ri.converter
Note that conversion can be an expensive step (for example, there is no C-level mapping between R and Python for arrays of strings so they have to be copied over a loop), and depending on how that R result is used elsewhere in the code you may want to go for a lighter conversion (or no conversion).
Additional comment
Now this is when one want exactly replicate what is happening in an "R magic" cell in jupyter. The core part is the execution of the cell, that is the evaluation of the string:
r2py.robjects('res <- Rfunction(myinput)')
The evaluation of that string can be moved to Python, progressively or all at one. In this case only the latter is possible because there is only one function call. The function known to R as "Rfunction" can be mapped to a Python object that is callable:
rfunction = r2py.robjects('Rfunction')
If you are only interested in having res in Python, the creations of symbols myinput and res in R's globalenv might no longer be necessary and the whole can be shortened to:
with localconverter(converter) as cv:
res = rfunction(myinput)
I primarily program in python (using jupyter notebooks) but on occasion need to use an R function. I currently do this by using rpy2 and R magic, which works fine. Now I would like to write a function which will summarize part of my analysis procedure into one wrapper function (so I don't always need to run all of the code cells but can simply execute the function once). As part of this procedure I need to call an R function. I adapted my code to import the R function to python using the rpy2.robjects interface with importr. This works but is extremely slow (more than triple the run time for an already lengthy procedure) which makes this simply not feasible analysiswise. I am assuming this has to do with me accessing R through the high-level interface of rpy2 instead of the low-level interface. I am unsure of how to use the low-level interface within a function call though and would need some help adapting my code.
I've tried looking into the rpy2 documentation but am struggling to understand it.
This is my code for executing the R function call from within python using R magic.
Activating rpy2 R magic
%load_ext rpy2.ipython
Load my required libaries
%%R
library(scran)
Actually call the R function
%%R -i data_mat -i input_groups -o size_factors
size_factors = computeSumFactors(data_mat, clusters=input_groups, min.mean=0.1)
This is my alternative code to import the R function using rpy2 importr.
from rpy2.robjects.packages import importr
scran = importr('scran')
computeSumFactors = scran.computeSumFactors
size_factors = computeSumFactors(data_mat, clusters=input_groups, min_mean=0.1)
For some reason this second approach is orders of magnitude slower.
Any help would be much apreciated.
The only difference between the two that I would see have an influence on the observe execution speed is conversion.
When running in an "R magic" code cell (prefixed with %%R), in your example the result of calling computeSumFactors() is an R object bound to the symbol size_factors in R. In the other case, the result of calling the function computeSumFactors() will go through the conversion system (and there what exactly happens depends on what are the active converters you have) before the result is bound to the Python symbol size_factors.
Conversion can be costly: you should consider trying to deactivate numpy / pandas conversion (the localconverter context manager can be a convenient way to temporarily use minimal conversion for a code block).
in this link: http://earthpy.org/speed.html I found the following
%%cython
import numpy as np
def useless_cython(year):
# define types of variables
cdef int i, j, n
cdef double a_cum
from netCDF4 import Dataset
f = Dataset('air.sig995.'+year+'.nc')
a = f.variables['air'][:]
a_cum = 0.
for i in range(a.shape[0]):
for j in range(a.shape[1]):
for n in range(a.shape[2]):
#here we have to convert numpy value to simple float
a_cum = a_cum+float(a[i,j,n])
# since a_cum is not numpy variable anymore,
# we introduce new variable d in order to save
# data to the file easily
d = np.array(a_cum)
d.tofile(year+'.bin')
print(year)
return d
It seems to be as easy as to just write %%cython over the function. However this just doesnt work for me -> "Statement seems to have no effect" says my IDE.
After a bit of research I found that the %% syntax comes from iphyton which I did also install (as well as cython). Still doesnt work. Iam using python3.6
Any ideas?
Once you are in the IPython interpreter you have to load the extension prior to using it. It can be done with the statement %load_ext, so in your case :
%load_ext cython
These two tools are pretty well documented, if you have not seen it yet, take a look at the relevant part of the document on cython side and on IPython side.
The basic question is this: Let's say I was writing R functions which called python via rPython, and I want to integrate this into a package. That's simple---it's irrelevant that the R function wraps around Python, and you proceed as usual. e.g.
# trivial example
# library(rPython)
add <- function(x, y) {
python.assign("x", x)
python.assign("y", y)
python.exec("result = x+y")
result <- python.get("result")
return(result)
}
But what if the python code with R functions require users to import Python libraries first? e.g.
# python code, not R
import numpy as np
print(np.sin(np.deg2rad(90)))
# R function that call Python via rPython
# *this function will not run without first executing `import numpy as np`
print_sin <- function(degree){
python.assign("degree", degree)
python.exec('result = np.sin(np.deg2rad(degree))')
result <- python.get('result')
return(result)
}
If you run this without importing the library numpy, you will get an error.
How do you import a Python library in an R package? How do you comment it with roxygen2?
It appears the R standard is this:
# R function that call Python via rPython
# *this function will not run without first executing `import numpy as np`
print_sin <- function(degree){
python.assign("degree", degree)
python.exec('import numpy as np')
python.exec('result = np.sin(np.deg2rad(degree))')
result <- python.get('result')
return(result)
}
Each time you run an R function, you will import an entire Python library.
As #Spacedman and #DirkEddelbuettel suggest you could add a .onLoad/.onAttach function to your package that calls python.exec to import the modules that will typically always be required by users of your package.
You could also test whether the module has already been imported before importing it, but (a) that gets you into a bit of a regression problem because you need to import sys in order to perform the test, (b) the answers to that question suggest that at least in terms of performance, it shouldn't matter, e.g.
If you want to optimize by not importing things twice, save yourself the hassle because Python already takes care of this.
(although admittedly there is some quibblingdiscussion elsewhere on that page about possible scenarios where there could be a performance cost).
But maybe your concern is stylistic rather than performance-oriented ...