QGIS: How to run R directly in python console? - python

I have been using QGIS python console to automate my needs. I've used a few processing algorithms (such as distance matrix) to work on my vector layers which outputs csv files.I need R to work on these files before bringing them back to my python console as variables.
Is there a way I can run R directly through the python console (maybe using packages such as rpy2?)

I guess you can easily interact with a R instance in the QGis python console using rpy2.
Try these following lines of code in the QGIS python console:
>>> import rpy2.rinterface as rinterface
>>> rinterface.set_initoptions((b'rpy2', b'--no-save'))
>>> rinterface.initr()
0
>>> from rpy2.robjects.packages import importr
>>> import rpy2.robjects as robjects
You can now interact with R like this :
>>> robjects.r("""seq(1,12);""")
<IntVector - Python:0x7fa5f6e4abd8 / R:0x769f4a8>
[ 1, 2, 3, ..., 10, 11, 12]
Or import some libraries for example :
>>> rutils = importr("utils")
>>> rgraphics = importr('graphics')
Take a look at the documentation of Rpy2, I have successfully used these methods to run some personals scripts or some libraries installed from the CRAN (running multiple statements in robjects.r("""...""") and grabbing the output in a python variable to use in QGIS).
(If i remember well, on windows I had to set some environnement variables first, like R_HOME or R_USER maybe)
Also, if you haven't seen it, take a look at this page of the QGIS documentation : 17.31. Use R scripts in Processing. It offers a convenient way to use your existing R script with some slight add.

Related

Using Python modules in Swift and Pythonkit

I was looking to get some help or clarification of the limitations of using PythonKit in Swift. Well I say PythonKit, I actually installed the Tensorflow toolchain in Xcode as I couldn't get Pythonkit to work on its own as a single dependancy (MacBook would spin its wheels with fans blasting trying to import numpy).
Anyway I wanted to say its brilliant that I can use Python modules in Swift, makes it much easier to potentially start using swift for more than just iOS apps.
My issue is that I have imported Python modules fine but its not clear how much functionality they will have. I assume ones like numpy will be pretty much the same but as a scientist I use netcdf files a lot so have been trying to use netCDF4. This imports fine and I can load the data object and attributes etc fine but I can't get the actual array out.
Here is an example:
import PythonKit
PythonLibrary.useVersion(3, 7)
let nc = Python.import("netCDF4")
var Data = nc.Dataset("ncfile path")
var lat_z = Data.variables["lat_z"][:]
The [:] is causing an error that is picked up by Xcode, removing it allows the script to run but results in the variable object rather than the array. I can add stuff to the end to get the attributes etc e.g. lat_z.long_name but not sure of how to extract the array without using [:]
I am hoping this is just a syntax difference that I need to learn with swift (very much early days with using it) or is it a limitation of the PythonKit? I have not found anyone atcually using netcdf4 (examples are mostly numpy and Matplotlib) If so are there some general limitations with using python modules in swift?
I am also trying to get Matplotlib to work but am pretty sure thats due to using an commandline tool project in Xcode which hasn't got a view so makes sense it can't show me an image.
Any pointers and maybe links to upto date documentation would be great there seems to be some changes that have occurred e.g. import PythonKit rather than import Python.
Many Thanks
You can use the count property on a python iterable, which is equivalent to len. You can index Numpy array in two ways: (i) with Swift range syntax and (ii) with Numpy range objects:
import Foundation
import PythonKit
let np = Python.import("numpy")
let array = np.array([1, 2, 3, 4, 5])
print(array) // [1, 2, 3, 4, 5]
let subArray = array[0..<array.count]
print(subArray) // [1, 2, 3, 4, 5]
let subArray2 = array[np.arange(0, 2)]
print(subArray2) // [1, 2]
// Swift equivalent of Python ":"
let subArray3 = array[...]
You can also convert numpy arrays to Swift arrays and use Swift methods and subscripts:
let swiftArray = Array(array)
let swiftSubArray = swiftArray[0..<3]
print(swiftSubArray) // [1, 2, 3]
Note that you should prefer using Python.len(...) over the count property while working with PythonObjects because count will incur performance penalty because of the implementation of PythonKit that does not automatically conforms Python Object to RandomAccessCollection (thus count is O(n)).

Improve Runtime for calling R function in python function using rpy2

I primarily program in python (using jupyter notebooks) but on occasion need to use an R function. I currently do this by using rpy2 and R magic, which works fine. Now I would like to write a function which will summarize part of my analysis procedure into one wrapper function (so I don't always need to run all of the code cells but can simply execute the function once). As part of this procedure I need to call an R function. I adapted my code to import the R function to python using the rpy2.robjects interface with importr. This works but is extremely slow (more than triple the run time for an already lengthy procedure) which makes this simply not feasible analysiswise. I am assuming this has to do with me accessing R through the high-level interface of rpy2 instead of the low-level interface. I am unsure of how to use the low-level interface within a function call though and would need some help adapting my code.
I've tried looking into the rpy2 documentation but am struggling to understand it.
This is my code for executing the R function call from within python using R magic.
Activating rpy2 R magic
%load_ext rpy2.ipython
Load my required libaries
%%R
library(scran)
Actually call the R function
%%R -i data_mat -i input_groups -o size_factors
size_factors = computeSumFactors(data_mat, clusters=input_groups, min.mean=0.1)
This is my alternative code to import the R function using rpy2 importr.
from rpy2.robjects.packages import importr
scran = importr('scran')
computeSumFactors = scran.computeSumFactors
size_factors = computeSumFactors(data_mat, clusters=input_groups, min_mean=0.1)
For some reason this second approach is orders of magnitude slower.
Any help would be much apreciated.
The only difference between the two that I would see have an influence on the observe execution speed is conversion.
When running in an "R magic" code cell (prefixed with %%R), in your example the result of calling computeSumFactors() is an R object bound to the symbol size_factors in R. In the other case, the result of calling the function computeSumFactors() will go through the conversion system (and there what exactly happens depends on what are the active converters you have) before the result is bound to the Python symbol size_factors.
Conversion can be costly: you should consider trying to deactivate numpy / pandas conversion (the localconverter context manager can be a convenient way to temporarily use minimal conversion for a code block).

VSCode Itellisense with python C extension module (petsc4py)

I'm currently using a python module called petsc4py (https://pypi.org/project/petsc4py/). My main issue is that none of the typical intellisense features seems to work with this module.
I'm guessing it might have something to do with it being a C extension module, but I am not sure exactly why this happens. I initially thought that intellisense was unable to look inside ".so" files, but it seems that numpy is able to do this with the array object, which in my case is inside a file called multiarray.cpython-37m-x86_64-linux-gnu (check example below).
Does anyone know why I see this behaviour in the petsc4py module. Is there anything that I (or the developers of petsc4py) can do to get intellisense to work?
Example:
import sys
import petsc4py
petsc4py.init(sys.argv)
from petsc4py import PETSc
x_p = PETSc.Vec().create()
x_p.setSizes(10)
x_p.setFromOptions()
u_p = x_p.duplicate()
import numpy as np
x_n = np.array([1,2,3])
u_n = x_n.copy()
In this example, when trying to work with a Vec object from petsc4py, doing u_p.duplicate() cannot find the function and the suggestion is simply a repetition of the function immediately before. However, using an array from numpy, doing u_n.copy() works perfectly.
If you're compiling in-place then you're bumping up against https://github.com/microsoft/python-language-server/issues/197.

How do you import a Python library within an R package using rPython?

The basic question is this: Let's say I was writing R functions which called python via rPython, and I want to integrate this into a package. That's simple---it's irrelevant that the R function wraps around Python, and you proceed as usual. e.g.
# trivial example
# library(rPython)
add <- function(x, y) {
python.assign("x", x)
python.assign("y", y)
python.exec("result = x+y")
result <- python.get("result")
return(result)
}
But what if the python code with R functions require users to import Python libraries first? e.g.
# python code, not R
import numpy as np
print(np.sin(np.deg2rad(90)))
# R function that call Python via rPython
# *this function will not run without first executing `import numpy as np`
print_sin <- function(degree){
python.assign("degree", degree)
python.exec('result = np.sin(np.deg2rad(degree))')
result <- python.get('result')
return(result)
}
If you run this without importing the library numpy, you will get an error.
How do you import a Python library in an R package? How do you comment it with roxygen2?
It appears the R standard is this:
# R function that call Python via rPython
# *this function will not run without first executing `import numpy as np`
print_sin <- function(degree){
python.assign("degree", degree)
python.exec('import numpy as np')
python.exec('result = np.sin(np.deg2rad(degree))')
result <- python.get('result')
return(result)
}
Each time you run an R function, you will import an entire Python library.
As #Spacedman and #DirkEddelbuettel suggest you could add a .onLoad/.onAttach function to your package that calls python.exec to import the modules that will typically always be required by users of your package.
You could also test whether the module has already been imported before importing it, but (a) that gets you into a bit of a regression problem because you need to import sys in order to perform the test, (b) the answers to that question suggest that at least in terms of performance, it shouldn't matter, e.g.
If you want to optimize by not importing things twice, save yourself the hassle because Python already takes care of this.
(although admittedly there is some quibblingdiscussion elsewhere on that page about possible scenarios where there could be a performance cost).
But maybe your concern is stylistic rather than performance-oriented ...

fitdistr in rpy2

I've a 1D list of data, that I want to fit into a distribution using either least squares or maximum likelihood, as presented here, but I want to do it from python instead of the R interactive shell.
I got rpy2 installed, and would like to use the fitdistr function from within the interactive ipython shell, as I have imported the data in a list.
Where is this function, and how do I use it?
The function is in the R package MASS
from rpy2.robjects.packages import importr
MASS = importr('MASS')
# the function is now at MASS.fitdistr

Categories