Unable to access Anaconda Environment in R w/ Reticulate - python

I am having issues selecting a particular Anaconda environment with the reticulate package in R.
What I have been successful with is getting python working with R on my machine using the below code (using R Markdown):
```{r setup, include=FALSE}
library(reticulate)
use_python("C:/Anaconda3/python.exe")```
That's great and all but ideally I would like to use the particular environments that I've created previously within Anaconda. So, this is what I've tried along with the error message:
```{r setup, include=FALSE}
library(reticulate)
use_condaenv(condaenv = 'PFDA', conda = "C:/Anaconda3/Library/bin/conda")```
Error: Specified conda binary 'C:/Anaconda3/Library/bin/conda' does not exist.
I have been referencing this documentation here and it suggests the code should go something like this:
use_condaenv(condaenv = "r-nlp", conda = "/opt/anaconda3/bin/conda")
However, in my Anaconda3 folder, I don't have a subfolder named bin:
There's obviously something I am doing wrong here. Potentially, I am not sending R in the right direction to locate conda? Hopefully, someone can point me in the right direction as I've been really excited to implement this in my coding life.
My session specs:
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
other attached packages:
[1] reticulate_1.13
loaded via a namespace (and not attached):
[1] compiler_3.6.2 Matrix_1.2-18 tools_3.6.2
[4] yaml_2.2.0 Rcpp_1.0.3 grid_3.6.2
[7] knitr_1.26 jsonlite_1.6 xfun_0.11
[10] png_0.1-7 lattice_0.20-38

Related

Problem with accessing virtual environment Python from R Markdown

Note: I'm on Windows using Git Bash.
So, I am trying to setup a dev environment for the work for my class. It is going to involve a combination of coding in R and Python.
I created virtual environments using pipenv and virtualenv and ran into the same problem with both. So, first, let's create a virtual environment for the project in a sub-folder dev_env:
cd project_folder/dev_env
pipenv --python 3.7
pipenv --py
Output
C:\Users\Ra Me\.virtualenvs\dev_env-5TUtSZI9\Scripts\python.exe
Now I'm going into my file.rmd and trying the reticulate package.
#install.packages("reticulate")
library(reticulate)
Next, I tried 2 methods:
Sys.setenv(RETICULATE_PYTHON = "C:/Users/Ra Me/.virtualenvs/dev_env-5TUtSZI9/Scripts")
or
use_virtualenv("C:/Users/Ra Me/.virtualenvs/dev_env-5TUtSZI9/", required = TRUE)
x = 1
if x:
print('Hello!')
Both of them produced the error
Fatal Python error: initfsencoding: unable to load the file system codec
ModuleNotFoundError: No module named 'encodings'
However, when I change the path to the Python environment that's installed for all users on my machine, it works.
Sys.setenv(RETICULATE_PYTHON = "C:/Program Files/Python37/")
This method also works. However, here we are not even using the reticulate project.
knitr::opts_chunk$set(engine.path = list(
python = "C:/Program Files/Python37/python.exe",
r = "C:/Program Files/R/R-3.6.1/bin/R.exe"
))

Cannot import geopandas using reticulate in RStudio when knitting with knitr

I am trying to knit an Rmd file using reticulate and Python inside of a virtualenv.
The following is my R set up chunk:
```{r r-setup}
library(reticulate)
venv_path <- "/path/to/venv/"
use_virtualenv(venv_path, required = TRUE)
```
This works as expected. However, the next step breaks when I try to import geopandas:
```{python}
import geopandas as gpd
```
The traceback is as follows:
Error in py_module_import... OSError: Could not find lib c or load any variants...
The traceback error points to the shapely package from shapely.geometry import shape, Point File. Other Python libraries load with no issue within the chunk e.g. import os.
From these messages, I'm guessing that it is not loading the OGR/GDAL bindings. However, I'm not sure how to solve this.
import geopandas runs without error when I run the chunk inside of the notebook e.g. not knitting. It also works within the repl_python() shell of my project. So the issue seems to be principally with knitr and knitting.
My RStudio version is: 1.1.456.
The output of session_info() is:
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin17.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reticulate_1.10 stringr_1.3.1 dplyr_0.7.6 ggplot2_3.0.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.18 pillar_1.3.0 compiler_3.5.1 plyr_1.8.4
[5] bindr_0.1.1 tools_3.5.1 digest_0.6.17 packrat_0.4.9-3
[9] jsonlite_1.5 evaluate_0.11 tibble_1.4.2 gtable_0.2.0
[13] lattice_0.20-35 pkgconfig_2.0.2 rlang_0.2.2 Matrix_1.2-14
[17] yaml_2.2.0 bindrcpp_0.2.2 withr_2.1.2 knitr_1.20
[21] rprojroot_1.3-2 grid_3.5.1 tidyselect_0.2.4 glue_1.3.0
[25] R6_2.2.2 rmarkdown_1.10 purrr_0.2.5 magrittr_1.5
[29] scales_1.0.0 backports_1.1.2 htmltools_0.3.6 assertthat_0.2.0
[33] colorspace_1.3-2 stringi_1.2.4 lazyeval_0.2.1 munsell_0.5.0
[37] crayon_1.3.4
I managed to solve this by removing the "DYLD_FALLBACK_LIBRARY_PATH" which points to my brew installed R libraries.
The solution was within a python chunk as follows:
```{python}
import os
FALLBACK_PATH = {"DYLD_FALLBACK_LIBRARY_PATH" : "/usr/local/Cellar/r/3.5.1/lib/R/lib"}
del os.environ["DYLD_FALLBACK_LIBRARY_PATH"]
import geopandas
# Reset the environmental variable.
os.environ.update(FALLBACK_PATH)
```
I'm not sure if this is the cleanest solution but it works. Also not sure if this is a Mac OSX problem only as well.

How to import newly compiled python module?

I have compiled lightgbm with GPU support for python from sources following this guide http://lightgbm.readthedocs.io/en/latest/GPU-Windows.html
Test usage from console was succesful:
C:\github_repos\LightGBM\examples\binary_classification>"../../lightgbm.exe" config=train.conf data=binary.train valid=binary.test objective=binary device=gpu
[LightGBM] [Warning] objective is set=binary, objective=binary will be ignored. Current value: objective=binary
[LightGBM] [Warning] data is set=binary.train, data=binary.train will be ignored. Current value: data=binary.train
[LightGBM] [Warning] valid is set=binary.test, valid_data=binary.test will be ignored. Current value: valid=binary.test
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Loading weights...
Then I tried to import in Python with no luck. It import anaconda version without GPU support:
from sklearn.datasets import load_iris
iris = load_iris()
import lightgbm as lgb
lgtrain = lgb.Dataset(iris.data, iris.target)
lgb_clf = lgb.train(
{
'objective' : 'regression',
'metric' : 'rmse',
'num_leaves' : 350,
#'max_depth': 14,
'learning_rate' : 0.017,
'feature_fraction' : 0.5,
'bagging_fraction' : .8,
'verbosity' : -1 ,
'device' : 'gpu'
},
lgtrain,
num_boost_round=3500,
verbose_eval=100
)
LightGBMError: b'GPU Tree Learner was not enabled in this build. Recompile with CMake option -DUSE_GPU=1'
I believe I have to specify the location but how?
I think this might not be specific to lightGBM, but rather a problem with Anaconda's virtual environment. When working within the Anaconda virtual env, your system paths are modified to point to Anaconda installation directories.
As you point out, this leads to Anaconda loading its own version, rather than the external version you configured, compiled and tested.
There are several ways to force Anaconda to find your package, see this related discussion.
The suggestions that involve running ln -s are only for Linux and Mac, but you can do something similar in Windows.
You could start by uninstalling the Anaconda version of lightGBM, then create a copy of the custom-compiled version within the Anaconda path. You can discover this using
import sys
sys.path
Remove previously installed Python package with the following command:
pip uninstall lightgbm
or
conda uninstall lightgbm
After doing that navigate to the Python package directory and install it with the library file which you've compiled:
cd LightGBM/python-package
python setup.py install --precompile

Cannot load R packages on Azure Batch nodes

I am having difficulty loading packages into R on my compute pool nodes using the Azure Batch Python API. The code that I am using is similar to what is provided in the Azure Batch Python SDK Tutorial, except the task is more complicated -- I want each node in the job pool to execute an R script which requires certain package dependencies.
Hence, in my start task commands below, I have each node (Canonical UbuntuServer SKU: 16) install R via apt and install R package dependencies (the reason why I added R package installation to the start task is that, even after creating a lib directory ~/Rpkgs with universal permissions, running install.packages(list_of_packages, lib="~/Rpkgs/", repos="http://cran.r-project.org") in the task script leads to "not writable" errors.)
task_commands = [
'cp -p {} $AZ_BATCH_NODE_SHARED_DIR'.format(_R_TASK_SCRIPT),
# Install pip
'curl -fSsL https://bootstrap.pypa.io/get-pip.py | python',
# Install the azure-storage module so that the task script can access Azure Blob storage, pre-cryptography version
'pip install azure-storage==0.32.0',
# Install R
'sudo apt -y install r-base-core',
'mkdir ~/Rpkgs/',
'sudo chown _azbatch:_azbatchgrp ~/Rpkgs/',
'sudo chmod 777 ~/Rpkgs/',
# Install R package dependencies
# *NOTE*: the double escape below is necessary because Azure strips the forward slash
'printf "install.packages( c(\\"foreach\\", \\"iterators\\", \\"optparse\\", \\"glmnet\\", \\"doMC\\"), lib=\\"~/Rpkgs/\\", repos=\\"https://cran.cnr.berkeley.edu\\")\n" > ~/startTask.txt',
'R < startTask.txt --no-save'
]
Anyhow, I confirmed in the Azure portal that these packages installed as intended on the compute pool nodes (you can see them located at startup/wd/Rpkgs/, a.k.a. ~/Rpkgs/, in the node filesystem). However, while the _R_TASK_SCRIPT task was successfully added to the job pool, it terminated with a non-zero exit code because it wasn't able to load any of the packages (e.g. foreach, iterators, optparse, etc.) that had been installed in the start task.
More specifically, the _R_TASK_SCRIPT contained the following R code and returned the following output:
R code:
lapply( c("iterators", "foreach", "optparse", "glmnet", "doMC"), require, character.only=TRUE, lib.loc="~/Rpkgs/")
...
R stderr, stderr.txt on Azure Batch node:
Loading required package: iterators
Loading required package: foreach
Loading required package: optparse
Loading required package: glmnet
Loading required package: doMC
R stdout, stdout.txt on Azure Batch node:
[[1]]
[1] FALSE
[[2]]
[1] FALSE
[[3]]
[1] FALSE
[[4]]
[1] FALSE
[[5]]
[1] FALSE
FALSE above indicates that it was not able to load the R package. This is the issue I'm facing, and I'd like to figure out why.
It may be noteworthy that, when I spin up a comparable VM (Canonical UbuntuServer SKU: 16) and run the same installation manually, it successfully loads all packages.
myusername#rnode:~$ pwd
/home/myusername
myusername#rnode:~$ mkdir ~/Rpkgs/
myusername#rnode:~$ printf "install.packages( c(\"foreach\", \"iterators\", \"optparse\", \"glmnet\", \"doMC\"), lib=\"~/Rpkgs/\", repos=\"http://cran.r-project.org\")\n" > ~/startTask.txt
myusername#rnode:~$ R < startTask.txt --no-save
myusername#rnode:~$ R
R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"
...
> lapply( c("iterators", "foreach", "optparse", "glmnet", "doMC"), require, character.only=TRUE, lib.loc="~/Rpkgs/")
Loading required package: iterators
Loading required package: foreach
...
Loading required package: optparse
Loading required package: glmnet
Loading required package: Matrix
Loaded glmnet 2.0-10
Loading required package: doMC
Loading required package: parallel
[[1]]
[1] TRUE
[[2]]
[1] TRUE
[[3]]
[1] TRUE
[[4]]
[1] TRUE
[[5]]
[1] TRUE
Thanks in advance for your help and suggestions.
Each task runs on its own working directory which is referenced by the environment variable, $AZ_BATCH_TASK_WORKING_DIR. When the R session runs, the current R working directory [ getwd() ] will be $AZ_BATCH_TASK_WORKING_DIR, not $AZ_BATCH_NODE_STARTUP_DIR where the pkgs lives.
To get the exact package location ("startup/wd/pkgs") in the R code,
lapply( c("iterators", "foreach", "optparse", "glmnet", "doMC"), require,
character.only=TRUE, lib.loc=paste0(Sys.getenv("AZ_BATCH_NODE_STARTUP_DIR"),
"/wd/", "Rpkgs") )
or
Run this method before the lapply:
setwd(paste0(Sys.getenv("AZ_BATCH_NODE_STARTUP_DIR"), "/wd/"))
Added: You can also create a Batch pool of Azure data scientist virtual machines that has R already installed so you don't have to install it yourself.
Azure Batch has the doAzureParallel R package supports package installation.
Here's a link: https://github.com/Azure/doAzureParallel (Disclaimer: I created the doAzureParallel R package)
It seems to be caused by the installed packages not exists the default library paths for R. Try to set the path of library trees within which packages are looked for via add the code .libPath("~\Rpkgs") before load packages.
As reference, there is a SO thread Changing R default library path using .libPaths in Rprofile.site fails to work which you can refer to.
Meanwhile, an offical blog introduces how to use R workload on Azure Batch, but for Windows environment. Hope it helps.

R igraph crashes R session when loading graphml file created by python igraph

I'm in a bit strange situation that I have to create an igraph object in python and post process it in R.
I found the rPython package on cran which allows me to execute python code in a R session.
In R, I use the following code to generate a python-igraph object and save it with the graphml format.
library(rPython)
library(igraph)
python.exec(c("from igraph import *",
"g = Graph(directed = True)",
"g.add_vertices(3)",
"g.add_edges([(0,1),(1,2)])",
"fn = './test.graphml'",
"g.write_graphml(fn)"))
Now if i try to load the file test.graphml in the very same R session with
g <- read.graph("./test.graphml", format = "graphml")
That session just crashes and returns the following error messages:
*** caught segfault ***
address 0x59, cause 'memory not mapped'
Traceback:
1: .Call("R_igraph_read_graph_graphml", file, as.numeric(index), PACKAGE = "igraph")
2: read.graph.graphml(file, ...)
3: read.graph("./test.graphml", format = "graphml")
aborting ...
Segmentation fault (core dumped)
If I start a new R session, I'm able to load the previous saved test.graphml with
g <- read.graph("./test.graphml", format = "graphml")
My python-igraph version is 0.7.0. and the R sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8
[4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] igraph_0.7.0 rPython_0.0-5 RJSONIO_1.3-0
loaded via a namespace (and not attached):
[1] tools_3.1.2
NOTE: I only have this issue when runing ubuntu 14.04 where I couldn't find the latest version 0.7.1 for python-igraph. I've tested the same code in mac ox in which both my python (via mac port) and R igraph versions are 0.7.1. It works just fine. I'm wondering if this can be resolved by simply install version 0.7.1 for python-igraph.

Categories