How to use R and python in a Kaggle Notebook?

How to use R and python in a Kaggle Notebook? - python

I would like to use both R and Python languages inside a Kaggle Kernel. Thus, when running
!pip install rpy2
inside a Kaggle Notebook I got the following error
Error: rpy2 in API mode cannot be built without R in the PATH or R_HOME defined. Correct this or force ABI mode-only by defining the environment variable RPY2_CFFI_MODE=ABI
I've found out a solution for users of Python within R, but a solution for calling R within Python in a Kaggle Kernel has not yet been provided.

One can notice that a Kaggle Kernel is using behind an anaconda environment. For example,
/opt/conda/bin/python3.7
Also, it is necessary to have R installed on this conda environment. Thus, we can use the subprocess library to run the following script for installing R
import subprocess
subprocess.run('conda install -c conda-forge r-base', shell=True)
and the corresponding rpy2 package
!pip install rpy2
I have provided a notebook on Kaggle with a complete explanation. I'll appreciate your comments.

Related

psutil library installation issue on databricks

I am using psutil library on my databricks cluster which was running fine for last couple of weeks. When I started the cluster today, this specific library failed to install. I noticed there was a different version of psutil got updated in the site.
Currently my python script fails with 'No module psutil'
Tried installing previous version of psutil using pip install but still my code fails with the same error.
Is there any alternative to psutil or is there a way to install it in databricks

As I known, there are two ways to install a Python package in Azure Databricks cluster, as below.
As the two figures below, move to the Libraries tab of your cluster and click the Install New button to type the package name of you want to install, then wait to install successfully
Open a notebook, type the shell command as below to install a Python package via pip. Note: At here, for installing in the current environment of databricks cluster, not in the system environment of Linux, you must use /databricks/python/bin/pip, not only pip.
%sh
/databricks/python/bin/pip install psutil
Finally, I run the code below, it works for the two ways above.
import psutil
for proc in psutil.process_iter(attrs=['pid', 'name']):
print(proc.info)
psutil.pid_exists(<a pid number in the printed list above>)

In additional to #Peter response, you can also use "Library utilities" to install Python libraries.
Library utilities allow you to install Python libraries and create an environment scoped to a notebook session. The libraries are available both on the driver and on the executors, so you can reference them in UDFs. This enables:
Library dependencies of a notebook to be organized within the
notebook itself.
Notebook users with different library dependencies
to share a cluster without interference.
Example: To install "psutil" library using library utilities:
dbutils.library.installPyPI("psutil")
**Reference: **Databricks - library utilities
Hope this helps.

R kernel crashes while loading R package using rpy2

First of all, I’m new to rpy2 / jupyter so please don’t judge me if this isn’t the correct place to ask my question.
I am trying to set up an integrated workflow for data analysis using R and Python and I encounter the following error:
I am on Ubuntu 19.04. running a conda environment using Jupyter 1.0.0, Python 3.7.4, R 3.5.1, r-irkernel 1.0.2 and rpy2 3.1.0 and I installed the R-package Seurat through R.
When I create a Jupyter notebook using the R-kernel, I can load Seurat with library(Seurat) just fine.
I can also use R code in python using rpy2 and the rmagic such as:
%load_ext rpy2.ipython
%%R
data(allen, package = 'scRNAseq')
adata_allen <- as(allen, 'SingleCellExperiment')
However when I try to load Seurat using rpy2 the kernel crashes:
%%R
library(Seurat)
And I get the following message:
Kernel Restarting
The kernel appears to have died. It will restart automatically
Jupyter gives the following message in the command line:
[I 16:39:01.388 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
kernel 23284ec0-63d5-4b61-9ffa-b52d19851eab restarted
Note that other libraries such as library(dplyr) load just fine using rpy2.
My complete conda environment can be found in the attached text file.
I just can’t seem to figure out what is causing the problem. Is there a way to get a more verbose error message from Jupyter?
Your help would be greatly appreciated!
Regards Felix

The R package Seurat is using an other R package called reticulate, providing a bridge to Python from R.
Unfortunately, whenever rpy2 and reticulate are involved R ends up being initialized twice, which results inevitably in a segfault. This is still an open bug at the time of writing. The issue tracking on the rpy2 side (a link to the reticulate side of the tracking can be found there) is here:
https://bitbucket.org/rpy2/rpy2/issues/456/reticulate-rpy2-sharing-r-process

I've got the same problem with you. But I downgrade to Seurat 3.0.2, your problem will be fixed. To use the user defined R kernel for rpy2 with conda, run the code before at the very beginning (before imoort rpy2)
# user defined R installation
import os
os.environ['R_HOME'] = '/path/to/miniconda/envs/seurat/lib/R' #path to your R installation
os.environ['R_USER'] = '/path/to/miniconda/lib/python3.7/site-packages/rpy2' #path depends on where you installed Python.

This worked for me, while facing issue of kernel getting dead during importing robjects from rpy2:
import os
os.environ['R_HOME'] = '/Users/<your user>/anaconda3/envs/<env name>/lib/R'
# import your desired module
from rpy2.robjects.packages import importr

I had the same problem and I am also using R and python with a Jupyter notebook in docker.
I solved the Kernel crash issue by starting my notebook or Python code with this:
import os \
os.environ['R_HOME'] = '/usr/lib/R'
/usr/lib/R is where I have my system's R installation and libraries, and should be an R version needed by rpy2. Hope this helps.

I tried to install rpy2 in the jupyter/r-notebook:hub-2.3.1 Docker image which comes with Python 3.10.5, IPython 8.4.0, R 4.1.3.
If I install rpy2 in a Terminal window with pip:
python3 -m pip install rpy2
and I start IPython in the Terminal, and type import rpy2,this first step works. But the next step, namely: import rpy2.robjects as robjects results in the following not-so-instructive error message:
Error in glue(.Internal(R.home()), "library", "base", "R", "base", sep = .Platform$file.sep) :
4 arguments passed to .Internal(paste) which requires 3
Error: could not find function "attach"
Error: object '.ArgsEnv' not found
Fatal error: unable to initialize the JIT
The reason is some subtle incompatibility between the rpy2 package on PyPI and the Python and R installations in the jupyter/r-notebook image. The incompatibility occurs because Python and R were installed using Conda in the r-notebook image.
If I install rpy2 also with Conda, like this:
conda install --yes rpy2
then everything works as advertised.
Lessons learned
If Python and R were installed with OS package installers, then you can probably install rpy2 with pip.
If Python and R were installed with Conda, then install rpy2 also with Conda.
(the most embarrassing bit): There is a jupyter/datascience-notebook which comes with rpy2 preinstalled (plus a lot of other goodies), no need to install anything:
jupyter/datascience-notebook includes libraries for data analysis from
the Julia, Python, and R communities.
Everything in the jupyter/scipy-notebook and jupyter/r-notebook
images, and their ancestor images rpy2 package The Julia compiler and
base environment IJulia to support Julia code in Jupyter notebooks
HDF5, Gadfly, RDatasets packages

saspy sas kernel not visible in jupyter

I can run this code fine using a Python kernel:
import saspy
sas = saspy.SASsession()
sas
cars = sas.sasdata('cars', 'sashelp')
cars.head()
Unfortunately, I cannot chose a SAS kernel anymore. I re-installed saspy and the sas kernel and as you can see the sas kernel (?) is working from Python. This:
jupyter kernelspec list
only returns my python and r kernel:
Available kernels:
ir C:\ProgramData\Anaconda3\share\jupyter\kernels\ir
python3 C:\ProgramData\Anaconda3\share\jupyter\kernels\python3
Can I somehow refresh (?) manual register the sas kernel?

I can imagine two reasons why this did not work out:
when you were installing sas_kernel, were you using the correct pip? (often in the same system you have python2 and python3, which both have separate package repositories)
have you tried manually to install it? I used the following command:
jupyter kernelspec install <path_to_sas_kernel>
I was in similar situation, since I could not use pip in my environment (no internet connection allowed), I installed sas_kernel manually as a package (python setup.py install), and then installed sas_kernel using to my environment specific sas_kernel path:
jupyter kernelspec install .\newpackages\sas_kernel-2.1.7\sas_kernel
And it worked for me.
Hope this helps.

Solve atom error message

I received an error on atom, it asked to install ipkernal using pip.
Not sure what to do. I have Anaconda on my system and not pip. Can someone explains whats the error about and how can I solve it in using anaconda.
I was running a python code and saved the file as .py.
import pandas as pd
wd = pd.read_csv("winequality-red", sep = ";")
five = wd.head()
print ("five")
Error message:
No kernel for grammar Python found <br>
Check that the language for this file is set in Atom and that you have a Jupyter kernel installed for it.<br>
To detect your current Python install you will need to run:<br>
python -m pip install ipykernel<br>
python -m ipykernel install --user

This isn't really an answer, but you might have better luck on the dedicated Atom forums.
In your case though, it looks like you haven't installed the proper kernels Hydrogen needs to run Python with. (Of course, I'm just assuming you're using Hydrogen. You haven't actually provided any details about how you are trying to run it).
From the Hydrogen documentation, it takes you to this page for Python kernels.
https://nteract.io/kernels/python
In particular, I think you want to run the command conda install ipykernel

Installing / using rpy2 on DSX

I want to be able to use some of R functions / packages within jupyter notebook on DSX. In that case, I would need a python package called 'rpy2'. When I tried installing 'rpy2' following instructions on the DSX page, it gave me an error that says "it cannot locate the R_HOME".
Is there a solution / workaround to this problem? Will appreciate your response!
Here's the error I get:
Error message
When I installed rpy2 on my PC, I had to create the R_HOME env variable and point it to the folder where R exists. On the DSX, I could get the path for R HOME (as "/usr/lib64/R"), but when i try to use 'setx' on the DSX notebook to set this path, I get the following:setx cannot be used to include R_HOME in path

As of now, Rpy2 is not supported when using Notebook on DSX with Spark service from Bluemix.
It complains about a missing header file, Rdefines.h. This can be fixed but
Rpy2 expects R to be built as shared libraries, which isn't the case on DSX because Notebook in DSX make use SparkR and doesn't built R as shared library.
http://rpy2.readthedocs.io/en/version_2.7.x/overview.html#requirements
Thanks,
Charles.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to use R and python in a Kaggle Notebook? - python

Related

psutil library installation issue on databricks

R kernel crashes while loading R package using rpy2

saspy sas kernel not visible in jupyter

Solve atom error message

Installing / using rpy2 on DSX

Categories

Resources