R and Python in one Jupyter notebook - python

Is it possible to run R and Python code in the same Jupyter notebook. What are all the alternatives available?
Install r-essentials and create R notebooks in Jupyter.
Install rpy2 and use rmagic functions.
Use a beaker notebook.
Which of above 3 options is reliable to run Python and R code snippets (sharing variables and visualizations) or is there a better option already?

Yes, it is possible! Use rpy2.
You can install rpy2 with: pip install rpy2
Then run %load_ext rpy2.ipython in one of your cells. (You only have to run this once.)
Now you can do the following:
Python cell:
# enables the %%R magic, not necessary if you've already done this
%load_ext rpy2.ipython
import pandas as pd
df = pd.DataFrame({
'cups_of_coffee': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
'productivity': [2, 5, 6, 8, 9, 8, 0, 1, 0, -1]
})
R cell:
%%R -i df -w 5 -h 5 --units in -r 200
# import df from global environment
# make default figure size 5 by 5 inches with 200 dpi resolution
install.packages("ggplot2", repos='http://cran.us.r-project.org', quiet=TRUE)
library(ggplot2)
ggplot(df, aes(x=cups_of_coffee, y=productivity)) + geom_line()
And you'll get your pretty figure plotting data from a python Pandas DataFrame.
You also have access to R objects (e.g. data frames) from Python cells:
import rpy2.robjects as robjects
robjects.globalenv['some-variable-name']
To view the names of all available variables use:
list(robjects.globalenv.keys())
Details are explained here: Pandas - how to convert r dataframe back to pandas?

Using #uut's answer for running R in a jupyter notebook within python kernel (in MacOS), the following worked for me.
%%Rshould always be at the start of the cell else you will get the error as shown in figure below
The following is the right way:
Also %load_ext rpy2.ipython should come before %%R hence put it in a different cell above it as shown in the figures.

UPDATE April 2018,
RStudio has also put out a package:
https://blog.rstudio.com/2018/03/26/reticulate-r-interface-to-python/
for which it is possible to run multiple code chunks in different languages using the R markdown notebook, which is similar to a jupyter notebook.
In my previous post, I said that the underlying representation of objects is different. Actually here is a more nuanced discussion of the underlying matrix representation of R and python from the same package:
https://rstudio.github.io/reticulate/articles/arrays.html
Old post:
It will be hard for you to use both R and Python syntax in the same notebook, mostly because the underlying representation of objects in the two languages are different. That said, there is a project that does try to allow conversion of objects and different languages in the same notebook:
http://beakernotebook.com/features
I haven't used it myself but it looks promising

SoS kernel is another option.
Don't know how well it performs yet, just started using it.
The SoS kernel allows you to run different languages within the same notebook, including Python and R.
SoS Polyglot Notebook - Instructions for Installing Desired Languages
Here is an example of a notebook with Python and R cells.
*Update:
In terms of sharing variables, one can use the magics %use and %with.
"SoS automatically shares variables with names starting with sos among all subkernels"1.
Ex.
Starting cell in R:
%use R
sos_var=read.csv('G:\\Somefile.csv')
dim(sos_var)
Output:
51 13
Switching to python:
%with Python3
sos_var.shape
Output:
(51, 13)

A small addition to #uut's answer and #msh's comment:
If you are using rpy2 in Jupyter Notebooks you also have access to R objects (e.g. data frames) from Python cells:
import rpy2.robjects as robjects
robjects.globalenv['some-variable-name']
To view the names of all available variables use:
list(robjects.globalenv.keys())
Details are explained here:
Pandas - how to convert r dataframe back to pandas?

Related

How to convert a pandas plot into an image

I am working on an app which will be able to show a graph of the company's performance in stocks, I wanted to turn the pandas plot of that company into an image without saving it. Can someone tell me what to do?
from fastquant import get_pse_data
import matplotlib.pyplot as plt
import pandas as pd
df = get_pse_data(symbol, '2019-01-01', '2020-01-01')
ma30 = df.close.rolling(30).mean()
close_ma30 = pd.concat([df.close, ma30], axis=1).dropna()
I am actually thinking of adding this plot derived from a pandas dataframe close_ma30 = pd.concat([df.close, ma30], axis=1).dropna() into my html code:
I want to create a python function that will allow me to return it as an image for a django code. Thank you for the help!
You can use Dataframe-image to convert a pandas plot into a image, you can Visit https://pypi.org/project/dataframe-image/.
dataframe_image has the ability to export both normal and styled DataFrames as images from within a Python script. Pass your normal or styled DataFrame to the export function along with a file location to save it as an image.
>>> import dataframe_image as dfi
>>> dfi.export(df_styled, 'df_styled.png')
You may also export directly from the DataFrame or styled DataFrame using the dfi.export and export_png methods, respectively.
>>> df.dfi.export('df.png')
>>> df_styled.export_png('df_styled.png)
As a Python Library
Dataframe_image can also be used outside of the notebook as a normal Python library. In a separate Python script, import the dataframe_image package and pass the file name of your notebook to the convert function.
>>> import dataframe_image as dfi
>>> dfi.convert('path/to/your_notebook.ipynb',
to='pdf',
use='latex',
center_df=True,
max_rows=30,
max_cols=10,
execute=False,
save_notebook=False,
limit=None,
document_name=None,
table_conversion='chrome'
chrome_path=None,
latex_command=None,
output_dir=None,
)
By default, the new file(s) will be saved in the same directory where the notebook resides. Do not run this command within the same notebook that is being converted.
From the Command Line
The command line tool dataframe_image will be available upon installation with the same options as the convert function from above.
dataframe_image --to=pdf "my notebook with dataframes.ipynb"
Finding Google Chrome
You must have Google Chrome (or Brave) installed in order for dataframe_image to work. The path to Chrome should automatically be found. If Chrome is not in a standard location, set it with the chrome_path parameter.
Using matplotlib instead of Chrome
If you do not have Chrome installed or cannot get it to work properly, you can alternatively use matplotlib to convert the DataFrames to images. Select this option by setting the table_conversion parameter to 'matplotlib'.
Publish to Medium
Closely related to this package is jupyter_to_medium, which publishes your notebooks directly and quickly as Medium blog posts.
Dependencies
You must have the following Python libraries installed.

load code from a code cell from one jupyter notebook into another jupyter notebook

I want to load (i.e., copy the code as with %load) the code from a code cell in one jupyter notebook into another jupyter notebook (Jupyter running Python, but not sure if that matters). I would really like to enter something like
%load cell[5] notebookname.ipynb
The command copies all code in cell 5 of notebookname.ipynb to the code cell of the notebook I am working on. Does anybody know a trick how to do that?
Adapting some code found here at Jupyter Notebook, the following will display the code of a specific cell in the specified notebook:
import io
from nbformat import read
def print_cell_code(fname, cellno):
with io.open(fname, 'r', encoding='utf-8') as f:
nb = read(f, 4)
cell = nb.cells[cellno]
print(cell.source)
print_cell_code("Untitled.ipynb",2)
Not sure what you want to do once the code is there, but maybe this can be adapted to suit your needs. Try print(nb.cells) to see what read brings in.
You'll probably want to use or write your own nbconvert preprocessor to extract a cell from one and insert into another. There is a good amount research into these docs it takes to understand how to write your preprocessor, but this is the preferred way.
The quick fix option you have is that the nbformat specification is predicated on JSON, which means that if you read in a ipynb file with pure python (ie with open and read), you can call json.loads on it to turn the entire file into a dict. From there, you can access cells in the cells entry (which is a list of cells). So, something like like this:
import json
with open("nb1.ipynb", "r") as nb1, open("nb2.ipynb", "r") as nb2:
nb1, nb2 = json.loads(nb1.read()), json.loads(nb2.read())
nb2["cells"].append(nb1["cells"][0]) # adds nb1's first cell to end of nb2
This assumes (as does your question) there is no metadata conflict between the notebooks.

vs code Python extension dataframe not shown in output

I just started using jupyter cells in Visual Studio Code through the Python extension. It is outputting plots fine, but my dataframe is not showing up like the blog example from Microsoft. Below is my code I am running in VS Code:
#%%
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import pandas as pd
x = np.linspace(0, 20, 100)
plt.plot(x, np.sin(x))
plt.show()
#%%
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df
My output looks like this:
VS Code Cell outputs
I am really excited to use jupyter in VS Code but I need to view the dataframes like in other variable explorers.
I am on windows using Anaconda as my environment.
jupyter=1.0.0=py36_7
jupyter_client=5.2.3=py36_0
jupyter_console=6.0.0=py36_0
jupyter_core=4.4.0=py36_0
numpy=1.15.4=py36h19fb1c0_0
pandas=0.23.4=py36h830ac7b_0
I uninstalled my Anaconda 3.6 and installed the newer Anaconda 3.7 and now it works in VS Code.
That error means we don't have the capability to render the df output for some reason.
The only thing I can think of is you might have a jupyter extension that's modifying the result of a df. (normally it returns an html table to us)
Do you know what jupyter extensions you have installed?

Reading a pickle file (PANDAS Python Data Frame) in R

Is there an easy way to read pickle files (.pkl) from Pandas Dataframe into R?
One possibility is to export to CSV and have R read the CSV but that seems really cumbersome for me because my dataframes are rather large. Is there an easier way to do so?
Thanks!
Reticulate was quite easy and super smooth as suggested by russellpierce in the comments.
install.packages('reticulate')
After which I created a Python script like this from examples given in their documentation.
Python file:
import pandas as pd
def read_pickle_file(file):
pickle_data = pd.read_pickle(file)
return pickle_data
And then my R file looked like:
require("reticulate")
source_python("pickle_reader.py")
pickle_data <- read_pickle_file("C:/tsa/dataset.pickle")
This gave me all my data in R stored earlier in pickle format.
You can also do this all in-line in R without leaving your R editor (provided your system python can reach pandas)... e.g.
library(reticulate)
pd <- import("pandas")
pickle_data <- pd$read_pickle("dataset.pickle")
Edit: If you can install and use the {reticulate} package, then this answer is probably outdated. See the other answers below for an easier path.
You could load the pickle in python and then export it to R via the python package rpy2 (or similar). Once you've done so, your data will exist in an R session linked to python. I suspect that what you'd want to do next would be to use that session to call R and saveRDS to a file or RAM disk. Then in RStudio you can read that file back in. Look at the R packages rJython and rPython for ways in which you could trigger the python commands from R.
Alternatively, you could write a simple python script to load your data in Python (probably using one of the R packages noted above) and write a formatted data stream to stdout. Then that entire system call to the script (including the argument that specifies your pickle) can use used as an argument to fread in the R package data.table. Alternatively, if you wanted to keep to standard functions, you could use combination of system(..., intern=TRUE) and read.table.
As usual, there are /many/ ways to skin this particular cat. The basic steps are:
Load the data in python
Express the data to R (e.g., exporting the object via rpy2 or writing formatted text to stdout with R ready to receive it on the other end)
Serialize the expressed data in R to an internal data representation (e.g., exporting the object via rpy2 or fread)
(optional) Make the data in that session of R accessible to another R session (i.e., the step to close the loop with rpy2, or if you've been using fread then you're already done).
To add to the answer above: you might need to point to a different conda env to get to pandas:
use_condaenv("name_of_conda_env", conda = "<<result_of `which conda`>>")
pd <- import('pandas')
df <- pd$read_pickle(paste0(outdir, "df.pkl"))

Exporting data frames to Excel using write.xls using Windows 7 (using R)

I am trying to export a bunch of data frames to excel using the write.xls function in R
Desired Outcome: The selected data frames should be exported to R
Error Message: [1] "Does 'python' exist, and is it in the path?"
Reproducible Code:
purchase_year <- c(2007,2007,2007,2008,2008,2008,2009,2009,2009,2009,2009)
sold_year <- c(2007,2008,2009,2009,2010,2011,2009,2010,2011,2012,2013)
units <-c(1,4,4,8,3,1,3,1,1,0,2)
df <- data.frame(purchase_year,sold_year,units)
library(dataframes2xls)
write.xls(df,"C:/WORK/OUTPUT.xls", sh.names = "default", formats = "default",
t.formats = FALSE, fnt.metr = "default",
col.widths = 48, col.names = TRUE, row.names = FALSE,
to.floats = "default", python = "python",
py.script = "default", sh.return = FALSE)
Other Information:
I am working on a 32bit windows 7 machine
Installed Python 3.2.2 as well
went through the documentation of write.xls
Did try my best looking up this and the other forums
Tried including the URL of the python 'exe' in the python argument, that dint work either
Since I'll be writing out multiple sheets across multiple worksheets, CSV does not look like an option at this point in time.
Thanks to all your help, the issue has been RESOLVED
SOLUTION
Ensuring that the correct version of Python is installed. dataframes2xls is designed to work with Python 2.x. I used 2.7.5
Python installation in my system was not accessible to R. I tried the steps outlined by David Marx which I am quoting here: Try going to the command prompt and running the command python. If this returns an error, you need to add the path to the python executable to your PATH environment variable: computerhope.com/issues/ch000549.htm
I tested your code. Im running Centos though. It worked fine after I changed the output file name to just "Output.xls". Seems like either a windows problem, or perhaps you don't have python/R configured properly.
I don't know this package, but I'd bet the problem is that it's designed for python 2.x, not python 3.x (which is a slightly different language). Try installing python 2.7. You may need to uninstall your current python 3.x

Categories