Display R plots running from Python - python

I have the following R script called Test.R:
x <- c(1,2,3,4,5,6,7,8,9,10)
y <- c(2,4,6,8,10,12,14,16,18,20)
plot(x,y, type="o")
x
y
I am running it via Python using this Python script called Test.py:
import subprocess
proc = subprocess.Popen(['Path/To/Rscript.exe',
'Path/To/Test.R'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = proc.communicate()
print stdout
# Alternative Code to see output
# retcode = subprocess.call(['Path/To/Rscript.exe',
# 'Path/To/Test.R'])
When I run the python script Test.py, I get the following output in Pycharm:
[1] 1 2 3 4 5 6 7 8 9 10
[1] 2 4 6 8 10 12 14 16 18 20
So the usual text results show up fine, but how do I get the plots to show? I've tried changing the file from Rscript.txt to Rgui.exe but I get the following error and it only opens up Rgui:
ARGUMENT Path/To/Test.R __ignored__
is there an easy way for the output to display? I know this is a simple problem but I'm wondering how this will extend to other plot commands in R like acf() or pacf(). Should I use ggplot2 to save he plots and just tell Python to open the saved files?
Thanks.

Add:
show()
after:
plot(x,y, type="o")

Related

Output without "print()" in SublimeText

Is there a way/plug-in to enable output without "print()" in SublimeText?
For example,
a = 1 + 2
print (a)
Output:
3
Wanted:
a = 1 + 2
a
Output:
3
P.s. I also tried below:
I am pretty sure that the answer is no. You can rename the print function to make it less noticable like this:
_ = print
a = 2
_(a)
Output is 2
Alternatively:
As a few people mentioned in the comments, what you are likely looking for is a repl, which you can get by simply running python command directly in your terminal.
like this:
$ python
that should take you to an interactive environment that gives you real time results for the python code you input. Below is an example...
>>> a = 1 + 2
>>> a
3
>>> a + 25
28
>>> a
3
>>> a = a + 25
>>> a
28

scriptExit 1 with pybedtools venn_mpl - snakemake 5.2.4

I want to create VennDiagramms with pybedtools. There is a special script using matplotlib called venn_mpl. It works perfectly when I try it out in my jupyter notebook. You can do it with python or using shell commands.
Unfortunately something wents wrong when I want to use it in my snakefile and I can’t really figure out what the problem is.
First of all, this is the script: venn_mpl.py
#!/gnu/store/3w3nz0h93h7jif9d9c3hdfyimgkpx1a4-python-wrapper-3.7.0/bin/python
"""
Given 3 files, creates a 3-way Venn diagram of intersections using matplotlib; \
see :mod:`pybedtools.contrib.venn_maker` for more flexibility.
Numbers are placed on the diagram. If you don't have matplotlib installed.
try venn_gchart.py to use the Google Chart API instead.
The values in the diagram assume:
* unstranded intersections
* no features that are nested inside larger features
"""
import argparse
import sys
import os
import pybedtools
def venn_mpl(a, b, c, colors=None, outfn='out.png', labels=None):
"""
*a*, *b*, and *c* are filenames to BED-like files.
*colors* is a list of matplotlib colors for the Venn diagram circles.
*outfn* is the resulting output file. This is passed directly to
fig.savefig(), so you can supply extensions of .png, .pdf, or whatever your
matplotlib installation supports.
*labels* is a list of labels to use for each of the files; by default the
labels are ['a','b','c']
"""
try:
import matplotlib.pyplot as plt
from matplotlib.patches import Circle
except ImportError:
sys.stderr.write('matplotlib is required to make a Venn diagram with %s\n' % os.path.basename(sys.argv[0]))
sys.exit(1)
a = pybedtools.BedTool(a)
b = pybedtools.BedTool(b)
c = pybedtools.BedTool(c)
if colors is None:
colors = ['r','b','g']
radius = 6.0
center = 0.0
offset = radius / 2
if labels is None:
labels = ['a','b','c']
Then my code:
rule venndiagramm_data:
input:
data = expand("bed_files/{sample}_peaks.narrowPeak", sample=config["samples"]["data"])
output:
"figures/Venn_PR1_PR2_GUI_data.png"
run:
col = ['g','k','b']
lab = ['PR1_data','PR2_data','GUI_data']
venn_mpl(input.data[0], input.data[1], input.data[2], colors = col, labels = lab, outfn = output)
The error is:
SystemExit in line 62 of snakemake_generatingVennDiagramm.py:
1
The snakemake-log only gives me:
rule venndiagramm_data:
input: bed_files/A_peaks.narrowPeak,bed_files/B_peaks.narrowPeak, bed_files/C_peaks.narrowPeak
output: figures/Venn_PR1_PR2_GUI_data.png
jobid: 2
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
I already tried to add as written in the documentation:
rule error:
shell:
"""
set +e
somecommand ...
exitcode=$?
if [ $exitcode -eq 1 ]
then
exit 1
else
exit 0
fi
"""
but this changed nothing.
Then my next idea was to just do it while using the shell command which I also tested before and which worked perfectly. But then I got a different but I think quite similar error message for which I didn’t found a proper solution too:
rule venndiagramm_data_shell:
input:
data = expand("bed_files/{sample}_peaks.narrowPeak", sample=config["samples"]["data"])
output:
"figures/Venn_PR1_PR2_GUI_data.png"
shell:
"venn_mpl.py -a {input.data[0]} -b {input.data[1]} -c {input.data[2]} --color 'g,k,b' --labels 'PR1_data,PR2_data,GUI_data'"
The snakemake log:
[Thu May 23 16:37:27 2019]
rule venndiagramm_data_shell:
input: bed_files/A_peaks.narrowPeak, bed_files/B_peaks.narrowPeak, bed_files/C_peaks.narrowPeak
output: figures/Venn_PR1_PR2_GUI_data.png
jobid: 1
[Thu May 23 16:37:29 2019]
Error in rule venndiagramm_data_shell:
jobid: 1
output: figures/Venn_PR1_PR2_GUI_data.png
RuleException:
CalledProcessError in line 45 of snakemake_generatingVennDiagramm.py:
Command ' set -euo pipefail; venn_mpl.py -a input.data[0] -b input.data[1] -c input.data[2] --color 'g,k,b' --labels 'PR1_data,PR2_data,GUI_data' ' returned non-zero exit status 1.
Does anyone has an idea what could be the reason for this and how to fix it?
FYI: I said that I tested it, without running it with snakemake. This is my working code:
from snakemake.io import expand
import yaml
import pybedtools
from pybedtools.scripts.venn_mpl import venn_mpl
config_text_real = """
samples:
data:
- A
- B
- C
control:
- A_input
- B_input
- C_input
"""
config_vennDiagramm = yaml.load(config_text_real)
config = config_vennDiagramm
data = expand("{sample}_peaks.narrowPeak", sample=config["samples"]["data"])
col = ['g','k','b']
lab = ['PR1_data','PR2_data','GUI_data']
venn_mpl(data[0], data[1], data[2], colors = col, labels = lab, outfn = 'Venn_PR1_PR2_GUI_data.png')
control = expand("{sample}_peaks.narrowPeak", sample=config["samples"]["control"])
lab = ['PR1_control','PR2_control','GUI_control']
venn_mpl(control[0], control[1], control[2], colors = col, labels = lab, outfn = 'Venn_PR1_PR2_GUI_control.png')
and within my jupyter notebook for shell:
!A='../path/to/file/A_peaks.narrowPeak'
!B='../path/to/file/B_peaks.narrowPeak'
!C='../path/to/file/C_peaks.narrowPeak'
!col=g,k,b
!lab='PR1_data, PR2_data, GUI_data'
!venn_mpl.py -a ../path/to/file/A_peaks.narrowPeak -b ../path/to/file/B_peaks.narrowPeak -c ../path/to/file/C_peaks.narrowPeak --color "g,k,b" --labels "PR1_data, PR2_data, GUI_data"
The reason why I used the full path instead of the variable is, because for some reason the code didn't worked with calling the variable with "$A" .
Not sure if this fixes it, but one thing I notice is that:
shell:
"venn_mpl.py -a input.data[0] -b input.data[1] -c input.data[2]..."
probably should be:
shell:
"venn_mpl.py -a {input.data[0]} -b {input.data[1]} -c {input.data[2]}..."

Running R script in Python

I am running a simple R script in python using the following code.
import rpy2.robjects as robjects
r=robjects.r
output = r.source("sample.R")
Now when I print the output
print (output)
I am getting script's last variable only as an output and not all the variable (which I was not expecting. Also I was thinking if I call c or data, the results will be printed as such but python isn't identifying these variables coded in R). I am not sure how to call all these variables.
I am writing very simple code in R script just for testing. My R script looks like:
a <- 1
b <- 3
c <- a + b
data = 1:20
now on calling the script and printing the results I am getting these the following at output. I am not sure what's happening.
$value
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$visible
[1] FALSE
I am not sure how to exactly print variable as it is from R to python. Please guide me to it. Your help will be appreciated.
Regards
Your variable output will only store the output from the source file, which is exactly what you get, id est the last variable. But all the variables actually live somewhere, in an R environment, which you can get with robjects.globalenv.
Knowing that you can easily retrieve the value for each variable that you created in R:
import rpy2.robjects as robjects
robjects.r.source("sample.R")
print(robjects.globalenv["a"])
print(robjects.globalenv["b"])
print(robjects.globalenv["c"])
print(robjects.globalenv["data"])

Run selected lines of a Python Script on CMD

I am executing python scripts using cmd and have challenges in using other interfaces, so would be executing on CMD only. The python script has 113 lines of code. I want to run and test some selected subsetted line of codes before executing the complete script, without making new python scripts but from the parent script.
From example below (has 28 lines):
To run the parent script we say in cmd:
C:\Users\X> python myMasterDummyScript.py
Can we run just between lines 1 - 16
Dummy Example:
import numpy as np
from six.moves import range
from six.moves import cPickle as pickle
pickle_file = "C:\\A.pickle"
with open(pickle_file, 'rb') as f:
data = pickle.load(f, encoding ='latin1')
train_dataset = data['train_dataset']
test_dataset = data['test_dataset']
valid_dataset = data['valid_dataset']
train_labels = data['train_labels']
test_labels = data['test_labels']
valid_labels = data['valid_labels']
a = 28
b = 1
def reformat(dataset, labels):
dataset = dataset.reshape(-1, a, a, b).astype(np.float32)
labels = (np.arange(10)==labels[:,None]).astype(np.float32)
return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
Open the parent script in an interpreter like PyCharm and select the lines you want to execute & then right click -> Execute selection in console.
It would theoretically be possible with a bit of work, however please note that this is not how scripts work in general. Instead, you should consider grouping coherent routine sequences into named functions, and call them from command line.
Among other issues, you'll have to modify all calling code to your script every time you shift the line numbers, you'll have to repeat any imports any subsection would potentially need and it's generally not a good idea. I am still going to address it after I make a case for refactoring though...
Refactoring the script and calling specific functions
Consider this answer to Python: Run function from the command line
Your python script:
import numpy as np
from six.moves import range
from six.moves import cPickle as pickle
def load_data()
pickle_file = "C:\\A.pickle"
with open(pickle_file, 'rb') as f:
data = pickle.load(f, encoding ='latin1')
train_dataset = data['train_dataset']
test_dataset = data['test_dataset']
valid_dataset = data['valid_dataset']
train_labels = data['train_labels']
test_labels = data['test_labels']
valid_labels = data['valid_labels']
def main():
a = 28
b = 1
def reformat(dataset, labels):
dataset = dataset.reshape(-1, a, a, b).astype(np.float32)
labels = (np.arange(10)==labels[:,None]).astype(np.float32)
return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
Your cmd code would look like this:
REM
REM any logic to identify which function to call
REM
python -c "import myMasterDummyScript; myMasterDummyScript.load_data()"
It also enables you to pass arguments from cmd into the function call.
Now if you're really adamant about running an arbitrary subset of lines from an overall python script...
How to run specific lines from a script in cmd
cmd to read those lines out of the original script and write them to a temporary script
Look at a proposed answer for batch script - read line by line. Adapting it slightly without so much of error management (which would significantly bloat this answer):
#echo off
#enabledelayedexpansion
SET startline=$1
SET endline=$2
SET originalscript=$3
SET tempscript=tempscript.py
SET line=1
REM erase tempscript
echo. > %tempscript%
for /f "tokens=*" %%a in (%originalscript%) do (
if %line% GEQ %startline% (
if %line% LEQ %endline% (
echo %%a >> %tempscript%
)
)
set /a line+=1
)
python %tempscript%
pause
You'd call it like this:
C:\> runlines.cmd 1 16 myMasterDummyScript.py
You could use the command line debugger pdb. As an example, given the following script:
print('1')
print('2')
print('3')
print('4')
print('5')
print('6')
print('7')
print('8')
print('9')
print('10')
print('11')
print('12')
print('13')
print('14')
print('15')
Here's a debug session that runs only lines 5-9 by jumping to line 5, setting a breakpoint at line 10, gives a listing to see the current line to be executed and breakpoints, and continuing execution. Type help to see all the commands available.
C:\>py -m pdb test.py
> c:\test.py(1)<module>()
-> print('1')
(Pdb) jump 5
> c:\test.py(5)<module>()
-> print('5')
(Pdb) b 10
Breakpoint 1 at c:\test.py:10
(Pdb) longlist
1 print('1')
2 print('2')
3 print('3')
4 print('4')
5 -> print('5')
6 print('6')
7 print('7')
8 print('8')
9 print('9')
10 B print('10')
11 print('11')
12 print('12')
13 print('13')
14 print('14')
15 print('15')
(Pdb) cont
5
6
7
8
9
> c:\test.py(10)<module>()
-> print('10')
(Pdb)
Option 1 - You can use a debugger to know everything in any moment of the code execution. (python -m pdb myscript.py debug your code too)
Option 2 - You can create a main file and a sub-files with your pieces of scripts and import in the main script and execute the main file or any separated file to test
Option 3 - You can use arguments (Using the argparse for example)
I've not more options at the moment

How to redirect output from awk as a variable in python?

Say I have a data file given below. The given awk command splits the files into multiple parts using the first value of the column and writes it to a file.
chr pos idx len
2 23 4 4
2 25 7 3
2 29 8 2
2 35 1 5
3 37 2 5
3 39 3 3
3 41 6 3
3 45 5 5
4 25 3 4
4 32 6 3
4 38 5 4
awk 'BEGIN {FS=OFS="\t"} {print > "file_"$1".txt"}' write_multiprocessData.txt
The above code will split the files as file_2.txt, file_3.txt ... . Since, awk loads the file into memory first. I rather want to write a python script that would call awk and split the file and directly load it into linux memory (and give unique variable names to the data as file_1, file_2).
Would this be possible? If not what other variations can I try.
I think your awk code has a little bug. If you want to incorporate your awk code into a python code that organizes all the things you wanna do try this:
import os
from numpy import *
os.system("awk '{if(NR>1) print >\"file_\"$1\".txt\"}' test.dat")
os.system works very well, however I did not know it is obsolescence. Anyway, as suggested subprocess works as well:
import subprocess
cmd = "awk '{if(NR>1) print >\"file_\"$1\".txt\"}' test.dat"
p = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, shell=True)
There is no need for Awk here.
from collections import defaultdict
prefix = defaultdict(list)
with open('Data.txt', 'r') as data:
for line in data:
line = line.rstrip('\r\n')
prefix[line.split()[0]].append(line)
Now you have in the dict prefix all the first fields from all the lines as keys, and the list of lines with that prefix as the value for each key.
If you also wish to write the results into files at this point, that's an easy exercise.
Generally, simple Awk scripts are nearly always easy and natural to reimplement in Python. Because Awk is very specialized for a constrained set of tasks, the Python code will often be less succinct, but with the Python adage "explicit is better than implicit" in mind, this may actually be a feature from a legibility and maintanability point of view.

Categories