Python and shell issue : .sh file not entirely run by Python - python

I am writing a Python script, in which I create then execute a shell file.
This shell file is composed of several lines, each line being a shell command (with some grep, sort, etc..).
Here is an example of one of these command lines from my_shell_file.sh :
nohup grep -A1 "pattern" myfile.fastq | grep -v "pattern2" | grep -v "^pattern3" | grep "^pattern4" | sort -T ./tmp | uniq -c > Result_directory/result_file.txt &
The only thing that changes between each line is the pattern4 and the name of the result_file.txt.
Well, my Python script creates my_shell_file.sh very well, it contains all the command lines I expected and without any syntax error. Problem comes at next step, when trying to run this shell file with :
os.system("sh ./my_shell_file.sh")
I expect one result_file.txt per command line that I wrote in the my_shell_file.sh. Problem is that not all the command lines from my_shell_file.sh are run.
There is always a few lines, the last ones, that are never run, I don't know why.
It seems very strange because 1) that means Python can run my_shell_file.sh, so why doesn't he run the last lines 2) I see no syntax difference between each line of my_shell_file.sh, so I don't know why it stops a few lines before the end, 3) the line where it stops can be different from one try to another...
Sometimes in my log files, I see messages like :
./my_shell_file.sh: line 63: Premature EOF while searching for the correspondant « " »
./my_shell_file.sh: line 64: Syntax error: Premature EOF
It looks like an « " » is not closed, buuuut... I see no syntax error... Plus if I just change the order of the lines in my_shell_file.sh, the issue can be resolved...
Whatsmore if I myself run the my_shell_file.sh created by Python,there is no problem, all command lines are run !
Does anyone have an idea about why the last lines are randomly skipped when the script is running via Python ?
Thank you in advance
Python version : 2.6.6

Related

How to automate commands from the command prompt?

I tried doing this in python and found questions here and here that look relevant, but I couldn't get it working from those answers. I am looking for a way to automate a manual process where I first change the directory and folder from the command prompt as follows.
d:
then
cd vph
then from the command prompt I run the following:
krr filename.xdf -vph 1,129 -s "StartFrequency: " -StartFrequency -n
The last line causes a frequency to be displayed at the command prompt. How can I save to a text file the frequency the last line returns, and how can I automate all the steps above? These steps need to be automated so I can do the same for over a thousand files. I worry nobody reading this has the krr software I need to use. If necessary, you can substitute the last line for something that will run on any windows computer and return a number. The "double-quotes" in the last line might be tricky if the last line needs to be in "double quotes".
import subprocess
#list of commands to be executed can be given one after another and add && between the commands.
#Only when the first command execution is successful the next command will get executed
lst_cmd = "d && cd vph && krr filename.xdf -vph 1,129 -s 'StartFrequency: ' -StartFrequency -n"
out,err = subprocess.Popen(lst_cmd,shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE).communicate()
print ("the output is {}".format(out))

issue parsing shell program through python

My function run_deinterleave() is meant to copy code from the file deinterleave.sh then replace the placeholder (sra_data) with a file name which has been input by the user and then run it on the command line.
def run_deinterleave():
codes = open('Project/CODE/deinterleave.sh')
codex = codes.read()
print(inp_address)
codex = codex.replace('sra_data', inp_address)
#is opening this twice creating another pipeline?
stream = os.popen(codex)
codes.close()
self.txtarea.insert(END,codex)
#stuff
However, I keep getting this error:
/bin/sh: 5: Syntax error: "(" unexpected
The code in deinterleave.sh works fine and produces two individual files given an interleaved paired end sra_file (an output file from genetic sequencing machines, I think :P)
#1deinterleave paired end fastq file
paste - - - - - - - - < sra_data \
| tee >(cut -f 1-4 | tr "\t" "\n" > /home/lols/Project/reads-1.fq) \
| cut -f 5-8 | tr "\t" "\n" > /home/lols/Project/reads-2.fq
As the error message shows, the code was interpreted by /bin/sh; if you executed
/bin/sh Project/CODE/deinterleave.sh, you'd get the same error, because the process substitution >(…) is a Bash extension not understood by /bin/sh.
Besides, since you don't communicate with the shell code, we don't need pipes at all. So instead of os.popen I'd use subprocess.run, which allows to specify Bash as the shell.
subprocess.run(codex, shell=True, executable="bash")
The absolutely best fix is probably to replace the shell script with native Python code; but without a specification and/or sample input, I don't think we can tell you exactly how to do that.
An immediate and trivial fix is to change deinterlace so that it accepts an input file parameter.
#!/usr/bin/env bash
paste - - - - - - - - < "${1-sra_data}" |
tee >(cut -f 1-4 | tr "\t" "\n" > "${2-/home/lols/Project/reads-1.fq}") |
cut -f 5-8 | tr "\t" "\n" > "${3-/home/lols/Project/reads-2.fq}"
This refactoring also allows you to specify the names of the output files as the second and third command-line arguments.
Also, a Bash script really should not have a .sh extension, so probably take that out.
Explictly naming Bash in the shebang line should solve the error message you got when running Bash code in sh; perhaps see also Difference between sh and bash
With that, your Python code can be reduced to something like
subprocess.run(
['Project/CODE/deinterleave', inp_address],
# probably a good idea
check=True)
though I don't exactly understand the rest of the surrounding function, so it's not clear how exactly to rewrite it.
I think the shell script could be reimplemented something like
with open(inp_address, 'r') as sra_data, open(
'/home/lols/Project/reads-1.fq', 'w') as first, open(
'/home/lols/Project/reads-2.fq', 'w') as second:
for idx in range(4):
first.write(sra_data.readline())
for idx in range(4):
second.write(sra_data.readline())

TensorFlow Get started page - print first 5 rows

I'm using PyCharm, and when I try to execute the statement from here:
!head -n5 {train_dataset_fp}
IDE complains that this is SyntaxError: invalid syntax and program never executes. I thought the entire tutorial on TensorFlow is in Python, but seems like this code from completely different language. Has anyone proceed successfully through the TensorFlow: Get Started tutorial?
This is not a python command, this is a unix one, to launch the head program.
You can use PyCharm to open a Terminal on your target machine, and type:
head -n5 {train_dataset_fp}
... replacing {train_dataset_fp} with the actual path to your dataset, which you obtained/printed in the previous step of the tutorial, c.f. lines:
train_dataset_fp = tf.keras.utils.get_file(fname=os.path.basename(train_dataset_url),
origin=train_dataset_url)
print("Local copy of the dataset file: {}".format(train_dataset_fp))
Since you're on Windows, you need to use Windows commands to achieve what head would do. If you have Powershell installed, you can use the command gc. If you don't, here's a workaround to print the first 5 lines of file.txt, prefixed with the line number:
findstr /n ".*" file.txt | findstr /b "[1-5]:"
inspired by this answer. Basically it numbers all lines in the file and then picks the first five. Obviously pretty inefficient for large files though. Use the "!" prefix as needed.

load the data with numpy.loadtxt and replace a new line in a file with command

This is my first question here.
I tried to load a data file with Python.
The file demo.txt is similar as below.
12,23,34.5,56,
78,29,33,
44,55,66,78,59,100
(the number of the lines in the files are different and the number of column in each line may be different. I need to work on many data files)
numpy.loadtxt("demo.txt",delimiter=",")
gives the error message "could not convert string to float:".
To fix this problem, I try to use the command
sed -i -e 's/,\n/,/g' demo.txt
to remove the line breaks at the end of each line to combine all lines into a single line. But it failed.
However, in the VIM, it is OK to use ":s/,\n/,/g" to remove the line breaks.
Thus, my questions are
is it possible to load the data file in python without modifying the files?
if not, how can I use a command like "sed" (as I need to put this command into my script to handle a bunch of files, a shell command like "sed" is necessary) to remove the line breaks at the end of each line to combine all lines into one single line? Without the line breaks at all lines, I can read the data with numpy.loadtxt easily.
Best regards,
Yiping
Remove all newlines from a file with tr -d '\n':
$ echo -e "some\nfile\nwith\n\newlines" > file_with_newlines
$ cat file_with_newlines
some
file
with
ewlines
$ cat file_with_newlines | tr -d '\n' > file_without_newlines
$ cat file_without_newlines
somefilewithewlines$
I don't know if this will actually help you with your numpy problem, but it will remove all the (UNIX) newlines from a file.

Passing individual lines from files into a python script using a bash script

This might be a simple question, but I am new to bash scripting and have spent quite a bit of time on this with no luck; I hope I can get an answer here.
I am trying to write a bash script that reads individual lines from a text file and passes them along as argument for a python script. I have a list of files (which I have saved into a single text file, all on individual lines) that I need to be used as arguments in my python script, and I would like to use a bash script to send them all through. Of course I can take the tedious way and copy/paste the rest of the python command to individual lines in the script, but I would think there is a way to do this with the "read line" command. I have tried all sorts of combinations of commands, but here is the most recent one I have:
#!/bin/bash
# Command Output Test
cat infile.txt << EOF
while read line
do
VALUE = $line
python fits_edit_head.py $line $line NEW_PARA 5
echo VALUE+"huh"
done
EOF
When I do this, all I get returned is the individual lines from the input file. I have the extra VALUE there to see if it will print that, but it does not. Clearly there is something simple about the "read line" command that I do not understand but after messing with it for quite a long time, I do not know what it is. I admit I am still a rookie to this bash scripting game, and not a very good one at that. Any help would certainly be appreciated.
You probably meant:
while read line; do
VALUE=$line ## No spaces allowed
python fits_edit_head.py "$line" "$line" NEW_PARA 5 ## Quote properly to isolate arguments well
echo "$VALUE+huh" ## You don't expand without $
done < infile.txt
Python may also read STDIN so that it could accidentally read input from infile.txt so you can use another file descriptor:
while read -u 4 line; do
...
done 4< infile.txt
Better yet if you're using Bash 4.0, it's safer and cleaner to use readarray:
readarray -t lines < infile.txt
for line in "${lines[#]}; do
...
done

Categories