Python Subprocess.pOpen runs slow when memory is occ - python

I wrote a python script that uses subprocess.Popen to call command line tool look to do the binary search on a file.
For example
p = subprocess.Popen('look -b "abc" testfile.txt',executable='/bin/bash', stdout=subprocess.PIPE, stderr=STDOUT, shell=True)
out, err = p.communicate()
result = out.decode()
print(result)
What this snippet of code does is that it calls the system command look to perform a binary search the file called testfile.txt for the string abc.
It works fine if you just have this snippet of code.
However, when your memory is loaded with some large files, it becomes significantly slow.
For example, if you do:
a = read_a_large_file() #Like GBs of data
p = subprocess.Popen('look -b "abc" testfile.txt',executable='/bin/bash', stdout=subprocess.PIPE, stderr=STDOUT, shell=True)
out, err = p.communicate()
result = out.decode()
print(result)
a[0]
The subprocess part takes a very long time to execute. Running the look command is very fast in shell as it performs binary search on sorted files.
Any help will be appreciated! Thanks!

Related

Unexpected output of Popen

I am using the Popen constructor from subprocess to capture the output of the command that I am running in my python script:
import os
from subprocess import Popen, PIPE
p = Popen(["my-cli", "ls", "/mypics/"], stdin=PIPE, stdout=PIPE,stderr=PIPE)
output, err = p.communicate()
print(output)
print(output.count('jpg'))
My objective is to save the output file names as an array of strings.
However, when I print the output, I notice that instead of saving file names as strings, the script saves each byte (letter) of each file as a string. Therefore, the printed output looks like this
f
i
l
e
.
j
p
g
1
So instead of printing one filename file.jpg I am getting a printout of the 8 separate characters that make up the filename. But running the ls command in the terminal directly will just list the filenames row by row as it should.
What am I doing wrong in this script and what is the workaround here? I am running Python 2.7 Any suggestions would be appreciated
What was that my-cli inside your Popen array. I think some new line character where appending after each char output. Just remove that my-cli and this could fork for you.
p = Popen(["ls", "/mypics/"], stdin=PIPE, stdout=PIPE,stderr=PIPE)
I hope this will work for you.

subprocess.Popen or proc.stdout.read() corrupting JSON data

I am running a PHP script through Python Django as it contains legacy code for a client.
Data is passed to through to the PHP script via JSON and after the script is computed a string is returned for display like so.
proc = subprocess.Popen(["php -f script.php " + json.dumps(data_for_php, sort_keys=True)], shell=True, stdout=subprocess.PIPE)
script_response = proc.stdout.read()
return HttpResponse(script_response)
The issue I am having is something in this process is corrupting the data.
i.e. one JSON field from data_for_php has the key and value 'xxx_amount': u'$350,000.00', returns ,000.00, as the value in script_response.
It's not doing this for anything else.
I have been doing a bit of debugging and have determined that json.dumps(data_for_php, sort_keys=True) is not causing the issue, also data_for_php is good too.
It leads me to believe that this command proc.stdout.read() is some how mutating $350 to (space).
Note: same thing is occurring for other dictionary values.
Update
I have been lead to believe that the process I am using is a command line script inside Python. When the command is called the JSON variables are being passed inside the command line script. This is probably the issue. Looking for a solution.
$350 in bash is a variable, in the shell it gets replaced by its value, which is not defined. Adding a single quote aroud the the dump should do the trick to avoir interpreting special characters:
proc = subprocess.Popen(["php -f script.php '" + json.dumps(data_for_php, sort_keys=True) + "'"], shell=True, stdout=subprocess.PIPE)
script_response = proc.stdout.read()
return HttpResponse(script_response)

Subprocess writing stdin and reading stdout python 3.4

I am writing a script which would run a Linux command and write a string (up to EOL) to stdin and read a string (until EOL) from stdout. The easiest illustration would be cat - command:
p=subprocess.Popen(['cat', '-'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
stringin="String of text\n"
p.stdin.write=(stringin)
stringout=p.stout.read()
print(stringout)
I aim to open the cat - process once and use it to write a string multiple times to its stdin every time getting a string from its stdout.
I googled quite a bit and a lot of recipes don't work, because the syntax is incompatible through different python versions (I use 3.4). That is my first python script from scratch and I find the python documentation to be quite confusing so far.
Thank you for your solution Salva.
Unfortunately communicate() closes the cat - process. I did not find any solution with subprocess to communicate with the cat - without having to open a new cat - for every call. I found an easy solution with pexpect though:
import pexpect
p = pexpect.spawn('cat -')
p.setecho(False)
def echoback(stringin):
p.sendline(stringin)
echoback = p.readline()
return echoback.decode();
i = 1
while (i < 11):
print(echoback("Test no: "+str(i)))
i = i + 1
In order to use pexpect Ubuntu users will have to install it through pip. If you wish to install it for python3.x, you will have to install pip3 (python3-pip) first from the Ubuntu repo.
Well you need to communicate with the process:
from subprocess import Popen, PIPE
s = Popen(['cat', '-'], stdin=PIPE, stdout=PIPE, stderr=PIPE)
input = b'hello!' # notice the input data are actually bytes and not text
output, errs = s.communicate(input)
To use unicode strings, you would need to encode() the input and decode() the output:
from subprocess import Popen, PIPE
s = Popen(['cat', '-'], stdin=PIPE, stdout=PIPE, stderr=PIPE)
input = 'EspaƱa'
output, errs = s.communicate(input.encode())
output, errs = output.decode(), errs.decode()

Getting output of a process at runtime

I am using a python script to run a process using subprocess.Popen and simultaneously store the output in a text file as well as print it on the console. This is my code:
result = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
for line in result.stdout.readlines(): #read and store result in log file
openfile.write("%s\n" %line)
print("%s" %line)
Above code works fine, but what it does is it first completes the process and stores the output in result variable. After that for loop stores the output as well as print it.
But i want the output at runtime (as my process can take hours to complete, i don't get any output for all these hours).
So is there any other function that gives me the output dynamically (at runtime), means as soon as the process gives first line, it should get printed.
The problem here is that .readlines() gets the entire output before returning, as it constructs a full list. Just iterate directly:
for line in result.stdout:
print(line)
.readlines() returns a list of all the lines the process will return while open, i.e., it doesn't return anything until all output from the subprocess is received. To read line by line in "real time":
import sys
from subprocess import Popen, PIPE
proc = Popen(cmd, shell=True, bufsize=1, stdout=PIPE)
for line in proc.stdout:
openfile.write(line)
sys.stdout.buffer.write(line)
sys.stdout.buffer.flush()
proc.stdout.close()
proc.wait()
Note: if the subprocess uses block-buffering when it is run in non-interactive mode; you might need pexpect, pty modules or stdbuf, unbuffer, script commands.
Note: on Python 2, you might also need to use iter(), to get "real time" output:
for line in iter(proc.stdout.readline, ""):
openfile.write(line)
print line,
You can iterate over the lines one by one by using readline on the pipe:
while True:
line = result.stdout.readline()
print line.strip()
if not line:
break
The lines contain a trailing \n which I stripped for printing.
When the process terminates, readline returns an empty string, so you know when to stop.

Reading/writing to a Popen() subprocess

I'm trying to talk to a child process using the python subprocess.Popen() call. In my real code, I'm implementing a type of IPC, so I want to write some data, read the response, write some more data, read the response, and so on. Because of this, I cannot use Popen.communicate(), which otherwise works well for the simple case.
This code shows my problem. It never even gets the first response, hangs at the first "Reading result". Why? How can I make this work as I expect?
import subprocess
p = subprocess.Popen(["sed", 's/a/x/g'],
stdout = subprocess.PIPE,
stdin = subprocess.PIPE)
p.stdin.write("abc\n")
print "Reading result:"
print p.stdout.readline()
p.stdin.write("cat\n")
print "Reading result:"
print p.stdout.readline()
sed's output is buffered and only outputs its data until enough has been cumulated or the input stream is exhausted and closed.
Try this:
import subprocess
p = subprocess.Popen(["sed", 's/a/x/g'],
stdout = subprocess.PIPE,
stdin = subprocess.PIPE)
p.stdin.write("abc\n")
p.stdin.write("cat\n")
p.stdin.close()
print "Reading result 1:"
print p.stdout.readline()
print "Reading result 2:"
print p.stdout.readline()
Be aware that this cannot be done reliably which huge data as wriring to stdin blocks once the buffer is full. The best way to do is using communicate().
I would try to use Popen().communicate() if you can as it does a lot of nice things for you, but if you need to use Popen() exactly as you described, you'll need to set sed to flush its buffer after newlines with the -l option:
p = subprocess.Popen(['sed', '-l', 's/a/x/g'],
stdout=subprocess.PIPE,
stdin=subprocess.PIPE)
and your code should work fine

Categories