subprocess replacement of popen2 with Python - python

I tried to run this code from the book 'Python Standard Library' of 'Fred Lunde'.
import popen2, string
fin, fout = popen2.popen2("sort")
fout.write("foo\n")
fout.write("bar\n")
fout.close()
print fin.readline(),
print fin.readline(),
fin.close()
It runs well with a warning of
~/python_standard_library_oreilly_lunde/scripts/popen2-example-1.py:1:
DeprecationWarning: The popen2 module is deprecated. Use the subprocess module.
How to translate the previous function with subprocess? I tried as follows, but it doesn't work.
from subprocess import *
p = Popen("sort", shell=True, stdin=PIPE, stdout=PIPE, close_fds=True)
p.stdin("foo\n") #p.stdin("bar\n")

import subprocess
proc=subprocess.Popen(['sort'],stdin=subprocess.PIPE,stdout=subprocess.PIPE)
proc.stdin.write('foo\n')
proc.stdin.write('bar\n')
out,err=proc.communicate()
print(out)

Within the multiprocessing module there is a method called 'Pool' which might be perfect for your needs considering you are planning to do sort (not sure how huge the data is, but...).
It's optimizes itself to the number of cores your system has. i.e. only as many processes are spawned as the no. of cores. Of course this is customizable.
from multiprocessing import Pool
def main():
po = Pool()
po.apply_async(sort_fn, (any_args,), callback=save_data)
po.close()
po.join()
return
def sort_fn(any_args):
#do whatever it is that you want to do in a separate process.
return data
def save_data(data):
#data is a object. Store it in a file, mysql or...
return

Related

Python subprocess always waits for programm [duplicate]

I'm trying to port a shell script to the much more readable python version. The original shell script starts several processes (utilities, monitors, etc.) in the background with "&". How can I achieve the same effect in python? I'd like these processes not to die when the python scripts complete. I am sure it's related to the concept of a daemon somehow, but I couldn't find how to do this easily.
While jkp's solution works, the newer way of doing things (and the way the documentation recommends) is to use the subprocess module. For simple commands its equivalent, but it offers more options if you want to do something complicated.
Example for your case:
import subprocess
subprocess.Popen(["rm","-r","some.file"])
This will run rm -r some.file in the background. Note that calling .communicate() on the object returned from Popen will block until it completes, so don't do that if you want it to run in the background:
import subprocess
ls_output=subprocess.Popen(["sleep", "30"])
ls_output.communicate() # Will block for 30 seconds
See the documentation here.
Also, a point of clarification: "Background" as you use it here is purely a shell concept; technically, what you mean is that you want to spawn a process without blocking while you wait for it to complete. However, I've used "background" here to refer to shell-background-like behavior.
Note: This answer is less current than it was when posted in 2009. Using the subprocess module shown in other answers is now recommended in the docs
(Note that the subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using these functions.)
If you want your process to start in the background you can either use system() and call it in the same way your shell script did, or you can spawn it:
import os
os.spawnl(os.P_DETACH, 'some_long_running_command')
(or, alternatively, you may try the less portable os.P_NOWAIT flag).
See the documentation here.
You probably want the answer to "How to call an external command in Python".
The simplest approach is to use the os.system function, e.g.:
import os
os.system("some_command &")
Basically, whatever you pass to the system function will be executed the same as if you'd passed it to the shell in a script.
I found this here:
On windows (win xp), the parent process will not finish until the longtask.py has finished its work. It is not what you want in CGI-script. The problem is not specific to Python, in PHP community the problems are the same.
The solution is to pass DETACHED_PROCESS Process Creation Flag to the underlying CreateProcess function in win API. If you happen to have installed pywin32 you can import the flag from the win32process module, otherwise you should define it yourself:
DETACHED_PROCESS = 0x00000008
pid = subprocess.Popen([sys.executable, "longtask.py"],
creationflags=DETACHED_PROCESS).pid
Use subprocess.Popen() with the close_fds=True parameter, which will allow the spawned subprocess to be detached from the Python process itself and continue running even after Python exits.
https://gist.github.com/yinjimmy/d6ad0742d03d54518e9f
import os, time, sys, subprocess
if len(sys.argv) == 2:
time.sleep(5)
print 'track end'
if sys.platform == 'darwin':
subprocess.Popen(['say', 'hello'])
else:
print 'main begin'
subprocess.Popen(['python', os.path.realpath(__file__), '0'], close_fds=True)
print 'main end'
Both capture output and run on background with threading
As mentioned on this answer, if you capture the output with stdout= and then try to read(), then the process blocks.
However, there are cases where you need this. For example, I wanted to launch two processes that talk over a port between them, and save their stdout to a log file and stdout.
The threading module allows us to do that.
First, have a look at how to do the output redirection part alone in this question: Python Popen: Write to stdout AND log file simultaneously
Then:
main.py
#!/usr/bin/env python3
import os
import subprocess
import sys
import threading
def output_reader(proc, file):
while True:
byte = proc.stdout.read(1)
if byte:
sys.stdout.buffer.write(byte)
sys.stdout.flush()
file.buffer.write(byte)
else:
break
with subprocess.Popen(['./sleep.py', '0'], stdout=subprocess.PIPE, stderr=subprocess.PIPE) as proc1, \
subprocess.Popen(['./sleep.py', '10'], stdout=subprocess.PIPE, stderr=subprocess.PIPE) as proc2, \
open('log1.log', 'w') as file1, \
open('log2.log', 'w') as file2:
t1 = threading.Thread(target=output_reader, args=(proc1, file1))
t2 = threading.Thread(target=output_reader, args=(proc2, file2))
t1.start()
t2.start()
t1.join()
t2.join()
sleep.py
#!/usr/bin/env python3
import sys
import time
for i in range(4):
print(i + int(sys.argv[1]))
sys.stdout.flush()
time.sleep(0.5)
After running:
./main.py
stdout get updated every 0.5 seconds for every two lines to contain:
0
10
1
11
2
12
3
13
and each log file contains the respective log for a given process.
Inspired by: https://eli.thegreenplace.net/2017/interacting-with-a-long-running-child-process-in-python/
Tested on Ubuntu 18.04, Python 3.6.7.
You probably want to start investigating the os module for forking different threads (by opening an interactive session and issuing help(os)). The relevant functions are fork and any of the exec ones. To give you an idea on how to start, put something like this in a function that performs the fork (the function needs to take a list or tuple 'args' as an argument that contains the program's name and its parameters; you may also want to define stdin, out and err for the new thread):
try:
pid = os.fork()
except OSError, e:
## some debug output
sys.exit(1)
if pid == 0:
## eventually use os.putenv(..) to set environment variables
## os.execv strips of args[0] for the arguments
os.execv(args[0], args)
You can use
import os
pid = os.fork()
if pid == 0:
Continue to other code ...
This will make the python process run in background.
I haven't tried this yet but using .pyw files instead of .py files should help. pyw files dosen't have a console so in theory it should not appear and work like a background process.

Handling interactive shells with Python subprocess

I am trying to run multiple instances of a console-based game (dungeon crawl stone soup -- for research purposes naturally) using a multiprocessing pool to evaluate each run.
In the past when I've used a pool to evaluate similar code (genetic algorithms), I've used subprocess.call to split off each process. However, with dcss being quite interactive having a shared subshell seems to be problematic.
I have the code I normally use for this kind of thing, with crawl replacing other applications I've thrown a GA at. Is there a better way to handle highly-interactive shells than this? I'd considered kicking off a screen for each instance, but thought there was a cleaner way. My understanding was that shell=True should be spawning a sub-shell, but I guess I it is spawning one in a way that is shared between each call.
I should mention I have a bot running the game, so I don't want any actual interaction from the user's end to occur.
# Kick off the GA execution
pool_args = zip(trial_ids,run_types,self.__population)
pool.map(self._GAExecute, pool_args)
---
# called by pool.map
def _GAExecute(self,pool_args):
trial_id = pool_args[0]
run_type = pool_args[1]
genome = pool_args[2]
self._RunSimulation(trial_id)
# Call the actual binary
def _RunSimulation(self, trial_id):
command = "./%s" % self.__crawl_binary
name = "-name %s" % trial_id
rc = "-rc %s" % os.path.join(self.__output_dir,'qw-%s'%trial_id,"qw -%s.rc"%trial_id)
seed = "-seed %d" % self.__seed
cdir = "-dir %s" % os.path.join(self.__output_dir,'qw-%s'%trial_id)
shell_command = "%s %s %s %s %s" % (command,name,rc,seed,cdir)
call(shell_command, shell=True)
You can indeed associate stdin and stdout to files, as in the answer from #napuzba:
fout = open('stdout.txt','w')
ferr = open('stderr.txt','w')
subprocess.call(cmd, stdout=fout, stderr=ferr)
Another option would be to use Popen instead of call. The difference is that call waits for completion (is blocking) while Popen is not, see What's the difference between subprocess Popen and call (how can I use them)?
Using Popen, you can then keep stdout and stderr inside your object, and then use them later, without having to rely on a file:
p = subprocess.Popen(cmd,stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p.wait()
stderr = p.stderr.read()
stdout = p.stdout.read()
Another potential advantage of this method is that you could run multiple instances of Popen without waiting for completion instead of having a thread pool:
processes=[
subprocess.Popen(cmd1,stdout=subprocess.PIPE, stderr=subprocess.PIPE),
subprocess.Popen(cmd2,stdout=subprocess.PIPE, stderr=subprocess.PIPE),
subprocess.Popen(cmd3,stdout=subprocess.PIPE, stderr=subprocess.PIPE)
]
for p in processes:
if p.poll():
# process completed
else:
# no completion yet
On a side note, you should avoid shell=True if you can, and if you do not use it Popen expects a list as a command instead of a string. Do not generate this list manually, but use shlex which will take care of all corner cases for you, eg.:
Popen(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
Specify the standard input, standard output and standard error with unique file handles for each call:
import subprocess
cmd = ""
fout = open('stdout.txt','w')
fin = open('stdin.txt','r')
ferr = open('stderr.txt','w')
subprocess.call(cmd, stdout=fout , stdin = fin , stderr=ferr )

Alternative to subprocess.Popen.communicate() for streaming

If I'm using subprocess.Popen I can use communicate() for small outputs.
But if the subprocess is going to take substantial time and produce substantial output, I want to access it as streaming data.
Is there a way to do this? The Python docs say
Warning: Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.
I would really like to access a process output as a file-like object:
with someMagicFunction(['path/to/some/command','arg1','arg2','arg3']) as outpipe:
# pass outpipe into some other function that takes a file-like object
but can't figure out how to do this.
communicate is a convenience method that starts background threads to read stdout and stderr. You can just read stdout yourself, but you need to figure out what to do with stderr. If you don't care about errors, you could add the param stderr=open(os.devnull, 'wb') or to a file stderr=open('somefile', 'wb'). Or, create your own background thread to do the read. It turns out that shutil already has such a function, so we can use it.
import subprocess
import threading
import shutil
import io
err_buf = io.BytesIO()
proc = subprocess.Popen(['ls', '-l'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
err_thread = threading.Thread(target=shutil.copyfileobj,
args=(proc.stderr, err_buf))
err_thread.start()
for line in proc.stdout:
print(line.decode('utf-8'), end='')
retval = proc.wait()
err_thread.join()
print('error:', err_buf.getvalue())

Calling multi-level commands/programs from python

I have a shell command 'fst-mor'. It takes an argument in form of file e.g. NOUN.A which is a lex file or something. Final command : fst-mor NOUN.A
It then produces following output:
analyze>INPUT_A_STRING_HERE
OUTPUT_HERE
Now I want to put call that fst-mor from my python script and then input string and want back output in the script.
So far I have:
import os
print os.system("fst-mor NOUN.A")
You want to capture the output of another command. Use the subprocess module for this.
import subprocess
output = subprocess.check_output('fst-mor', 'NOUN.A')
If your command requires interactive input, you have two options:
Use a subprocess.Popen() object, and set the stdin parameter to subprocess.PIPE and write the input to the stdin pipe available. For one input parameter, that's often enough. Study the documentation for the subprocess module for details, but the basic interaction is:
proc = subprocess.Popen(['fst-mor', 'NOUN.A'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
output, err = proc.communicate('INPUT_A_STRING_HERE')
Use the pexpect library to drive a process. This let's you create more complex interactions with a subprocess by looking for patterns is the output it generates:
import pexpect
py = pexpect.spawn('fst-mor NOUN.A')
py.expect('analyze>')
py.send('INPUT_A_STRING_HERE')
output = py.read()
py.close()
You could try:
from subprocess import Popen, PIPE
p = Popen(["fst-mor", "NOUN.A"], stdin=PIPE, stdout=PIPE)
output = p.communicate("INPUT_A_STRING_HERE")[0]
A sample that communicates with another process:
pipe = subprocess.Popen(['clisp'],stdin=subprocess.PIPE, stdout=subprocess.PIPE)
(response,err) = pipe.communicate("(+ 1 1)\n(* 2 2)")
#only print the last 6 lines to chop off the REPL intro text.
#Obviously you can do whatever manipulations you feel are necessary
#to correctly grab the input here
print '\n'.join(response.split('\n')[-6:])
Note that communicate will close the streams after it runs, so you have to know all your commands ahead of time for this method to work. It seems like the pipe.stdout doesn't flush until stdin is closed? I'd be curious if there is a way around that I'm missing.
You should use the subprocess module subprocess module
In your example you might run:
subprocess.check_output(["fst-mor", "NOUN.A"])

Multi processing subprocess

I'm new to subprocess module of python, currently my implementation is not multi processed.
import subprocess,shlex
def forcedParsing(fname):
cmd = 'strings "%s"' % (fname)
#print cmd
args= shlex.split(cmd)
try:
sp = subprocess.Popen( args, shell = False, stdout = subprocess.PIPE, stderr = subprocess.PIPE )
out, err = sp.communicate()
except OSError:
print "Error no %s Message %s" % (OSError.errno,OSError.message)
pass
if sp.returncode== 0:
#print "Processed %s" %fname
return out
res=[]
for f in file_list: res.append(forcedParsing(f))
my questions:
Is sp.communicate a good way to go? should I use poll?
if I use poll I need a sperate process which monitors if process finished right?
should I fork at the for loop?
1) subprocess.communicate() seems the right option for what you are trying to do. And you don't need to poll the proces, communicate() returns only when it's finished.
2) you mean forking to paralellize work? take a look at multiprocessing (python >= 2.6). Running parallel processes using subprocess is of course possible but it's quite a work, you cannot just call communicate(), which is blocking.
About your code:
cmd = 'strings "%s"' % (fname)
args= shlex.split(cmd)
Why not simply?
args = ["strings", fname]
As for this ugly pattern:
res=[]
for f in file_list: res.append(forcedParsing(f))
You should use list-comprehensions whenever possible:
res = [forcedParsing(f) for f in file_list]
About question 2: forking at the for loop will mostly speed things up if the script's supposed to run on a system with multiple cores/processors. It will consume more memory, though, and will stress IO harder. There will be a sweet spot somewhere that depends on the number of files in file_list, but only benchmarking on a realistic target system can tell you where it is. If you find that number, you could add an if len(file_list) > <your number>: with optional fork() 'ing [Edit: rather, as #tokland say's via multiprocessing if it's available on your Python version (2.6+)] that chooses the most efficient strategy on a per-job basis.
Read about Python profiling here: http://docs.python.org/library/profile.html
If you're on Linux, you can also run time: http://linuxmanpages.com/man1/time.1.php
There are several warnings in the subprocess documentation that advise you to use communicate to avoid problems with a processes blocking, so it would be a good idea to use that.

Categories