Related
I have an assignment where we are making a shell for the Linux OS. And I have a lot of questions!
I was allowed to do it in python using some of the methods from the os library. The idea is that my program should communicate directly with the linux operating system calls.
So this include:
Create
Open
Close
Read
Write
Exit
Pipe
Exec
Fork
Dup2
Wait
So far I made a working shell which can execute commands with execvp but I am having trouble with the piping stuff.
I was reading this q/a and i felt that I almost understood what i have to do
I guess I have to use Dup2 to write (and maybe read later). Also I am a little confused if I should use read() and write() at some point regarding to piping.
from os import (
execvp,
wait,
fork,
close,
pipe,
dup2,
)
from os import _exit as kill
STDIN = 0
STDOUT = 1
STDERR = 2
CHILD = 0
def piping(cmd):
reading, writing = pipe()
pid = fork()
if pid > CHILD:
wait()
close(writing)
dup2(reading, STDIN)
execvp(cmd[1][0], cmd[1])
kill(127)
elif pid == CHILD:
close(reading)
dup2(writing, STDOUT)
execvp(cmd[0][0], cmd[0])
kill(127)
else:
print('Command not found:', cmd)
piping([['ls', '-l', '/'], ['grep', 'var']])
If I run this code it works. But I don't understand some things:
How can the execvp know that it gets extra arguments from the pipe?
Why should i kill in the end and why is it 127?
How is it possible to run the execvp inside the parent? Is this also possible in C?
If I have a nested pipe eg: ls -l / | grep var | xclip -selection clipboard should I create a new fork then? (maybe some recursion)
It is not a part of the assignment to write to a file, but I might implement it as well later, when I get the piping to work.
Should I use dup2 for that as well or maybe read/write
Thank you in advance! :)
How do I execute the following shell command using the Python subprocess module?
echo "input data" | awk -f script.awk | sort > outfile.txt
The input data will come from a string, so I don't actually need echo. I've got this far, can anyone explain how I get it to pipe through sort too?
p_awk = subprocess.Popen(["awk","-f","script.awk"],
stdin=subprocess.PIPE,
stdout=file("outfile.txt", "w"))
p_awk.communicate( "input data" )
UPDATE: Note that while the accepted answer below doesn't actually answer the question as asked, I believe S.Lott is right and it's better to avoid having to solve that problem in the first place!
You'd be a little happier with the following.
import subprocess
awk_sort = subprocess.Popen( "awk -f script.awk | sort > outfile.txt",
stdin=subprocess.PIPE, shell=True )
awk_sort.communicate( b"input data\n" )
Delegate part of the work to the shell. Let it connect two processes with a pipeline.
You'd be a lot happier rewriting 'script.awk' into Python, eliminating awk and the pipeline.
Edit. Some of the reasons for suggesting that awk isn't helping.
[There are too many reasons to respond via comments.]
Awk is adding a step of no significant value. There's nothing unique about awk's processing that Python doesn't handle.
The pipelining from awk to sort, for large sets of data, may improve elapsed processing time. For short sets of data, it has no significant benefit. A quick measurement of awk >file ; sort file and awk | sort will reveal of concurrency helps. With sort, it rarely helps because sort is not a once-through filter.
The simplicity of "Python to sort" processing (instead of "Python to awk to sort") prevents the exact kind of questions being asked here.
Python -- while wordier than awk -- is also explicit where awk has certain implicit rules that are opaque to newbies, and confusing to non-specialists.
Awk (like the shell script itself) adds Yet Another Programming language. If all of this can be done in one language (Python), eliminating the shell and the awk programming eliminates two programming languages, allowing someone to focus on the value-producing parts of the task.
Bottom line: awk can't add significant value. In this case, awk is a net cost; it added enough complexity that it was necessary to ask this question. Removing awk will be a net gain.
Sidebar Why building a pipeline (a | b) is so hard.
When the shell is confronted with a | b it has to do the following.
Fork a child process of the original shell. This will eventually become b.
Build an os pipe. (not a Python subprocess.PIPE) but call os.pipe() which returns two new file descriptors that are connected via common buffer. At this point the process has stdin, stdout, stderr from its parent, plus a file that will be "a's stdout" and "b's stdin".
Fork a child. The child replaces its stdout with the new a's stdout. Exec the a process.
The b child closes replaces its stdin with the new b's stdin. Exec the b process.
The b child waits for a to complete.
The parent is waiting for b to complete.
I think that the above can be used recursively to spawn a | b | c, but you have to implicitly parenthesize long pipelines, treating them as if they're a | (b | c).
Since Python has os.pipe(), os.exec() and os.fork(), and you can replace sys.stdin and sys.stdout, there's a way to do the above in pure Python. Indeed, you may be able to work out some shortcuts using os.pipe() and subprocess.Popen.
However, it's easier to delegate that operation to the shell.
import subprocess
some_string = b'input_data'
sort_out = open('outfile.txt', 'wb', 0)
sort_in = subprocess.Popen('sort', stdin=subprocess.PIPE, stdout=sort_out).stdin
subprocess.Popen(['awk', '-f', 'script.awk'], stdout=sort_in,
stdin=subprocess.PIPE).communicate(some_string)
To emulate a shell pipeline:
from subprocess import check_call
check_call('echo "input data" | a | b > outfile.txt', shell=True)
without invoking the shell (see 17.1.4.2. Replacing shell pipeline):
#!/usr/bin/env python
from subprocess import Popen, PIPE
a = Popen(["a"], stdin=PIPE, stdout=PIPE)
with a.stdin:
with a.stdout, open("outfile.txt", "wb") as outfile:
b = Popen(["b"], stdin=a.stdout, stdout=outfile)
a.stdin.write(b"input data")
statuses = [a.wait(), b.wait()] # both a.stdin/stdout are closed already
plumbum provides some syntax sugar:
#!/usr/bin/env python
from plumbum.cmd import a, b # magic
(a << "input data" | b > "outfile.txt")()
The analog of:
#!/bin/sh
echo "input data" | awk -f script.awk | sort > outfile.txt
is:
#!/usr/bin/env python
from plumbum.cmd import awk, sort
(awk["-f", "script.awk"] << "input data" | sort > "outfile.txt")()
The accepted answer is sidestepping actual question.
here is a snippet that chains the output of multiple processes:
Note that it also prints the (somewhat) equivalent shell command so you can run it and make sure the output is correct.
#!/usr/bin/env python3
from subprocess import Popen, PIPE
# cmd1 : dd if=/dev/zero bs=1m count=100
# cmd2 : gzip
# cmd3 : wc -c
cmd1 = ['dd', 'if=/dev/zero', 'bs=1M', 'count=100']
cmd2 = ['tee']
cmd3 = ['wc', '-c']
print(f"Shell style : {' '.join(cmd1)} | {' '.join(cmd2)} | {' '.join(cmd3)}")
p1 = Popen(cmd1, stdout=PIPE, stderr=PIPE) # stderr=PIPE optional, dd is chatty
p2 = Popen(cmd2, stdin=p1.stdout, stdout=PIPE)
p3 = Popen(cmd3, stdin=p2.stdout, stdout=PIPE)
print("Output from last process : " + (p3.communicate()[0]).decode())
# thoretically p1 and p2 may still be running, this ensures we are collecting their return codes
p1.wait()
p2.wait()
print("p1 return: ", p1.returncode)
print("p2 return: ", p2.returncode)
print("p3 return: ", p3.returncode)
http://www.python.org/doc/2.5.2/lib/node535.html covered this pretty well. Is there some part of this you didn't understand?
Your program would be pretty similar, but the second Popen would have stdout= to a file, and you wouldn't need the output of its .communicate().
Inspired by #Cristian's answer. I met just the same issue, but with a different command. So I'm putting my tested example, which I believe could be helpful:
grep_proc = subprocess.Popen(["grep", "rabbitmq"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
subprocess.Popen(["ps", "aux"], stdout=grep_proc.stdin)
out, err = grep_proc.communicate()
This is tested.
What has been done
Declared lazy grep execution with stdin from pipe. This command will be executed at the ps command execution when the pipe will be filled with the stdout of ps.
Called the primary command ps with stdout directed to the pipe used by the grep command.
Grep communicated to get stdout from the pipe.
I like this way because it is natural pipe conception gently wrapped with subprocess interfaces.
The previous answers missed an important point. Replacing shell pipeline is basically correct, as pointed out by geocar. It is almost sufficient to run communicate on the last element of the pipe.
The remaining problem is passing the input data to the pipeline. With multiple subprocesses, a simple communicate(input_data) on the last element doesn't work - it hangs forever. You need to create a a pipeline and a child manually like this:
import os
import subprocess
input = """\
input data
more input
""" * 10
rd, wr = os.pipe()
if os.fork() != 0: # parent
os.close(wr)
else: # child
os.close(rd)
os.write(wr, input)
os.close(wr)
exit()
p_awk = subprocess.Popen(["awk", "{ print $2; }"],
stdin=rd,
stdout=subprocess.PIPE)
p_sort = subprocess.Popen(["sort"],
stdin=p_awk.stdout,
stdout=subprocess.PIPE)
p_awk.stdout.close()
out, err = p_sort.communicate()
print (out.rstrip())
Now the child provides the input through the pipe, and the parent calls communicate(), which works as expected. With this approach, you can create arbitrary long pipelines without resorting to "delegating part of the work to the shell". Unfortunately the subprocess documentation doesn't mention this.
There are ways to achieve the same effect without pipes:
from tempfile import TemporaryFile
tf = TemporaryFile()
tf.write(input)
tf.seek(0, 0)
Now use stdin=tf for p_awk. It's a matter of taste what you prefer.
The above is still not 100% equivalent to bash pipelines because the signal handling is different. You can see this if you add another pipe element that truncates the output of sort, e.g. head -n 10. With the code above, sort will print a "Broken pipe" error message to stderr. You won't see this message when you run the same pipeline in the shell. (That's the only difference though, the result in stdout is the same). The reason seems to be that python's Popen sets SIG_IGN for SIGPIPE, whereas the shell leaves it at SIG_DFL, and sort's signal handling is different in these two cases.
EDIT: pipes is available on Windows but, crucially, doesn't appear to actually work on Windows. See comments below.
The Python standard library now includes the pipes module for handling this:
https://docs.python.org/2/library/pipes.html, https://docs.python.org/3.4/library/pipes.html
I'm not sure how long this module has been around, but this approach appears to be vastly simpler than mucking about with subprocess.
For me, the below approach is the cleanest and easiest to read
from subprocess import Popen, PIPE
def string_to_2_procs_to_file(input_s, first_cmd, second_cmd, output_filename):
with open(output_filename, 'wb') as out_f:
p2 = Popen(second_cmd, stdin=PIPE, stdout=out_f)
p1 = Popen(first_cmd, stdout=p2.stdin, stdin=PIPE)
p1.communicate(input=bytes(input_s))
p1.wait()
p2.stdin.close()
p2.wait()
which can be called like so:
string_to_2_procs_to_file('input data', ['awk', '-f', 'script.awk'], ['sort'], 'output.txt')
Does the subprocess module in Python execute in the background while you are still able to continue with the program?
For example: If I were to "tail -f *.log" as a subprocess and I read those lines into Python, am I still able to perform other tasks concurrently with the "tail" process happening? Or is that more of a multithread?
subprocess.check_output will wait until the child process terminates and only then give you it's output.
subprocess.Popen does what you want. It will look something like this:
proc = subprocess.Popen(['/usr/bin/tail', '-f', '*.log'], stdout=subprocess.PIPE, shell=True)
You need the shell=True so that the wildcard in *.log will be interpreted. If not for that, you wouldn't need the shell. Be careful in passing arguments to Popen with shell=True; this can be a security risk if the arguments are user-supplied. It's usually better to find a way that doesn't require the shell. In your case, that would be simple enough - you'd simply have to build the list of log files using python and then pass the list elements as further arguments.
The stdout=subprocess.PIPE is here so that you have a way to read tail's output. You can access it via proc.stdout and read from it just like from a regular file.
But reading from it will probably turn out to be somewhat tricky, since you'll have to do it periodically to ensure that the pipe's buffer doesn't fill up and block the tail process. One way to do it is to start a thread that does nothing than read tail's output into a large-enough buffer.
I'm trying to write a Python script that starts a subprocess, and writes to the subprocess stdin. I'd also like to be able to determine an action to be taken if the subprocess crashes.
The process I'm trying to start is a program called nuke which has its own built-in version of Python which I'd like to be able to submit commands to, and then tell it to quit after the commands execute. So far I've worked out that if I start Python on the command prompt like and then start nuke as a subprocess then I can type in commands to nuke, but I'd like to be able to put this all in a script so that the master Python program can start nuke and then write to its standard input (and thus into its built-in version of Python) and tell it to do snazzy things, so I wrote a script that starts nuke like this:
subprocess.call(["C:/Program Files/Nuke6.3v5/Nuke6.3", "-t", "E:/NukeTest/test.nk"])
Then nothing happens because nuke is waiting for user input. How would I now write to standard input?
I'm doing this because I'm running a plugin with nuke that causes it to crash intermittently when rendering multiple frames. So I'd like this script to be able to start nuke, tell it to do something and then if it crashes, try again. So if there is a way to catch a crash and still be OK then that'd be great.
It might be better to use communicate:
from subprocess import Popen, PIPE, STDOUT
p = Popen(['myapp'], stdout=PIPE, stdin=PIPE, stderr=PIPE)
stdout_data = p.communicate(input='data_to_write')[0]
"Better", because of this warning:
Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.
To clarify some points:
As jro has mentioned, the right way is to use subprocess.communicate.
Yet, when feeding the stdin using subprocess.communicate with input, you need to initiate the subprocess with stdin=subprocess.PIPE according to the docs.
Note that if you want to send data to the process’s stdin, you need to create the Popen object with stdin=PIPE. Similarly, to get anything other than None in the result tuple, you need to give stdout=PIPE and/or stderr=PIPE too.
Also qed has mentioned in the comments that for Python 3.4 you need to encode the string, meaning you need to pass Bytes to the input rather than a string. This is not entirely true. According to the docs, if the streams were opened in text mode, the input should be a string (source is the same page).
If streams were opened in text mode, input must be a string. Otherwise, it must be bytes.
So, if the streams were not opened explicitly in text mode, then something like below should work:
import subprocess
command = ['myapp', '--arg1', 'value_for_arg1']
p = subprocess.Popen(command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
output = p.communicate(input='some data'.encode())[0]
I've left the stderr value above deliberately as STDOUT as an example.
That being said, sometimes you might want the output of another process rather than building it up from scratch. Let's say you want to run the equivalent of echo -n 'CATCH\nme' | grep -i catch | wc -m. This should normally return the number characters in 'CATCH' plus a newline character, which results in 6. The point of the echo here is to feed the CATCH\nme data to grep. So we can feed the data to grep with stdin in the Python subprocess chain as a variable, and then pass the stdout as a PIPE to the wc process' stdin (in the meantime, get rid of the extra newline character):
import subprocess
what_to_catch = 'catch'
what_to_feed = 'CATCH\nme'
# We create the first subprocess, note that we need stdin=PIPE and stdout=PIPE
p1 = subprocess.Popen(['grep', '-i', what_to_catch], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
# We immediately run the first subprocess and get the result
# Note that we encode the data, otherwise we'd get a TypeError
p1_out = p1.communicate(input=what_to_feed.encode())[0]
# Well the result includes an '\n' at the end,
# if we want to get rid of it in a VERY hacky way
p1_out = p1_out.decode().strip().encode()
# We create the second subprocess, note that we need stdin=PIPE
p2 = subprocess.Popen(['wc', '-m'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
# We run the second subprocess feeding it with the first subprocess' output.
# We decode the output to convert to a string
# We still have a '\n', so we strip that out
output = p2.communicate(input=p1_out)[0].decode().strip()
This is somewhat different than the response here, where you pipe two processes directly without adding data directly in Python.
Hope that helps someone out.
Since subprocess 3.5, there is the subprocess.run() function, which provides a convenient way to initialize and interact with Popen() objects. run() takes an optional input argument, through which you can pass things to stdin (like you would using Popen.communicate(), but all in one go).
Adapting jro's example to use run() would look like:
import subprocess
p = subprocess.run(['myapp'], input='data_to_write', capture_output=True, text=True)
After execution, p will be a CompletedProcess object. By setting capture_output to True, we make available a p.stdout attribute which gives us access to the output, if we care about it. text=True tells it to work with regular strings rather than bytes. If you want, you might also add the argument check=True to make it throw an error if the exit status (accessible regardless via p.returncode) isn't 0.
This is the "modern"/quick and easy way to do to this.
One can write data to the subprocess object on-the-fly, instead of collecting all the input in a string beforehand to pass through the communicate() method.
This example sends a list of animals names to the Unix utility sort, and sends the output to standard output.
import sys, subprocess
p = subprocess.Popen('sort', stdin=subprocess.PIPE, stdout=sys.stdout)
for v in ('dog','cat','mouse','cow','mule','chicken','bear','robin'):
p.stdin.write( v.encode() + b'\n' )
p.communicate()
Note that writing to the process is done via p.stdin.write(v.encode()). I tried using
print(v.encode(), file=p.stdin), but that failed with the message TypeError: a bytes-like object is required, not 'str'. I haven't figured out how to get print() to work with this.
You can provide a file-like object to the stdin argument of subprocess.call().
The documentation for the Popen object applies here.
To capture the output, you should instead use subprocess.check_output(), which takes similar arguments. From the documentation:
>>> subprocess.check_output(
... "ls non_existent_file; exit 0",
... stderr=subprocess.STDOUT,
... shell=True)
'ls: non_existent_file: No such file or directory\n'
I'm trying to write a Python script that starts a subprocess, and writes to the subprocess stdin. I'd also like to be able to determine an action to be taken if the subprocess crashes.
The process I'm trying to start is a program called nuke which has its own built-in version of Python which I'd like to be able to submit commands to, and then tell it to quit after the commands execute. So far I've worked out that if I start Python on the command prompt like and then start nuke as a subprocess then I can type in commands to nuke, but I'd like to be able to put this all in a script so that the master Python program can start nuke and then write to its standard input (and thus into its built-in version of Python) and tell it to do snazzy things, so I wrote a script that starts nuke like this:
subprocess.call(["C:/Program Files/Nuke6.3v5/Nuke6.3", "-t", "E:/NukeTest/test.nk"])
Then nothing happens because nuke is waiting for user input. How would I now write to standard input?
I'm doing this because I'm running a plugin with nuke that causes it to crash intermittently when rendering multiple frames. So I'd like this script to be able to start nuke, tell it to do something and then if it crashes, try again. So if there is a way to catch a crash and still be OK then that'd be great.
It might be better to use communicate:
from subprocess import Popen, PIPE, STDOUT
p = Popen(['myapp'], stdout=PIPE, stdin=PIPE, stderr=PIPE)
stdout_data = p.communicate(input='data_to_write')[0]
"Better", because of this warning:
Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.
To clarify some points:
As jro has mentioned, the right way is to use subprocess.communicate.
Yet, when feeding the stdin using subprocess.communicate with input, you need to initiate the subprocess with stdin=subprocess.PIPE according to the docs.
Note that if you want to send data to the process’s stdin, you need to create the Popen object with stdin=PIPE. Similarly, to get anything other than None in the result tuple, you need to give stdout=PIPE and/or stderr=PIPE too.
Also qed has mentioned in the comments that for Python 3.4 you need to encode the string, meaning you need to pass Bytes to the input rather than a string. This is not entirely true. According to the docs, if the streams were opened in text mode, the input should be a string (source is the same page).
If streams were opened in text mode, input must be a string. Otherwise, it must be bytes.
So, if the streams were not opened explicitly in text mode, then something like below should work:
import subprocess
command = ['myapp', '--arg1', 'value_for_arg1']
p = subprocess.Popen(command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
output = p.communicate(input='some data'.encode())[0]
I've left the stderr value above deliberately as STDOUT as an example.
That being said, sometimes you might want the output of another process rather than building it up from scratch. Let's say you want to run the equivalent of echo -n 'CATCH\nme' | grep -i catch | wc -m. This should normally return the number characters in 'CATCH' plus a newline character, which results in 6. The point of the echo here is to feed the CATCH\nme data to grep. So we can feed the data to grep with stdin in the Python subprocess chain as a variable, and then pass the stdout as a PIPE to the wc process' stdin (in the meantime, get rid of the extra newline character):
import subprocess
what_to_catch = 'catch'
what_to_feed = 'CATCH\nme'
# We create the first subprocess, note that we need stdin=PIPE and stdout=PIPE
p1 = subprocess.Popen(['grep', '-i', what_to_catch], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
# We immediately run the first subprocess and get the result
# Note that we encode the data, otherwise we'd get a TypeError
p1_out = p1.communicate(input=what_to_feed.encode())[0]
# Well the result includes an '\n' at the end,
# if we want to get rid of it in a VERY hacky way
p1_out = p1_out.decode().strip().encode()
# We create the second subprocess, note that we need stdin=PIPE
p2 = subprocess.Popen(['wc', '-m'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
# We run the second subprocess feeding it with the first subprocess' output.
# We decode the output to convert to a string
# We still have a '\n', so we strip that out
output = p2.communicate(input=p1_out)[0].decode().strip()
This is somewhat different than the response here, where you pipe two processes directly without adding data directly in Python.
Hope that helps someone out.
Since subprocess 3.5, there is the subprocess.run() function, which provides a convenient way to initialize and interact with Popen() objects. run() takes an optional input argument, through which you can pass things to stdin (like you would using Popen.communicate(), but all in one go).
Adapting jro's example to use run() would look like:
import subprocess
p = subprocess.run(['myapp'], input='data_to_write', capture_output=True, text=True)
After execution, p will be a CompletedProcess object. By setting capture_output to True, we make available a p.stdout attribute which gives us access to the output, if we care about it. text=True tells it to work with regular strings rather than bytes. If you want, you might also add the argument check=True to make it throw an error if the exit status (accessible regardless via p.returncode) isn't 0.
This is the "modern"/quick and easy way to do to this.
One can write data to the subprocess object on-the-fly, instead of collecting all the input in a string beforehand to pass through the communicate() method.
This example sends a list of animals names to the Unix utility sort, and sends the output to standard output.
import sys, subprocess
p = subprocess.Popen('sort', stdin=subprocess.PIPE, stdout=sys.stdout)
for v in ('dog','cat','mouse','cow','mule','chicken','bear','robin'):
p.stdin.write( v.encode() + b'\n' )
p.communicate()
Note that writing to the process is done via p.stdin.write(v.encode()). I tried using
print(v.encode(), file=p.stdin), but that failed with the message TypeError: a bytes-like object is required, not 'str'. I haven't figured out how to get print() to work with this.
You can provide a file-like object to the stdin argument of subprocess.call().
The documentation for the Popen object applies here.
To capture the output, you should instead use subprocess.check_output(), which takes similar arguments. From the documentation:
>>> subprocess.check_output(
... "ls non_existent_file; exit 0",
... stderr=subprocess.STDOUT,
... shell=True)
'ls: non_existent_file: No such file or directory\n'