Goal: I'm trying to put a Python script together that captures the network traffic that occurs as a result of the execution of a block of code. For simplicity, let's assume I want to log the network traffic resulting from a call to socket.gethostbyname('example.com'). Note: I can't just simply terminate tcpdump when gethostbyname() returns as the actual code block that I want to measure triggers other external code, and I have no way to determine when this external code finishes execution (so I have to leave tcpdump running "long enough" for it to be highly probable that I logged all traffic generated by this external code).
Approach: I'm using subprocess to start tcpdump, telling tcpdump to terminate itself after duration seconds using its -G and -W options, e.g.:
duration = 15
nif = 'en0'
pcap = 'dns.pcap'
cmd = ['tcpdump', '-G', str(duration), '-W', '1', '-i', nif, '-w', pcap]
tcpdump_proc = subprocess.Popen(cmd)
socket.gethostbyname('example.com')
time.sleep(duration + 5) # sleep longer than tcpdump is running
The problem with this is that Popen() returns before tcpdump is fully up and running, thus some/all of the traffic resulting from the call to gethostbyname() will not be captured. I could obviously add a time.sleep(x) before calling gethostbyname() to give tcpdump a bit of time to spin up, but that's not a portable solution (I can't just pick some arbitrary x < duration as a powerful system would start capturing packets earlier than a less powerful system).
To deal with this, my idea is to parse tcpdump's output to look for when the following is written to its stderr as that appears to indicate that the capture is up and running fully:
tcpdump: listening on en0, link-type EN10MB (Ethernet), capture size 262144 bytes
Thus I need to attach to stderr, but the problem is that I don't want to commit to reading all of its output as I need my code to move on to actually execute the code block I want to measure (gethostbyname() in this example) instead of being stuck in a loop reading from stderr.
I could solve this by adding a semaphore that blocks the main thread from proceeding onto the gethostbyname() call, and have a background thread read from stderr and decrement the semaphore (to let the main thread move on) when it reads the string above from stderr, but I'd like to keep the code single-threaded if possible.
From my understanding, it's a big NONO to use subprocess.PIPE for stderr and stdout without committing to reading all of the output as the child will end up blocking when the buffer fills up. But can you "detach" (destroy?) the pipe mid execution if you're only interested in reading the first part of the output? Essentially I'd like to end up with something like this:
duration = 15
nif = 'en0'
pcap = 'dns.pcap'
cmd = ['tcpdump', '-G', str(duration), '-W', '1', '-i', nif, '-w', pcap]
tcpdump_proc = subprocess.Popen(cmd, stderr=subprocess.PIPE, text=True)
for l in tcpdump_proc.stderr:
if 'tcpdump: listening on' in l:
break
socket.gethostbyname('example.com')
time.sleep(duration) # sleep at least as long as tcpdump is running
What else do I need to add within the if block to "reassign" who's in charge of reading stderr? Can I just set stderr back to None (tcpdump_proc.stderr = None)? Or should I call tcpdump_proc.stderr.close() (and will tcpdump terminate early if I do so)?
It could also very well be that I missed something obvious and that there is a much better approach to achieve what I want - if so, please enlighten me :).
Thanks in advance :)
You could use detach() or close() on stderr after recieving the listening on message:
import subprocess
import time
duration = 10
nif = 'eth0'
pcap = 'dns.pcap'
cmd = ['tcpdump', '-G', str(duration), '-W', '1', '-i', nif, '-w', pcap]
proc = subprocess.Popen(
cmd, shell=False, stderr=subprocess.PIPE, bufsize=1, text=True
)
for i, line in enumerate(proc.stderr):
print('read %d lines from stderr' % i)
if 'listening on' in line:
print('detach stderr!')
proc.stderr.detach()
break
while proc.poll() is None:
print("doing something else while tcpdump is runnning!")
time.sleep(2)
print(proc.returncode)
print(proc.stderr.read())
Out:
read 0 lines from stderr
detach stderr!
doing something else while tcpdump is runnning!
doing something else while tcpdump is runnning!
doing something else while tcpdump is runnning!
doing something else while tcpdump is runnning!
doing something else while tcpdump is runnning!
doing something else while tcpdump is runnning!
0
Traceback (most recent call last):
File "x.py", line 24, in <module>
print(proc.stderr.read())
ValueError: underlying buffer has been detached
Note:
I haven't checked what is really happening to the stderr data, but detaching stderr doesn't seem to have any impact on tcpdump.
Related
I am running two processes simultaneously in python using the subprocess module:
p_topic = subprocess.Popen(['rostopic','echo','/msg/address'], stdout=PIPE)
p_play = subprocess.Popen(['rosbag','play',bagfile_path])
These are ROS processes: p_topic listens for a .bag file to be played and outputs certain information from that .bag file to the stdout stream; I want to then access this output using the p_topic.stdout object (which behaves as a file).
However, what I find happening is that the p_topic.stdout object only contains the first ~1/3 of the output lines it should have - that is, in comparison to running the two commands manually, simultaneously in two shells side by side.
I've tried waiting for many seconds for output to finish, but this doesn't change anything, its approximately the same ratio of lines captured by p_topic.stdout each time. Any hints on what this could be would be greatly appreciated!
EDIT:
Here's the reading code:
#wait for playing to stop
while p_play.poll() == None:
time.sleep(.1)
time.sleep(X)#wait for some time for the p_topic to finish
p_topic.terminate()
output=[]
for line in p_topic.stdout:
output.append(line)
Note that the value X in time.sleep(X) doesn't make any difference
By default, when a process's stdout is not connected to a terminal, the output is block buffered. When connected to a terminal, it's line buffered. You expect to get complete lines, but you can't unless rostopic unbuffers or explicitly line buffers its stdout (if it's a C program, you can use setvbuf to make this automatic).
The other (possibly overlapping) possibility is that the pipe buffer itself is filling (pipe buffers are usually fairly small), and because you never drain it, rostopic fills the pipe buffer and then blocks indefinitely until you kill it, leaving only what managed to fit in the pipe to be drained when you read the process's stdout. In that case, you'd need to either spawn a thread to keep the pipe drained from Python, or have your main thread use select module components to monitor and drain the pipe (intermingled with polling the other process). The thread is generally easier, though you do need to be careful to avoid thread safety issues.
is it worth trying process communicate/wait? rather than sleep and would that solve your issue?
i have this for general purpose so not sure if you can take this and change it to what you need?
executable_Params = "{0} {1} {2} {3} {4}".format(my_Binary,
arg1,
arg2,
arg3,
arg4)
# execute the process
process = subprocess.Popen(shlex.split(executable_Params),
shell=False,
stderr=subprocess.PIPE,
stdout=subprocess.PIPE)
stdout, stderr = process.communicate()
ret_code = process.wait()
if ret_code == 0:
return 0
else:
#get the correct message from my enum method
error_msg = Process_Error_Codes(ret_code).name
raise subprocess.CalledProcessError(returncode=ret_code,
cmd=executable_Params)
I have a script that runs another command, waits for it to finish, logs the stdout and stderr and based the return code does other stuff. Here is the code:
p = subprocess.Popen(command, stdin=subprocess.PIPE, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
o, e = p.communicate()
if p.returncode:
# report error
# do other stuff
The problem I'm having is that if command takes a long time to run none of the other actions get done. The possible errors won't get reported and the other stuff that needs to happen if no errors doesn't get done. It essentially doesn't go past p.communicate() if it takes too long. Some times this command can takes hours (or even longer) to run and some times it can take as little as 5 seconds.
Am I missing something or doing something wrong?
As per the documentation located here, it's safe to say that you're code is waiting for the subprocess to finish.
If you need to go do 'other things' while you wait you could create a loop like:
while p.poll():
# 'other things'
time.sleep(0.2)
Pick a sleep time that's reasonable for how often you want python to wake up and check the subprocess as well as doing its 'other things'.
The Popen.communicate waits for the process to finish, before anything is returned. Thus it is not ideal for any long running command; and even less so if the subprocess can hang waiting for input, say prompting for a password.
The stderr=subprocess.PIPE, stdout=subprocess.PIPE are needed only if you want to capture the output of the command into a variable. If you are OK with the output going to your terminal, then you can remove these both; and even use subprocess.call instead of Popen. Also, if you do not provide input to your subprocess, then do not use stdin=subprocess.PIPE at all, but direct that from the null device instead (in Python 3.3+ you can use stdin=subprocess.DEVNULL; in Python <3.3 use stdin=open(os.devnull, 'rb')
If you need the contents too, then instead of calling p.communicate(), you can read p.stdout and p.stderr yourself in chunks and output to the terminal, but it is a bit complicated, as it is easy to deadlock the program - the dummy approach would try to read from the subprocess' stdout while the subprocess would want to write to stderr. For this case there are 2 remedies:
you could use select.select to poll both stdout and stderr to see whichever becomes ready first and read from it then
or, if you do not care for stdout and stderr being combined into one,
you can use STDOUT to redirect the stderr stream into the stdout stream: stdout=subprocess.PIPE, stderr=subprocess.STDOUT; now all the output comes to p.stdout that you can read easily in loop and output the chunks, without worrying about deadlocks:
If the stdout, stderr are going to be huge, you can also spool them to a file right there in Popen; say,
stdout = open('stdout.txt', 'w+b')
stderr = open('stderr.txt', 'w+b')
p = subprocess.Popen(..., stdout=stdout, stderr=stderr)
while p.poll() is None:
# reading at the end of the file will return an empty string
err = stderr.read()
print(err)
out = stdout.read()
print(out)
# if we met the end of the file, then we can sleep a bit
# here to avoid spending excess CPU cycles just to poll;
# another option would be to use `select`
if not err and not out: # no input, sleep a bit
time.sleep(0.01)
What is the proper way of reading subprocess and the stdout
Here are my files:
traffic.sh
code.py
traffic.sh:
sudo tcpdump -i lo -A | grep Host:
code.py:
proc = subprocess.Popen(['./traffic.sh'], stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
# Do some network stuff like ping places, send an email, open a few web pages and wait for them to finish loading
# Stop all traffic and make sure its over
data = proc.stdout.read()
proc.kill()
The code above sometimes works and sometimes doesnt.
The times that it fails, its is due to getting stuck on the proc.stdout.read().
I have followed a bunch of examples that recommend to setup a thread and queue for the proc and read the queue as the proc writes. However, this turnout to be intermittent as to how it works.
I feel like im doing something wrong with the kill and the read. because I can guarantee that there is no communication happening on lo when I make that call and therefore, traffic.sh should not be printing out anything at all.
Then why is the read blocking.
Any clean alternative to the thread?
Edit
I have also tried this, in the hope that the read would no longer block since the process would is terminated
proc = subprocess.Popen(['./traffic.sh'], stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
# Do some network stuff like ping places, send an email, open a few web pages and wait for them to finish loading
# Stop all traffic and make sure its over
proc.kill()
data = proc.stdout.read()
I have a python program which launches subprocesses using Popen and consumes their output nearly real-time as it is produced. The code of the relevant loop is:
def run(self, output_consumer):
self.prepare_to_run()
popen_args = self.get_popen_args()
logging.debug("Calling popen with arguments %s" % popen_args)
self.popen = subprocess.Popen(**popen_args)
while True:
outdata = self.popen.stdout.readline()
if not outdata and self.popen.returncode is not None:
# Terminate when we've read all the output and the returncode is set
break
output_consumer.process_output(outdata)
self.popen.poll() # updates returncode so we can exit the loop
output_consumer.finish(self.popen.returncode)
self.post_run()
def get_popen_args(self):
return {
'args': self.command,
'shell': False, # Just being explicit for security's sake
'bufsize': 0, # More likely to see what's being printed as it happens
# Not guarantted since the process itself might buffer its output
# run `python -u` to unbuffer output of a python processes
'cwd': self.get_cwd(),
'env': self.get_environment(),
'stdout': subprocess.PIPE,
'stderr': subprocess.STDOUT,
'close_fds': True, # Doesn't seem to matter
}
This works great on my production machines, but on my dev machine, the call to .readline() hangs when certain subprocesses complete. That is, it will successfully process all of the output, including the final output line saying "process complete", but then will again poll readline and never return. This method exits properly on the dev machine for most of the sub-processes I call, but consistently fails to exit for one complex bash script that itself calls many sub-processes.
It's worth noting that popen.returncode gets set to a non-None (usually 0) value many lines before the end of the output. So I can't just break out of the loop when that is set or else I lose everything that gets spat out at the end of the process and is still buffered waiting for reading. The problem is that when I'm flushing the buffer at that point, I can't tell when I'm at the end because the last call to readline() hangs. Calling read() also hangs. Calling read(1) gets me every last character out, but also hangs after the final line. popen.stdout.closed is always False. How can I tell when I'm at the end?
All systems are running python 2.7.3 on Ubuntu 12.04LTS. FWIW, stderr is being merged with stdout using stderr=subprocess.STDOUT.
Why the difference? Is it failing to close stdout for some reason? Could the sub-sub-processes do something to keep it open somehow? Could it be because I'm launching the process from a terminal on my dev box, but in production it's launched as a daemon through supervisord? Would that change the way the pipes are processed and if so how do I normalize them?
The main code loop looks right. It could be that the pipe isn't closing because another process is keeping it open. For example, if script launches a background process that writes to stdout then the pipe will no close. Are you sure no other child process still running?
An idea is to change modes when you see the .returncode has set. Once you know the main process is done, read all its output from buffer, but don't get stuck waiting. You can use select to read from the pipe with a timeout. Set a several seconds timeout and you can clear the buffer without getting stuck waiting child process.
Without knowing the contents of the "one complex bash script" which causes the problem, there's too many possibilities to determine the exact cause.
However, focusing on the fact that you claim it works if you run your Python script under supervisord, then it might be getting stuck if a sub-process is trying to read from stdin, or just behaves differently if stdin is a tty, which (I presume) supervisord will redirect from /dev/null.
This minimal example seems to cope better with cases where my example test.sh runs subprocesses which try to read from stdin...
import os
import subprocess
f = subprocess.Popen(args='./test.sh',
shell=False,
bufsize=0,
stdin=open(os.devnull, 'rb'),
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
close_fds=True)
while 1:
s = f.stdout.readline()
if not s and f.returncode is not None:
break
print s.strip()
f.poll()
print "done %d" % f.returncode
Otherwise, you can always fall back to using a non-blocking read, and bail out when you get your final output line saying "process complete", although it's a bit of a hack.
If you use readline() or read(), it should not hang. No need to check returncode or poll(). If it is hanging when you know the process is finished, it is most probably a subprocess keeping your pipe open, as others said before.
There are two things you could do to debug this:
* Try to reproduce with a minimal script instead of the current complex one, or
* Run that complex script with strace -f -e clone,execve,exit_group and see what is that script starting, and if any process is surviving the main script (check when the main script calls exit_group, if strace is still waiting after that, you have a child still alive).
I find that calls to read (or readline) sometimes hang, despite previously calling poll. So I resorted to calling select to find out if there is readable data. However, select without a timeout can hang, too, if the process was closed. So I call select in a semi-busy loop with a tiny timeout for each iteration (see below).
I'm not sure if you can adapt this to readline, as readline might hang if the final \n is missing, or if the process doesn't close its stdout before you close its stdin and/or terminate it. You could wrap this in a generator, and everytime you encounter a \n in stdout_collected, yield the current line.
Also note that in my actual code, I'm using pseudoterminals (pty) to wrap the popen handles (to more closely fake user input) but it should work without.
# handle to read from
handle = self.popen.stdout
# how many seconds to wait without data
timeout = 1
begin = datetime.now()
stdout_collected = ""
while self.popen.poll() is None:
try:
fds = select.select([handle], [], [], 0.01)[0]
except select.error, exc:
print exc
break
if len(fds) == 0:
# select timed out, no new data
delta = (datetime.now() - begin).total_seconds()
if delta > timeout:
return stdout_collected
# try longer
continue
else:
# have data, timeout counter resets again
begin = datetime.now()
for fd in fds:
if fd == handle:
data = os.read(handle, 1024)
# can handle the bytes as they come in here
# self._handle_stdout(data)
stdout_collected += data
# process exited
# if using a pseudoterminal, close the handles here
self.popen.wait()
Why are you setting the sdterr to STDOUT?
The real benefit of making a communicate() call on a subproces is that you are able to retrieve a tuple containining the stdout response as well as the stderr meesage.
Those might be useful if the logic depends on their succsss or failure.
Also, it would save you from the pain of having to iterate through lines. Communicate() gives you everything and there would be no unresolved questions about whether or not the full message was received
I wrote a demo with bash subprocess that can be easy explored.
A closed pipe can be recognized by '' in the output from readline(), while the output from an empty line is '\n'.
from subprocess import Popen, PIPE, STDOUT
p = Popen(['bash'], stdout=PIPE, stderr=STDOUT)
out = []
while True:
outdata = p.stdout.readline()
if not outdata:
break
#output_consumer.process_output(outdata)
print "* " + repr(outdata)
out.append(outdata)
print "* closed", repr(out)
print "* returncode", p.wait()
Example of input/output: Closing the pipe distinctly before terminating the process. That is why wait() should be used instead of poll()
[prompt] $ python myscript.py
echo abc
* 'abc\n'
exec 1>&- # close stdout
exec 2>&- # close stderr
* closed ['abc\n']
exit
* returncode 0
[prompt] $
Your code did output a huge number of empty strings for this case.
Example: Fast terminated process without '\n' on the last line:
echo -n abc
exit
* 'abc'
* closed ['abc']
* returncode 0
I trying to start a program (HandBreakCLI) as a subprocess or thread from within python 2.7. I have gotten as far as starting it, but I can't figure out how to monitor it's stderr and stdout.
The program outputs it's status (% done) and info about the encode to stderr and stdout, respectively. I'd like to be able to periodically retrieve the % done from the appropriate stream.
I've tried calling subprocess.Popen with stderr and stdout set to PIPE and using the subprocess.communicate, but it sits and waits till the process is killed or complete then retrieves the output then. Doesn't do me much good.
I've got it up and running as a thread, but as far as I can tell I still have to eventually call subprocess.Popen to execute the program and run into the same wall.
Am I going about this the right way? What other options do I have or how to I get this to work as described?
I have accomplished the same with ffmpeg. This is a stripped down version of the relevant portions. bufsize=1 means line buffering and may not be needed.
def Run(command):
proc = subprocess.Popen(command, bufsize=1,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
universal_newlines=True)
return proc
def Trace(proc):
while proc.poll() is None:
line = proc.stdout.readline()
if line:
# Process output here
print 'Read line', line
proc = Run([ handbrakePath ] + allOptions)
Trace(proc)
Edit 1: I noticed that the subprocess (handbrake in this case) needs to flush after lines to use this (ffmpeg does).
Edit 2: Some quick tests reveal that bufsize=1 may not be actually needed.