python subprocess read stdout as it executes - python

one of my function in the program checks the md5sum of hashfile
def check():
print "checking integrity status.."
md5 = subprocess.Popen(["md5sum", "-c", hashfile],shell=False, stdout=subprocess.PIPE)
#fopen = open(basefile, "r")
for f in md5.stdout.readlines():
fc = f.rstrip("\n")
sys.stdout.write("\rChecking..." + fc)
sys.stdout.flush()
now what happens is that the whole command is first executed and then for loop reads from md5 using md5.stdout.readlines, as such it is not dynamic i.e. i dont get the output as the command is executed....is there a way i can get the output while the command is in execution...
fixed using glglgl's answer:
def check():
print "checking integrity status.."
md5 = subprocess.Popen(["md5sum", "-c", hashfile],shell=False, stdout=subprocess.PIPE)
#fopen = open(basefile, "r")
fc = "NULL"
i = 0
for f in iter(md5.stdout.readline, ''):
k = fc
fc = f.rstrip("\n")
if "FAILED" in fc:
print fc
i = i + 1
sys.stdout.write("\rChecking... "+ str(i)+ " " + fc + (' ' * (len(k) - len(fc))) )
sys.stdout.flush()

Of course. There are several ways.
First, you could work with md5.stdout.read(), but there you would have to do line separation by yourself.
Then, you could operate with the file object md5.stdout as iterator. But there seems to be an issue with buffering, i. e. you don't get the results immediately.
And then there is a possibility to call md5.stdout.readline() repeatedly until it returns ''.
The third way is to be preferred in this case; I would suggest it to do like this:
...
for f in iter(md5.stdout.readline, ''):
fc = f.rstrip("\n")
sys.stdout.write("\rChecked " + fc)
sys.stdout.flush()
I have also changed the output text, as there is only an output if che check is done already.
If that is not what you want, but rather really haveing every output captured separately, you should switch to point 1. But that makes it more complicated. I will think about a solution on indication that it is wanted.
There, one must consider the following points:
read() blocks, so one should read byte for byte (quite ugly).
There is the question what should be output and when should there be a intermitting output.

The original poster is correct that hashlib is not available in Python 2.4, however, the md5 library is available. Example workaround:
try:
# Python 2.5 and up.
import hashlib
md5Hash = hashlib.md5
except ImportError:
# Python 2.4 and below.
import md5
md5Hash = md5.new
somedata = 'foobar'
hashstring = md5Hash (somedata).hexdigest ()

What is the file size?
Popen creates a new child process to run the command. Maybe it finishes before you run the for loop.
You can check if the "subprocess" has finished looking returncode attribute (in your code: md5.returncode)

Related

Im not sure how to take filename out of sha256 checksum in Python 3

I'm still a Python noob but figured that instead of checking a checksum manually, I would make a quick program so it would take less time whenever I had to do it(also as practice) so I wrote this(excuse the extra useless lines and bad naming in my code, I was trying to pinpoint what I was doing wrong.)
import subprocess
FileLocation = input("Enter File Location: ")
Garbage1 = str(input("Enter First Checksum: "))
Garbage2 = str(subprocess.call(['sha256sum', FileLocation]))
Garbage3 = Garbage2.split(None, 1)
if Garbage1 == Garbage3[0]:
print("all good")
else:
print("Still Not Working!!!")
When I run this code it keeps on leaving the filepath at the end of the second checksum because of the Linux command, but I tried getting rid of it various ways with .split() but when I ran the code, it was still there, I also tried to add the file path to the end of the first checksum as a test, but that also won't add the filepath to the end of it.
I do know for a fact that the checksums match
Any idea whats wrong any help would be appreciated.
From the docs, subprocess.call does: Run command with arguments. Wait for command to complete or timeout, then return the returncode attribute. You can verify this in the python shell by entering help(subprocess.call) or looking at https://docs.python.org and searching for the subprocess module.
Your code converts the integer return code to a string, not the checksum. There are other calls in subprocess that capture the process stdout, which is where sha256sum sends its checksum. Stdout is a bytes object that needs to be decoded to a string.
import subprocess
FileLocation = input("Enter File Location: ")
Garbage1 = str(input("Enter First Checksum: "))
Garbage2 = subprocess.check_output(['sha256sum', FileLocation]).decode()
Garbage3 = Garbage2.split(None, 1)
if Garbage1 == Garbage3[0]:
print("all good")
else:
print("Still Not Working!!!")

How to best capture the final process progress output containing carriage returns

I use python to connect multiple processing tools for NLP tasks together but also capture the output of each in case something fails and write it to a log.
Some tools need many hours and output their current status as a progress percentage with carriage returns (\r). They do many steps, so they mix normal messages and progress messages. That results in sometimes really large log files that are hard to view with less.
My log will look like this (for fast progresses):
[DEBUG ] [FILE] [OUT] ^M4% done^M8% done^M12% done^M15% done^M19% done^M23% done^M27% done^M31% done^M35% done^M38% done^M42% done^M46% done^M50% done^M54% done^M58% done^M62% done^M65% done^M69% done^M73% done^M77% done^M81% done^M85% done^M88% done^M92% done^M96% done^M100% doneFinished
What I want is an easy way to collapse those strings in python. (I guess it is also possible to do this after the pipeline is finished and replace progress messages with e. g. sed ...)
My code for running and capturing the output looks like this:
import subprocess
from tempfile import NamedTemporaryFile
def run_command_of(command):
try:
out_file = NamedTemporaryFile(mode='w+b', delete=False, suffix='out')
err_file = NamedTemporaryFile(mode='w+b', delete=False, suffix='err')
debug('Redirecting command output to temp files ...', \
'out =', out_file.name, ', err =', err_file.name)
p = subprocess.Popen(command, shell=True, \
stdout=out_file, stderr=err_file)
p.communicate()
status = p.returncode
def fr_gen(file):
debug('Reading from %s ...' % file.name)
file.seek(0)
for line in file:
# TODO: UnicodeDecodeError?
# reload(sys)
# sys.setdefaultencoding('utf-8')
# unicode(line, 'utf-8')
# no decoding ...
yield line.decode('utf-8', errors='replace').rstrip()
debug('Closing temp file %s' % file.name)
file.close()
os.unlink(file.name)
return (fr_gen(out_file), fr_gen(err_file), status)
except:
from sys import exc_info
error('Error while running command', command, exc_info()[0], exc_info()[1])
return (None, None, 1)
def execute(command, check_retcode_null=False):
debug('run command:', command)
out, err, status = run_command_of(command)
debug('-> exit status:', status)
if out is not None:
is_empty = True
for line in out:
is_empty = False
debug('[FILE]', '[OUT]', line.encode('utf-8', errors='replace'))
if is_empty:
debug('execute: no output')
else:
debug('execute: no output?')
if err is not None:
is_empty = True
for line in err:
is_empty = False
debug('[FILE]', '[ERR]', line.encode('utf-8', errors='replace'))
if is_empty:
debug('execute: no error-output')
else:
debug('execute: no error-output?')
if check_retcode_null:
return status == 0
return True
It is some older code in Python 2 (funny time with unicode strings) that I want to rewrite to Python 3 and improve upon. (I'm also open for suggestions in how to process the output in realtime and not when everything is finished. update: is too broad and not exactly part of my problem)
I can think of many approaches but do not know if there is a ready-to-use function/library/etc. but I could not find any. (My google-fu needs work.) The only things I found were ways to remove the CR/LF but not the string portion that gets visually replaced. So I'm open for suggestions and improvements before I invest my time in reimplementing the wheel. ;-)
My approach would be to use a regex to find sections in a string/line between \r and remove them. Optionally I would keep a single percentage value for really long processes. Something like \r([^\r]*\r).
Note: A possible duplicate of: How to pull the output of the most recent terminal command?
It may require a wrapper script. It can still be used to convert my old log files with script2log. I found/got a suggestion for a plain python way that fulfills my needs.
I think the solution for my use case is as simple as this snippet:
# my data
segments = ['abcdef', '567', '1234', 'xy', '\n']
s = '\r'.join(segments)
# really fast approach:
last = s.rstrip().split('\r')[-1]
# or: simulate the overwrites
parts = s.rstrip().split('\r')
last = parts[-1]
last_len = len(last)
for part in reversed(parts):
if len(part) > last_len:
last = last + part[last_len:]
last_len = len(last)
# result
print(last)
Thanks to the comments to my question, I could better/further refine my requirements. In my case the only control characters are carriage returns (CR, \r) and a rather simple solution works as tripleee suggested.
Why not simply the last part after \r? The output of
echo -e "abcd\r12"
can result in:
12cd
The questions under the subprocess tag (also suggested in a comment from tripleee) should help for realtime/interleaved output but are outside of my current focus. I will have to test the best approach. I was already using stdbuf for switching the buffering when needed.

Python's subprocess.check_output( )

I'm working with python's subprocess.check_output() and I'm using it to run a python file that takes certain attributes (like fileName, title, etc..). Everything works fine however, I decided to pass in a string variable instead of an actual string. This doesn't work and I'm not sure why. Does anyone see something that I don't?
import textFile
import upload
import subprocess
def upload(fileName):
arr = []
bunny = "big_buck_bunny.flv" #this is the variable
arr = textFile.readLine(fileName)
size = textFile.getLines(fileName)
i = 0
while(i < size):
f = open("upload.py-oauth2.json", 'w').close()
textFile.append("C:\\Users\\user1\\Desktop\\tester\\upload.py-oauth2.json",arr[i])
#This below is where i would like to pass in a variable
subprocess.check_output('python upload.py --file="C:\\Users\\...\\anniebot\\' + bunny)
i+=1
upload("C:\\Users\\user1\\Desktop\\tester\\accountList.txt")
So I pretty much would like to change the path constantly. The problem is, I cant figure out a way to get subprocess to work without passing in a fixed string.
i would like to do something like:-
subprocess.check_output('python upload.py --file="C:\\Users\\user1\\Videos\\anniebot\\" + bunny --title="title" --description="testing" --keywords="test" --category="22" --privacyStatus="public"')
Do you mean:
subprocess.check_output('python upload.py --file="C:\\Users\\...\\anniebot\\' + bunny + '" --title= ...')
So basically concatenate the string using the single quote instead of the double quote you are using.

Python BUG or I don't get how encoding works? len, find, and re.search do nothing with no empty, successful subprocess.communicate() execution result

For a reason I cannot understand I cannot do nothing with the output of Popen.communicate, except for print it to terminal.
If I save the output to a variable, the variable contains the text, because I can print it to terminal too, but len returns 0, re.search match nothing, and find always returns -1.
The offending function:
#!/usr/bin/env python
# coding: utf-8
import os
import sys
import getopt
import subprocess
import os.path
import re
def get_video_duration (ifile):
p = subprocess.Popen(["ffprobe", ifile], stdout=subprocess.PIPE)
info_str = p.communicate()[0].decode(sys.stdout.encoding)
print(info_str) # for debug, the information about the file prints ok
duration_start = info_str.find("Duration")
# duration_start = "AJDKLAJSL Duration:".find("Duration"), this test was ok
duration_end = info_str.find("start", duration_start)
number_start = info_str.find(" ", duration_start, duration_end) + 1
number_end = info_str.find(",", number_start, duration_end)
temp = info_str[number_start:number_end].split(":")
return int(temp[0]) * 60 * 60 + int(temp[1]) * 60 + int(temp[2])
I attempted different variations. Like do not use .decode(), change find for a single re.search, implement my own find by iterating each character (the problem is that I need len for this, and len always returns 0 for that particular string/byte_array).
It is a bug of some sort or I am doing something wrong about the string encoding. I cannot figure what exactly is the problem. Any help appreciated.
Ubuntu 12.10 64 bits
Python 2.7
You're not doing anything wrong with encoding. Your problem is that ffprobe sends its output (including the duration info you're looking for) to stderr, not stdout. Do this and you should be fine:
def get_video_duration (ifile):
p = subprocess.Popen(["ffprobe", ifile], stderr=subprocess.PIPE)
info_str = p.communicate()[1].decode(sys.stderr.encoding)
The reason your print() call seems to be working is that it's printing nothing (because info_str truly is empty)... but the stderr output is being dumped to the console, which gives you the illusion that what you're seeing is the result of your print() call.

How to force wait() to completely wait for subprocess.Popen() to complete

I'm trying to transfer and rename some files in a while loop using the subprocess.Popen(['cp', etc..]) and wait(). Unfortunately it appears that wait() command is not properly working, i.e. not waiting for the file to completely copy to the new directory. Most of the time the files copy over fine, however, a small subset of random files do not (not the same files each time I run the script) and are thus zero byte files or incomplete files. I have also tried using subprocess.check_call() but this does not work either. When I print the poll() value it's always zero which should mean the process has finished. Note all files I'm dealing with are in the range of 150KBs. My python script is being run in pyraf utilising python 2.7, python version of iraf (image reduction and analysis facility) since I'm using iraf routines. Is there any way to force Popen() or check_call() to wait for the file transfer to complete?
while count <= ncross_correlate and skip_flag != 's':
...more stuff
try:
iraf.rv.fxcor (object_fits, template_fits, apertures="*", cursor="",
continuum="both", filter="both", rebin="smallest", pixcorr="no",
osample=osample_reg, rsample=osample_reg, apodize=0.1, function="gaussian",
width="INDEF", height=0., peak="no", minwidth=3., maxwidth=21., weights=1.,
background=0., window=300., wincenter="INDEF", output=output_name, verbose="long",
imupdate="no", graphics="stdgraph", interactive="no", autowrite="yes",
autodraw="yes", ccftype="image", observatory="aao", continpars="", filtpars="",
keywpars="")
# Create a eps file of the cross_correlation file.
gki_output_name = output_name + '.gki'
iraf.plot.sgikern (gki_output_name, device='eps', generic='no', debug='no',
verbose='no', gkiunit='no')
Unfortunately the only way to convert the .gki file created in fxcor to some readable
format outside of iraf is to call the iraf task sgikern which dumps an .eps file in my
iraf/iraf/ directory without giving the option to change the file name or directory placement. In fact the filename is randomly generated. Very frustrating!!! Also note that nothing is wrong with any of the eps files created using iraf.plot.sgikern (i.e. no 0 KB files to begin with). Copying and renaming is where I have issues.
# Find the eps file in /iraf/iraf/, rename it, and move to proper output location.
iraf_dir = '/iraf/iraf/'
eps_file_list = glob.glob(iraf_dir + 'sgi' + '*.eps')
...more code
At this point I have tried using check_call() or Popen():
subprocess.check_call(['cp', eps_file_list[0], ccf_output_dir + object_name_sub +
'.eps'], stdout=subprocess.PIPE)
subprocess.check_call(['rm', eps_file_list[0]], stdout=subprocess.PIPE)
or
process1 = subprocess.Popen(['cp', eps_file_list[0], ccf_output_dir +
object_name_sub + '.eps'], stdout=subprocess.PIPE)
process1.wait()
process2 = subprocess.Popen(['rm', eps_file_list[0]], stdout=subprocess.PIPE)
process2.wait()
...more stuff
# end of try statement
#end of while statement
I reckon if I could somehow combine the two Popen statement in to a single Popen statement and also include a shell sleep time of maybe 0.01s to force the other two processes to finish before returning a completed process, that would probably fix it. Maybe something like this, though I'm not sure of the exact sentax:
process1 = subprocess.Popen(['cp', eps_file_list[0], ccf_output_dir +
object_name_sub + '.eps']; ['rm', eps_file_list[0]]; ['sleep', 0.01],
stdout=subprocess.PIPE)
process1.wait()
Hopefully this gives you an idea of what I'm trying to do. I've been trying lots of different things and looking all over for a solution to this problem and I'm truly stuck.
Cheers,
Brett
Perhaps the following would be sufficient:
subprocess.check_call(['mv', eps_file_list[0], ccf_output_dir + object_name_sub +
'.eps'], stdout=subprocess.PIPE)
and
process1 = subprocess.Popen(['mv', eps_file_list[0], ccf_output_dir +
object_name_sub + '.eps'], stdout=subprocess.PIPE)
process1.wait()
Have you considered using shutil.copyfile for the copy and os.remove for the deletion?
If you really want to use Subprocess, I believe the syntax is something like this:
process1 = subprocess.Popen('cp ' + eps_file_list[0] + ' ' + ccf_output_dir +
object_name_sub + '.eps; rm ' + eps_file_list[0] ' + '; sleep 0.01',
stdout=subprocess.PIPE)
This way, the command you're calling is all in one string: 'cp whatever foo/bar.eps; rm whatever; sleep 0.01'
You can also format the string with triple quotes and have the commands on separate lines:
'''
cp %s %s%s
rm %s
sleep %s
''' % (eps_file_list[0], ccf_output_dir, object_name_sub, eps_file_list[0], 0.01)
This isn't a complete solution nor a satisfying one but it's the best I have come up with and works ~99.9% of the time (out of 4000+ eps I create, 5 files were I either 0 byte or incomplete). This is an improvement over the original way I was doing this which was successful about 95% of the time. I have pasted the code below:
try:
iraf.rv.fxcor (object_fits, template_fits, apertures="*", cursor="",
continuum="both", filter="both", rebin="smallest", pixcorr="no", osample=osample_reg,
rsample=osample_reg, apodize=0.1, function="gaussian", width="INDEF", height=0., peak="no",
minwidth=3., maxwidth=21., weights=1., background=0., window=300.,
wincenter="INDEF", output=output_name, verbose="long", imupdate="no",
graphics="stdgraph", interactive="no", autowrite="yes", autodraw="yes",
ccftype="image", observatory="aao", continpars="", filtpars="", keywpars="")
# Create a eps file of the cross_correlation file.
gki_output_name = output_name + '.gki'
iraf.plot.sgikern (gki_output_name, device='eps', generic='no', debug='no',
verbose='no', gkiunit='no')
time.sleep(0.25)
I put a time sleeper here as I discovered that some of the ps files being created in my iraf directory had not been fully written by the time the my code tried to move the file to another directory.
# Find the eps file in /iraf/iraf/, rename it, move to proper output location, and delete the old eps file.
iraf_dir = '/iraf/iraf/'
eps_file_list = glob.glob(iraf_dir + 'sgi' + '*.eps')
...continuation of code
if len(eps_file_list) == 1:
eps_file_sub = os.path.basename(eps_file_list[0])
cmd1 = 'cp {0} {1}'.format(eps_file_list[0], ccf_output_dir + object_name_sub + '.eps')
cmd2 = 'rm {0}'.format(eps_file_list[0])
cmd3 = 'sleep 0.05'
process1 = subprocess.Popen("{}; {}; {}".format(cmd1, cmd2, cmd3), shell=True, stdout=subprocess.PIPE)
process1.wait()
With process1 I'm sending three subprocess shell commands. The first is to copy the eps files from my /iraf directory to another directory (the iraf function which creates them in the first place does not allow me to give these files a proper name nor location for output). The second is to remove the eps file from my /iraf directory. The third command for forces the kernal to sleep. By doing this, Python does not receive a completed signal until the sleep command has been reached. This part I believe works perfectly. The only issue is that very rarely the iraf routine used to create the eps file doesn't create them fast enough when I reach this command.
#endif
num_cross = num_cross + 1
#Endtry
...more code
This is a very clunky solution and not satisfy to the least bit but it does work 99.9% of the time. If anyone has a better solution, please let me know. This has been a very frustrating problem and everyone I've asked hasn't been able to come up with anything better (including people who program in python regularly in my astro department).

Categories