How to stream data written to file to stdout

How to stream data written to file to stdout - python

I have shared library implemented in C which provides a function F. Every time I call F it logs its output in the file error.log.
Now I'm trying to capture the output produced by F from a python script (I'm using python 2.7.5 which I can't change for reasons out of my control).
I would like to stream the data written into error.log to a different file or stdout. I can't just open the file and parse it because it has more of stuff logged in, including the output of previous and later runs of F. I'm only interested in a specific execution which I can't be able to recognize from just the logging. That's why I'm trying to capture the output instead.
I tried opening error.log from python and then changing the file descriptor to make it point to stdout, but that doesn't seem to work (I tried the same with stdout and stderr and it did work).
What I'm doing is roughly
with open('error.log') as logfile:
with redirect_output(logfile, sys.stdout):
function_implemented_in_C()
where redirect_output is a context manager I implemented to do the redirection:
#contextmanager
def redirect_output(orig, dest):
orig_fd = orig.fileno()
with os.fdopen(os.dup(orig_fd)) as old_orig_fd:
os.dup2(dest.fileno(), orig_fd)
try:
yield orig
finally:
# clean and restore fd's
I can't get this to work. Any ideas what I'm doing wrong?
UPDATE:
I reduced the problem to a simple script and it still doesn't seem to work. I guess it has something to do with the data being generated from a function in a shared lib (?) because if I do the same but redirecting the write calls from a file opened from python it works. This example works fine:
import sys
import os
def foo():
f = open('dummy.txt', 'wb', buffering=0)
os.dup2(sys.stdout.fileno(), f.fileno())
f.write('some test data\n')
f.close()
if __name__ == '__main__':
foo()
But this doesn't
import sys
import os
def foo():
f = open('error.log', 'wb', buffering=0)
os.dup2(sys.stdout.fileno(), f.fileno())
function_implemented_in_C()
f.close()
if __name__ == '__main__':
foo()

I'm answering my own question in case somebody stumble into this question with the same problem.
The issue here is that the C function producing the output in error.log launches a different process to perform the task. This makes not possible to easily redirect writes to the file to stdout, since the file descriptors are process specific.
So, if your function happens to produce the output using the same process, then the following should work
import sys
import os
def foo():
f = open('dummy.txt', 'wb', buffering=0)
os.dup2(sys.stdout.fileno(), f.fileno())
f.write('some test data\n')
f.close()
if __name__ == '__main__':
foo()
If that's not the case then you can approach the problem by setting a loop reading from the file so it gets the newly written data. Something like (I haven't tried this code so it could not run):
from multiprocessing import Process, Pipe
def foo(filename, pipe_conn):
with open(filename) as f:
while True:
line = f.readline()
do_something(line)
if pipe_conn.poll(0.01):
break
def bar(pipe_conn):
function_implemented_in_C()
pipe_conn.send(['done'])
pipe_conn.close()
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p = Process(target=bar, args=(child_conn,))
p.start()
foo(parent_conn)
p.join()

Related

Python: Write to global text file from within a multiprocessing.Process

I'd like to launch a mp.Process which can write to a text file. But I'm finding that at the end of the script, the data written to the file isn't actually saved to disk. I don't know what's happening. Here's a minimum working example:
import os, time, multiprocessing
myfile = open("test.dat", "w")
def main():
proc = multiprocessing.Process(target=writer)
proc.start()
time.sleep(1)
print "Times up! Closing file..."
myfile.flush()
os.fsync(myfile.fileno())
print "Closing %s" % (myfile)
myfile.close()
print "File closed. Have a nice day!"
print "> cat test.dat"
def writer():
data = "0000"
for _ in xrange(5):
print "Writing %s to %s" % (data, myfile)
myfile.write(str(data) + '\n')
# if you comment me, writing to disk works!
# myfile.flush()
# os.fsync(myfile.fileno())
if __name__ == "__main__":
main()
Does anyone have suggestions? The context is that this Process will be eventually listening for incoming data, so it really needs to run independently of other things happening in the script.

The problem is that you're opening the file in the main process. Open files are not passed to the subprocesses, so you need to open it inside your function.
Also every code outside the function is executed once for each process, so you're overwriting the file multiple times.
def main():
# create the file empty so it can be appended to
open("test.dat", "w").close()
proc = multiprocessing.Process(target=writer)
proc.start()
def writer():
with open('test.dat', 'a') as myfile: # opens the file for appending
...
myfile.write(...)
...
Now, some OSes don't allow a file to be opened by multiple processes at the same time. The best solution is to use a Queue and pass the data to the main process which then writes to the file.

Autorun python script save output to txt file raspberry pi

I have a issue with my raspberry pi that starts up a python script.How do I save the printed output to a file when it is running on boot? I found script below on the internet but it doesn't seem to write the printed text,it creates the file but the content is empty.
sudo python /home/pi/python.py > /home/pi/output.log

It does write its output to the file but you cannot see it until the python file has finished executing due to buffer never flushed.
If you change the output to a file within your python script you can periodicity call flush in your code to push the output through to the file as and when you wish, something like this.
import sys
import time
outputFile = "output.txt";
with open(outputFile, "w+") as sys.stdout:
while True:
print("some output")
sys.stdout.flush() # force buffer content out to file
time.sleep(5) # wait 5 seconds
if you want to set the output back to the terminal, you may want to save a reference to the original stdout like this
import time
outputFile = "output.txt";
original_stdout = sys.stdout
with open(outputFile, "w+") as sys.stdout:
print("some output in file")
sys.stdout.flush()
time.sleep(5)
sys.stdout = original_stdout
print("back in terminal")

Subprocess error file

I'm using the python module subprocess to call a program and redirect the possible std error to a specific file with the following command:
with open("std.err","w") as err:
subprocess.call(["exec"],stderr=err)
I want that the "std.err" file is created only if there are errors, but using the command above if there are no errors the code will create an empty file.
How i can make python create a file only if it's not empty?
I can check after execution if the file is empty and in case remove it, but i was looking for a "cleaner" way.

You could use Popen, checking stderr:
from subprocess import Popen,PIPE
proc = Popen(["EXEC"], stderr=PIPE,stdout=PIPE,universal_newlines=True)
out, err = proc.communicate()
if err:
with open("std.err","w") as f:
f.write(err)
On a side note, if you care about the return code you should use check_call, you could combine it with a NamedTemporaryFile:
from tempfile import NamedTemporaryFile
from os import stat,remove
from shutil import move
try:
with NamedTemporaryFile(dir=".", delete=False) as err:
subprocess.check_call(["exec"], stderr=err)
except (subprocess.CalledProcessError,OSError) as e:
print(e)
if stat(err.name).st_size != 0:
move(err.name,"std.err")
else:
remove(err.name)

You can create your own context manager to handle the cleanup for you -- you can't really do what you're describing here, which boils down to asking how you can see into the future. Something like this (with better error handling, etc.):
import os
from contextlib import contextmanager
#contextmanager
def maybeFile(fileName):
# open the file
f = open(fileName, "w")
# yield the file to be used by the block of code inside the with statement
yield f
# the block is over, do our cleanup.
f.flush()
# if nothing was written, remember that we need to delete the file.
needsCleanup = f.tell() == 0
f.close()
if needsCleanup:
os.remove(fileName)
...and then something like:
with maybeFile("myFileName.txt") as f:
import random
if random.random() < 0.5:
f.write("There should be a file left behind!\n")
will either leave behind a file with a single line of text in it, or will leave nothing behind.

Need to run a diff command on 2 NamedTemporaryFiles using subprocess module

I am trying to run a diff on 2 named temporary files, I did not use difflib because its output was different from the linux diff.
When I run this code, It does not output anything. I tried a diff on regular files and that works just fine.
#using python 2.6
temp_stage = tempfile.NamedTemporaryFile(delete = False)
temp_prod = tempfile.NamedTemporaryFile(delete = False)
temp_stage.write(stage_notes)
temp_prod.write(prod_notes)
#this does not work, shows no output, tried both call and popen
subprocess.Popen(["diff", temp_stage.name, temp_prod.name])
#subprocess.call(["diff", temp_stage.name, temp_prod.name])

You need to force the files to be written out to disk by calling flush(); or else the data you were writing to the file may only exist in a buffer.
In fact, if you do this, you can even use delete = True, assuming there's no other reason to keep the files around. This keeps the benefit of using tempfile.
#!/usr/bin/python2
temp_stage = tempfile.NamedTemporaryFile(delete = True)
temp_prod = tempfile.NamedTemporaryFile(delete = True)
temp_stage.write(stage_notes)
temp_prod.write(prod_notes)
temp_stage.flush()
temp_prod.flush()
subprocess.Popen(["diff", temp_stage.name, temp_prod.name])

Unrelated to your .flush() issue, you could pass one file via stdin instead of writing data to disk:
from tempfile import NamedTemporaryFile
from subprocess import Popen, PIPE
with NamedTemporaryFile() as file:
file.write(prod_notes)
file.flush()
p = Popen(['diff', '-', file.name], stdin=PIPE)
p.communicate(stage_notes) # diff reads the first file from stdin
if p.returncode == 0:
print('the same')
elif p.returncode == 1:
print('different')
else:
print('error %s' % p.returncode)
diff reads from stdin if input filename is -.
If you use a named pipe then you don't need to write data to disk at all:
from subprocess import Popen, PIPE
from threading import Thread
with named_pipe() as path:
p = Popen(['diff', '-', path], stdin=PIPE)
# use thread, to support content larger than the pipe buffer
Thread(target=p.communicate, args=[stage_notes]).start()
with open(path, 'wb') as pipe:
pipe.write(prod_notes)
if p.wait() == 0:
print('the same')
elif p.returncode == 1:
print('different')
else:
print('error %s' % p.returncode)
where named_pipe() context manager is defined as:
import os
import tempfile
from contextlib import contextmanager
from shutil import rmtree
#contextmanager
def named_pipe(name='named_pipe'):
dirname = tempfile.mkdtemp()
try:
path = os.path.join(dirname, name)
os.mkfifo(path)
yield path
finally:
rmtree(dirname)
The content of a named pipe doesn't touch the disk.

I would suggest bypassing the tempfile handling since with a NTF you're going to have to handle cleanup anyway. Create a new file and write your data then close it. Flush the buffer and then call the subprocess commands. See if that gets it to run.
f=open('file1.blah','w')
f2=open('file2.blah','w')
f.write(stage_notes)
f.flush()
f.close()
f2.write(prod_notes)
f2.flush()
f2.close()
then run your subprocess calls

How do I Convert Python Dict to JSON in a Multi-Threaded Fashion

I have a number of large files with many thousands of lines in python dict format. I'm converting them with json.dumps to json strings.
import json
import ast
mydict = open('input', 'r')
output = open('output.json', "a")
for line in mydict:
line = ast.literal_eval(line)
line = json.dumps(line)
output.write(line)
output.write("\n")
This works flawlessly, however, it does so in a single threaded fashion. Is there an easy way to utilize the remaining cores in my system to speed things up?
Edit:
Based on the suggestions I've started here with the multiprocessing library:
import os
import json
import ast
from multiprocessing import Process, Pool
mydict = open('twosec.in', 'r')
def info(title):
print title
print 'module name:', __name__
print 'parent process: ', os.getppid()
print 'process id:', os.getpid()
def converter(name):
info('converter function')
output = open('twosec.out', "a")
for line in mydict:
line = ast.literal_eval(line)
line = json.dumps(line)
output.write(line)
output.write("\n")
if __name__ == '__main__':
info('main line')
p = Process(target=converter, args=(mydict))
p.start()
p.join()
I don't quite understand where Pool comes into play, can you explain more?

I don't know of an easy way for you to get a speedup from multithreading, but if any sort of speedup is really what you want then I would recommend trying the ujson package instead of json. It has produced very significant speedups for me, basically for free. Use it the same way you would use the regular json package.
http://pypi.python.org/pypi/ujson/

Wrap the code above in a function that takes as its single argument a filename and that writes the json to an output file.
Then create a Pool object from the multiprocessing module, and use Pool.map() to apply your function in parallel to the list of all files. This will automagically use all cores on your CPU, and because it uses multiple processes instead of threads, you won't run into the global interpreter lock.
Edit: Change the main portion of your program like so;
if __name__ == '__main__':
files = ['first.in', 'second.in', 'third.in'] # et cetera
info('main line')
p = Pool()
p.map(convertor, files)
p.close()
Of course you should also change convertor() to derive the output name from the input name!
Below is a complete example of a program to convert DICOM files into PNG format, using the ImageMagick program
"Convert DICOM files to PNG format, remove blank areas."
import os
import sys # voor argv.
import subprocess
from multiprocessing import Pool, Lock
def checkfor(args):
try:
subprocess.check_output(args, stderr=subprocess.STDOUT)
except CalledProcessError:
print "Required program '{}' not found! exiting.".format(progname)
sys.exit(1)
def processfile(fname):
size = '1574x2048'
args = ['convert', fname, '-units', 'PixelsPerInch', '-density', '300',
'-crop', size+'+232+0', '-page', size+'+0+0', fname+'.png']
rv = subprocess.call(args)
globallock.acquire()
if rv != 0:
print "Error '{}' when processing file '{}'.".format(rv, fname)
else:
print "File '{}' processed.".format(fname)
globallock.release()
## This is the main program ##
if __name__ == '__main__':
if len(sys.argv) == 1:
path, binary = os.path.split(sys.argv[0])
print "Usage: {} [file ...]".format(binary)
sys.exit(0)
checkfor('convert')
globallock = Lock()
p = Pool()
p.map(processfile, sys.argv[1:])
p.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to stream data written to file to stdout - python

Related

Python: Write to global text file from within a multiprocessing.Process

Autorun python script save output to txt file raspberry pi

Subprocess error file

Need to run a diff command on 2 NamedTemporaryFiles using subprocess module

How do I Convert Python Dict to JSON in a Multi-Threaded Fashion

Categories

Resources