Python: Write to global text file from within a multiprocessing.Process - python

I'd like to launch a mp.Process which can write to a text file. But I'm finding that at the end of the script, the data written to the file isn't actually saved to disk. I don't know what's happening. Here's a minimum working example:
import os, time, multiprocessing
myfile = open("test.dat", "w")
def main():
proc = multiprocessing.Process(target=writer)
proc.start()
time.sleep(1)
print "Times up! Closing file..."
myfile.flush()
os.fsync(myfile.fileno())
print "Closing %s" % (myfile)
myfile.close()
print "File closed. Have a nice day!"
print "> cat test.dat"
def writer():
data = "0000"
for _ in xrange(5):
print "Writing %s to %s" % (data, myfile)
myfile.write(str(data) + '\n')
# if you comment me, writing to disk works!
# myfile.flush()
# os.fsync(myfile.fileno())
if __name__ == "__main__":
main()
Does anyone have suggestions? The context is that this Process will be eventually listening for incoming data, so it really needs to run independently of other things happening in the script.

The problem is that you're opening the file in the main process. Open files are not passed to the subprocesses, so you need to open it inside your function.
Also every code outside the function is executed once for each process, so you're overwriting the file multiple times.
def main():
# create the file empty so it can be appended to
open("test.dat", "w").close()
proc = multiprocessing.Process(target=writer)
proc.start()
def writer():
with open('test.dat', 'a') as myfile: # opens the file for appending
...
myfile.write(...)
...
Now, some OSes don't allow a file to be opened by multiple processes at the same time. The best solution is to use a Queue and pass the data to the main process which then writes to the file.

Related

How to stream data written to file to stdout

I have shared library implemented in C which provides a function F. Every time I call F it logs its output in the file error.log.
Now I'm trying to capture the output produced by F from a python script (I'm using python 2.7.5 which I can't change for reasons out of my control).
I would like to stream the data written into error.log to a different file or stdout. I can't just open the file and parse it because it has more of stuff logged in, including the output of previous and later runs of F. I'm only interested in a specific execution which I can't be able to recognize from just the logging. That's why I'm trying to capture the output instead.
I tried opening error.log from python and then changing the file descriptor to make it point to stdout, but that doesn't seem to work (I tried the same with stdout and stderr and it did work).
What I'm doing is roughly
with open('error.log') as logfile:
with redirect_output(logfile, sys.stdout):
function_implemented_in_C()
where redirect_output is a context manager I implemented to do the redirection:
#contextmanager
def redirect_output(orig, dest):
orig_fd = orig.fileno()
with os.fdopen(os.dup(orig_fd)) as old_orig_fd:
os.dup2(dest.fileno(), orig_fd)
try:
yield orig
finally:
# clean and restore fd's
I can't get this to work. Any ideas what I'm doing wrong?
UPDATE:
I reduced the problem to a simple script and it still doesn't seem to work. I guess it has something to do with the data being generated from a function in a shared lib (?) because if I do the same but redirecting the write calls from a file opened from python it works. This example works fine:
import sys
import os
def foo():
f = open('dummy.txt', 'wb', buffering=0)
os.dup2(sys.stdout.fileno(), f.fileno())
f.write('some test data\n')
f.close()
if __name__ == '__main__':
foo()
But this doesn't
import sys
import os
def foo():
f = open('error.log', 'wb', buffering=0)
os.dup2(sys.stdout.fileno(), f.fileno())
function_implemented_in_C()
f.close()
if __name__ == '__main__':
foo()
I'm answering my own question in case somebody stumble into this question with the same problem.
The issue here is that the C function producing the output in error.log launches a different process to perform the task. This makes not possible to easily redirect writes to the file to stdout, since the file descriptors are process specific.
So, if your function happens to produce the output using the same process, then the following should work
import sys
import os
def foo():
f = open('dummy.txt', 'wb', buffering=0)
os.dup2(sys.stdout.fileno(), f.fileno())
f.write('some test data\n')
f.close()
if __name__ == '__main__':
foo()
If that's not the case then you can approach the problem by setting a loop reading from the file so it gets the newly written data. Something like (I haven't tried this code so it could not run):
from multiprocessing import Process, Pipe
def foo(filename, pipe_conn):
with open(filename) as f:
while True:
line = f.readline()
do_something(line)
if pipe_conn.poll(0.01):
break
def bar(pipe_conn):
function_implemented_in_C()
pipe_conn.send(['done'])
pipe_conn.close()
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p = Process(target=bar, args=(child_conn,))
p.start()
foo(parent_conn)
p.join()

Writing sys.stdout to multiple log files using Python?

I am having trouble figuring out what the issue with my code snippet for writing print messages in my console to multiple log-files is doing.
The code snippet I have posted below is supposed to create a new directory test, then write 11 log-files, 1 global log file, and 10 loop log files to this directory. However, the 1st 2 print messages to my global log file is missing when I run this and I cannot figure out what the issue is?
import sys
import os
# Create a test folder to store these global and loop log files.
path = os.getcwd()
test_dir_name = 'test'
test_dir_path = os.path.join(path, test_dir_name)
os.mkdir(test_dir_path)
# Keep a reference to the original stdout.
orig_stdout = sys.stdout
# Define global logfile path.
global_log_name = "global-log.txt"
global_log_path = os.path.join(test_dir_path, global_log_name)
# Problematic code-snippet
sys.stdout = open(global_log_path, 'w')
print("This is a global log file.") # Why is my code omitting this line?
print("The loop is now creating 10 individual log files.") # And this one?
sys.stdout.close()
for i in range(10):
sys.stdout = open(global_log_path, 'w')
print("Creating loop log file {}...".format(i))
sys.stdout.close()
loop_log_name = "local-log-{}.txt".format(i)
loop_log_path = os.path.join(test_dir_path, loop_log_name)
sys.stdout = open(loop_log_path, 'w')
print("This is loop log file {}".format(i))
print("Closing this loop log file...")
sys.stdout.close()
sys.stdout = open(global_log_path, 'w')
print("Loops have concluded.") # But then it includes this line.
print("Now closing global log file.") # And this line in the global log file.
sys.stdout.close()
sys.stdout = orig_stdout
print("Back to original console.")
Some assistance would be greatly appreciated.
The principal issue with this code snippet is the inappropriate use of open(global_log_path, 'w') to append further print messages to global-log.txt. After you have initially executed:
sys.stdout = open(global_log_path, 'w')
print("This is a global log file.") # Why is my code omitting this line?
print("The loop is now creating 10 individual log files.") # And this one?
Subsequent redirections of stdout to global-log.txt instead require passing the argument a, standing for append to open() like so:
sys.stdout = open(global_log_path, 'a')
print("Creating loop log file {}...".format(i))
This prevents previously redirected text from being overwritten, which was happening with your code snippet.

Autorun python script save output to txt file raspberry pi

I have a issue with my raspberry pi that starts up a python script.How do I save the printed output to a file when it is running on boot? I found script below on the internet but it doesn't seem to write the printed text,it creates the file but the content is empty.
sudo python /home/pi/python.py > /home/pi/output.log
It does write its output to the file but you cannot see it until the python file has finished executing due to buffer never flushed.
If you change the output to a file within your python script you can periodicity call flush in your code to push the output through to the file as and when you wish, something like this.
import sys
import time
outputFile = "output.txt";
with open(outputFile, "w+") as sys.stdout:
while True:
print("some output")
sys.stdout.flush() # force buffer content out to file
time.sleep(5) # wait 5 seconds
if you want to set the output back to the terminal, you may want to save a reference to the original stdout like this
import time
outputFile = "output.txt";
original_stdout = sys.stdout
with open(outputFile, "w+") as sys.stdout:
print("some output in file")
sys.stdout.flush()
time.sleep(5)
sys.stdout = original_stdout
print("back in terminal")

Is it really that readline does not block on an empty file as it does on an empty file-like object?

Here is the code that I used to experiment with Python readline().
import threading, os, time
def worker():
file.seek(0)
print ("First attempt on file: " + file.readline().strip())
print ("First attempt on pipe: " + Iget.readline().strip())
print ("Second attempt on pipe: " + Iget.readline().strip())
file.seek(0)
print ("Second attempt on file: " + file.readline().strip())
print ("Third attempt on file: " + file.readline().strip())
fdIget, fdIset = os.pipe()
Iget = os.fdopen(fdIget)
Iset = os.fdopen(fdIset, 'w')
file = open("Test.txt", "w+")
t = threading.Thread(target=worker)
t.start()
time.sleep(2)
Iset.write("Parent pipe\n")
Iset.flush()
file.write("Parent file.\n")
file.flush()
time.sleep(2)
Iset.write("Again Parent pipe\n")
Iset.flush()
file.write("Again Parent file.\n")
file.flush()
t.join()
The output is
First attempt on file:
First attempt on pipe: Parent pipe
Second attempt on pipe: Again Parent pipe
Second attempt on file: Parent file.
Third attempt on file: Again Parent file.
It seems that readline() does not block on an empty file - perhaps it sees an EOF because the file is empty. On the other hand, readline() blocks on an empty file-like object - no EOF is seen until after we close the file-like object. I am expecting that I got it wrong - that I am missing something basic. It would have been more uniform to have readline() blocks on an empty file until after the handle is closed, as it does with a file-like object.
File objects don't know if anyone else has an open handle to the file, so there is no way for them to distinguish "empty file with writers" from "empty file without writers"; a writer closing the file is not visible to the handle reading it.
By contrast, pipes communicate that sort of information, they're streams that are explicitly closed by the writer to communicate data to the reader.
If files acted like pipes, given the lack of info on writers, you'd block indefinitely when you ran out of lines, waiting for another line that would never arrive.
Basically, they're for fundamentally different purposes, don't expect one to behave exactly like the other.

Named pipe won't block

I'm trying to make multiple program communicate using Named Pipes under python.
Here's how I'm proceeding :
import os
os.mkfifo("/tmp/p")
file = os.open("/tmp/p", os.O_RDONLY)
while True:
line = os.read(file, 255)
print("'%s'" % line)
Then, after starting it I'm sending a simple data through the pipe :
echo "test" > /tmp/p
I expected here to have test\n showing up, and the python blocks at os.read() again.
What is happening is python to print the 'test\n' and then print '' (empty string) infinitely.
Why is that happening, and what can I do about that ?
From http://man7.org/linux/man-pages/man7/pipe.7.html :
If all file descriptors referring to the write end of a pipe have been
closed, then an attempt to read(2) from the pipe will see end-of-file
From https://docs.python.org/2/library/os.html#os.read :
If the end of the file referred to by fd has been reached, an empty string is returned.
So, you're closing the write end of the pipe (when your echo command finishes) and Python is reporting that as end-of-file.
If you want to wait for another process to open the FIFO, then you could detect when read() returns end-of-file, close the FIFO, and open it again. The open should block until a new writer comes along.
As an alternative to user9876's answer you can open your pipe for writing right after creating it, this allows it to stay open for writing at all times.
Here's an example contextmanager for working with pipes:
#contextlib.contextmanager
def pipe(path):
try:
os.mkfifo(path)
except FileExistsError:
pass
try:
with open(path, 'w'): # dummy writer
with open(path, 'r') as reader:
yield reader
finally:
os.unlink(path)
And here is how you use it:
with pipe('myfile') as reader:
while True:
print(reader.readline(), end='')

Categories