Python writing data in a file is sometimes delayed [duplicate]

Python writing data in a file is sometimes delayed [duplicate] - python

How often does Python flush to a file?
How often does Python flush to stdout?
I'm unsure about (1).
As for (2), I believe Python flushes to stdout after every new line. But, if you overload stdout to be to a file, does it flush as often?

For file operations, Python uses the operating system's default buffering unless you configure it do otherwise. You can specify a buffer size, unbuffered, or line buffered.
For example, the open function takes a buffer size argument.
http://docs.python.org/library/functions.html#open
"The optional buffering argument specifies the file’s desired buffer size:"
0 means unbuffered,
1 means line buffered,
any other positive value means use a buffer of (approximately) that size.
A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files.
If omitted, the system default is used.
code:
bufsize = 0
f = open('file.txt', 'w', buffering=bufsize)

You can also force flush the buffer to a file programmatically with the flush() method.
with open('out.log', 'w+') as f:
f.write('output is ')
# some work
s = 'OK.'
f.write(s)
f.write('\n')
f.flush()
# some other work
f.write('done\n')
f.flush()
I have found this useful when tailing an output file with tail -f.

You can also check the default buffer size by calling the read only DEFAULT_BUFFER_SIZE attribute from io module.
import io
print (io.DEFAULT_BUFFER_SIZE)

I don't know if this applies to python as well, but I think it depends on the operating system that you are running.
On Linux for example, output to terminal flushes the buffer on a newline, whereas for output to files it only flushes when the buffer is full (by default). This is because it is more efficient to flush the buffer fewer times, and the user is less likely to notice if the output is not flushed on a newline in a file.
You might be able to auto-flush the output if that is what you need.
EDIT: I think you would auto-flush in python this way (based
from here)
#0 means there is no buffer, so all output
#will be auto-flushed
fsock = open('out.log', 'w', 0)
sys.stdout = fsock
#do whatever
fsock.close()

Here is another approach, up to the OP to choose which one he prefers.
When including the code below in the __init__.py file before any other code, messages printed with print and any errors will no longer be logged to Ableton's Log.txt but to separate files on your disk:
import sys
path = "/Users/#username#"
errorLog = open(path + "/stderr.txt", "w", 1)
errorLog.write("---Starting Error Log---\n")
sys.stderr = errorLog
stdoutLog = open(path + "/stdout.txt", "w", 1)
stdoutLog.write("---Starting Standard Out Log---\n")
sys.stdout = stdoutLog
(for Mac, change #username# to the name of your user folder. On Windows the path to your user folder will have a different format)
When you open the files in a text editor that refreshes its content when the file on disk is changed (example for Mac: TextEdit does not but TextWrangler does), you will see the logs being updated in real-time.
Credits: this code was copied mostly from the liveAPI control surface scripts by Nathan Ramella

Related

How to put the output of ffmpeg into a pipe in Python? [duplicate]

I can successfully redirect my output to a file, however this appears to overwrite the file's existing data:
import subprocess
outfile = open('test','w') #same with "w" or "a" as opening mode
outfile.write('Hello')
subprocess.Popen('ls',stdout=outfile)
will remove the 'Hello' line from the file.
I guess a workaround is to store the output elsewhere as a string or something (it won't be too long), and append this manually with outfile.write(thestring) - but I was wondering if I am missing something within the module that facilitates this.

You sure can append the output of subprocess.Popen to a file, and I make a daily use of it. Here's how I do it:
log = open('some file.txt', 'a') # so that data written to it will be appended
c = subprocess.Popen(['dir', '/p'], stdout=log, stderr=log, shell=True)
(of course, this is a dummy example, I'm not using subprocess to list files...)
By the way, other objects behaving like file (with write() method in particular) could replace this log item, so you can buffer the output, and do whatever you want with it (write to file, display, etc) [but this seems not so easy, see my comment below].
Note: what may be misleading, is the fact that subprocess, for some reason I don't understand, will write before what you want to write. So, here's the way to use this:
log = open('some file.txt', 'a')
log.write('some text, as header of the file\n')
log.flush() # <-- here's something not to forget!
c = subprocess.Popen(['dir', '/p'], stdout=log, stderr=log, shell=True)
So the hint is: do not forget to flush the output!

Well the problem is if you want the header to be header, then you need to flush before the rest of the output is written to file :D

Are data in file really overwritten? On my Linux host I have the following behavior:
1) your code execution in the separate directory gets:
$ cat test
test
test.py
test.py~
Hello
2) if I add outfile.flush() after outfile.write('Hello'), results is slightly different:
$ cat test
Hello
test
test.py
test.py~
But output file has Hello in both cases. Without explicit flush() call stdout buffer will be flushed when python process is terminated.
Where is the problem?

Open() command buffer manual buffer manipulation does not work

I am currently creating a data logging function with the Raspberry Pi, and I am unsure as to whether I have found a slight bug. The code I am using is as follows:
import sys, time, os
_File = 'TemperatureData1.csv'
_newDir = '/home/pi/Documents/Temperature Data Logs'
_directoryList = os.listdir(_newDir)
os.chdir(_newDir)
# Here I am specifying the file, that I want to write to it, and that
# I want to use a buffer of 5kb
output = open(_File, 'w', 5000)
try:
while (1):
output.write('hi\n')
time.sleep(0.01)
except KeyboardInterrupt:
print('Keyboard has been pressed')
output.close()
sys.exit(1)
What I have found is that when I periodically view the created file properties, the file size increases in accordance with the default buffer setting 8192 bytes, and not the 5kb that I have specified. However, when I run the exact same program in Python 2.7.13, the buffer size changes to 5kb as requested.
I was wondering if anyone else had experienced this and had any ideas on a solution to getting the program working on Python 3.6.3? Thanks in advance. I can work with the problem on python 2.7.13, it is my pure curiosity which has led to me posting this question.

Python's definition of open in version 2 is what you are using:
open(name[, mode[, buffering]])
In Python 3, the open command is a little different, in that buffering is not a positional integer, but a keyword arg:
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
The docs have the following note:
buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size in bytes of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows:
Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying to determine the underlying device’s “block size” and falling back on io.DEFAULT_BUFFER_SIZE. On many systems, the buffer will typically be 4096 or 8192 bytes long.
“Interactive” text files (files for which isatty() returns True) use line buffering. Other text files use the policy described above for binary files.
That special 8192 number is simply 2^13.
I would suggest trying buffering=5000.

I have done some more research, and managed to find some reasons as to why setting 'buffering' to a value more than 1, does not manually manipulate the buffer to a desired size (bytes) in python 3 or above.
It seems to be because the io library uses two buffers when working with files, a text buffer, and a binary buffer. When in text mode, the file is flushed in accordance to the text buffer (which does not seem to be able to be manipulated when buffering > 1). Instead the buffering argument, manipulates the binary buffer, which then feeds into the text buffer, therefore the buffering function does not work how the programmer expects. This is further explained in the following link:
https://bugs.python.org/issue30718
There is however a work around; you need to use open() in binary mode and not text mode, then use the io.TextIOWrapper function to write to a txt or csv file using the binary buffer. The work around is as follows:
import sys, time, os, io
_File = 'TemperatureData1.csv'
# Open or overwrite the file _file, and use a 20kb buffer in RAM
# before data is saved to disk.
output = open(_File, mode='wb', buffering=700)
output = io.TextIOWrapper(output, write_through=True)
try:
while (1):
output.write('h\n')
time.sleep(0.01)
except KeyboardInterrupt:
print('Keyboard has been pressed')
output.close()
sys.exit(1)

How do I pipe to a file or shell program via Pythons subprocess?

I am working with some fairly large gzipped text files that I have to unzip, edit and re-zip. I use Pythons gzip module for unzipping and zipping, but I have found that my current implementation is far from optimal:
input_file = gzip.open(input_file_name, 'rb')
output_file = gzip.open(output_file_name, 'wb')
for line in input_file:
# Edit line and write to output_file
This approach is unbearably slow – probably because there is a huge overhead involved in doing per line iteration with the gzip module: I initially also run a line-count routine where I - using the gzip module - read chunks of the file and then count the number of newline chars in each chunk and that is very fast!
So one of the optimizations should definitely be to read my files in chunks and then only do per line iterations once the chunks have been unzipped.
As an additional optimization, I have seen a few suggestions to unzip in a shell command via subprocess. Using this approach, the equivalent of the first line in the above could be:
from subprocess import Popen, PIPE
file_input = Popen(["zcat", fastq_filename], stdout=PIPE)
input_file = file_input.stdout
Using this approach input_file becomes a file-like object. I don't know exactly how it is different to a real file object in terms of available attributes and methods, but one difference is that you obviously cannot use seek since it is a stream rather than a file.
This does run faster and it should - unless you run your script in a single core machine the claim is. The latter must mean that subprocess automatically ships different threads to different cores if possible, but I am no expert there.
So now to my current problem: I would like to zip my output in a similar fashion. That is, instead of using Pythons gzip module, I would like to pipe it to a subprocess and then call the shell gzip. This way I could potentially get reading, editing and writing in separate cores, which sounds wildly effective to me.
I have made a puny attempt at this, but attempting to write to output_file resulted in an empty file. Initially, I create an empty file using the touch command because Popen fails if the file does not exist:
call('touch ' + output_file_name, shell=True)
output = Popen(["gzip", output_file_name], stdin=PIPE)
output_file = output.stdin
Any help is greatly appreciated, I am using Python 2.7 by the way. Thanks.

Here is a working example of how this can be done:
#!/usr/bin/env python
from subprocess import Popen, PIPE
output = ['this', 'is', 'a', 'test']
output_file_name = 'pipe_out_test.txt.gz'
gzip_output_file = open(output_file_name, 'wb', 0)
output_stream = Popen(["gzip"], stdin=PIPE, stdout=gzip_output_file) # If gzip is supported
for line in output:
output_stream.stdin.write(line + '\n')
output_stream.stdin.close()
output_stream.wait()
gzip_output_file.close()
If our script only wrote to console and we wanted the output zipped, a shell command equivalent of the above could be:
script_that_writes_to_console | gzip > output.txt.gz

You meant output_file = gzip_process.stdin. After that you can use output_file as you've used gzip.open() object previously (no-seeking).
If the result file is empty then check that you call output_file.close() and gzip_process.wait() at the end of your Python script. Also, the usage of gzip may be incorrect: if gzip writes the compressed output to its stdout then pass stdout=gzip_output_file where gzip_output_file = open(output_file_name, 'wb', 0).

redirect sys.stdout to a file without buffering in Python 3

I have a script that writes to a log file. In Python 2, my quick solution to allow tailing/viewing of the log as it progressed was by assigning sys.stdout to a file object with buffering set to 0:
original_stdout = sys.stdout
sys.stdout = open(log_file, 'w', 0)
Once set, any print statements in the script's functions redirect to the log file very nicely.
Running the 2to3-converted version under Python 3 gives the following error: ValueError: can't have unbuffered text I/O. Changing the 'w' above to 'wb' solves that, so the structure of the block is
original_stdout = sys.stdout
sys.stdout = open(log_file, 'wb', 0)
print("{}".format(first_message))
but now the first print statement errors with TypeError: 'str' does not support the buffer interface. I tried explicitly casting the string to bytes
print(bytes("{}".format(first_message), "UTF-8"))
but that produces the same TypeError as before.
What is the easiest way to write unbuffered text to a file in Python 3?

According to Python 3.4.3 documentation at https://docs.python.org/3/library/io.html#raw-i-o and 3.5 documenmtation at https://docs.python.org/3.5/library/io.html#raw-i-o the way to get unbuffered IO is with Raw IO which can be enabled as in:
f = open("myfile.jpg", "rb", buffering=0)
That means "wb" should work for writing.
Details on Raw IO are at https://docs.python.org/3/library/io.html#io.RawIOBase and https://docs.python.org/3.5/library/io.html#io.RawIOBase which appear to be the same.
I did some testing and found buffering of Text IO to be severe and can amount to hundreds of lines and this happens even when writing to sys.stderr and redirecting the error output to a file, on Windows 7 at least. The I tried Raw IO and it worked great! - each line printed came through immediately and in plain text in tail -f output. This is what worked for me on Windows 7 with Python 3.4.3 and using tail bundled with GitHub tools:
import time
import sys
f = open("myfile.txt", "ab", buffering=0)
c = 0
while True:
f.write(bytes("count is " + str(c) + '\n','utf-8'))
c += 1
time.sleep(1)

If by unbuffered you mean having the outputs immediately flushed to disk, you can simply do this:
original_stdout = sys.stdout
sys.stdout = open(log_file, 'w')
print(log_message, flush=True)
As print is now a first-class function you can also specify which file to print to, such as:
fd = open(log_file, 'w')
print(log_message, file=fd, flush=True)

The issue seems to be in the way you open the file -
open(log_file, 'w', 0)
From Python 3.x documentation -
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
The third argument for open() determines the buffering mode for the file, 0 means no buffering. I do not think you can make it work by just using 'wb' instead of 'w' .
You should remove that 0 third argument, and let open() use default line buffering for text files. Example -
open(log_file, 'w')

using Python subprocess to redirect stdout to stdin?

I'm making a call to a program from the shell using the subprocess module that outputs a binary file to STDOUT.
I use Popen() to call the program and then I want to pass the stream to a function in a Python package (called "pysam") that unfortunately cannot Python file objects, but can read from STDIN. So what I'd like to do is have the output of the shell command go from STDOUT into STDIN.
How can this be done from within Popen/subprocess module? This is the way I'm calling the shell program:
p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True).stdout
This will read "my_cmd"'s STDOUT output and get a stream to it in p. Since my Python module cannot read from "p" directly, I am trying to redirect STDOUT of "my_cmd" back into STDIN using:
p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE, shell=True).stdout
I then call my module, which uses "-" as a placeholder for STDIN:
s = pysam.Samfile("-", "rb")
The above call just means read from STDIN (denoted "-") and read it as a binary file ("rb").
When I try this, I just get binary output sent to the screen, and it doesn't look like the Samfile() function can read it. This occurs even if I remove the call to Samfile, so I think it's my call to Popen that is the problem and not downstream steps.
EDIT: In response to answers, I tried:
sys.stdin = subprocess.Popen(tagBam_cmd, stdout=subprocess.PIPE, shell=True).stdout
print "Opening SAM.."
s = pysam.Samfile("-","rb")
print "Done?"
sys.stdin = sys.__stdin__
This seems to hang. I get the output:
Opening SAM..
but it never gets past the Samfile("-", "rb") line. Any idea why?
Any idea how this can be fixed?
EDIT 2: I am adding a link to Pysam documentation in case it helps, I really cannot figure this out. The documentation page is:
http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/usage.html
and the specific note about streams is here:
http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/usage.html#using-streams
In particular:
"""
Pysam does not support reading and writing from true python file objects, but it does support reading and writing from stdin and stdout. The following example reads from stdin and writes to stdout:
infile = pysam.Samfile( "-", "r" )
outfile = pysam.Samfile( "-", "w", template = infile )
for s in infile: outfile.write(s)
It will also work with BAM files. The following script converts a BAM formatted file on stdin to a SAM formatted file on stdout:
infile = pysam.Samfile( "-", "rb" )
outfile = pysam.Samfile( "-", "w", template = infile )
for s in infile: outfile.write(s)
Note, only the file open mode needs to changed from r to rb.
"""
So I simply want to take the stream coming from Popen, which reads stdout, and redirect that into stdin, so that I can use Samfile("-", "rb") as the above section states is possible.
thanks.

I'm a little confused that you see binary on stdout if you are using stdout=subprocess.PIPE, however, the overall problem is that you need to work with sys.stdin if you want to trick pysam into using it.
For instance:
sys.stdin = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True).stdout
s = pysam.Samfile("-", "rb")
sys.stdin = sys.__stdin__ # restore original stdin
UPDATE: This assumed that pysam is running in the context of the Python interpreter and thus means the Python interpreter's stdin when "-" is specified. Unfortunately, it doesn't; when "-" is specified it reads directly from file descriptor 0.
In other words, it is not using Python's concept of stdin (sys.stdin) so replacing it has no effect on pysam.Samfile(). It also is not possible to take the output from the Popen call and somehow "push" it on to file descriptor 0; it's readonly and the other end of that is connected to your terminal.
The only real way to get that output onto file descriptor 0 is to just move it to an additional script and connect the two together from the first. That ensures that the output from the Popen in the first script will end up on file descriptor 0 of the second one.
So, in this case, your best option is to split this into two scripts. The first one will invoke my_cmd and take the output of that and use it for the input to a second Popen of another Python script that invokes pysam.Samfile("-", "rb").

In the specific case of dealing with pysam, I was able to work around the issue using a named pipe (http://docs.python.org/library/os.html#os.mkfifo), which is a pipe that can be accessed like a regular file. In general, you want the consumer (reader) of the pipe to listen before you start writing to the pipe, to ensure you don't miss anything. However, pysam.Samfile("-", "rb") will hang as you noted above if nothing is already registered on stdin.
Assuming you're dealing with a prior computation that takes a decent amount of time (e.g. sorting the bam before passing it into pysam), you can start that prior computation and then listen on the stream before anything gets output:
import os
import tempfile
import subprocess
import shutil
import pysam
# Create a named pipe
tmpdir = tempfile.mkdtemp()
samtools_prefix = os.path.join(tmpdir, "namedpipe")
fifo = samtools_prefix + ".bam"
os.mkfifo(fifo)
# The example below sorts the file 'input.bam',
# creates a pysam.Samfile object of the sorted data,
# and prints out the name of each record in sorted order
# Your prior process that spits out data to stdout/a file
# We pass samtools_prefix as the output prefix, knowing that its
# ending file will be named what we called the named pipe
subprocess.Popen(["samtools", "sort", "input.bam", samtools_prefix])
# Read from the named pipe
samfile = pysam.Samfile(fifo, "rb")
# Print out the names of each record
for read in samfile:
print read.qname
# Clean up the named pipe and associated temp directory
shutil.rmtree(tmpdir)

If your system supports it; you could use /dev/fd/# filenames:
process = subprocess.Popen(args, stdout=subprocess.PIPE)
samfile = pysam.Samfile("/dev/fd/%d" % process.stdout.fileno(), "rb")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.