Closing files in Python

Closing files in Python - python

In this discussion about the easiest way to run a process and discard its output, I suggested the following code:
with open('/dev/null', 'w') as dev_null:
subprocess.call(['command'], stdout=dev_null, stderr=dev_null)
Another developer suggested this version:
subprocess.call(['command'], stdout=open('/dev/null', 'w'), stderr=STDOUT)
The C++ programmer in me wants to say that when objects are released is an implementation detail, so to avoid leaving a filehandle open for an indeterminate period of time, I should use with. But a couple of resources suggest that Python always or almost always uses reference counting for code like this, in which case the filehandle should be reclaimed as soon as subprocess.call is done and using with is unnecessary.
(I guess that leaving a filehandle open to /dev/null in particular is unlikely to matter, so pretend it's an important file.)
Which approach is best?

You are correct, refcouting is not guaranteed. In fact, only CPython (which is the main implementation, yes, but not even remotely the only one) provdies refcounting. In case CPython ever changes that implementation detail (unlikely, yes, but possible), or your code is ever run on an alternate implementation, or you lose refcouting because of any other reason, the file won't be closed. Therefore, and given that the with statement makes cleanup very easy, I would suggest you always use a context manager when you open files.

When the pipe to the null device closes is irrelevant - it won't lead to data loss in the output or some such. While you maybe want to use the with variant always to ensure that your output files are always properly flushed and closed, etc., this isn't an example where this matters.

The entire point of the with statement is to have a controlled cleanup process. You're doing it right, don't let anyone convince you otherwise.

Related

Pylint R1732 ("Consider using 'with'") for one-liner: is it really good advice?

On a line such as
r = open(path, encoding="utf-8").read()
(actual line here),
Pylint 2.14.5 provides the following advise:
submodules-dedup.py:71:32: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
If I understand correctly, the suggestion is to change it to
with open(path, encoding="utf-8") as f:
r = f.read()
But is this really better in any way?
Personally I don't find it any more readable, and as for other concerns, wouldn't the file be closed at the same time thanks to how reference counting works anyhow?

This suggestion is to ensure the resource is closed or freed when it exits the context. This is the point of using a context manager.
Of course using a context manager breaks in some extent the one-liner style but it brings better/safer code. No chance to forget the close statement. Indeed it is a trade off between readability and good coding practice.
The question is: Is the second line with close statement more readable?
Python documentation states it explicitly:
If you’re not using the with keyword, then you should call f.close()
to close the file and immediately free up any system resources used by
it.
Warning: Calling f.write() without using the with keyword or calling
f.close() might result in the arguments of f.write() not being
completely written to the disk, even if the program exits
successfully.
Anyway the resource should be released when your program exists but it may not be in the state you think it should be.
If the resource is not critical or you think that explicitly write the close statement afterward does not break the one liner style you may ignore this warning.
The risk of keeping files opened are few but you may consider it:
Dead lock if the resource is locked when opened, it will prevent other process to access it until the lock is released;
Corruption and unattended behaviour when writing to the resource;
Reaching the limit of number of files that can be opened by the OS;
The same will happen with database connection:
Reaching the connection limit due to unclosed connections leading to a service denial.
So, IMHO using the context manager is the good choice to take as it ensures resource to be released as soon as possible, it keeps the code clean and prevent you to forget the required close statement that anyway will break the one-liner style.

Complementary to the other answer: FWIW pathlib provides utility methods which nicely replace open().read(), without the issues that might have on alternative implementations:
r = open(path, encoding="utf-8").read()
can be written as
r = Path(path).read_text("utf-8")
The end result is the same, but the implementation will ensure the file is properly closed before returning. And the Path() invocation is not necessary if you work with path objects to start with, which is a pretty nice thing to do in modern python as the API is not the worst (though not all of the os and os.path utility functions are available through pathlib, sadly).
wouldn't the file be closed at the same time thanks to how reference counting works anyhow?
There are more implementations than cpython, and thus other memory reclamation schemes than refcounting (none of pypy, jython, and ironpython use refcounting). pylint should not assume you're using cpython, and those alternate implementations are a big part of why context managers were introduced to start with.

Python print immediately?

I noticed that in python when I use a print statement, that it doesn't print immediately. I found you can use sys.stdout.flush() to make it show the print in the console.
Is this the proper way of getting immediate feedback from a script, or is there a better way?
I mainly want this for debugging. I had a hang and was trying to find where the code stalled, but since my print statements didn't show up I was searching in the wrong place, thinking my code didn't get to the print statement. (I finally found it with breakpoints, which is perhaps a better way of debugging, but print immediate-prints are just more convenient sometimes.)

Exactly how and when stdout will flush itself depends on the implementation of the print function you're using, the string you're trying to print, possibly the OS, etc.
Anecdotally, stdout seems to flush when it hits a newline character, but I don't always trust it, or I won't know until runtime if my string will contain a newline, so I use flush() pretty regularly in C, Java, and Python when I want to be sure that something prints immediately and I am not overly concerned about performance. Places where I've found this especially useful (ie where timing of the messages really matters) is: near code that is likely to throw exceptions, right before blocking IO calls, in multi-threaded sections of code.
As a practical example, something like this seems like a natural usage for flush() (sorry it's in C++ rather than Python)
printf("Connecing to IP Camera at %s...", ipCamAddr);
cout.flush();
// open a network connection to the camera
printf("Done!\n");
Because without the flush, it would hold the first string until it gets a \n, which is after the connection has succeeded / failed.

Does python "file write()" method guarantee datas have been correctly written?

I'm new with python and I'm writting script to patch a file with something like:
def getPatchDatas(file):
f = open(file,"rb")
datas = f.read()
f.close()
return datas
f = open("myfile.bin","r+b")
f.seek(0xC020)
f.write(getPatchDatas("mypatch.bin"))
f.close()
I would like to be sure the patch as been applied correctly.
So, if no error / exception is raised, does it mean I'm 100% sure the patch has been correctly written?
Or is it better to double check with something like:
f = open("myfile.bin","rb")
f.seek(0xC020)
if not f.read(0x20) == getPatchDatas("mypatch.bin"):
print "Patch not applied correctly!"
f.close()
??
Thanks.

No it doesn't, but roughly it does. It depends how much it matters.
Anything could go wrong - it could be a consumer hard disk which lies to the operating system about when it has finished writing data to disk. It could be corrupted in memory and that corrupt version gets written to disk, or it could be corrupted inside the disk during writing by electrical or physical problems.
It could be intercepted by kernel modules on Linux, filter drivers on Windows or a FUSE filesystem provider which doesn't actually support writing but pretends it does, meaning nothing was written.
It could be screwed up by a corrupted Python install where exceptions don't work or were deliberately hacked out of it, or file objects monkeypatched, or accidentally run in an uncommon implementation of Python which fakes supporting files but is otherwise identical.
These kinds of reasons are why servers have server class hardware with higher tolerances to temperature and electrical variation, error checking and correcting memory (ECC), RAID controller battery backups, ZFS checksumming filesystem, Uninterruptable Power Supplies, and so on.
But, as far as normal people and low risk things go - if it's written without error, it's as good as written. Double-checking makes sense - especially as it's that easy. It's nice to know if something has failed.

In single process, it is.
In multi processes(e.g. One process is writing and another is reading. Even you ensure it'll only read after call "write", the "write" need some time to finish), you may need a filelock.

How to tell whether sys.stdout has been flushed in Python

I'm trying to debug some code I wrote, which involves a lot of parallel processes. And have some unwanted behaviour involving output to sys.stdout and certain messages being printed twice. For debugging purposes it would be very useful to know whether at a certain point sys.stdout has been flushed or not. I wonder if this is possible and if so, how?
Ps. I don't know if it matters but I'm using OS X (at least some sys commands depend on the operating system).

The answer is: you cannot tell (not without serious uglyness, an external C module or similar).
The reason is that python’s file-implementation is based on the C (stdio) implementation for FILE *. So an underlying python file object basically just has a reference to the opened FILE. When writing data the C-implementation writes data, and when you tell it to flush() python also just forwards the flush call. So python does not hold the information. Even for the underlying C layer, a quick search returned that there is no documented API allowing you to access this information, however it's probably somewhere in the FILE object, so it could in theory be read out if it is that desperately needed.

How can I read output from another program? [duplicate]

This question already has answers here:
read subprocess stdout line by line
(10 answers)
Closed 21 days ago.
How can I receive input from the terminal in Python?
I am using Python to interface with another program which generates output from user input.
I am using subprocess.Popen() to input to the program, but I can't set stdout to subprocess.PIPE because the program does not seem to flush ever, so everything gets stuck in the buffer.
The program's standard output seems to be to print to terminal, and I see output when I do not redirect stdout. However, I need Python to read and interpret the output which is now in the terminal.
Sorry if this is a stupid question, but I can't seem to get this to work.

Buffering in child processes is a common problem. Here are four possible approaches.
First, and easiest, you could read one byte at a time from your pipe. This is what I would call a "dirty hack" and it carries a performance penalty, but it's easy and it guarantees that your read() calls will only block until the first byte comes in, rather than wait for a buffer to fill up that's never going to fill up. However, this does not force the other process to flush its write buffer, so if that is the issue this approach will not help you anyway.
Second, and I think next-easiest, consider using the Twisted framework which has a facility for using a virtual terminal, or pty ("pseudo-teletype" I think) to talk to your child process. However, this can affect the design of your application (possibly for the better, but this may not be in the cards for you regardless). http://twistedmatrix.com/documents/current/core/howto/process.html
If neither of the above options works for you, you're reduced to solving gritty I/O concurrency issues yourself.
Third, try setting your pipes (all of them, before fork()) to non-blocking mode using fcntl() with O_NONBLOCK. Then you can use select() to test for read/write readiness before trying the read/write; but you still have to catch IOError and test for EAGAIN because it can happen even in this case. This may, depending on the behavior of the child process, allow you to wait until the data really shows up before trying to read it in.
The last resort is to implement the PTY logic yourself. If you've seen references to stuff like termio options, ioctl() calls, etc. then that's what you're up against. I have not done this before, because it's complicated and I have never really needed to. If this is your destiny, good luck.

Have you tried setting the bufsize in your Popen object to 0? I'm not sure if you can force the buffer to be unbuffered from the receiving size, but I'd try it.
http://docs.python.org/library/subprocess.html#using-the-subprocess-module

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.