Is it worth closing files in small functions? - python

Say you have:
def my_func():
fh = open(...)
try:
print fh.read()
finally:
fh.close()
My first question is: Is it worth having the try/finally (or with) statement? Isn't the file closed anyway when the function terminates (via garbage collection)?
I came across this after reading a recipe form Martelli's "python cookbook" where
all_the_text = open('thefile.txt').read()
comes with the comment: "When you do so, you no longer have a reference to the file object as soon as the reading operation finishes. In practice, Python notices the lack of a reference at once, and immediately closes the file."
My function example is almost the same. You do have a reference, it's just that the reference has a very short life.
My second question is: What does "immediately" in Martelli's statement mean? Even though you don't have a reference at all, doesn't the file closing happen at garbage collection time anyway?

It is good practice to close the file yourself. Using the with statement leads to clean code and it automatically closes the file (which is a Good Thing).
Even though Python is a high-level programming language, you still need to be in control of what you're doing. As a rule of thumb: if you open a file, it also needs to be closed. There's never a good reason to be sloppy in your code :-)
Regarding your second question: it won't run immediately, it'll run when the garbage collector decides it is time to run. When the file object is deallocated Python will close the file. Here are some articles on garbage collection in Python (also see the gc module), it's an interesting read.
It also shows that Python's garbage collection uses a threshold based on the number of allocated and deallocated objects before it decides to garbage collect. If your file is big then Python might hold the file open longer than necessary because the garbage collection code might not have run yet.

Related

Code block in python in order to free memory

Pretty simple question:
I have some code to show some graphs, and it prepares data for the graphs, and I don't want to waste memory (limited)... is there a way to have a "local scope" so when we get to the end, everything inside is freed?
I come from C++ where you can define code inside { ... } so at the end everything is freed, and you don't have to care about anything
Anything like that in python?
The only thing I can think of is:
def tmp():
... code ...
tmp()
but is very ugly, and for sure I don't want to list all the del x at the end
If anything holds a reference to your object, it cannot be freed. By default, anything at the global scope is going to be held in the global namespace (globals()), and as far as the interpreter knows, the very next line of source code could reference it (or, another module could import it from this current module), so globals cannot be implicitly freed, ever.
This forces your hand to either explicitly delete references to objects with del, or to put them within the local scope of a function. This may seem ugly, but if you follow the philosophy that a function should do one thing and one thing well (thanks Unix!), you will already segment your code into functions already. On the one-off exceptions where you allocate a lot of memory early on in your function, and no longer need it midway through, you can del the reference to it.
I know this isn't the answer you want to hear, but its the reality of Python. You could accomplish something similar by nesting function defs or classs inside, but this is kinda hacky (or in the class case, which wouldn't require calling/instantiating, extremely hacky).
I will also mention, there is a gc built in module for interacting with the garbage collector. Here, you can trigger an immediate garbage collection (otherwise python will eventually get around to collecting the things you del refs to), as well as inspect how many references a given object has.
If you're curious where the allocations are happening, you can also use the built in tracemalloc module to trace said allocations.
Mechanism that handles freeing memory in Python is called "Garbage Collector" and it means there's no reason to use del in overwhelming majority of Python code.
When programming in Python, you are "not supposed" to care about such low level things as allocating and freeing memory for your variables.
That being said, putting your code into functions (although preferrably called something clearer than tmp()) is most definitely a good idea as it will make your code much more readable and "Pythonic"
Coming from C++ and already stumbled to one of the main diferences (drawbacks) of python and this is memory management.Python Garbage Collector will delete all the objects that will fall out of scope.Freeing up memory of objects althought doesnt guarantee that this memory will return actually to the system but instead a rather big portion will be kept reserved by the python programm even if not used.If you face a memory problem and you want to free your memory back to the system the only safe method is to run the memory intensive function into a seperate process.Every process in python have its own interpreter and any memory consumed by this process will return to the system when the process exits.

in python, ensure a file is closed when you can't use "with"

The standard answer to "how can I ensure a file is closed in Python" is to wrap the commands in a "with" statement, so that the destructor is called upon exiting the "with" block.
But what about a case where you can't do that because the file handle needs to remain open across a large swath of code? For example, you open the file handle in an object constructor, saving it to an object property, and then referring to the file handle in many object methods.
It would be possible to move the opening of the file handle to the methods themselves, but basically in that case I'd be opening/closing the file every time a method is called, which is far less efficient.
I have tried placing a "close" command in the object destructor (the "del" method), but this does not work.
A dirty but easy win is to keep record of file names when you open them, and make sure file.close(...) appears at the end. Keep a list or wrapping the open() function (suggest by this post) may do the job.
Post check what files are open in Python
suggests several solutions like wrapping the built-in file object, command line method and module psutil, etc, maybe some of them would fit in your situations.

Is file automatically closed if read in same line as opening?

If I do (in Python):
text = open("filename").read()
is the file automatically closed?
The garbage collector would be activated at some point, but you cannot be certain of when unless you force it.
The best way to ensure that the file is closed when you go out of scope just do this:
with open("filename") as f: text = f.read()
also one-liner but safer.
In CPython (the reference Python implementation) the file will be automatically closed. CPython destroys objects as soon as they have no references, which happens at the end of the statement at the very latest.
In other Python implementations this may not happen right away since they may rely on the memory management of an underlying virtual machine, or use some other memory management strategy entirely (see PyParallel for an interesting example).
Python, the language, does not specify any particular form of memory management, so you can't rely on the file being closed in the general case. Use the with statement to explicitly specify when it will be closed if you need to rely on it.
In practice, I often use this approach in short-lived scripts where it doesn't really matter when the file gets closed.
Since you have no reference on the open file handle,
CPython will close it automatically either during garbage collection or at program exit. The problem here is that you don't have any guarantees about when that will occur, which is why the with open(...) construct is preferred.

How does using the try statement avoid a race condition?

When determining whether or not a file exists, how does using the try statement avoid a "race condition"?
I'm asking because a highly upvoted answer (update: it was deleted) seems to imply that using os.path.exists() creates an opportunity that would not exist otherwise.
The example given is:
try:
with open(filename): pass
except IOError:
print 'Oh dear.'
But I'm not understanding how that avoids a race condition compared to:
if not os.path.exists(filename):
print 'Oh dear.'
How does calling os.path.exists(filename) allow the attacker to do something with the file that they could not already do?
The race condition is, of course, between your program and some other code that operates on file (race condition always requires at least two parallel processes or threads, see this for details). That means using open() instead of exists() may really help only in two situations:
You check for existence of a file that is created or deleted by some background process (however, if you run inside a web server, that often means there are many copies of your process running in parallel to process HTTP requests, so for web apps race condition is possible even if there are no other programs).
There may be some malicious program running that is trying to crash your code by destroying the file at the moments you expect it to exist.
exists() just performs a single check. If file exists, it may be deleted a microsecond after exists() returned True. If file is absent, it may be created immediately.
However, open() not just tests for file existence, but also opens the file (and does these two actions atomically, so nothing can happen between the check and the opening). Usually files can not be deleted while they are open by someone. That means that inside with you may be completely sure: file really exists now since it is open. Though it's true only inside with, and the file still may be deleted immediately after with block exits, putting code that needs file to exist inside with guarantees that code will not fail.
Here's an example of usage:
try:
with open('filename') as f:
do_stuff_that_depends_on_the_existence_of_the_file(f)
except IOError as e:
print 'Trouble opening file'
If you are opening the file with any access at all, then the OS will guarantee that the file exists, or else it will fail with an error. If the access is exclusive, any other process in contention for the file will either be blocked by you, or block you.
The try is just a way to detect the error or success of the act of opening the file, since file I/O APIs in Python typically do not have return codes (exceptions are used instead). So to really answer your question, it's not the try that avoids the race condition, it's the open. It's basically the same in C (on which Python is based), but without exceptions. Read this for more information.
Note that you would probably want to execute code that depends on access to the file inside the try block. Once you close the file, its existence is no longer guaranteed.
Calling os.path.exists merely gives a snapshot at a moment in time when the file may or may not exist, and you have no knowledge of the existence of the file once os.path.exists returns. Malevolent code or unexpected logic may delete or change the file when you are not expecting it. It is akin to turning your head to check that a road is clear before driving into it. Once you turn your head back, you have nothing but a guess about what is going on where you are no longer looking. Holding the file open guarantees an extended consistent state, something not possible (for good or ill) when driving. :)
Your suggestion of checking that a file does not exist rather than using try/open is still insufficient because of the snapshot nature of os.path.exists. Unfortunately I know of no way to prevent files from being created in a directory in all cases, so I think it is best to check for the positive existence of a file, rather than its absence.
I think what you're asking is the particular race condition where:
file is opened
context is switched and the file is deleted
context is switched back and file operations are attempted on the "opened" file
The way you're "protected" in this case is by putting all the file handling code in a try block, if at any point the file becomes inaccessible/corrupt your file operations will be able to fail "gracefully" via the catch block.
Note of course modern OS's this can't happen anyway, when a file is "deleted" the delete won't take place until all open handles on the file are resolved (released)

Do files automatically close if I don't assign them to a variable? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is close() necessary when using iterator on a Python file object
for line in open("processes.txt").readlines():
doSomethingWith(line)
Take that code for example. There's nothing to call close() on. So does it close itself automatically?
Files will close when the corresponding object is deallocated. The sample you give depends on that; there is no reference to the object, so the object will be removed and the file will be closed.
Important to note is that there isn't a guarantee made as to when the object will be removed. With CPython, you have reference counting as the basis of memory management, so you would expect the file to close immediately. In, say, Jython, the garbage collector is not guaranteed to run at any particular time (or even at all), so you shouldn't count on the file being closed and should instead close the file manually or (better) use a with statement.
AFAIK they don't. In order to have autoclosing, you need to use a context manager, such as with
Although the object itself may be reclaimed by garbage collection and closed, there is no definite time to when garbage collection occurs.
with open("processes.txt") as openfile:
<do stuff>

Categories