File open and close in python

File open and close in python - python

I have read that when file is opened using the below format
with open(filename) as f:
#My Code
f.close()
explicit closing of file is not required . Can someone explain why is it so ? Also if someone does explicitly close the file, will it have any undesirable effect ?

The mile-high overview is this: When you leave the nested block, Python automatically calls f.close() for you.
It doesn't matter whether you leave by just falling off the bottom, or calling break/continue/return to jump out of it, or raise an exception; no matter how you leave that block. It always knows you're leaving, so it always closes the file.*
One level down, you can think of it as mapping to the try:/finally: statement:
f = open(filename)
try:
# My Code
finally:
f.close()
One level down: How does it know to call close instead of something different?
Well, it doesn't really. It actually calls special methods __enter__ and __exit__:
f = open()
f.__enter__()
try:
# My Code
finally:
f.__exit__()
And the object returned by open (a file in Python 2, one of the wrappers in io in Python 3) has something like this in it:
def __exit__(self):
self.close()
It's actually a bit more complicated than that last version, which makes it easier to generate better error messages, and lets Python avoid "entering" a block that it doesn't know how to "exit".
To understand all the details, read PEP 343.
Also if someone does explicitly close the file, will it have any undesirable effect ?
In general, this is a bad thing to do.
However, file objects go out of their way to make it safe. It's an error to do anything to a closed file—except to close it again.
* Unless you leave by, say, pulling the power cord on the server in the middle of it executing your script. In that case, obviously, it never gets to run any code, much less the close. But an explicit close would hardly help you there.

Closing is not required because the with statement automatically takes care of that.
Within the with statement the __enter__ method on open(...) is called and as soon as you go out of that block the __exit__ method is called.
So closing it manually is just futile since the __exit__ method will take care of that automatically.
As for the f.close() after, it's not wrong but useless. It's already closed so it won't do anything.
Also see this blogpost for more info about the with statement: http://effbot.org/zone/python-with-statement.htm

Related

Why it's needed to open file every time we want to append the file

As the thread How do you append to a file?, most answer is about open a file and append to it, for instance:
def FileSave(content):
with open(filename, "a") as myfile:
myfile.write(content)
FileSave("test1 \n")
FileSave("test2 \n")
Why don't we just extract myfile out and only write to it when FileSave is invoked.
global myfile
myfile = open(filename)
def FileSave(content):
myfile.write(content)
FileSave("test1 \n")
FileSave("test2 \n")
Is the latter code better cause it's open the file only once and write it multiple times?
Or, there is no difference cause what's inside python will guarantee the file is opened only once albeit the open method is invoked multiple times.

There are a number of problems with your modified code that aren't really relevant to your question: you open the file in read-only mode, you never close the file, you have a global statement that does nothing…
Let's ignore all of those and just talk about the advantages and disadvantages of opening and closing a file over and over:
Wastes a bit of time. If you're really unlucky, the file could even just barely keep falling out of the disk cache and waste even more time.
Ensures that you're always appending to the end of the file, even if some other program is also appending to the same file. (This is pretty important for, e.g., syslog-type logs.)1
Ensures that you've flushed your writes to disk at some point, which reduces the chance of lost data if your program crashes or gets killed.
Ensures that you've flushed your writes to disk as soon as you write them. If you try to open and read the file elsewhere in the same program, or in a different program, or if the end user just opens it in Notepad, you won't be missing the last 1.73KB worth of lines because they're still in a buffer somewhere and won't be written until later.2
So, it's a tradeoff. Often, you want one of those guarantees, and the performance cost isn't a big deal. Sometimes, it is a big deal and the guarantees don't matter. Sometimes, you really need both, so you have to write something complicated where you manually buffer up bits and write-and-flush them all at once.
1. As the Python docs for open make clear, this will happen anyway on some Unix systems. But not on other Unix systems, and not on Windows..
2. Also, if you have multiple writers, they're all appending a line at a time, rather than appending whenever they happen to flush, which is again pretty important for logfiles.

In general global should be avoided if possible.
The reason that people use the with command when dealing with files is that it explicitly controls the scope. Once the with operator is done the file is closed and the file variable is discarded.
You can avoid using the with operator but then you must remember to call myfile.close(). Particularly if you're dealing with a lot of files.
One way that avoids using the with block that also avoids using global is
def filesave(f_obj, string):
f_obj.write(string)
f = open(filename, 'a')
filesave(f, "test1\n")
filesave(f, "test2\n")
f.close()
However at this point you'd be better off getting rid of the function and just simply doing:
f = open(filename, 'a')
f.write("test1\n")
f.write("test2\n")
f.close()
At which point you could easily put it within a with block:
with open(filename, 'a') as f:
f.write("test1\n")
f.write("test2\n")
So yes. There's no hard reason to not do what you're doing. It's just not very Pythonic.

The latter code may be more efficient, but the former code is safer because it makes sure that the content that each call to FileSave writes to the file gets flushed to the filesystem so that other processes can read the updated content, and by closing the file handle with each call using open as a context manager, you allow other processes a chance to write to the file as well (specifically in Windows).

It really depends on the circumstances, but here are some thoughts:
A with block absolutely guarantees that the file will be closed once the block is exited. Python does not make and weird optimizations for appending files.
In general, globals make your code less modular, and therefore harder to read and maintain. You would think that the original FileSave function is attempting to avoid globals, but it's using the global name filename, so you may as well use a global file altogether at that point, as it will save you some I/O overhead.
A better option would be to avoid globals at all, or to at least use them properly. You really don't need a separate function to wrap file.write, but if it represents something more complex, here is a design suggestion:
def save(file, content):
print(content, file=file)
def my_thing(filename):
with open(filename, 'a') as f:
# do some stuff
save(f, 'test1')
# do more stuff
save(f, 'test2')
if __name__ == '__main__':
my_thing('myfile.txt')
Notice that when you call the module as a script, a file name defined in the global scope will be passed in to the main routine. However, since the main routine does not reference global variables, you can A) read it easier because it's self contained, and B) test it without having to wonder how to feed it inputs without breaking everything else.
Also, by using print instead of file.write, you avoid having to spend newlines manually.

Writing to file without losing data after crash

I have a file that I open before a loop starts, and I'm writing to that file almost at each iteration of the loop. Then I close the file once the loop has finished. So e.g. something like:
testfile = open('datagathered','w')
for i in range(n):
...
testfile.write(line)
testfile.close()
The issue I'm having is that, in case the program crashes or I want to crash it, what has already been written to testfile will be deleted, and the text file datagathered will be empty. I understand that this happens because I'm closing the file only after the loop, but if I close and open the file after each write (i.e. in the loop) doesn't that lead to an incredible slow-down?
If yes, what alternatives do I have for doing the writing, and making sure that in case of a crash the already-written-lines won't get lost, in an efficient way?
The linked posts do bring up good suggestions that arguably answer this question, but they don't cover risks and efficiency differences involved. More precisely: Are there any risks involved with playing with the buffersize? e.g. testfile = open('datagathered','w',0) Finally is using with open... still a viable alternative if there are multiple files to write to?
Small note: This is asked in the context of a very long run, where the file is being written to for 2-3 days. Thus having a speedy and safe way of doing the writing is definitely valuable here.

From the question I understood that you are talking about exceptions may occur at runtime and SIGINT.
You may use 'try-except-finally' block to achieve your goal. It enables you to catch both exceptions and SIGINT signal. Since the finally block will be executed either exception is caught or everything goes well, closing file there is the best choice. Following sample code would solve your problem I guess.
testfile = open('datagathered','w')
try:
for i in range(n):
...
testfile.write(line)
except KeyboardInterrupt:
print "Interrupt from keyboard"
except:
print "Other exception"
finally:
testfile.close()

Use a context:
with open('datagathered','w') as f:
f.write(data)

Python I/O: Purpose of with?

For file I/O what is the purpose of:
with open
and should I use it instead of:
f=open('file', 'w')
f.write('foo)'
f.close()

Always use the with statement.
From docs:
It is good practice to use the with keyword when dealing with file
objects. This has the advantage that the file is properly closed after
its suite finishes, even if an exception is raised on the way. It is also much shorter than writing equivalent try-finally blocks.
If you don't close the file explicitly then the file object may hang around in the memory until it is garbage collected, which implicitly calls close() on the file object. So, better use the with statement, as it will close the file explicitly even if an error occurs.
Related: Does a File Object Automatically Close when its Reference Count Hits Zero?

Yes. You should use with whenever possible.
This is using the return value of open as a context manager. Thus with is used not just specifically for open, but it should be preferred in any case that some cleanup needs to occur with regards to the object (that you would normally put in a finally block). In this case: on exiting the context, the .close() method of the file object is invoked.
Another good example of a context manager "cleaning up" is threading's Lock:
lock = Lock()
with lock:
#do thing
#lock is released outside the context
In this case, the context manager is .release()-ing the lock.
Anything with an __enter__ and __exit__ method can be used as a context manager. Or, better, you can use contextlib to make context managers with the #contextmanager decoration. More here.

Basically what it is trying to avoid is this:
set things up
try:
do something
finally:
tear things down
but with the with statement you can safely, say open a file and as soon as you exit the scope of the with statement the file will be closed.
The with statement calls the __enter__ function of a class, which does your initial set up and it makes sure it calls the __exit__ function at the end, which makes sure that everything is closed properly.

The with statement is a shortcut for easily writing more robust code. This:
with open('file', 'w') as f:
f.write('foo')
is equivalent to this:
try:
f = open('file', 'w')
f.write('foo')
finally:
f.close()

Why does my generator hang instead of throwing exception?

I have a generator that returns lines from a number of files, through a filter. It looks like this:
def line_generator(self):
# Find the relevant files
files = self.get_files()
# Read lines
input_object = fileinput.input(files)
for line in input_object:
# Apply filter and yield if it is not *None*
filtered = self.__line_filter(input_object.filename(), line)
if filtered is not None:
yield filtered
input_object.close()
The method self.get_files() returns a list of file paths or an empty list.
I have tried to do s = fileinput.input([]), and then call s.next(). This is where it hangs, and I cannot understand why. I'm trying to be pythonic, and not handling all errors myself, but I guess this is one where there is no way around. Or is there?
Unfortunately I have no means of testing this on Linux right now, but could someone please try the following on Linux, and comment what they get?
import fileinput
s = fileinput.input([])
s.next()
I'm on Windows with Python 2.7.5 (64 bit).
All in all, I'd really like to know:
Is this a bug in Python, or me that is doing something wrong?
Shouldn't .next() always return something, or raise a StopIteration?

fileinput defaults to stdin if the list is empty, so it's just waiting for you to type something.
An obvious fix would be to get rid of fileinput (is not terribly useful anyway) and to be explicit, as python zen suggests:
for path in self.get_files():
with open(path) as fp:
for line in fp:
etc

As others already have answered, I try to answer one specific sub-item:
Shouldn't .next() always return something, or raise a StopIteration?
Yes, but it is not specified when this return is supposed to happen: within some milliseconds, seconds or even longer.
If you have a blocking iterator, you can define some wrapper around it so that it runs inside a different thread, filling a list or something, and the originating thread gets an interface to determine if there are data, if there are currently no data or if the source is exhausted.
I can elaborate on this even more if needed.

Close all open files in ipython

Sometimes when using ipython you might hit an exception in a function which has opened a file in write mode. This means that the next time you run the function you get a value error,
ValueError: The file 'filename' is already opened. Please close it before reopening in write mode.
However since the function bugged out, the file handle (which was created inside the function) is lost, so it can't be closed. The only way round it seems to be to close the ipython session, at which point you get the message:
Closing remaining open files: filename... done
Is there a way to instruct ipython to close the files without quitting the session?

You should try to always use the with statement when working with files. For example, use something like
with open("x.txt") as fh:
...do something with the file handle fh
This ensures that if something goes wrong during the execution of the with block, and an exception is raised, the file is guaranteed to be closed. See the with documentation for more information on this.
Edit: Following a discussion in the comments, it seems that the OP needs to have a number of files open at the same time and needs to use data from multiple files at once. Clearly having lots of nested with statements, one for each file opened, is not an option and goes against the ideal that "flat is better than nested".
One option would be to wrap the calculation in a try/finally block. For example
file_handles = []
try:
for file in file_list:
file_handles.append(open(file))
# Do some calculations with open files
finally:
for fh in file_handles:
fh.close()
The finally block contains code which should be run after any try, except or else block, even if an exception occured. From the documentation:
If finally is present, it specifies a "cleanup" handler. The try clause is executed, including any except and else clauses. If an exception occurs in any of the clauses and is not handled, the exception is temporarily saved. The finally clause is executed. If there is a saved exception, it is re-raised at the end of the finally clause. If the finally clause raises another exception or executes a return or break statement, the saved exception is lost. The exception information is not available to the program during execution of the finally clause.

A few ideas:
use always finally (or a with block) when working with files, so they are properly closed.
you can blindly close the non standard file descriptors using os.close(n) where n is a number greater than 2 (this is unix specific, so you might want to peek /proc/ipython_pid/fd/ to see what descriptors the process have opened so far).
you can inspect the captured stack frames locals to see if you can find the reference to the wayward file and close it... take a look to sys.last_traceback

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

File open and close in python - python

I have read that when file is opened using the below format with open(filename) as f: #My Code f.close() explicit closing of file is not required . Can someone explain why is it so ? Also if someone does explicitly close the file, will it have any undesirable effect ?

Related

Why it's needed to open file every time we want to append the file

Writing to file without losing data after crash

Python I/O: Purpose of with?

Why does my generator hang instead of throwing exception?

Close all open files in ipython

Categories

Resources