line-by-line file processing, for-loop vs with - python

I am trying to understand the trade offs/differences between these to
ways of opening files for line-by-line processing
with open('data.txt') as inf:
for line in inf:
#etc
vs
for line in open('data.txt'):
# etc
I understand that using with ensures the file is closed when the
"with-block" (suite?) is exited (or an exception is countered). So I have been using with ever since I learned about it here.
Re for-loop: From searching around the net and SO, it seems that whether the file
is closed when the for-loop is exited is implementation dependent? And
I couldn't find anything about how this construct would deal with
exceptions. Does anyone know?
If I am mistaken about anything above, I'd appreciate corrections,
otherwise is there a reason to ever use the for construct over the
with? (Assuming you have a choice, i.e., aren't limited by Python version)

The problem with this
for line in open('data.txt'):
# etc
Is that you don't keep an explicit reference to the open file, so how do you close it?
The lazy way is wait for the garbage collector to clean it up, but that may mean that the resources aren't freed in a timely manner.
So you can say
inf = open('data.txt')
for line in inf:
# etc
inf.close()
Now what happens if there is an exception while you are inside the for loop? The file won't get closed explicitly.
Add a try/finally
inf = open('data.txt')
try:
for line in inf:
# etc
finally:
inf.close()
This is a lot of code to do something pretty simple, so Python added with to enable this code to be written in a more readable way. Which gets us to here
with open('data.txt') as inf:
for line in inf:
#etc
So, that is the preferred way to open the file. If your Python is too old for the with statement, you should use the try/finally version for production code

The with statement was only introduced in Python 2.5 - only if you have backward compatibility requirements for earlier versions should you use the latter.
Bit more clarity
The with statement was introduced (as you're aware) to encompass the try/except/finally system - which isn't terrific to understand, but okay. In Python (the Python in C), the implementation of it will close open files. The specification of the language itself, doesn't say... so IPython, JPython etc... may choose to keep files open, memory open, whatever, and not free resources until the next GC cycle (or at all, but the CPython GC is different from the .NET or Java ones...).
I think the only thing I've heard against it, is that it adds another indentation level.
So to summarise: won't work < 2.5, introduces the 'as' keyword and adds an indentation level.
Otherwise, you stay in control of handling exceptions as normal, and the finally block closes resources if something escapes.
Works for me!

import os
path = "c:\\fio"
longer_path = "c:\\fio\\"
# Read every file in directory
for filename in os.listdir(path):
print()
print("Here is the file name",filename)
inf = open(longer_path+filename)
try:
for line in inf:
print(line,end='')
finally:
inf.close()
#output
Here is the file name a.txt
mouse
apple
Here is the file name New Text Document - Copy.txt
cat
Here is the file name New Text Document.txt
dog

Related

Exception in "with" block blanks file opened for writing

This simple code
# This code will BLANK the file 'myfile'!
with open('myfile', 'w') as file:
raise Exception()
rather than merely throwing an exception, deletes all data in "myfile", although no actual write operation is even attempted.
This is dangerous to say the least, and certainly not how other languages treat such situations.
How I can prevent this? Do I have to handle every possible exception in order to be certain that the target file will not be blanked by some unforeseen condition? Surely there must be a standard pattern to solve this problem. And, above all: What is happening here in the first place?
You are opening a file for writing. It is that simple action that blanks the file, regardless of what else you do with it. From the open() function documentation:
'w'
open for writing, truncating the file first
Emphasis mine. In essence, the file is empty because you didn't write anything to it, not because you opened it.
Postpone opening the file to a point where you actually have data to write if you don't want this to happen. Writing a list of strings to a file is not going to cause exceptions at the Python level.
Alternatively, write to a new file, and rename (move) it afterwards to replace the original. Renaming a file as left to the OS.
The statement open('myfile', 'w') will delete all the contents on execution i.e. truncate the file.
If you want to retain the lines you have to use open('myfile', 'a'). Here the a option is for append.
Opening a file for writing erases the contents. Best way to avoid lost of data, not only in case of exceptions, also computer shutdown, etc. is to create a new temporary file and rename the file to the original name, when everything is done.
yourfile = "myfile"
try:
with tempfile.NamedTemporaryFile(dir=os.path.dirname(yourfile) or '.', delete=False) as output:
do_something()
except Exception:
handle_exception()
else:
os.rename(output.name, yourfile)

Is there a more concise way to read csv files in Python?

with open(file, 'rb') as readerfile:
reader = csv.reader(readerfile)
In the above syntax, can I perform the first and second line together? It seems unnecessary to use 2 variables ('readerfile' and 'reader' above) if I only need to use the latter.
Is the former variable ('readerfile') ever used?
Can I use the same variable name for both is that bad form?
You can do:
reader = csv.reader(open(file, 'rb'))
but that would mean you are not closing your file explicitly.
with open(file, 'rb') as readerfile:
The first line opens the file and stores the file object in readerfile. The with statement ensures that the file is closed when you exit the block by any means, including exceptions.
reader = csv.reader(readerfile)
The second line creates a CSV reader object using the file object. It needs the file object (otherwise where would it read the data from?). Of course you could conceivably store it in the same variable
readerfile = csv.reader(readerfile)
if you wanted to (and don't plan on using the file object again), but this will likely lead to confusion for readers of your code.
Note that you haven't read anything yet! You still need to iterate over the reader object in order to get the data that you're interested in, and if you close the file before that happens then the reader object won't work. The file object is used behind the scenes by the reader object, even if you "hide" it by overwriting the readerfile variable.
Lastly, if you really want to do everything on one line, you could conceivably define a function that abstracts the with statement:
def with1(context, func):
with context as x:
return func(x)
Now you can write this as one line:
data = with1(open(file, 'rb'), lambda readerfile: list(csv.reader(readerfile)))
It's by no means clearer, however.
This is not recommended at all
Why is it important to use one line?
Most python programmers know well the benefits of using the with statement. Keep in mind that readers might be lazy (that is -read line by line-) on some cases. You want to be able to handle the file with the correct statement, ensuring the correct closing, even if errors arise.
Nevertheless, you can use a one liner for this, as stated in other answers:
reader = csv.reader(open(file, 'rb'))
So basically you want a one-liner?
reader = csv.reader(open(file, 'rb'))
As said before, the problem with that is with open() allows you to do the following steps in one time:
Open the file
Do what you want with the file (inside your open block)
Close the file (that is implicit and you don't have to specify it)
If you don't use with open but directly open, you file stays opened until the object is garbage collected, and that could lead to unpredicted behaviour in some cases.
Plus, your original code (two lines) is much more readable than a one-liner.
If you put them together, then the file won't be closed automatically -- but that often doesn't really matter, since it will be closed automatically when the script terminates.
It's not common to need to reference the raw file once acsv.readerinstance has been created from (except possibly to explicitly close it if you're not using awithstatement).
If you use the same variable name for both, it will probably work because thecsv.readerinstance will still hold a reference to the file object, so it won't be garbage collected until the program ends. It's not a commonly idiom, however.
Since csv files are often processed sequentially, the following can be a fairly concise way to do it since thecsv.readerinstance frequently doesn't really need to be given a variable name and it will close the file properly even if an exception occurs:
with open(file, 'rb') as readerfile:
for row in csv.reader(readerfile):
process the data...

Which method should I use for accessing files and why?

I am in python and there is a lot of ways to access files.
Method 1:
fp = open("hello.txt", "w")
fp.write("No no no");
fp.close()
fp = open("hello.txt", "r")
print fp.read()
fp.close()
Method 2:
open("hello.txt", "w").write("hello world!")
print open("hello.txt", "r").read()
Method 3:
with open("hello.txt","w") as f:
f.write("Yes yes yes")
with open("hello.txt") as f:
print f.read()
Is there a specific advantage in using each of these?
Stuff I know:
Method 2 and Method 3 closes the file automatically, but
Method 1 doesn't.
Method 2 doesn't give you a handle to do multiple operations.
You should use the third method.
There is a common pattern in programming where to use some object you have set it up, run your code, and tear it down again. File handles are one example of this: you have to open the file, run your code, and then close the file. This last is not optional -- it's important for the operating system to know that you are done with it, and for Python to flush all the data out of its IO buffers.
Now, CPython is a reference counted language. That means that it counts how many pieces of code 'know about' a given object, so that when that count becomes zero it can clean up said object and reuse its space in memory. In method 2, the reference count of the file object becomes zero, which allows Python to clean it up. And file objects' cleanup method also closes them. However, you should in general not rely on this -- reference counting is an implementation detail of the standard version of Python, and there's no guarantee that whatever you're using to run the program will do the same. That's why you shouldn't use method 2.
Method 1 is better, because you explicitly close the file -- as long as you reach the .close() function call! If an exception was thrown in the middle of that code block, the close would not be reached, and the file would not be explicitly closed. So you should really wrap the middle code in a try... finally block.
Method 3 is therefore best: you use the with statement -- an idiomatic way of enclosing the .close in a finally block -- to close the file, so you don't have to worry about the extra syntactic fluff of try... except.
I'd use this, kind of extended version of method 3:
with open("hello.txt","w+") as f:
f.write("Yes yes yes")
f.seek(0) #places the cursor back to the start of the file
print f.read() #now read the file
Advantages:
It opens the file only once
w+ mode allows both read and write on the same file object
with takes care of the closing of file
I would think it is best to go method one as it is explicit and you can surround it with a try and except block or the method 3

Closing a file in python opened with a shortcut

I am just beginning with python with lpthw and had a specific question for closing a file.
I can open a file with:
input = open(from_file)
indata = input.read()
#Do something
indata.close()
However, if I try to simplify the code into a single line:
indata = open(from_file).read()
How do I close the file I opened, or is it already automatically closed?
Thanks in advance for the help!
You simply have to use more than one line; however, a more pythonic way to do it would be:
with open(path_to_file, 'r') as f:
contents = f.read()
Note that with what you are doing before, you could miss closing the file if an exception was thrown. The 'with' statement here will cause it be closed even if an exception is propagated out of the 'with' block.
Files are automatically closed when the relevant variable is no longer referenced. It is taken care of by Python garbage collection.
In this case, the call to open() creates a File object, of which the read() method is run. After the method is executed, no reference to it exists and it is closed (at least by the end of script execution).
Although this works, it is not good practice. It is always better to explicitly close a file, or (even better) to follow the with suggestion of the other answer.

How to know when to manage resources in Python

I hope I framed the question right. I am trying to force myself to be a better programmer. By better I mean efficient. I want to write a program to identify the files in a directory and read each file for further processing. After some shuffling I got to this:
for file in os.listdir(dir):
y=open(dir+'\\'+file,'r').readlines()
for line in y:
pass
y.close()
It should be no surprise that I get an AttributeError since y is a list. I didn't think about that when I wrote the snippet.
I am thinking about this and am afraid that I have five open files (there are five files in the directory specified by dir.
I can fix the code so it runs and I explicitly close the files after opening them. I am curious if I need to or if Python handles closing the file in the next iteration of the loop. If so then I only need to write:
for file in os.listdir(dir):
y=open(dir+'\\'+file,'r').readlines()
for line in y:
pass
I am guessing that it(python) does handle this effortlessly. The reason I think that this might be handled is that I have changed the object/thing that y is referencing. When I start the second iteration there are no more memory references to the file that was opened and read using the readlines method.
Python will close open files when they get garbage-collected, so generally you can forget about it -- particularly when reading.
That said, if you want to close explicitely, you could do this:
for file in os.listdir(dir):
f = open(dir+'\\'+file,'r')
y = f.readlines()
for line in y:
pass
f.close()
However, we can immediately improve this, because in python you can iterate over file-like objects directly:
for file in os.listdir(dir):
y = open(dir+'\\'+file,'r')
for line in y:
pass
y.close()
Finally, in recent python, there is the 'with' statement:
for file in os.listdir(dir):
with open(dir+'\\'+file,'r') as y:
for line in y:
pass
When the with block ends, python will close the file for you and clean it up.
(you also might want to look into os.path for more pythonic tools for manipulating file names and directories)
Don't worry about it. Python's garbage collector is good, and I've never had a problem with not closing file-pointers (for read operations at least)
If you did want to explicitly close the file, just store the open() in one variable, then call readlines() on that, for example..
f = open("thefile.txt")
all_lines = f.readlines()
f.close()
Or, you can use the with statement, which was added in Python 2.5 as a from __future__ import, and "properly" added in Python 2.6:
from __future__ import with_statement # for python 2.5, not required for >2.6
with open("thefile.txt") as f:
print f.readlines()
# or
the_file = open("thefile.txt")
with the_file as f:
print f.readlines()
The file will automatically be closed at the end of the block.
..but, there are other more important things to worry about in the snippets you posted, mostly stylistic things.
Firstly, try to avoid manually constructing paths using string-concatenation. The os.path module contains lots of methods to do this, in a more reliable, cross-platform manner.
import os
y = open(os.path.join(dir, file), 'r')
Also, you are using two variable names, dir and file - both of which are built-in functions. Pylint is a good tool to spot things like this, in this case it would give the warning:
[W0622] Redefining built-in 'file'

Categories