Which method should I use for accessing files and why?

Which method should I use for accessing files and why? - python

I am in python and there is a lot of ways to access files.
Method 1:
fp = open("hello.txt", "w")
fp.write("No no no");
fp.close()
fp = open("hello.txt", "r")
print fp.read()
fp.close()
Method 2:
open("hello.txt", "w").write("hello world!")
print open("hello.txt", "r").read()
Method 3:
with open("hello.txt","w") as f:
f.write("Yes yes yes")
with open("hello.txt") as f:
print f.read()
Is there a specific advantage in using each of these?
Stuff I know:
Method 2 and Method 3 closes the file automatically, but
Method 1 doesn't.
Method 2 doesn't give you a handle to do multiple operations.

You should use the third method.
There is a common pattern in programming where to use some object you have set it up, run your code, and tear it down again. File handles are one example of this: you have to open the file, run your code, and then close the file. This last is not optional -- it's important for the operating system to know that you are done with it, and for Python to flush all the data out of its IO buffers.
Now, CPython is a reference counted language. That means that it counts how many pieces of code 'know about' a given object, so that when that count becomes zero it can clean up said object and reuse its space in memory. In method 2, the reference count of the file object becomes zero, which allows Python to clean it up. And file objects' cleanup method also closes them. However, you should in general not rely on this -- reference counting is an implementation detail of the standard version of Python, and there's no guarantee that whatever you're using to run the program will do the same. That's why you shouldn't use method 2.
Method 1 is better, because you explicitly close the file -- as long as you reach the .close() function call! If an exception was thrown in the middle of that code block, the close would not be reached, and the file would not be explicitly closed. So you should really wrap the middle code in a try... finally block.
Method 3 is therefore best: you use the with statement -- an idiomatic way of enclosing the .close in a finally block -- to close the file, so you don't have to worry about the extra syntactic fluff of try... except.

I'd use this, kind of extended version of method 3:
with open("hello.txt","w+") as f:
f.write("Yes yes yes")
f.seek(0) #places the cursor back to the start of the file
print f.read() #now read the file
Advantages:
It opens the file only once
w+ mode allows both read and write on the same file object
with takes care of the closing of file

I would think it is best to go method one as it is explicit and you can surround it with a try and except block or the method 3

Related

Is there any difference between the two ways to dump an object to a JSON file?

Method 1:
json.dump(object, file('object.json', 'w'))
Method 2:
f = file('object.json', 'w')
json.dump(object, f)
f.close()
I often use method 2 to dump object to json files, but it looks so ugly. Method 1 looks good and clear, but I feel confused whether it is right to open a file object as the parameter, who will take control of that object after the object has been stored in the JSON file?

I feel confused whether it is right to open a file object as the parameter, who will take control of that object after the object has been stored in the json file ?
No one takes control; the file you created in the json.dump call from method 1 leaves scope after that line and is therefore implicitly closed. Both methods are therefore equivalent.
If you want to be more verbose, I'd suggest to use a context manager:
with file('object.json', 'w') as f:
json.dump(object, f)
This ensures that the file is always closed properly at the end of the block, even before control is passed to any exception handlers if an error occured.

I would say there is not that much difference but that the second method could be considered better from a readability standpoint; Readability counts.
In your example, the filename and mode that you are using to open the file are hardcoded strings. If they were both variable names (as they most likely would be in a real world example), I feel that method #1 would be a little harder to parse while reading.
One more observation would be that using method #1, you wouldn't be able to continue writing to that file object as you have no variable referencing it; You would need to reopen it in order to append more data.

they are the same internally, you can check the source code.

Is there a more concise way to read csv files in Python?

with open(file, 'rb') as readerfile:
reader = csv.reader(readerfile)
In the above syntax, can I perform the first and second line together? It seems unnecessary to use 2 variables ('readerfile' and 'reader' above) if I only need to use the latter.
Is the former variable ('readerfile') ever used?
Can I use the same variable name for both is that bad form?

You can do:
reader = csv.reader(open(file, 'rb'))
but that would mean you are not closing your file explicitly.

with open(file, 'rb') as readerfile:
The first line opens the file and stores the file object in readerfile. The with statement ensures that the file is closed when you exit the block by any means, including exceptions.
reader = csv.reader(readerfile)
The second line creates a CSV reader object using the file object. It needs the file object (otherwise where would it read the data from?). Of course you could conceivably store it in the same variable
readerfile = csv.reader(readerfile)
if you wanted to (and don't plan on using the file object again), but this will likely lead to confusion for readers of your code.
Note that you haven't read anything yet! You still need to iterate over the reader object in order to get the data that you're interested in, and if you close the file before that happens then the reader object won't work. The file object is used behind the scenes by the reader object, even if you "hide" it by overwriting the readerfile variable.
Lastly, if you really want to do everything on one line, you could conceivably define a function that abstracts the with statement:
def with1(context, func):
with context as x:
return func(x)
Now you can write this as one line:
data = with1(open(file, 'rb'), lambda readerfile: list(csv.reader(readerfile)))
It's by no means clearer, however.

This is not recommended at all
Why is it important to use one line?
Most python programmers know well the benefits of using the with statement. Keep in mind that readers might be lazy (that is -read line by line-) on some cases. You want to be able to handle the file with the correct statement, ensuring the correct closing, even if errors arise.
Nevertheless, you can use a one liner for this, as stated in other answers:
reader = csv.reader(open(file, 'rb'))

So basically you want a one-liner?
reader = csv.reader(open(file, 'rb'))
As said before, the problem with that is with open() allows you to do the following steps in one time:
Open the file
Do what you want with the file (inside your open block)
Close the file (that is implicit and you don't have to specify it)
If you don't use with open but directly open, you file stays opened until the object is garbage collected, and that could lead to unpredicted behaviour in some cases.
Plus, your original code (two lines) is much more readable than a one-liner.

If you put them together, then the file won't be closed automatically -- but that often doesn't really matter, since it will be closed automatically when the script terminates.
It's not common to need to reference the raw file once acsv.readerinstance has been created from (except possibly to explicitly close it if you're not using awithstatement).
If you use the same variable name for both, it will probably work because thecsv.readerinstance will still hold a reference to the file object, so it won't be garbage collected until the program ends. It's not a commonly idiom, however.
Since csv files are often processed sequentially, the following can be a fairly concise way to do it since thecsv.readerinstance frequently doesn't really need to be given a variable name and it will close the file properly even if an exception occurs:
with open(file, 'rb') as readerfile:
for row in csv.reader(readerfile):
process the data...

opening & closing file without file object in python

Opening & closing file using file object:
fp=open("ram.txt","w")
fp.close()
If we want to Open & close file without using file object ,i.e;
open("ram.txt","w")
Do we need to write close("poem.txt") or writing close() is fine?
None of them are giving any error...
By only writing close() ,How it would understand to what file we are referencing?

For every object in memory, Python keeps a reference count. As long as there are no more references to an object around, it will be garbage collected.
The open() function returns a file object.
f = open("myfile.txt", "w")
And in the line above, you keep a reference to the object around in the variable f, and therefore the file object keeps existing. If you do
del f
Then the file object has no references anymore, and will be cleaned up. It'll be closed in the process, but that can take a little while which is why it's better to use the with construct.
However, if you just do:
open("myfile.txt")
Then the file object is created and immediately discarded again, because there are no references to it. It's gone, and closed. You can't close it anymore, because you can't say what exactly you want to close.
open("myfile.txt", "r").readlines()
To evaluate this whole expression, first open is called, which returns a file object, and then the method readlines is called on that. Then the result of that is returned. As there are now no references to the file object, it is immediately discarded again.

I would use with open(...), if I understand the question correctly.
This answer might help you What is the python keyword "with" used for?.

In answer to your actual question... a file object (what you get back when you call open) has the reference to the file in it. So when you do something like:
fp = open(myfile, 'w')
fp.write(...)
fp.close()
Everything in the above, including both write and close, know they reference myfile because that's the file that fp is associated with. I'm not sure what fp.close(myfile) actually does, but it certainly doesn't need the filename after it's open.
Better constructions like
with open(myfile,'w') as fp:
fp.write(...)
don't change this; in this case, fp is also a context manager, but still contains the pointer to myfile; there's no need to remind it.

line-by-line file processing, for-loop vs with

I am trying to understand the trade offs/differences between these to
ways of opening files for line-by-line processing
with open('data.txt') as inf:
for line in inf:
#etc
vs
for line in open('data.txt'):
# etc
I understand that using with ensures the file is closed when the
"with-block" (suite?) is exited (or an exception is countered). So I have been using with ever since I learned about it here.
Re for-loop: From searching around the net and SO, it seems that whether the file
is closed when the for-loop is exited is implementation dependent? And
I couldn't find anything about how this construct would deal with
exceptions. Does anyone know?
If I am mistaken about anything above, I'd appreciate corrections,
otherwise is there a reason to ever use the for construct over the
with? (Assuming you have a choice, i.e., aren't limited by Python version)

The problem with this
for line in open('data.txt'):
# etc
Is that you don't keep an explicit reference to the open file, so how do you close it?
The lazy way is wait for the garbage collector to clean it up, but that may mean that the resources aren't freed in a timely manner.
So you can say
inf = open('data.txt')
for line in inf:
# etc
inf.close()
Now what happens if there is an exception while you are inside the for loop? The file won't get closed explicitly.
Add a try/finally
inf = open('data.txt')
try:
for line in inf:
# etc
finally:
inf.close()
This is a lot of code to do something pretty simple, so Python added with to enable this code to be written in a more readable way. Which gets us to here
with open('data.txt') as inf:
for line in inf:
#etc
So, that is the preferred way to open the file. If your Python is too old for the with statement, you should use the try/finally version for production code

The with statement was only introduced in Python 2.5 - only if you have backward compatibility requirements for earlier versions should you use the latter.
Bit more clarity
The with statement was introduced (as you're aware) to encompass the try/except/finally system - which isn't terrific to understand, but okay. In Python (the Python in C), the implementation of it will close open files. The specification of the language itself, doesn't say... so IPython, JPython etc... may choose to keep files open, memory open, whatever, and not free resources until the next GC cycle (or at all, but the CPython GC is different from the .NET or Java ones...).
I think the only thing I've heard against it, is that it adds another indentation level.
So to summarise: won't work < 2.5, introduces the 'as' keyword and adds an indentation level.
Otherwise, you stay in control of handling exceptions as normal, and the finally block closes resources if something escapes.
Works for me!

import os
path = "c:\\fio"
longer_path = "c:\\fio\\"
# Read every file in directory
for filename in os.listdir(path):
print()
print("Here is the file name",filename)
inf = open(longer_path+filename)
try:
for line in inf:
print(line,end='')
finally:
inf.close()
#output
Here is the file name a.txt
mouse
apple
Here is the file name New Text Document - Copy.txt
cat
Here is the file name New Text Document.txt
dog

Does reading an entire file leave the file handle open?

If you read an entire file with content = open('Path/to/file', 'r').read() is the file handle left open until the script exits? Is there a more concise method to read a whole file?

The answer to that question depends somewhat on the particular Python implementation.
To understand what this is all about, pay particular attention to the actual file object. In your code, that object is mentioned only once, in an expression, and becomes inaccessible immediately after the read() call returns.
This means that the file object is garbage. The only remaining question is "When will the garbage collector collect the file object?".
in CPython, which uses a reference counter, this kind of garbage is noticed immediately, and so it will be collected immediately. This is not generally true of other python implementations.
A better solution, to make sure that the file is closed, is this pattern:
with open('Path/to/file', 'r') as content_file:
content = content_file.read()
which will always close the file immediately after the block ends; even if an exception occurs.
Edit: To put a finer point on it:
Other than file.__exit__(), which is "automatically" called in a with context manager setting, the only other way that file.close() is automatically called (that is, other than explicitly calling it yourself,) is via file.__del__(). This leads us to the question of when does __del__() get called?
A correctly-written program cannot assume that finalizers will ever run at any point prior to program termination.
-- https://devblogs.microsoft.com/oldnewthing/20100809-00/?p=13203
In particular:
Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.
[...]
CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references.
-- https://docs.python.org/3.5/reference/datamodel.html#objects-values-and-types
(Emphasis mine)
but as it suggests, other implementations may have other behavior. As an example, PyPy has 6 different garbage collection implementations!

You can use pathlib.
For Python 3.5 and above:
from pathlib import Path
contents = Path(file_path).read_text()
For older versions of Python use pathlib2:
$ pip install pathlib2
Then:
from pathlib2 import Path
contents = Path(file_path).read_text()
This is the actual read_text implementation:
def read_text(self, encoding=None, errors=None):
"""
Open the file in text mode, read it, and close the file.
"""
with self.open(mode='r', encoding=encoding, errors=errors) as f:
return f.read()

Well, if you have to read file line by line to work with each line, you can use
with open('Path/to/file', 'r') as f:
s = f.readline()
while s:
# do whatever you want to
s = f.readline()
Or even better way:
with open('Path/to/file') as f:
for line in f:
# do whatever you want to

Instead of retrieving the file content as a single string,
it can be handy to store the content as a list of all lines the file comprises:
with open('Path/to/file', 'r') as content_file:
content_list = content_file.read().strip().split("\n")
As can be seen, one needs to add the concatenated methods .strip().split("\n") to the main answer in this thread.
Here, .strip() just removes whitespace and newline characters at the endings of the entire file string,
and .split("\n") produces the actual list via splitting the entire file string at every newline character \n.
Moreover,
this way the entire file content can be stored in a variable, which might be desired in some cases, instead of looping over the file line by line as pointed out in this previous answer.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.