If i've done the following:
import codecs
lines = codecs.open(somefile, 'r','utf8').readlines()
Is there a way to close the file that i've not initialized? If so, how? Normally, i could have done:
import codecs
reader = codecs.open(somefile, 'r','utf8')
lines = reader.readlines()
reader.close()
In CPython, the file object will close on its own once the reference count drops to 0, which is right after .readlines() returns. For other Python implementations it may take a little longer depending on the garbage collection algorithm used. The file is certainly going to be closed no later than program exit.
You should really use the file object as a context manager and have the with statement call close on it:
with codecs.open(somefile, 'r','utf8') as reader:
lines = reader.readlines()
As soon as the block of code indented under the with statement exits (be it with an exception, a return, continue or break statement, or simply because all code in the block finished executing), the reader file object will be closed.
Bonus tip: file objects are iterables, so the following also works:
with codecs.open(somefile, 'r','utf8') as reader:
lines = list(reader)
for the exact same result.
Related
So I am tasked with creating a function that returns the amount of times a substring appears in a given string and the index of the substring every time it appears.
But when I run my code, I get a "I/O operation on closed file" error. Anyone know how to fix this?
# 1. Import the text.csv file
import csv
with open('text.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
# 2. Complete function counter. The function should return the number
of times the substring appears & their index
def counter(substring):
substring_counter = 0
string = csv_reader
for substring in csv_file:
substring_counter = substring_counter + 1
print('Counter = ', substring_counter)
print(string.find(substring))
# do not edit the code below
counter("TCA")
This is happening because csv.reader is local to the module, not your function. The reader object returned by csv module maintains a reference a file handle (returned by open). On the first iteration of the for loop, that file handle ends up going to the end, i.e., it gets exhausted but on the second iteration, you are causing csv reader object to try to read from the end of a file which causes that error.
Before or after every iteration of the loop, you can reset file pointer to beginning by doing something like this: csv_file.seek(0)
A better solution would be to store all the file contents into a buffer and access it repeatedly (without having to re-read file contents from file handle).
Since you are using with (which is good), you have explicitly limited the scope of the open file, yet as the previous response pointed out, the csv reader object is used outside that scope. The reader in this case is just a wrapper for the file and does not read everything initially. You either need to read the whole file within the with or move the with inside the function and everything that references the file under it.
with open(file, 'rb') as readerfile:
reader = csv.reader(readerfile)
In the above syntax, can I perform the first and second line together? It seems unnecessary to use 2 variables ('readerfile' and 'reader' above) if I only need to use the latter.
Is the former variable ('readerfile') ever used?
Can I use the same variable name for both is that bad form?
You can do:
reader = csv.reader(open(file, 'rb'))
but that would mean you are not closing your file explicitly.
with open(file, 'rb') as readerfile:
The first line opens the file and stores the file object in readerfile. The with statement ensures that the file is closed when you exit the block by any means, including exceptions.
reader = csv.reader(readerfile)
The second line creates a CSV reader object using the file object. It needs the file object (otherwise where would it read the data from?). Of course you could conceivably store it in the same variable
readerfile = csv.reader(readerfile)
if you wanted to (and don't plan on using the file object again), but this will likely lead to confusion for readers of your code.
Note that you haven't read anything yet! You still need to iterate over the reader object in order to get the data that you're interested in, and if you close the file before that happens then the reader object won't work. The file object is used behind the scenes by the reader object, even if you "hide" it by overwriting the readerfile variable.
Lastly, if you really want to do everything on one line, you could conceivably define a function that abstracts the with statement:
def with1(context, func):
with context as x:
return func(x)
Now you can write this as one line:
data = with1(open(file, 'rb'), lambda readerfile: list(csv.reader(readerfile)))
It's by no means clearer, however.
This is not recommended at all
Why is it important to use one line?
Most python programmers know well the benefits of using the with statement. Keep in mind that readers might be lazy (that is -read line by line-) on some cases. You want to be able to handle the file with the correct statement, ensuring the correct closing, even if errors arise.
Nevertheless, you can use a one liner for this, as stated in other answers:
reader = csv.reader(open(file, 'rb'))
So basically you want a one-liner?
reader = csv.reader(open(file, 'rb'))
As said before, the problem with that is with open() allows you to do the following steps in one time:
Open the file
Do what you want with the file (inside your open block)
Close the file (that is implicit and you don't have to specify it)
If you don't use with open but directly open, you file stays opened until the object is garbage collected, and that could lead to unpredicted behaviour in some cases.
Plus, your original code (two lines) is much more readable than a one-liner.
If you put them together, then the file won't be closed automatically -- but that often doesn't really matter, since it will be closed automatically when the script terminates.
It's not common to need to reference the raw file once acsv.readerinstance has been created from (except possibly to explicitly close it if you're not using awithstatement).
If you use the same variable name for both, it will probably work because thecsv.readerinstance will still hold a reference to the file object, so it won't be garbage collected until the program ends. It's not a commonly idiom, however.
Since csv files are often processed sequentially, the following can be a fairly concise way to do it since thecsv.readerinstance frequently doesn't really need to be given a variable name and it will close the file properly even if an exception occurs:
with open(file, 'rb') as readerfile:
for row in csv.reader(readerfile):
process the data...
I am in python and there is a lot of ways to access files.
Method 1:
fp = open("hello.txt", "w")
fp.write("No no no");
fp.close()
fp = open("hello.txt", "r")
print fp.read()
fp.close()
Method 2:
open("hello.txt", "w").write("hello world!")
print open("hello.txt", "r").read()
Method 3:
with open("hello.txt","w") as f:
f.write("Yes yes yes")
with open("hello.txt") as f:
print f.read()
Is there a specific advantage in using each of these?
Stuff I know:
Method 2 and Method 3 closes the file automatically, but
Method 1 doesn't.
Method 2 doesn't give you a handle to do multiple operations.
You should use the third method.
There is a common pattern in programming where to use some object you have set it up, run your code, and tear it down again. File handles are one example of this: you have to open the file, run your code, and then close the file. This last is not optional -- it's important for the operating system to know that you are done with it, and for Python to flush all the data out of its IO buffers.
Now, CPython is a reference counted language. That means that it counts how many pieces of code 'know about' a given object, so that when that count becomes zero it can clean up said object and reuse its space in memory. In method 2, the reference count of the file object becomes zero, which allows Python to clean it up. And file objects' cleanup method also closes them. However, you should in general not rely on this -- reference counting is an implementation detail of the standard version of Python, and there's no guarantee that whatever you're using to run the program will do the same. That's why you shouldn't use method 2.
Method 1 is better, because you explicitly close the file -- as long as you reach the .close() function call! If an exception was thrown in the middle of that code block, the close would not be reached, and the file would not be explicitly closed. So you should really wrap the middle code in a try... finally block.
Method 3 is therefore best: you use the with statement -- an idiomatic way of enclosing the .close in a finally block -- to close the file, so you don't have to worry about the extra syntactic fluff of try... except.
I'd use this, kind of extended version of method 3:
with open("hello.txt","w+") as f:
f.write("Yes yes yes")
f.seek(0) #places the cursor back to the start of the file
print f.read() #now read the file
Advantages:
It opens the file only once
w+ mode allows both read and write on the same file object
with takes care of the closing of file
I would think it is best to go method one as it is explicit and you can surround it with a try and except block or the method 3
If you read an entire file with content = open('Path/to/file', 'r').read() is the file handle left open until the script exits? Is there a more concise method to read a whole file?
The answer to that question depends somewhat on the particular Python implementation.
To understand what this is all about, pay particular attention to the actual file object. In your code, that object is mentioned only once, in an expression, and becomes inaccessible immediately after the read() call returns.
This means that the file object is garbage. The only remaining question is "When will the garbage collector collect the file object?".
in CPython, which uses a reference counter, this kind of garbage is noticed immediately, and so it will be collected immediately. This is not generally true of other python implementations.
A better solution, to make sure that the file is closed, is this pattern:
with open('Path/to/file', 'r') as content_file:
content = content_file.read()
which will always close the file immediately after the block ends; even if an exception occurs.
Edit: To put a finer point on it:
Other than file.__exit__(), which is "automatically" called in a with context manager setting, the only other way that file.close() is automatically called (that is, other than explicitly calling it yourself,) is via file.__del__(). This leads us to the question of when does __del__() get called?
A correctly-written program cannot assume that finalizers will ever run at any point prior to program termination.
-- https://devblogs.microsoft.com/oldnewthing/20100809-00/?p=13203
In particular:
Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.
[...]
CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references.
-- https://docs.python.org/3.5/reference/datamodel.html#objects-values-and-types
(Emphasis mine)
but as it suggests, other implementations may have other behavior. As an example, PyPy has 6 different garbage collection implementations!
You can use pathlib.
For Python 3.5 and above:
from pathlib import Path
contents = Path(file_path).read_text()
For older versions of Python use pathlib2:
$ pip install pathlib2
Then:
from pathlib2 import Path
contents = Path(file_path).read_text()
This is the actual read_text implementation:
def read_text(self, encoding=None, errors=None):
"""
Open the file in text mode, read it, and close the file.
"""
with self.open(mode='r', encoding=encoding, errors=errors) as f:
return f.read()
Well, if you have to read file line by line to work with each line, you can use
with open('Path/to/file', 'r') as f:
s = f.readline()
while s:
# do whatever you want to
s = f.readline()
Or even better way:
with open('Path/to/file') as f:
for line in f:
# do whatever you want to
Instead of retrieving the file content as a single string,
it can be handy to store the content as a list of all lines the file comprises:
with open('Path/to/file', 'r') as content_file:
content_list = content_file.read().strip().split("\n")
As can be seen, one needs to add the concatenated methods .strip().split("\n") to the main answer in this thread.
Here, .strip() just removes whitespace and newline characters at the endings of the entire file string,
and .split("\n") produces the actual list via splitting the entire file string at every newline character \n.
Moreover,
this way the entire file content can be stored in a variable, which might be desired in some cases, instead of looping over the file line by line as pointed out in this previous answer.
I am just beginning with python with lpthw and had a specific question for closing a file.
I can open a file with:
input = open(from_file)
indata = input.read()
#Do something
indata.close()
However, if I try to simplify the code into a single line:
indata = open(from_file).read()
How do I close the file I opened, or is it already automatically closed?
Thanks in advance for the help!
You simply have to use more than one line; however, a more pythonic way to do it would be:
with open(path_to_file, 'r') as f:
contents = f.read()
Note that with what you are doing before, you could miss closing the file if an exception was thrown. The 'with' statement here will cause it be closed even if an exception is propagated out of the 'with' block.
Files are automatically closed when the relevant variable is no longer referenced. It is taken care of by Python garbage collection.
In this case, the call to open() creates a File object, of which the read() method is run. After the method is executed, no reference to it exists and it is closed (at least by the end of script execution).
Although this works, it is not good practice. It is always better to explicitly close a file, or (even better) to follow the with suggestion of the other answer.