This question already has answers here:
How to read a file line-by-line into a list?
(28 answers)
Closed 7 months ago.
I am pretty new to Python. So I was trying out my first basic piece of code. So i was trying to read a file and print it line by line in Python. Here is my code:
class ReadFile(object):
def main (self):
readFile = ReadFile()
readFile.printData()
def printData(self):
filename = "H:\\Desktop\\TheFile.txt"
try:
with open(filename, 'r') as f:
value = f.readline()
print(value)
f.close()
except Exception as ex:
print(ex)
Now When I run it, I get no output. So I tried debugging it. I see the control jumps from one method to another (main --> printData) and then exists. Its doesn't execute anything within the method. Can you tell me what am I doing wrong here? I am new, so a little insight as to why the code is behaving this way would be nice as well.
If the idea here is to understand how to read a file line by line then all you need to do is:
with open(filename, 'r') as f:
for line in f:
print(line)
It's not typical to put this in a try-except block.
Coming back to your original code there are several mistakes there which I'm assuming stem from a lack of understanding of how classes are defined/work in python.
The way you've written that code suggests you perhaps come from a Java background. I highly recommend doing one of the myriad free and really good online python courses offered on Coursera, or EdX.
Anyways, here's how I would do it using a class:
class ReadFile:
def __init__(self, path):
self.path = path
def print_data(self):
with open(self.path, 'r') as f:
for line in f:
print(line)
if __name__ == "__main__":
reader = ReadFile("H:\\Desktop\\TheFile.txt")
reader.print_data()
You don't really need a class for this and neither do you need a try block or a file.close when using a context manager (With open ....).
Please read up on how classes are used in python. A function will do for this
def read():
filename = "C:\\Users\\file.txt"
with open(filename, 'r') as f:
for line in f:
print(line)
you don't usually put main method in a class. Python is not like Java or C#. all the code outside of class will execute when you load the file.
you only create classes when you want to encapsulate some data with methods together in an object. In your case it looks like you don't need a class at all, but if you want one you have to explicitly create and call it, for example:
class A:
def __init__(self):
print('constructor')
def bang(self):
print('bang')
# code outside of the class gets executed (like a 'main' method in Java/C#)
a = A()
a.bang()
There are a few problems here.
The first is that you are declaring a class but then not using it (except from within itself). You would need to create an instance of the class outside of the class (or call a class method on it) in order for it to be instantiated.
class ReadFile:
def print_data(self):
...
# Create a new object which is an instance of the class ReadFile
an_object = ReadFile()
# Call the print_data() method on an_object
an_object.print_data()
Now, you don't actually need to use classes to solve this problem, so you could ignore all this, and just use the code you have inside your printData method:
filename = "H:\\Desktop\\TheFile.txt"
try:
with open(filename, 'r') as f:
value = f.readline()
print(value)
# don't need to do this, as f is only valid within the
# scope of the 'with' block
# f.close()
except Exception as ex:
print(ex)
You will find this almost does what you want. You just need to modify it to print the entire file, not just the first line. Here, instead of just reading a single line with f.readline() we can iterate over the result of f.readlines():
filename = "H:\\Desktop\\TheFile.txt"
try:
with open(filename, 'r') as f:
for value in f.readlines(): # read all lines
print(value)
except Exception as ex:
print(ex)
Related
I've created a script using Python to parse the movie names and its years spread across multiple pages from a torrent site and write them to a csv file. It is working errorlessly and writing the data to a csv file without any issues.
I did the whole thing without the usage of this very line return itemlist within my get_data() function and as I create this function write_data() fully independent so I wrote the data to a CSV file taken from this list itemlist located under the variable URLS.
If I keep the existing design intact, is it necessary to use this very line return itemlist which is commented out now? If so, why?
import requests
from bs4 import BeautifulSoup
import csv
URLS = ["https://yts.am/browse-movies?page={}".format(page) for page in range(1,6)]
itemlist = []
def get_data(links):
for url in links:
res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for record in soup.select('.browse-movie-bottom'):
items = {}
items["Name"] = record.select_one('.browse-movie-title').text
items["Year"] = record.select_one('.browse-movie-year').text
itemlist.append(items)
# return itemlist
def write_data():
with open("outputfile.csv","w", newline="") as f:
writer = csv.DictWriter(f,['Name','Year'])
writer.writeheader()
for data in itemlist:
writer.writerow(data)
if __name__ == '__main__':
get_data(URLS)
write_data()
With existing design you don't need that line because your get_data intend to modify list from outer scope instead of return list.
But if you want to rename itemlist you need to rename it also in both get_data and write_data (in all functions that might use it)
You might need return itemlist if you define write_data as
def write_data(some_list):
...
and use it as
if __name__ == '__main__':
write_data(get_data(URLS))
In this case write_data receives list returned by get_data and you don't need to define itemlist = [] outside get_data
In this one specific case your script will work. But wouldn't it be nice to reuse your function somewhere else? In a different file, you could say:
from xy import get_data
links = ["url1", "url2",...]
a = get_data(links)
and work with it without the need of rewriting the function.
Let's do this in a Python console:
>>> import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
...
Aphorism 2 states that "Explicit is better than implicit.". Therefore, I think its better for readability to inform a potential user of these functions what side effects is expected by calling them. And to achieve that, you need to return the list of data you get, and give it to write_data to do its work.
As a self-contained script, your script is good. As a library, it's not practical unless you return the list you populate.
Using global variable in any programming language is not appreciated. If you want to reuse your functions you can have the return statement and pass the result to the write_data function. I would suggest you to have a look at what #andersson has given above
In your code, itemlist is the global variable. Return statement is not required with the above cases.
I know that decorator can expand function ability without original function change. so case of Real World Examples, decorator is useful about logging or Timing Functions.
and my question is.. I make decorator for only one function, not other function. decorator is dependent to original function like this.
def read_file(file_path):
file_data=''
with open(file_path) as f:
file_data = f.read()
return file_data
def only_compare_file_value_decorator(f):
def wrapper(*args):
d1 = read_file(args[0])
d2 = read_file(args[1])
return f(d1,d2)
return wrapper
#only_compare_file_value_decorator
def compare_file_value(*args):
return compare_hash(args[0], args[1])
In case above, only_compare_file_value_decorator is only for function compare_file_value.. I wonder this decorator is good or not in generally.
please tell me your advice.
As stated in Zen of Python Simple is better than complex. Adding wrapper in this case only obfuscates the code and makes it harder to read. Knowing how to write them is important, but your code would be easier to read if you moved only_compare_file_value_decorator implementation into compare_file_value body. You also use *args when you really expect 2 filenames as input, which also obfuscates the code.
def read_file(file_path):
file_data=''
with open(file_path) as f:
file_data = f.read()
return file_data
def compare_file_value(file_name_1, file_name_2):
file1, file2 = read_file(file_name_1), read_file(file_name_2)
return compare_hash(file1, file2)
I know best way to file opening for reading/writing is using with with
instead of using
f = open('file.txt', 'w')
f.write('something')
f.close()
we should write --
with open('file.txt', 'w') as f:
f.write('something')
but what if I want to simply read a file ? I can do this
with open('file.txt') as f:
print (f.read())
but what is the problem in below line
print (open('file.txt').read())
OR
alist = open('file.txt').readlines()
print (alist)
does it automatically close the file after executing this statement? is this a standard way to write? should we write like this?
other than this - should I open a file in a function and pass the pointer to other for writing or should I declared it as a global variable? i.e
def writeto(f):
#do some stuff
f.write('write stuff')
def main():
f = open('file.txt', 'w')
while somecondition:
writeto(f)
f. close()
OR
f = open('file.txt', 'w')
def writeto():
#do some stuff
f.write('write stuff')
def main():
while somecondition:
writeto()
f. close()
Addressing your questions in order,
"What's wrong with print(open(file).readlines())?"
Well, you throw away the file object after using it so you cannot close it. Yes, Python will eventually automatically close your file, but on its terms rather than yours. If you are just playing around in a shell or terminal, this is probably fine, because your session will likely be short and there isn't any resource competition over the files usually. However, in a production environment, leaving file handles open for the lifetime of your script can be devastating to performance.
As far as creating a function that takes a file object and writes to it, well, that's essentially what file.write is. Consider that a file handle is an object with methods, and those methods behind the scenes take self (that is, the object) as the first argument. So write itself is already a function that takes a file handle and writes to it! You can create other functions if you want, but you're essentially duplicating the default behavior for no tangible benefit.
Consider that your first function looks sort of like this:
def write_to(file_handle, text):
return file_handle.write_to(text)
But what if I named file_handle self instead?
def write_to(self, text):
return self.write(text)
Now it looks like a method instead of a stand-alone function. In fact, if you do this:
f = open(some_file, 'w') # or 'a' -- some write mode
write_to = f.write
you have almost the same function (just bound to that particular file_handle)!
As an exercise, you can also create your own context managers in Python (to be used with the with statement). You do this by defining __enter__ and __exit__. So technically you could redefine this as well:
class FileContextManager():
def __init__(self, filename):
self.filename = filename
self._file = None
def __enter__(self):
self._file = open(self.filename, 'w')
def __exit__(self, type, value, traceback):
self._file.close()
and then use it like:
with FileContextManager('hello.txt') as filename:
filename.write('Hi!')
and it would do the same thing.
The point of all of this is just to say that Python is flexible enough to do all of this if you need to reimplement this and add to the default behavior, but in the standard case there's no real benefit to doing so.
As far as the program in your example, there's nothing wrong with virtually any of these ways in the trivial case. However, you are missing an opportunity to use the with statement in your main function:
def main():
with open('file.txt') as filename:
while some_condition:
filename.write('some text')
# file closed here after we fall off the loop then the with context
if __name__ == '__main__':
main()
If you want to make a function that takes a file handle as an argument:
def write_stuff(file_handle, text):
return file_handle.write(text)
def main():
with open('file.txt', 'w') as filename:
while some_condition:
write_stuff(filename, 'some text')
# file closed here after we fall off the loop then the with context
if __name__ == '__main__':
main()
Again, you can do this numerous different ways, so what's best for what you're trying to do? What's the most readable?
"Should I open a file in a function and pass the pointer to other for writing or should I declared it as a module variable?"
Well, as you've seen, either will work. This question is highly context-dependent and normally best practice dictates having the file open for the least amount of time and in the smallest reasonable scope. Thus, what needs access to your file? If it is many things in the module, perhaps a module-level variable or class to hold it is a good idea. Again, in the trivial case, just use with as above.
The problem is you put a space between print and ().
This code works good in Python3:
print(open('yourfile.ext').read())
I was wondering if there was any difference between doing:
var1 = open(filename, 'w').write("Hello world!")
and doing:
var1 = open(filename, 'w')
var1.write("Hello world!")
var1.close()
I find that there is no need (AttributeError) if I try to run close() after using the first method (all in one line).
I was wondering if one way was actually any different/'better' than the other, and secondly, what is Python actually doing here? I understand that open() returns a file object, but how come running all of the code in one line automatically closes the file too?
Using with statement is preferred way:
with open(filename, 'w') as f:
f.write("Hello world!")
It will ensure the file object is closed outside the with block.
Let me example to you why your first instance wont work if you initial a close() method. This will be useful for your future venture into learning object orientated programming in Python
Example 1
When you run open(filename, 'w') , it will initialise and return an file handle object.
When you call for open(filename, 'w').write('helloworld'), you are calling the write method on the file object that you initiated. Since the write method do not return any value/object, var1 in your code above will be of NoneType
Example 2
Now in your second example, you are storing the file object as var1.
var1 will have the write method as well as the close method and hence it will work.
This is in contrast to what you have done in your first example.
falsetru have provided a good example of how you can read and write file using the with statement
Reading and Writing file using the with statement
to write
with open(filename, 'w') as f:
f.write("helloworld")
to read
with open(filename) as f:
for line in f:
## do your stuff here
Using nested with statements to read/write multiple files at once
Hi here's an update to your question on the comments. Not too sure if this is the most pythonic way. But if you will like to use the with statement to read/write mulitple files at the same time using the with statement. What you can do is the nest the with statement within one another
For instance :
with open('a.txt', 'r') as a:
with open('b.txt', 'w') as b:
for line in a:
b.write(line)
How and Why
The file object itself is a iterator. Therefore, you could iterator thru the file with a for loop. The file object contains the next() method, which, with each iteration will be called until the end of file is reached.
The with statement was introduced in python 2.5. Prior to python 2.5 to achieve the same effect, one have to
f = open("hello.txt")
try:
for line in f:
print line,
finally:
f.close()
Now the with statement does that automatically for you. The try and finally statement are in place to ensure if there is any expection/error raised in the for loop, the file will be closed.
source : Python Built-in Documentation
Official documentations
Using the with statement, f.close() will be called automatically when it finishes. https://docs.python.org/2/tutorial/inputoutput.html
Happy venture into python
cheers,
biobirdman
#falsetru's answer is correct, in terms of telling you how you're "supposed" to open files. But you also asked what the difference was between the two approaches you tried, and why they do what they do.
The answer to those parts of your question is that the first approach doesn't do what you probably think it does. The following code
var1 = open(filename, 'w').write("Hello world!")
is roughly equivalent to
tmp = open(filename, 'w')
var1 = tmp.write("Hello world!")
del tmp
Notice that the open() function returns a file object, and that file object has a write() method. But write() doesn't have any return value, so var1 winds up being None. From the official documentation for file.write(str):
Write a string to the file. There is no return value. Due to buffering, the string may not actually show up in the file until the flush() or close() method is called.
Now, the reason you don't need to close() is that the main implementation of Python (the one found at python.org, also called CPython) happens to garbage-collect objects that no longer have references to them, and in your one-line version, you don't have any reference to the file object once the statement completes. You'll find that your multiline version also doesn't strictly need the close(), since all references will be cleaned up when the interpreter exits. But see answers to this question for a more detailed explanation about close() and why it's still a good idea to use it, unless you're using with instead.
This question already has answers here:
How to do unit testing of functions writing files using Python's 'unittest'
(7 answers)
Closed 8 years ago.
I have a Python function that takes a list as an argument and writes it to a file:
def write_file(a):
try:
f = open('testfile', 'w')
for i in a:
f.write(str(i))
finally:
f.close()
How do I test this function ?
def test_write_file(self):
a = [1,2,3]
#what next ?
Call the write_file function and check whether testfile is created with expected content.
def test_write_file(self):
a = [1,2,3]
write_file(a)
with open('testfile') as f:
assert f.read() == '123' # Replace this line with the method
# provided by your testing framework.
If you don't want test case write to actual filesystem, use something like mock.mock_open.
First solution: rewrite you function to accept a writable file-like object. You can then pass a StringIO instead and test the StringIO's value after the call.
Second solution: use some mock library that let you patch builtins.