Correct way to write to files? - python

I was wondering if there was any difference between doing:
var1 = open(filename, 'w').write("Hello world!")
and doing:
var1 = open(filename, 'w')
var1.write("Hello world!")
var1.close()
I find that there is no need (AttributeError) if I try to run close() after using the first method (all in one line).
I was wondering if one way was actually any different/'better' than the other, and secondly, what is Python actually doing here? I understand that open() returns a file object, but how come running all of the code in one line automatically closes the file too?

Using with statement is preferred way:
with open(filename, 'w') as f:
f.write("Hello world!")
It will ensure the file object is closed outside the with block.

Let me example to you why your first instance wont work if you initial a close() method. This will be useful for your future venture into learning object orientated programming in Python
Example 1
When you run open(filename, 'w') , it will initialise and return an file handle object.
When you call for open(filename, 'w').write('helloworld'), you are calling the write method on the file object that you initiated. Since the write method do not return any value/object, var1 in your code above will be of NoneType
Example 2
Now in your second example, you are storing the file object as var1.
var1 will have the write method as well as the close method and hence it will work.
This is in contrast to what you have done in your first example.
falsetru have provided a good example of how you can read and write file using the with statement
Reading and Writing file using the with statement
to write
with open(filename, 'w') as f:
f.write("helloworld")
to read
with open(filename) as f:
for line in f:
## do your stuff here
Using nested with statements to read/write multiple files at once
Hi here's an update to your question on the comments. Not too sure if this is the most pythonic way. But if you will like to use the with statement to read/write mulitple files at the same time using the with statement. What you can do is the nest the with statement within one another
For instance :
with open('a.txt', 'r') as a:
with open('b.txt', 'w') as b:
for line in a:
b.write(line)
How and Why
The file object itself is a iterator. Therefore, you could iterator thru the file with a for loop. The file object contains the next() method, which, with each iteration will be called until the end of file is reached.
The with statement was introduced in python 2.5. Prior to python 2.5 to achieve the same effect, one have to
f = open("hello.txt")
try:
for line in f:
print line,
finally:
f.close()
Now the with statement does that automatically for you. The try and finally statement are in place to ensure if there is any expection/error raised in the for loop, the file will be closed.
source : Python Built-in Documentation
Official documentations
Using the with statement, f.close() will be called automatically when it finishes. https://docs.python.org/2/tutorial/inputoutput.html
Happy venture into python
cheers,
biobirdman

#falsetru's answer is correct, in terms of telling you how you're "supposed" to open files. But you also asked what the difference was between the two approaches you tried, and why they do what they do.
The answer to those parts of your question is that the first approach doesn't do what you probably think it does. The following code
var1 = open(filename, 'w').write("Hello world!")
is roughly equivalent to
tmp = open(filename, 'w')
var1 = tmp.write("Hello world!")
del tmp
Notice that the open() function returns a file object, and that file object has a write() method. But write() doesn't have any return value, so var1 winds up being None. From the official documentation for file.write(str):
Write a string to the file. There is no return value. Due to buffering, the string may not actually show up in the file until the flush() or close() method is called.
Now, the reason you don't need to close() is that the main implementation of Python (the one found at python.org, also called CPython) happens to garbage-collect objects that no longer have references to them, and in your one-line version, you don't have any reference to the file object once the statement completes. You'll find that your multiline version also doesn't strictly need the close(), since all references will be cleaned up when the interpreter exits. But see answers to this question for a more detailed explanation about close() and why it's still a good idea to use it, unless you're using with instead.

Related

When are file objects passed directly as argument to another function garbage collected?

If I'm going to pass an open(filename) object directly as argument to another method, say json.load(), in this way:
data = json.load(open("filename.json"))
can I be sure that the open stream is garbage-collected, thus closed, as soon as json.load() finishes its execution, or I'm going to face corruption issues very soon?
I know the best practice would be with open(filename) as f: syntax, but what if I insisted for a one-liner solution?

How to patch method io.RawIOBase.read with unittest?

I've recently learned about unittest.monkey.patch and its variants, and I'd like to use it to unit test for atomicity of a file read function. However, the patch doesn't seem to have any effect.
Here's my set-up. The method under scrutiny is roughly like so (abriged):
#local_storage.py
def read(uri):
with open(path, "rb") as file_handle:
result = file_handle.read()
return result
And the module that performs the unit tests (also abriged):
#test/test_local_storage.py
import unittest.mock
import local_storage
def _read_while_writing(io_handle, size=-1):
""" The patch function, to replace io.RawIOBase.read. """
_write_something_to(TestLocalStorage._unsafe_target_file) #Appends "12".
result = io_handle.read(size) #Should call the actual read.
_write_something_to(TestLocalStorage._unsafe_target_file) #Appends "34".
class TestLocalStorage(unittest.TestCase):
_unsafe_target_file = "test.txt"
def test_read_atomicity(self):
with open(self._unsafe_target_file, "wb") as unsafe_file_handle:
unsafe_file_handle.write(b"Test")
with unittest.mock.patch("io.RawIOBase.read", _read_while_writing): # <--- This doesn't work!
result = local_storage.read(TestLocalStorage._unsafe_target_file) #The actual test.
self.assertIn(result, [b"Test", b"Test1234"], "Read is not atomic.")
This way, the patch should ensure that every time you try to read it, the file gets modified just before and just after the actual read, as if it happens concurrently, thus testing for atomicity of our read.
The unit test currently succeeds, but I've verified with print statements that the patch function doesn't actually get called, so the file never gets the additional writes (it just says "Test"). I've also modified the code as to be non-atomic on purpose.
So my question: How can I patch the read function of an IO handle inside the local_storage module? I've read elsewhere that people tend to replace the open() function to return something like a StringIO, but I don't see how that could fix this problem.
I need to support Python 3.4 and up.
I've finally found a solution myself.
The problem is that mock can't mock any methods of objects that are written in C. One of these is the RawIOBase that I was encountering.
So indeed the solution was to mock open to return a wrapper around RawIOBase. I couldn't get mock to produce a wrapper for me, so I implemented it myself.
There is one pre-defined file that's considered "unsafe". The wrapper writes to this "unsafe" file every time any call is made to the wrapper. This allows for testing the atomicity of file writes, since it writes additional things to the unsafe file while writing. My implementation prevents this by writing to a temporary ("safe") file and then moving that file over the target file.
The wrapper has a special case for the read function, because to test atomicity properly it needs to write to the file during the read. So it reads first halfway through the file, then stops and writes something, and then reads on. This solution is now semi-hardcoded (in how far is halfway), but I'll find a way to improve that.
You can see my solution here: https://github.com/Ghostkeeper/Luna/blob/0e88841d19737fb1f4606917f86e3de9b5b9f29b/plugins/storage/localstorage/test/test_local_storage.py

Python opening and reading files one liner

I know best way to file opening for reading/writing is using with with
instead of using
f = open('file.txt', 'w')
f.write('something')
f.close()
we should write --
with open('file.txt', 'w') as f:
f.write('something')
but what if I want to simply read a file ? I can do this
with open('file.txt') as f:
print (f.read())
but what is the problem in below line
print (open('file.txt').read())
OR
alist = open('file.txt').readlines()
print (alist)
does it automatically close the file after executing this statement? is this a standard way to write? should we write like this?
other than this - should I open a file in a function and pass the pointer to other for writing or should I declared it as a global variable? i.e
def writeto(f):
#do some stuff
f.write('write stuff')
def main():
f = open('file.txt', 'w')
while somecondition:
writeto(f)
f. close()
OR
f = open('file.txt', 'w')
def writeto():
#do some stuff
f.write('write stuff')
def main():
while somecondition:
writeto()
f. close()
Addressing your questions in order,
"What's wrong with print(open(file).readlines())?"
Well, you throw away the file object after using it so you cannot close it. Yes, Python will eventually automatically close your file, but on its terms rather than yours. If you are just playing around in a shell or terminal, this is probably fine, because your session will likely be short and there isn't any resource competition over the files usually. However, in a production environment, leaving file handles open for the lifetime of your script can be devastating to performance.
As far as creating a function that takes a file object and writes to it, well, that's essentially what file.write is. Consider that a file handle is an object with methods, and those methods behind the scenes take self (that is, the object) as the first argument. So write itself is already a function that takes a file handle and writes to it! You can create other functions if you want, but you're essentially duplicating the default behavior for no tangible benefit.
Consider that your first function looks sort of like this:
def write_to(file_handle, text):
return file_handle.write_to(text)
But what if I named file_handle self instead?
def write_to(self, text):
return self.write(text)
Now it looks like a method instead of a stand-alone function. In fact, if you do this:
f = open(some_file, 'w') # or 'a' -- some write mode
write_to = f.write
you have almost the same function (just bound to that particular file_handle)!
As an exercise, you can also create your own context managers in Python (to be used with the with statement). You do this by defining __enter__ and __exit__. So technically you could redefine this as well:
class FileContextManager():
def __init__(self, filename):
self.filename = filename
self._file = None
def __enter__(self):
self._file = open(self.filename, 'w')
def __exit__(self, type, value, traceback):
self._file.close()
and then use it like:
with FileContextManager('hello.txt') as filename:
filename.write('Hi!')
and it would do the same thing.
The point of all of this is just to say that Python is flexible enough to do all of this if you need to reimplement this and add to the default behavior, but in the standard case there's no real benefit to doing so.
As far as the program in your example, there's nothing wrong with virtually any of these ways in the trivial case. However, you are missing an opportunity to use the with statement in your main function:
def main():
with open('file.txt') as filename:
while some_condition:
filename.write('some text')
# file closed here after we fall off the loop then the with context
if __name__ == '__main__':
main()
If you want to make a function that takes a file handle as an argument:
def write_stuff(file_handle, text):
return file_handle.write(text)
def main():
with open('file.txt', 'w') as filename:
while some_condition:
write_stuff(filename, 'some text')
# file closed here after we fall off the loop then the with context
if __name__ == '__main__':
main()
Again, you can do this numerous different ways, so what's best for what you're trying to do? What's the most readable?
"Should I open a file in a function and pass the pointer to other for writing or should I declared it as a module variable?"
Well, as you've seen, either will work. This question is highly context-dependent and normally best practice dictates having the file open for the least amount of time and in the smallest reasonable scope. Thus, what needs access to your file? If it is many things in the module, perhaps a module-level variable or class to hold it is a good idea. Again, in the trivial case, just use with as above.
The problem is you put a space between print and ().
This code works good in Python3:
print(open('yourfile.ext').read())

Using a variable at once or passing it around

This question isn't really about Python but about Python in particular. Say, I've read a file:
f = open(file_name, "rb")
body = f.read()
f.close()
Since the file is already read in the variable and closed, does it matter in terms of performance whether I use the variable body within the same method or pass around to another method? Is there enough information to answer my question?
Python doesn't create copies of objects passed to functions (including strings)
def getidof(s):
return id(s)
s = 'blabla'
id(s) == getidof(s) # True
So even passing a huge string doesn't affect performances, of course you will have a slight overhead because you called a function, but the type of the argument and its length doesn't matter.

Why the syntax for open() and .read() is different?

This is a newbie question, but I looked around and I'm having trouble finding anything specific to this question (perhaps because it's too simple/obvious to others).
So, I am working through Zed Shaw's "Learn Python the Hard Way" and I am on exercise 15. This isn't my first exposure to python, but this time I'm really trying to understand it at a more fundamental level so I can really do something with a programming language for once. I should also warn that I don't have a good background in object oriented programming or fully internalized what objects, classes, etc. etc. are.
Anyway, here is the exercise. The ideas is to understand basic file opening and reading:
from sys import argv
script, filename = argv
txt = open(filename)
print "Here's your file %r:" % filename
print txt.read()
print "I'll also ask you to type it again:"
file_again = raw_input("> ")
txt_again = open(file_again)
print txt_again.read()
txt.close()
txt_again.close()
My question is, why are the open and read functions used differntly?
For example, to read the example file, why don't/can't I type print read(txt) on line 8?
Why do I put a period in front of the variable and the function after it?
Alternatively, why isn't line 5 written txt = filename.open()?
This is so confusing to me. Is it simply that some functions have one syntax and others another syntax? Or am I not understanding something with respect to how one passes variables to functions.
Syntax
Specifically to the syntactical differences: open() is a function, read() is an object method.
When you call the open() function, it returns an object (first txt, then txt_again).
txt is an object of class file. Objects of class file are defined with the method read(). So, in your code above:
txt = open(filename)
Calls the open() function and assigns an object of class file into txt.
Afterwards, the code:
txt.read()
calls the method read() that is associated with the object txt.
Objects
In this scenario, it's important to understand that objects are defined not only as data entities, but also with built-in actions against those entities.
e.g. A hypothetical object of class car might be defined with methods like start_engine(), stop_engine(), open_doors(), etc.
So as a parallel to your file example above, code for creating and using a car might be:
my_car = create_car(type_of_car)
my_car.start_engine()
(Wikipedia entry on OOP.)
To answer this you should have some understanding of object oriented programming.
open() is a normal function, and the first parameter is a string, with the path to the file. The return value of this function is an object.
The further work is done by using this object. An object also has functions, but they are called methods. These methods are called in the context of this object, and the point connects the object with the method. So txt.read() means that you are calling the read-method from the txt-object.
But if you really want to understand this, you should have a look at OOP.
You're coming up against methods vs functions.
open is a global function, and it takes as its parameters simply the things that go between the brackets.
read is a method of file objects. The expression txt.read() calls the read method of the txt object. Under the hood, the txt object is passed as the first parameter of its read method. The read method will be defined something like this:
class File(object):
def read(self):
# do whatever here
# self is whatever object appears to the left of the dot in foo.read
It follows from the above definition that you can only use a method like read on an object which has a read method defined for it.

Categories