Let's say we want to implement an equivalent of the PHP's file_get_content.
What is the best practice? (elegant and reliable)
Here are some proposition, are they correct?
using with statement:
def file_get_contents(filename):
with file(filename) as f:
s = f.read()
return s
is using standard open() safe?
def file_get_contents(filename):
return open(filename).read()
What happens to file descriptor in either solution?
In the current implementation of CPython, both will generally immediately close the file. However, Python the language makes no such guarantee for the second one - the file will eventually be closed, but the finaliser may not be called until the next gc cycle. Implementations like Jython and IronPython will work like this, so it's good practice to explicitely close your files.
I'd say using the first solution is the best practice, though open is generally preferred to file. Note that you can shorten it a little though if you prefer the brevity of the second example:
def file_get_contents(filename):
with open(filename) as f:
return f.read()
The __exit__ part of the context manager will execute when you leave the body for any reason, including exceptions and returning from the function - there's no need to use an intermediate variable.
import os
def file_get_contents(filename):
if os.path.exists(filename):
fp = open(filename, "r")
content = fp.read()
fp.close()
return content
This case it will return None if the file didn't exist, and the file descriptor will be closed before we exit the function.
Using the with statement is actually the nicest way to be sure that the file is really closed.
Depending on the garbage collector behavior for this task might work, but in this case, there is a nice way to be sure in all cases, so...
with will make sure that the file is closed when the block is left.
In your second example, the file handle might remain open (Python makes no guarantee that it's closed or when if you don't do it explicitly).
You can also use Python's v3 feature:
>>> ''.join(open('htdocs/config.php', 'r').readlines())
"This is the first line of the file.\nSecond line of the file"
Read more here http://docs.python.org/py3k/tutorial/inputoutput.html
Related
From what I gather, open() in really io.open(), a high level "wrapper" for os.open().
For other file operations like renaming and removing files I have to use os funcations like os.remove and os.rename a file or even shutil.move in some cases, like below:
import shutil
with open("/tmp/workfile", "w") as f:
f.write("some stuff")
shutil.move(f.name, "finalfile")
Why is there no similar wrapper like open for removal/renaming?
Is there a better, perhaps more pythonic way of accomplishing above task?
It seems strange to have to do imports instead of maybe having rename and remove be methods in f, point it over to another file. Especially when open() requires no import.
edit: I removed the del f at the end that seemed to anger a lot of people. I know it's not needed. I had it there to highlight that an f-object that no longer points to a removed file has very little use.
#micke, wanna bet this is due to history? I'm guessing the open function was one of the first added to Python because, well... the creator of the language needed that early on.
I'd argue that having open as a built-in function is weird, not the other way around.
Also note that you're using the variable f outside of the with block. And the with block automatically calls close on f when the block exits, so the del f statement is not necessary. That's the whole point of using with blocks (forgetting to .close() is a very common mistake)
Example:
subprocess.call(cmd, stdout=open('status_grid','a'), cwd = folder)
is the file status_grid closed automatically?
No, it doesn't:
import subprocess
f = open('b','a')
subprocess.call('ls', stdout=f)
print f.closed
Output:
False
Now a better answer might come from unutbu. You don't give your open file a reference, so once your subprocess completes, it's up to the garbage collector how much longer the file is open.
One way to be sure is
with open('status_grid', 'a') as my_file:
subprocess.call(cmd, stdout=my_file, cwd = folder)
If not done explicitly, the file will be closed when it is garbage collected. When the file is garbage collected is not specified by the Python language per se.
In CPython, the file is garbage collected when there are no more references to the file object.
With other implementations of Python, such as Jython, garbage collection may happen completely differently:
Jython has "true" garbage collection whereas CPython uses reference
counting. This means that in Jython users don't need to worry about
handling circular references as these are guaranteed to be collected
properly. On the other hand, users of Jython have no guarantees of
when an object will be finalized -- this can cause problems for people
who use open("foo", 'r').read() excessively. Both behaviors are
acceptable -- and highly unlikely to change.
As EMS and Charles Salvia point out, to be sure when the file is closed, it is best to not leave it up to the garbage collector. The best way to do that is to use a with statement, which guarantees the file will be closed when Python leaves the with-suite:
with open('status_grid','a') as f:
subprocess.call(cmd, stdout=f, cwd = folder)
No it's not. You can wrap your call in a with statement to ensure the file closes automatically:
with open('status_grid','a') as myfile:
subprocess.call(cmd, stdout=myfile, cwd = folder)
Note: with current CPython implementations based on reference counting, the file will be closed when the reference count reaches 0, which will happen immediately in the code you posted. However, this is just an implementation detail of CPython. Other implementations may leave the file open indefinitely. Use the with statement to ensure you've written portable code.
Note: I'm used to using Dependency Injection with C# code,
but from what I understand, dynamic languages like Ruby and Python are
like play-doh not LEGOs, and thus don't need to follow use IoC
containers, though there is some debate on if IoC patterns are still useful. In the code below I used fudge's .patch feature that provides the seams needed to mock/stub the code. However, the components of the code are thus coupled. I'm not sure I like this. This SO answer also explains that coupling in dynamic languages is looser than static ones, but does reference another answer in that question that says the tools for IoC are unneeded but the patterns are not. So a side question would be, "Should I have used DI for this?"
I'm using the following python frameworks:
Nose for unit testing
Fudge for fakes (stubs, mocking, etc)
Here is the resulting production code:
def to_fasta(seq_records, file_name):
file_object = open(file_name, "w")
Bio.SeqIO.write(seq_records, file_object, "fasta")
file_object.close()
Now I did TDD this code, but I did it with the following test (which wasn't all the thorough):
#istest
#fudge.patch('__builtin__.open', 'Bio.SeqIO.write')
def to_fasta_writes_file(fake_open, fake_SeqIO):
fake_open.is_a_stub()
fake_SeqIO.expects_call()
seq_records = build_expected_output_sequneces()
file_path = "doesn't matter"
to_fasta(seq_records, file_path)
Here is the updated test along with explicit comments to ensure I'm following the Four-Phase Test pattern:
#istest
#fudge.patch('__builtin__.open', 'Bio.SeqIO')
def to_fasta_writes_file(fake_open, fake_SeqIO):
# Setup
seq_records = build_expected_output_sequneces()
file_path = "doesn't matter"
file_type = 'fasta'
file_object = fudge.Fake('file').expects('close')
(fake_open
.expects_call()
.with_args(file_path, 'w')
.returns(file_object))
(fake_SeqIO
.is_callable()
.expects("write")
.with_args(seq_records, file_object, file_type))
# Exercise
to_fasta(seq_records, file_path)
# Verify (not needed due to '.patch')
# Teardown
While the second example is more thorough, is this test overkill? Are there better ways to TDD python code? Basically, I'm looking for feedback on how I did with TDDing this operation, and would welcome any alternative ways to write either the test code or the production code.
Think about what this function does and think about what you actually have responsibility for. It looks to me like: given some data and a file name, write the records in to the file in a particular format (fasta). You aren't actually responsible for the workings of Python file I/O, or how Bio.SeqIO works.
Your second version tests that:
The file is opened for writing.
That Bio.SeqIO.write is called with the expected parameters.
The file is closed.
That looks pretty good. Most of this is simple, and some people may call it overkill, but the TDD approach can help remind you to do something like close the file (obvious, but we all forget stuff like that all the time). These tests also guard against such things as Bio.SeqIO.write being changed in the future to expect different parameters. You can either upgrade your version of the library and wonder why your program breaks, or upgrade your version of the library, run your tests, and know why and where it breaks.
Naturally you should write other tests for the case when you can't open the file, or any exceptions that Bio.SeqIO.write might throw.
I'm writing a Python script that needs to write some data to a temporary file, then create a subprocess running a C++ program that will read the temporary file. I'm trying to use NamedTemporaryFile for this, but according to the docs,
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
And indeed, on Windows if I flush the temporary file after writing, but don't close it until I want it to go away, the subprocess isn't able to open it for reading.
I'm working around this by creating the file with delete=False, closing it before spawning the subprocess, and then manually deleting it once I'm done:
fileTemp = tempfile.NamedTemporaryFile(delete = False)
try:
fileTemp.write(someStuff)
fileTemp.close()
# ...run the subprocess and wait for it to complete...
finally:
os.remove(fileTemp.name)
This seems inelegant. Is there a better way to do this? Perhaps a way to open up the permissions on the temporary file so the subprocess can get at it?
Since nobody else appears to be interested in leaving this information out in the open...
tempfile does expose a function, mkdtemp(), which can trivialize this problem:
try:
temp_dir = mkdtemp()
temp_file = make_a_file_in_a_dir(temp_dir)
do_your_subprocess_stuff(temp_file)
remove_your_temp_file(temp_file)
finally:
os.rmdir(temp_dir)
I leave the implementation of the intermediate functions up to the reader, as one might wish to do things like use mkstemp() to tighten up the security of the temporary file itself, or overwrite the file in-place before removing it. I don't particularly know what security restrictions one might have that are not easily planned for by perusing the source of tempfile.
Anyway, yes, using NamedTemporaryFile on Windows might be inelegant, and my solution here might also be inelegant, but you've already decided that Windows support is more important than elegant code, so you might as well go ahead and do something readable.
According to Richard Oudkerk
(...) the only reason that trying to reopen a NamedTemporaryFile fails on
Windows is because when we reopen we need to use O_TEMPORARY.
and he gives an example of how to do this in Python 3.3+
import os, tempfile
DATA = b"hello bob"
def temp_opener(name, flag, mode=0o777):
return os.open(name, flag | os.O_TEMPORARY, mode)
with tempfile.NamedTemporaryFile() as f:
f.write(DATA)
f.flush()
with open(f.name, "rb", opener=temp_opener) as f:
assert f.read() == DATA
assert not os.path.exists(f.name)
Because there's no opener parameter in the built-in open() in Python 2.x, we have to combine lower level os.open() and os.fdopen() functions to achieve the same effect:
import subprocess
import tempfile
DATA = b"hello bob"
with tempfile.NamedTemporaryFile() as f:
f.write(DATA)
f.flush()
subprocess_code = \
"""import os
f = os.fdopen(os.open(r'{FILENAME}', os.O_RDWR | os.O_BINARY | os.O_TEMPORARY), 'rb')
assert f.read() == b'{DATA}'
""".replace('\n', ';').format(FILENAME=f.name, DATA=DATA)
subprocess.check_output(['python', '-c', subprocess_code]) == DATA
You can always go low-level, though am not sure if it's clean enough for you:
fd, filename = tempfile.mkstemp()
try:
os.write(fd, someStuff)
os.close(fd)
# ...run the subprocess and wait for it to complete...
finally:
os.remove(filename)
At least if you open a temporary file using existing Python libraries, accessing it from multiple processes is not possible in case of Windows. According to MSDN you can specify a 3rd parameter (dwSharedMode) shared mode flag FILE_SHARE_READ to CreateFile() function which:
Enables subsequent open operations on a file or device to request read
access. Otherwise, other processes cannot open the file or device if
they request read access. If this flag is not specified, but the file
or device has been opened for read access, the function fails.
So, you can write a Windows specific C routine to create a custom temporary file opener function, call it from Python and then you can make your sub-process access the file without any error. But I think you should stick with your existing approach as it is the most portable version and will work on any system and thus is the most elegant implementation.
Discussion on Linux and windows file locking can be found here.
EDIT: Turns out it is possible to open & read the temporary file from multiple processes in Windows too. See Piotr Dobrogost's answer.
Using mkstemp() instead with os.fdopen() in a with statement avoids having to call close():
fd, path = tempfile.mkstemp()
try:
with os.fdopen(fd, 'wb') as fileTemp:
fileTemp.write(someStuff)
# ...run the subprocess and wait for it to complete...
finally:
os.remove(path)
I know this is a really old post, but I think it's relevant today given that the API is changing and functions like mktemp and mkstemp are being replaced by functions like TemporaryFile() and TemporaryDirectory(). I just wanted to demonstrate in the following sample how to make sure that a temp directory is still available downstream:
Instead of coding:
tmpdirname = tempfile.TemporaryDirectory()
and using tmpdirname throughout your code, you should trying to use your code in a with statement block to insure that it is available for your code calls... like this:
with tempfile.TemporaryDirectory() as tmpdirname:
[do dependent code nested so it's part of the with statement]
If you reference it outside of the with then it's likely that it won't be visible anymore.
I have a bunch of print calls that I need to write to a file instead of stdout. (I don't need stdout at all.)
I am considering three approaches. Are there any advantages (including performance) to any one of them?
Full redirect, which I saw here:
import sys
saveout = sys.stdout
fsock = open('out.log', 'w')
sys.stdout = fsock
print(x)
# and many more print calls
# later if I ever need it:
# sys.stdout = saveout
# fsock.close()
Redirect in each print statement:
fsock = open('out.log', 'w')
print(x, file = fsock)
# and many more print calls
Write function:
fsock = open('out.log', 'w')
fsock.write(str(x))
# and many more write calls
I would not expect any durable performance differences among these approaches.
The advantage of the first approach is that any reasonably well-behaved code which you rely upon (modules you import) will automatically pick up your desired redirection.
The second approach has no advantage. It's only suitable for debugging or throwaway code ... and not even a good idea for that. You want your output decisions to be consolidated in a few well-defined places, not scattered across your code in every call to print(). In Python3 print() is a function rather than a statement. This allows you to re-define it, if you like. So you can def print(*args) if you want. You can also call __builtins__.print() if you need access to it, within the definition of your own custom print(), for example.
The third approach ... and by extension the principle that all of your output should be generated in specific functions and class methods that you define for that purpose ... is probably best.
You should keep your output and formatting separated from your core functionality as much as possible. By keeping them separate you allow your core to be re-used. (For example you might start with something that's intended to run from a text/shell console, and later need to provide a Web UI, a full-screen (curses) front end or a GUI for it. You may also build entirely different functionality around it ... in situations where the resulting data needs to be returned in its native form (as objects) rather than pulled in as text (output) and re-parsed into new objects.
For example I've had more than one occasional where something I wrote to perform some complex queries and data gathering from various sources and print a report ... say of the discrepancies ... later need to be adapted into a form which could spit out the data in some form (such as YAML/JSON) that could be fed into some other system (say, for reconciling one data source against another.
If, from the outset, you keep the main operations separate from the output and formatting then this sort of adaptation is relatively easy. Otherwise it entails quite a bit of refactoring (sometimes tantamount to a complete re-write).
From the filenames you're using in your question, it sounds like you're wanting to create a log file. Have you consider the Python logging module instead?
I think that semantics is imporant:
I would suggest first approach for situation when you printing the same stuff you would print to console. Semantics will be the same. For more complex situation I would use standard logging module.
The second and third approach are a bit different in case you are printing text lines. Second approach - print adds the newline and write does not.
I would use the third approach in writing mainly binary or non-textual format and I would use redirect in print statement in the most other cases.