Python: capturing all writes to a file in memory

Python: capturing all writes to a file in memory - python

Is there some way to "capture" all attempted writes to a particular file /my/special/file, and instead write that to a BytesIO or StringIO object instead, or some other way to get that output without actually writing to disk?
The use case is: there's a 'handler' function, whose contract is that it should write its output to /my/special/file. I don't have any control over this handler function -- I don't write it, I don't know its contents and I can't change its contents, and the contract cannot change. I'd like to be able to do something like this:
# 'output' has whatever 'handler' has written to `/my/special/file`
output = handler.run(data)
Even if this is an odd request, I'd like to be able to do this even with a 'hackier' answer.
EDIT: my code (and handler) will be invoked many times on a lot of chunks of data, so performance (both latency and throughput) are important.
Thanks.

If you're talking about code in your own Python program, you could monkey-patch the built in open function before that code gets called. Here's a really stupid example, but it shows that you can do this. This causes code that thinks it's writing to a file to instead write into an in-memory buffer. The calling code then prints what the foreign code wrote to the file:
import io
# The function you don't have access to that writes to a file
def foo():
f = open("/tmp/foo", "w")
f.write("blahblahblah\n")
f.close()
# The buffer to contain the captured text
capture_buffer = ""
# My silly file-like object that only handles write(str) and close()
class MyFileClass:
def write(self, str):
global capture_buffer
capture_buffer += str
def close(self):
pass
# patch open to return a MyFileClass instance
def my_open2(*args, **kwargs):
return MyFileClass()
open = my_open2
# Call the target function
foo()
# Print what the function wrote to "the file"
print(capture_buffer)
Result:
blahblahblah
Sorry for not spending more time with this. Just showing you it's possible. As others say, a mocking module might be the way to go to not have to grow your own thing here. I don't know if they allow access to what is written. I guess they must. Such a module is just going to do a better job of what I've shown here.
If your program does other file IO with open, or whichever method the mystery code uses to open the file, you'd check the incoming path and only return your special object if it was the one path you're interested in. Otherwise, you could just call the original open, which you could stash away under another name.

Related

Understanding python close method

Is it correctly understood that the following two functions do the exact same? No matter how they are invoked.
def test():
file = open("testfile.txt", "w")
file.write("Hello World")
def test_2():
with open("testfile.txt", "w") as f:
f.write("Hello World")
Since python invokes the close method when an object is no longer referenced.
If not then this quote confuses me:
Python automatically closes a file when the reference object of a file
is reassigned to another file. It is a good practice to use the
close() method to close a file.
from https://www.tutorialspoint.com/python/file_close.htm

No, the close method would be invoked by python garbage collector (finalizer) machinery in the first case, and immediately in the second case. If you loop calling your test or test_2 functions thousands of times, the observed behavior could be different.
File descriptors are (at least on Linux) a precious and scarce resource (when it is exhausted, the open(2) syscall fails). On Linux use getrlimit(2) with RLIMIT_NOFILE to query the limit on the number of file descriptors for your process. You should prefer the close(2) syscall to be invoked quickly once a file handle is useless.
Your question is implementation specific, operating system specific, and computer specific. You may want to understand more about operating systems by reading Operating Systems: Three Easy Pieces.
On Linux, try also the cat /proc/$$/limits or cat /proc/self/limits command in a terminal. You would see a line starting with Max open files (on my Debian desktop computer, right now in december 2019, the soft limit is 1024). See proc(5).

No. The first one will not save the information correctly. You need to use file.close() to ensure that file is closed properly and data is saved.
On the other hand, with statement handles file operations for you. It will keep the file open for as long as the program keeps executing at the same indent level and as soon as it goes to a level higher will automatically close and save the file.
More information here.

In case of test function, close method is not called until Python garbage collector will del f, in this case it's invoked by file __del__ magic method which is invoked on variable deletion.
In case of test_2 function, close method is called when code execution goes outside of with statement. Read more about python context managers which is used by with statement.
with foo as f:
do_something()
roughly is just syntax sugar for:
f = foo.__enter__()
do_something()
f.__exit__()
and in case of file, __exit__ implicitly calls close

No, it is not correctly understood. The close method is invoked via the __exit__ method, which is only invoked when exiting a with statement not when exiting a function. Se code example below:
class Temp:
def __exit__(self, exc_type, exc_value, tb):
print('exited')
def __enter__(self):
pass
def make_temp():
temp = Temp()
make_temp()
print('temp_make')
with Temp() as temp:
pass
print('temp_with')
Witch outputs:
temp_make
exited
temp_with

How do I ignore characters using the python pty module?

I want to write a command-line program that communicates with other interactive programs through a pseudo-terminal. In particular I want to be able to cause keystrokes received to conditionally be sent to the underlying process. Let's say for an example that I would like to silently ignore any "e" characters that are sent.
I know that Python has a pty module for working with pseudo-terminals and I have a basic version of my program working using it:
import os
import pty
def script_read(stdin):
data = os.read(stdin, 1024)
if data == b"e":
return ... # What goes here?
return data
pty.spawn(["bash"], script_read)
From experimenting, I know that returning an empty bytes object b"" causes the pty.spawn implementation to think that the underlying file descriptor has reached the end of file and should no longer be read from, which causes the terminal to become totally unresponsive (I had to kill my terminal emulator!).

For interactive use, the simplest way to do this is probably to just return a bytes object containing a single null byte: b"\0". The terminal emulator will not print anything for it and so it will look like that input is just completely ignored.
This probably isn't great for certain usages of pseudo-terminals. In particular, if the content written to the pseudo-terminal is going to be written again by the attached program this would probably cause random null bytes to appear in the file. Testing with cat as the attached program, the sequence ^# is printed to the terminal whenever a null byte is sent to it.
So, YMMV.
A more proper solution would be to create a wrapper type that can masquerade as an empty string for the purposes of os.write but that would evaluate as "truthy" in a boolean context to not trigger the end of file conditional. I did some experimenting with this and couldn't figure out what needs to be faked to make os.write fully accept the wrapper as a string type. I'm unclear if it's even possible. :(
Here's my initial attempt at creating such a wrapper type:
class EmptyBytes():
def __init__(self):
self.sliced = False
def __class__(self):
return type(b"")
def __getitem__(self, _key):
return b""

Why it's needed to open file every time we want to append the file

As the thread How do you append to a file?, most answer is about open a file and append to it, for instance:
def FileSave(content):
with open(filename, "a") as myfile:
myfile.write(content)
FileSave("test1 \n")
FileSave("test2 \n")
Why don't we just extract myfile out and only write to it when FileSave is invoked.
global myfile
myfile = open(filename)
def FileSave(content):
myfile.write(content)
FileSave("test1 \n")
FileSave("test2 \n")
Is the latter code better cause it's open the file only once and write it multiple times?
Or, there is no difference cause what's inside python will guarantee the file is opened only once albeit the open method is invoked multiple times.

There are a number of problems with your modified code that aren't really relevant to your question: you open the file in read-only mode, you never close the file, you have a global statement that does nothing…
Let's ignore all of those and just talk about the advantages and disadvantages of opening and closing a file over and over:
Wastes a bit of time. If you're really unlucky, the file could even just barely keep falling out of the disk cache and waste even more time.
Ensures that you're always appending to the end of the file, even if some other program is also appending to the same file. (This is pretty important for, e.g., syslog-type logs.)1
Ensures that you've flushed your writes to disk at some point, which reduces the chance of lost data if your program crashes or gets killed.
Ensures that you've flushed your writes to disk as soon as you write them. If you try to open and read the file elsewhere in the same program, or in a different program, or if the end user just opens it in Notepad, you won't be missing the last 1.73KB worth of lines because they're still in a buffer somewhere and won't be written until later.2
So, it's a tradeoff. Often, you want one of those guarantees, and the performance cost isn't a big deal. Sometimes, it is a big deal and the guarantees don't matter. Sometimes, you really need both, so you have to write something complicated where you manually buffer up bits and write-and-flush them all at once.
1. As the Python docs for open make clear, this will happen anyway on some Unix systems. But not on other Unix systems, and not on Windows..
2. Also, if you have multiple writers, they're all appending a line at a time, rather than appending whenever they happen to flush, which is again pretty important for logfiles.

In general global should be avoided if possible.
The reason that people use the with command when dealing with files is that it explicitly controls the scope. Once the with operator is done the file is closed and the file variable is discarded.
You can avoid using the with operator but then you must remember to call myfile.close(). Particularly if you're dealing with a lot of files.
One way that avoids using the with block that also avoids using global is
def filesave(f_obj, string):
f_obj.write(string)
f = open(filename, 'a')
filesave(f, "test1\n")
filesave(f, "test2\n")
f.close()
However at this point you'd be better off getting rid of the function and just simply doing:
f = open(filename, 'a')
f.write("test1\n")
f.write("test2\n")
f.close()
At which point you could easily put it within a with block:
with open(filename, 'a') as f:
f.write("test1\n")
f.write("test2\n")
So yes. There's no hard reason to not do what you're doing. It's just not very Pythonic.

The latter code may be more efficient, but the former code is safer because it makes sure that the content that each call to FileSave writes to the file gets flushed to the filesystem so that other processes can read the updated content, and by closing the file handle with each call using open as a context manager, you allow other processes a chance to write to the file as well (specifically in Windows).

It really depends on the circumstances, but here are some thoughts:
A with block absolutely guarantees that the file will be closed once the block is exited. Python does not make and weird optimizations for appending files.
In general, globals make your code less modular, and therefore harder to read and maintain. You would think that the original FileSave function is attempting to avoid globals, but it's using the global name filename, so you may as well use a global file altogether at that point, as it will save you some I/O overhead.
A better option would be to avoid globals at all, or to at least use them properly. You really don't need a separate function to wrap file.write, but if it represents something more complex, here is a design suggestion:
def save(file, content):
print(content, file=file)
def my_thing(filename):
with open(filename, 'a') as f:
# do some stuff
save(f, 'test1')
# do more stuff
save(f, 'test2')
if __name__ == '__main__':
my_thing('myfile.txt')
Notice that when you call the module as a script, a file name defined in the global scope will be passed in to the main routine. However, since the main routine does not reference global variables, you can A) read it easier because it's self contained, and B) test it without having to wonder how to feed it inputs without breaking everything else.
Also, by using print instead of file.write, you avoid having to spend newlines manually.

Three ways to print in Python -- when to use each?

According to Tim Peters, "There should be one-- and preferably only one --obvious way to do it." In Python, there appears to be three ways to print information:
print('Hello World', end='')
sys.stdout.write('Hello World')
os.write(1, b'Hello World')
Question: Are there best-practice policies that state when each of these three different methods of printing should be used in a program?

Note that the statement of Tim is perfectly correct: there is only one obvious way to do it: print().
The other two possibilities that you mention have different goals.
If we want to summarize the goals of the three alternatives:
print is the high-level function that allow you to write something to stdout(or an other file). It provides a simple and readable API, with some fancy options about how the single items are separated, or whether you want to add or not a terminator etc. This is what you want to do most of the time.
sys.stdout.write is just a method of the file objects. So the real point of sys.stdout is that you can pass it around as if it were any other file. This is useful when you have to deal with a function that is expecting a file and you want it to print the text directly on stdout.
In other words you shouldn't use sys.stdout.write at all. You just pass around sys.stdout to code that expects a file.
Note: in python2 there were some situations where using the print statement produced worse code than calling sys.stdout.write. However the print function allows you to define the separator and terminator and thus avoids almost all these corner cases.
os.write is a low-level call to write to a file. You must manually encode the contents and you also have to pass the file descriptor explicitly. This is meant to handle only low level code that, for some reason, cannot be implemented on top of the higher-level interfaces. You almost never want to call this directly, because it's not required and has a worse API than the rest.
Note that if you have code that should write down things on a file, it's better to do:
my_file.write(a)
# ...
my_file.write(b)
# ...
my_file.write(c)
Than:
print(a, file=my_file)
# ...
print(b, file=my_file)
# ...
print(c, file=my_file)
Because it's more DRY. Using print you have to repeat file= everytime. This is fine if you have to write only in one place of the code, but if you have 5/6 different writes is much easier to simply call the write method directly.

To me print is the right way to print to stdout, but :
There is a good reason why sys.stdout.write exists - Imagine a class which generates some text output, and you want to make it write to either stdout, and file on disk, or a string. Ideally the class really shouldn't care what output type it is writing to. The class can simple be given a file object, and so long as that object supports the write method, the class can use the write method to output the text.

Two of these methods require importing entire modules. Based on this alone, print() is the best standard use option.
sys.stdout is useful whenever stdout may change. This gives quite a bit of power for stream handling.
os.write is useful for os specific writing tasks (non blocking writes for instance)
This question has been asked a number of times on this site for sys.stdout vs. print:
Python - The difference between sys.stdout.write and print
print() vs sys.stdout.write(): which and why?
One example for using os.write (non blocking file writes demonstrated in the question below). The function may only be useful on some os's but it still must remain portable even when certain os's don't support different/special behaviors.
How to write to a file using non blocking IO?

File open and close in python

I have read that when file is opened using the below format
with open(filename) as f:
#My Code
f.close()
explicit closing of file is not required . Can someone explain why is it so ? Also if someone does explicitly close the file, will it have any undesirable effect ?

The mile-high overview is this: When you leave the nested block, Python automatically calls f.close() for you.
It doesn't matter whether you leave by just falling off the bottom, or calling break/continue/return to jump out of it, or raise an exception; no matter how you leave that block. It always knows you're leaving, so it always closes the file.*
One level down, you can think of it as mapping to the try:/finally: statement:
f = open(filename)
try:
# My Code
finally:
f.close()
One level down: How does it know to call close instead of something different?
Well, it doesn't really. It actually calls special methods __enter__ and __exit__:
f = open()
f.__enter__()
try:
# My Code
finally:
f.__exit__()
And the object returned by open (a file in Python 2, one of the wrappers in io in Python 3) has something like this in it:
def __exit__(self):
self.close()
It's actually a bit more complicated than that last version, which makes it easier to generate better error messages, and lets Python avoid "entering" a block that it doesn't know how to "exit".
To understand all the details, read PEP 343.
Also if someone does explicitly close the file, will it have any undesirable effect ?
In general, this is a bad thing to do.
However, file objects go out of their way to make it safe. It's an error to do anything to a closed file—except to close it again.
* Unless you leave by, say, pulling the power cord on the server in the middle of it executing your script. In that case, obviously, it never gets to run any code, much less the close. But an explicit close would hardly help you there.

Closing is not required because the with statement automatically takes care of that.
Within the with statement the __enter__ method on open(...) is called and as soon as you go out of that block the __exit__ method is called.
So closing it manually is just futile since the __exit__ method will take care of that automatically.
As for the f.close() after, it's not wrong but useless. It's already closed so it won't do anything.
Also see this blogpost for more info about the with statement: http://effbot.org/zone/python-with-statement.htm

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: capturing all writes to a file in memory - python

Related

Understanding python close method

How do I ignore characters using the python pty module?

Why it's needed to open file every time we want to append the file

Three ways to print in Python -- when to use each?

File open and close in python

Categories

Resources