How to efficiently close and open files in a Python logger class

How to efficiently close and open files in a Python logger class - python

I'm implementing a simple logging class that writes out some messages to a log file. I have a doubt on how to manage the opening/closing of the file in a sensible and pythonic way.
I understood that the idiomatic way to do the writing in files is via the with statement. Therefore this is a simplified version of the code I have:
class Logger():
def __init__(self, filename, mode='w', name='root'):
self.filename = filename
self.name = name
# remove the previous content of the file if mode for logger is 'w'
if mode == 'w':
with open(self.filename, 'w') as f:
f.write('')
def info(self, msg):
with open(self.filename, 'a') as f:
f.write(f'INFO:{self.name}:{msg}\n')
logger = Logger('log.txt')
logger.info('Starting program')
The problem is that this implementation will open and close the file as many times as the logger is called, which will be hundred of times. I'm concerned with this being an overheat of the program (the runtime of this program is important). It perhaps would be more sensible to open the file at the moment of creation of the logger, and close it when the program finishes. But this goes against the "use width" rule, and certainly there is a serious risk that I (or the user of the class) will forget to manually close the file at the end. Other problem of this approach is that if I want to create different loggers that dump to the same file, I'll have to add careful checks to know whether the file is already open or not by previous loggers...
So all in all, what's the most pythonic and sensible way to handle the opening/closing of files in this context?

While I agree with the other comments that the most pythonic way is to use the standard lib, I'll try to answer your question as it was asked.
I think the with construct is a great construct but it doesn't mean it works in every situation. Opening and saving a file handle for continual use is not unpythonic if it makes sense in your situation (IMO). Opening, do something, and closing it in the same function with try/except/finally blocks would be unpythonic. I think it may be preferred to only open it when you first try to use it (instead of at creation time). But that can depend on the rest of the application.
If you start creating different loggers that write to the same file, if in the same process, I would think the goal would be to have a single open file handle that all the loggers write to instead of each logger having their own handle they write to. But multi-instance and multi-process logging synchronization is where the stdlib shines, so...you know...your mileage may vary.

Related

Python Logging - moving file on startup

In my Flask application I have implemented a logging system using the logging library. It is currently run in a function below:
if __name__ == "__main__":
"""[Runs the webserver.
Finally block is used for some logging management. It will first shut down
logging, to ensure no files are open, then renames the file to 'log_'
+ the current date, and finally moves the file to the /logs archive
directory]
"""
try:
session_management.clean_uploads_on_start(UPLOAD_FOLDER)
app.run(debug=False)
finally:
try:
logging.shutdown()
new_log_file_name = log_management.rename_log(app.config['DEFAULT_LOG_NAME'])
log_management.move_log(new_log_file_name)
except FileNotFoundError:
logging.warning("Current log file not found")
except PermissionError:
logging.warning("Permissions lacking to rename or move log.")
I discovered that the file is not renamed and moved if (either) the cmd prompt is force closed, or if the server crashes. I thought it might be better to put the rename and move into the initial 'try' block of the function, prior to the server starting, but I run into issues because I have a config file (which is imported in this script) which has the following code:
logging.basicConfig(filename='current_log.log', level=logging.INFO,
filemode='a',
format='%(asctime)s:%(levelname)s:%(message)s')
I have tried to do something like the below, but I still run into permission errors, but I think I am still running into errors because the log_management script also imports config. Further, I could not find a function which starts the logging system similar to logging.shutdown() which is used upon the system ending, otherwise I would shut it down, move the file (if it exists) and the start it back up.
try:
session_management.clean_uploads_on_start(UPLOAD_FOLDER)
log_management.check_log_on_startup(app.config['DEFAULT_LOG_NAME'])
import config
app.run(debug=False)
finally:
try:
logging.shutdown()
new_log_file_name = log_management.rename_log(app.config['DEFAULT_LOG_NAME'])
log_management.move_log(new_log_file_name)
except FileNotFoundError:
logging.warning("Current log file not found")
except PermissionError:
logging.warning("Permissions lacking to rename or move log.")
# (in another script)
def check_log_on_startup(file_name):
if os.path.exists(file_name):
move_log(rename_log(file_name))
Any suggestions much welcomed, because I feel like I'm at a brick wall!

As you have already found out, trying to perform cleanups at the end of your process life cycle has the potential to fail if the process terminates uncleanly.
The issue with performing the cleanup at the start is, that you apparently call logging.basicConfig from your import before attempting to move the old log file.
This leads to the implicitly created FileHandler holding an open file object on the existing log when you attempt to rename and move it. Depending on the file system you are using, this might not be met with joy.
If you want to move the handling of potential old log files to the start of your application completely, you have to perform the renaming and moving before you call logging.basicConfig, so you'll have to remove it from your import and add it to the log_management somehow.
As an alternative, you could move the whole handling of log files to the logging file handler by subclassing the standard FileHandler class, e.g:
import logging
import os
from datetime import datetime
class CustomFileHandler(logging.FileHandler):
def __init__(self, filename, archive_path='archive', archive_name='log_%Y%m%d', **kwargs):
self._archive = os.path.join(archive_path, archive_name)
self._archive_log(filename)
super().__init__(filename, **kwargs)
def _archive_log(self, filepath):
if os.path.exists(filepath):
os.rename(filepath, datetime.now().strftime(self._archive))
def close(self):
super().close()
self._archive_log(self.baseFilename)
With this, you would configure your logging like so:
hdler = CustomFileHandler('current.log')
logging.basicConfig(level=logging.INFO, handlers=[hdler],
format='%(asctime)s:%(levelname)s:%(message)s')
The CustomFileHandler will check for, and potentially archive, old logs during initialization. This will deal with leftovers after an unclean process termination where the shutdown cleanup cannot take place. Since the parent class initializer is called after the log archiving is attempted, there is not yet an open handle on the log that would cause a PermissionError.
The overwritten close() method will perform the archiving on a clean process shutdown.
This should remove the need for the dedicated log_management module, at least as far as the functions you show in your code are concerned. rename_log, move_log and check_log_on_startup are all encapsulated in the CustomFileHandler. There is also no need to explicitly call logging.shutdown().
Some notes:
The reason you cannot find a start function equivalent to logging.shutdown() is that the logging system is started/initialized when you import the logging module. Among other things, it instantiates the implicit root logger and registers logging.shutdown as exit handler via atexit.
The latter is the reason why there is no need to explicitly call logging.shutdown() with the above solution. The Python interpreter will call it during finalization when preparing for interpreter shutdown due to the exit handler registration. logging.shutdown() then iterates through the list of registered handlers and calls their close() methods, which will perform the log archiving during a clean shutdown.
Depending on the method you choose for moving (and renaming) the old log file, the above solution might need some additional safeguards against exceptions. os.rename will raise an exception if the destination path already exists, i.e. when you have already stopped and started your process previously on the same day while os.replace would silently overwrite the existing file. See more details about moving files via Python here.
Thus I would recommend to name the archived logs not only by current date but also by time.
In the above, adding the current date to the archive file name is done via datetime's strftime, hence the 'log_%Y%m%d' as default for the archive_name parameter of the custom file handler. The characters with a preceding % are valid format codes that strftime() replaces with the respective parts of the datetime object it is called on. To append the current time to the archive log file name you would simply append the respective format codes to the archive_name, e.g.: 'log_%Y%m%d_%H%M%S' which would result in a log name such as log_20200819_123721.

Can I use Python 3 'with open(filename, mode) as file:' and keep it in an object to send it events?

I'm writing a manager class that will allow creation of different types of log files (raw, CSV, custom data format), and I'd like to keep the log file open to write lines as they come in. The log file can also be started and stopped by events (button presses, conditions).
I want to know if I can combine the with open('file') as file: syntax, with an instance variable in a class - so I'm not stuck polling in a loop while the file is open, but instead can write to the file by event.
I know how to use the open and close methods, but everyone says "with" is much more robust, and (more) certain to write the file to disk.
I want to do something like this:
class logfile:
def __init__(self,filename,mode):
with open(filename, mode) as self.f:
return
def write(self, input):
self.f.write(input)
and use it like:
lf = logfile("junk.txt","wt") # yeah, lf has to be global to use like this. Keeping demo simple here.
...then leave method, do other stuff to refresh screen, respond to other events, and later when a data line to log comes in:
lf.write(dataline)
I then expect things to close cleanly, file to get flushed to disk when lf disappears - either implicitly at program close, or explicitly when I set lf to None.
When I try this, the file is (apparently) closed at return from creation of lf. I can inspect lf and see that
lf.f == <_io.TextIOWrapper name='junk.txt' mode='wt' encoding='UTF-8'>
but when I try to use lf.write("text"), I get:
ValueError: I/O operation on closed file.
Is there a way of using "with" and keeping it in an instance?
Failing that, should I just use open, close and write, and ensure I have try/except/finally in init and close in exit?

The with syntax is a context manager in Python. So, what it does is, it calls file.close() method once it is out of context. So, the file is closed already after its return. I think you can write your class this way:
class logfile:
def __init__(self, filename, mode):
self.filename = filename
self.mode = mode
def write(self, text):
with open(self.filename, self.mode) as f:
f.write(text)

Why it's needed to open file every time we want to append the file

As the thread How do you append to a file?, most answer is about open a file and append to it, for instance:
def FileSave(content):
with open(filename, "a") as myfile:
myfile.write(content)
FileSave("test1 \n")
FileSave("test2 \n")
Why don't we just extract myfile out and only write to it when FileSave is invoked.
global myfile
myfile = open(filename)
def FileSave(content):
myfile.write(content)
FileSave("test1 \n")
FileSave("test2 \n")
Is the latter code better cause it's open the file only once and write it multiple times?
Or, there is no difference cause what's inside python will guarantee the file is opened only once albeit the open method is invoked multiple times.

There are a number of problems with your modified code that aren't really relevant to your question: you open the file in read-only mode, you never close the file, you have a global statement that does nothing…
Let's ignore all of those and just talk about the advantages and disadvantages of opening and closing a file over and over:
Wastes a bit of time. If you're really unlucky, the file could even just barely keep falling out of the disk cache and waste even more time.
Ensures that you're always appending to the end of the file, even if some other program is also appending to the same file. (This is pretty important for, e.g., syslog-type logs.)1
Ensures that you've flushed your writes to disk at some point, which reduces the chance of lost data if your program crashes or gets killed.
Ensures that you've flushed your writes to disk as soon as you write them. If you try to open and read the file elsewhere in the same program, or in a different program, or if the end user just opens it in Notepad, you won't be missing the last 1.73KB worth of lines because they're still in a buffer somewhere and won't be written until later.2
So, it's a tradeoff. Often, you want one of those guarantees, and the performance cost isn't a big deal. Sometimes, it is a big deal and the guarantees don't matter. Sometimes, you really need both, so you have to write something complicated where you manually buffer up bits and write-and-flush them all at once.
1. As the Python docs for open make clear, this will happen anyway on some Unix systems. But not on other Unix systems, and not on Windows..
2. Also, if you have multiple writers, they're all appending a line at a time, rather than appending whenever they happen to flush, which is again pretty important for logfiles.

In general global should be avoided if possible.
The reason that people use the with command when dealing with files is that it explicitly controls the scope. Once the with operator is done the file is closed and the file variable is discarded.
You can avoid using the with operator but then you must remember to call myfile.close(). Particularly if you're dealing with a lot of files.
One way that avoids using the with block that also avoids using global is
def filesave(f_obj, string):
f_obj.write(string)
f = open(filename, 'a')
filesave(f, "test1\n")
filesave(f, "test2\n")
f.close()
However at this point you'd be better off getting rid of the function and just simply doing:
f = open(filename, 'a')
f.write("test1\n")
f.write("test2\n")
f.close()
At which point you could easily put it within a with block:
with open(filename, 'a') as f:
f.write("test1\n")
f.write("test2\n")
So yes. There's no hard reason to not do what you're doing. It's just not very Pythonic.

The latter code may be more efficient, but the former code is safer because it makes sure that the content that each call to FileSave writes to the file gets flushed to the filesystem so that other processes can read the updated content, and by closing the file handle with each call using open as a context manager, you allow other processes a chance to write to the file as well (specifically in Windows).

It really depends on the circumstances, but here are some thoughts:
A with block absolutely guarantees that the file will be closed once the block is exited. Python does not make and weird optimizations for appending files.
In general, globals make your code less modular, and therefore harder to read and maintain. You would think that the original FileSave function is attempting to avoid globals, but it's using the global name filename, so you may as well use a global file altogether at that point, as it will save you some I/O overhead.
A better option would be to avoid globals at all, or to at least use them properly. You really don't need a separate function to wrap file.write, but if it represents something more complex, here is a design suggestion:
def save(file, content):
print(content, file=file)
def my_thing(filename):
with open(filename, 'a') as f:
# do some stuff
save(f, 'test1')
# do more stuff
save(f, 'test2')
if __name__ == '__main__':
my_thing('myfile.txt')
Notice that when you call the module as a script, a file name defined in the global scope will be passed in to the main routine. However, since the main routine does not reference global variables, you can A) read it easier because it's self contained, and B) test it without having to wonder how to feed it inputs without breaking everything else.
Also, by using print instead of file.write, you avoid having to spend newlines manually.

Python 'with' implementation [duplicate]

I came across the Python with statement for the first time today. I've been using Python lightly for several months and didn't even know of its existence! Given its somewhat obscure status, I thought it would be worth asking:
What is the Python with statement
designed to be used for?
What do
you use it for?
Are there any
gotchas I need to be aware of, or
common anti-patterns associated with
its use? Any cases where it is better use try..finally than with?
Why isn't it used more widely?
Which standard library classes are compatible with it?

I believe this has already been answered by other users before me, so I only add it for the sake of completeness: the with statement simplifies exception handling by encapsulating common preparation and cleanup tasks in so-called context managers. More details can be found in PEP 343. For instance, the open statement is a context manager in itself, which lets you open a file, keep it open as long as the execution is in the context of the with statement where you used it, and close it as soon as you leave the context, no matter whether you have left it because of an exception or during regular control flow. The with statement can thus be used in ways similar to the RAII pattern in C++: some resource is acquired by the with statement and released when you leave the with context.
Some examples are: opening files using with open(filename) as fp:, acquiring locks using with lock: (where lock is an instance of threading.Lock). You can also construct your own context managers using the contextmanager decorator from contextlib. For instance, I often use this when I have to change the current directory temporarily and then return to where I was:
from contextlib import contextmanager
import os
#contextmanager
def working_directory(path):
current_dir = os.getcwd()
os.chdir(path)
try:
yield
finally:
os.chdir(current_dir)
with working_directory("data/stuff"):
# do something within data/stuff
# here I am back again in the original working directory
Here's another example that temporarily redirects sys.stdin, sys.stdout and sys.stderr to some other file handle and restores them later:
from contextlib import contextmanager
import sys
#contextmanager
def redirected(**kwds):
stream_names = ["stdin", "stdout", "stderr"]
old_streams = {}
try:
for sname in stream_names:
stream = kwds.get(sname, None)
if stream is not None and stream != getattr(sys, sname):
old_streams[sname] = getattr(sys, sname)
setattr(sys, sname, stream)
yield
finally:
for sname, stream in old_streams.iteritems():
setattr(sys, sname, stream)
with redirected(stdout=open("/tmp/log.txt", "w")):
# these print statements will go to /tmp/log.txt
print "Test entry 1"
print "Test entry 2"
# back to the normal stdout
print "Back to normal stdout again"
And finally, another example that creates a temporary folder and cleans it up when leaving the context:
from tempfile import mkdtemp
from shutil import rmtree
#contextmanager
def temporary_dir(*args, **kwds):
name = mkdtemp(*args, **kwds)
try:
yield name
finally:
shutil.rmtree(name)
with temporary_dir() as dirname:
# do whatever you want

I would suggest two interesting lectures:
PEP 343 The "with" Statement
Effbot Understanding Python's
"with" statement
1.
The with statement is used to wrap the execution of a block with methods defined by a context manager. This allows common try...except...finally usage patterns to be encapsulated for convenient reuse.
2.
You could do something like:
with open("foo.txt") as foo_file:
data = foo_file.read()
OR
from contextlib import nested
with nested(A(), B(), C()) as (X, Y, Z):
do_something()
OR (Python 3.1)
with open('data') as input_file, open('result', 'w') as output_file:
for line in input_file:
output_file.write(parse(line))
OR
lock = threading.Lock()
with lock:
# Critical section of code
3.
I don't see any Antipattern here.
Quoting Dive into Python:
try..finally is good. with is better.
4.
I guess it's related to programmers's habit to use try..catch..finally statement from other languages.

The Python with statement is built-in language support of the Resource Acquisition Is Initialization idiom commonly used in C++. It is intended to allow safe acquisition and release of operating system resources.
The with statement creates resources within a scope/block. You write your code using the resources within the block. When the block exits the resources are cleanly released regardless of the outcome of the code in the block (that is whether the block exits normally or because of an exception).
Many resources in the Python library that obey the protocol required by the with statement and so can used with it out-of-the-box. However anyone can make resources that can be used in a with statement by implementing the well documented protocol: PEP 0343
Use it whenever you acquire resources in your application that must be explicitly relinquished such as files, network connections, locks and the like.

Again for completeness I'll add my most useful use-case for with statements.
I do a lot of scientific computing and for some activities I need the Decimal library for arbitrary precision calculations. Some part of my code I need high precision and for most other parts I need less precision.
I set my default precision to a low number and then use with to get a more precise answer for some sections:
from decimal import localcontext
with localcontext() as ctx:
ctx.prec = 42 # Perform a high precision calculation
s = calculate_something()
s = +s # Round the final result back to the default precision
I use this a lot with the Hypergeometric Test which requires the division of large numbers resulting form factorials. When you do genomic scale calculations you have to be careful of round-off and overflow errors.

An example of an antipattern might be to use the with inside a loop when it would be more efficient to have the with outside the loop
for example
for row in lines:
with open("outfile","a") as f:
f.write(row)
vs
with open("outfile","a") as f:
for row in lines:
f.write(row)
The first way is opening and closing the file for each row which may cause performance problems compared to the second way with opens and closes the file just once.

See PEP 343 - The 'with' statement, there is an example section at the end.
... new statement "with" to the Python
language to make
it possible to factor out standard uses of try/finally statements.

points 1, 2, and 3 being reasonably well covered:
4: it is relatively new, only available in python2.6+ (or python2.5 using from __future__ import with_statement)

The with statement works with so-called context managers:
http://docs.python.org/release/2.5.2/lib/typecontextmanager.html
The idea is to simplify exception handling by doing the necessary cleanup after leaving the 'with' block. Some of the python built-ins already work as context managers.

Another example for out-of-the-box support, and one that might be a bit baffling at first when you are used to the way built-in open() behaves, are connection objects of popular database modules such as:
sqlite3
psycopg2
cx_oracle
The connection objects are context managers and as such can be used out-of-the-box in a with-statement, however when using the above note that:
When the with-block is finished, either with an exception or without, the connection is not closed. In case the with-block finishes with an exception, the transaction is rolled back, otherwise the transaction is commited.
This means that the programmer has to take care to close the connection themselves, but allows to acquire a connection, and use it in multiple with-statements, as shown in the psycopg2 docs:
conn = psycopg2.connect(DSN)
with conn:
with conn.cursor() as curs:
curs.execute(SQL1)
with conn:
with conn.cursor() as curs:
curs.execute(SQL2)
conn.close()
In the example above, you'll note that the cursor objects of psycopg2 also are context managers. From the relevant documentation on the behavior:
When a cursor exits the with-block it is closed, releasing any resource eventually associated with it. The state of the transaction is not affected.

In python generally “with” statement is used to open a file, process the data present in the file, and also to close the file without calling a close() method. “with” statement makes the exception handling simpler by providing cleanup activities.
General form of with:
with open(“file name”, “mode”) as file_var:
processing statements
note: no need to close the file by calling close() upon file_var.close()

The answers here are great, but just to add a simple one that helped me:
with open("foo.txt") as file:
data = file.read()
open returns a file
Since 2.6 python added the methods __enter__ and __exit__ to file.
with is like a for loop that calls __enter__, runs the loop once and then calls __exit__
with works with any instance that has __enter__ and __exit__
a file is locked and not re-usable by other processes until it's closed, __exit__ closes it.
source: http://web.archive.org/web/20180310054708/http://effbot.org/zone/python-with-statement.htm

subprocess stdout/stderr to finite size logfile

I have a process which chats a lot to stderr, and I want to log that stuff to a file.
foo 2> /tmp/foo.log
Actually I'm launching it with python subprocess.Popen, but it may as well be from the shell for the purposes of this question.
with open('/tmp/foo.log', 'w') as stderr:
foo_proc = subprocess.Popen(['foo'], stderr=stderr)
The problem is after a few days my log file can be very large, like >500 MB. I am interested in all that stderr chat, but only the recent stuff. How can I limit the size of the logfile to, say, 1 MB? The file should be a bit like a circular buffer in that the most recent stuff will be written but the older stuff should fall out of the file, so that it never goes above a given size.
I'm not sure if there's an elegant Unixey way to do this already which I'm simply not aware of, with some sort of special file.
An alternative solution with log rotation would be sufficient for my needs as well, as long as I don't have to interrupt the running process.

You should be able to use the stdlib logging package to do this. Instead of connecting the subprocess' output directly to a file, you can do something like this:
import logging
logger = logging.getLogger('foo')
def stream_reader(stream):
while True:
line = stream.readline()
logger.debug('%s', line.strip())
This just logs every line received from the stream, and you can configure logging with a RotatingFileHandler which provides log file rotation. You then arrange to read this data and log it.
foo_proc = subprocess.Popen(['foo'], stderr=subprocess.PIPE)
thread = threading.Thread(target=stream_reader, args=(foo_proc.stderr,))
thread.setDaemon(True) # optional
thread.start()
# do other stuff
thread.join() # await thread termination (optional for daemons)
Of course you can call stream_reader(foo_proc.stderr) too, but I'm assuming you might have other work to do while the foo subprocess does its stuff.
Here's one way you could configure logging (code that should only be executed once):
import logging, logging.handlers
handler = logging.handlers.RotatingFileHandler('/tmp/foo.log', 'a', 100000, 10)
logging.getLogger().addHandler(handler)
logging.getLogger('foo').setLevel(logging.DEBUG)
This will create up to 10 files of 100K named foo.log (and after rotation foo.log.1, foo.log.2 etc., where foo.log is the latest). You could also pass in 1000000, 1 to give you just foo.log and foo.log.1, where the rotation happens when the file would exceed 1000000 bytes in size.

The way with circular buffer would be hard to implement, as you would constantly have to rewrite the whole file as soon as something falls out.
The approach with logrotate or something would be your way to go. In this case, you simply would do similiar to this:
import subprocess
import signal
def hupsignal(signum, frame):
global logfile
logfile.close()
logfile = open('/tmp/foo.log', 'a')
logfile = open('/tmp/foo.log', 'a')
signal.signal()
foo_proc = subprocess.Popen(['foo'], stderr=subprocess.PIPE)
for chunk in iter(lambda: foo_proc.stderr.read(8192), ''):
# iterate until EOF occurs
logfile.write(chunk)
# or do you want to rotate yourself?
# Then omit the signal stuff and do it here.
# if logfile.tell() > MAX_FILE_SIZE:
# logfile.close()
# logfile = open('/tmp/foo.log', 'a')
It is not a complete solution; think of it as pseudocode as it is untested and I am not sure about the syntax in the one or other place. Probably it needs some modification for making it work. But you should get the idea.
As well, it is an example of how to make it work with logrotate. Of course, you can rotate your logfile yourself, if needed.

You may be able to use the properties of 'open file descriptions' (distinct from, but closely related to, 'open file descriptors'). In particular, the current write position is associated with the open file description, so two processes that share an single open file description can each adjust the write position.
So, in context, the original process could retain the file descriptor for standard error of the child process, and periodically, when the position reaches your 1 MiB size, reposition the pointer to the start of the file, thus achieving your required circular buffer effect.
The biggest problem is determining where the current messages are being written, so that you can read from the oldest material (just in front of the file position) to the newest material. It is unlikely that new lines overwriting the old will match exactly, so there'd be some debris. You might be able to follow each line from the child with a known character sequence (say 'XXXXXX'), and then have each write from the child reposition to overwrite the previous marker...but that definitely requires control over the program that's being run. If it is not under your control, or cannot be modified, that option vanishes.
An alternative would be to periodically truncate the file (maybe after copying it), and to have the child process write in append mode (because the file is opened in the parent in append mode). You could arrange to copy the material from the file to a spare file before truncating to preserve the previous 1 MiB of data. You might use up to 2 MiB that way, which is a lot better than 500 MiB and the sizes could be configured if you're actually short of space.
Have fun!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.