In my Flask application I have implemented a logging system using the logging library. It is currently run in a function below:
if __name__ == "__main__":
"""[Runs the webserver.
Finally block is used for some logging management. It will first shut down
logging, to ensure no files are open, then renames the file to 'log_'
+ the current date, and finally moves the file to the /logs archive
directory]
"""
try:
session_management.clean_uploads_on_start(UPLOAD_FOLDER)
app.run(debug=False)
finally:
try:
logging.shutdown()
new_log_file_name = log_management.rename_log(app.config['DEFAULT_LOG_NAME'])
log_management.move_log(new_log_file_name)
except FileNotFoundError:
logging.warning("Current log file not found")
except PermissionError:
logging.warning("Permissions lacking to rename or move log.")
I discovered that the file is not renamed and moved if (either) the cmd prompt is force closed, or if the server crashes. I thought it might be better to put the rename and move into the initial 'try' block of the function, prior to the server starting, but I run into issues because I have a config file (which is imported in this script) which has the following code:
logging.basicConfig(filename='current_log.log', level=logging.INFO,
filemode='a',
format='%(asctime)s:%(levelname)s:%(message)s')
I have tried to do something like the below, but I still run into permission errors, but I think I am still running into errors because the log_management script also imports config. Further, I could not find a function which starts the logging system similar to logging.shutdown() which is used upon the system ending, otherwise I would shut it down, move the file (if it exists) and the start it back up.
try:
session_management.clean_uploads_on_start(UPLOAD_FOLDER)
log_management.check_log_on_startup(app.config['DEFAULT_LOG_NAME'])
import config
app.run(debug=False)
finally:
try:
logging.shutdown()
new_log_file_name = log_management.rename_log(app.config['DEFAULT_LOG_NAME'])
log_management.move_log(new_log_file_name)
except FileNotFoundError:
logging.warning("Current log file not found")
except PermissionError:
logging.warning("Permissions lacking to rename or move log.")
# (in another script)
def check_log_on_startup(file_name):
if os.path.exists(file_name):
move_log(rename_log(file_name))
Any suggestions much welcomed, because I feel like I'm at a brick wall!
As you have already found out, trying to perform cleanups at the end of your process life cycle has the potential to fail if the process terminates uncleanly.
The issue with performing the cleanup at the start is, that you apparently call logging.basicConfig from your import before attempting to move the old log file.
This leads to the implicitly created FileHandler holding an open file object on the existing log when you attempt to rename and move it. Depending on the file system you are using, this might not be met with joy.
If you want to move the handling of potential old log files to the start of your application completely, you have to perform the renaming and moving before you call logging.basicConfig, so you'll have to remove it from your import and add it to the log_management somehow.
As an alternative, you could move the whole handling of log files to the logging file handler by subclassing the standard FileHandler class, e.g:
import logging
import os
from datetime import datetime
class CustomFileHandler(logging.FileHandler):
def __init__(self, filename, archive_path='archive', archive_name='log_%Y%m%d', **kwargs):
self._archive = os.path.join(archive_path, archive_name)
self._archive_log(filename)
super().__init__(filename, **kwargs)
def _archive_log(self, filepath):
if os.path.exists(filepath):
os.rename(filepath, datetime.now().strftime(self._archive))
def close(self):
super().close()
self._archive_log(self.baseFilename)
With this, you would configure your logging like so:
hdler = CustomFileHandler('current.log')
logging.basicConfig(level=logging.INFO, handlers=[hdler],
format='%(asctime)s:%(levelname)s:%(message)s')
The CustomFileHandler will check for, and potentially archive, old logs during initialization. This will deal with leftovers after an unclean process termination where the shutdown cleanup cannot take place. Since the parent class initializer is called after the log archiving is attempted, there is not yet an open handle on the log that would cause a PermissionError.
The overwritten close() method will perform the archiving on a clean process shutdown.
This should remove the need for the dedicated log_management module, at least as far as the functions you show in your code are concerned. rename_log, move_log and check_log_on_startup are all encapsulated in the CustomFileHandler. There is also no need to explicitly call logging.shutdown().
Some notes:
The reason you cannot find a start function equivalent to logging.shutdown() is that the logging system is started/initialized when you import the logging module. Among other things, it instantiates the implicit root logger and registers logging.shutdown as exit handler via atexit.
The latter is the reason why there is no need to explicitly call logging.shutdown() with the above solution. The Python interpreter will call it during finalization when preparing for interpreter shutdown due to the exit handler registration. logging.shutdown() then iterates through the list of registered handlers and calls their close() methods, which will perform the log archiving during a clean shutdown.
Depending on the method you choose for moving (and renaming) the old log file, the above solution might need some additional safeguards against exceptions. os.rename will raise an exception if the destination path already exists, i.e. when you have already stopped and started your process previously on the same day while os.replace would silently overwrite the existing file. See more details about moving files via Python here.
Thus I would recommend to name the archived logs not only by current date but also by time.
In the above, adding the current date to the archive file name is done via datetime's strftime, hence the 'log_%Y%m%d' as default for the archive_name parameter of the custom file handler. The characters with a preceding % are valid format codes that strftime() replaces with the respective parts of the datetime object it is called on. To append the current time to the archive log file name you would simply append the respective format codes to the archive_name, e.g.: 'log_%Y%m%d_%H%M%S' which would result in a log name such as log_20200819_123721.
Related
I am using the logging module in python inside a function. A simplified structure of the code is like below.
def testfunc(df):
import logging
import sys
from datetime import datetime
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# to print to the screen
ch = logging.StreamHandler(sys.__stdout__)
ch.setLevel(logging.INFO)
logger.addHandler(ch)
#to print to file
fh = logging.FileHandler('./data/treatment/Treatment_log_'+str(datetime.today().strftime('%Y-%m-%d'))+'.log')
fh.setLevel(logging.INFO)
logger.addHandler(fh)
#several lines of code and some information like:
logger.info('Loop starting...')
for i in range(6): # actually a long for-loop
#several lines of somewhat slow code (even with multiprocessing) and some information like:
logger.info('test '+str(i))
logging.shutdown()
return None
So, I know:
the logger need to be shutdown (logging.shutdown());
and it is included at the end of the function.
The issue is:
the actual function deals with subsets of a data frame, and sometimes it results in error because no sufficient data, etc.
If I run the function again, what I see is all messages are repeated twice (or even more, if I need to run again).
The situation remind the reported here, here, and here, for example... But slightly different...
I got, it is because the logging module was not shutdown, neither the handlers were removed... And I understand for the final function, I should anticipate such situations, and include steps to avoid raising errors, like shutdown the logger and finish the function, etc... But currently I am even using the log information to identify such situations...
My question is: how can I shut down it once such situation (function aborted because error) happened? ... in my current situation, in which I am just testing the code? Currently, the way to make it stop is to start the new console in Spyder (in my understanding, restarting the kernel). What is the correct procedure in this situation?
I appreciate any help...
I suppose you can check first to see if there is any existing logger
loggers = [logging.getLogger(name) for name in logging.root.manager.loggerDict]
if there isn't, you can create the logger
if there is, don't create a new one
Alternatively, you can have another file setting the logger, and calling this file through an subprocess.Popen() or similar.
The code for the first option is from here How to list all existing loggers using python.logging module
I'm implementing a simple logging class that writes out some messages to a log file. I have a doubt on how to manage the opening/closing of the file in a sensible and pythonic way.
I understood that the idiomatic way to do the writing in files is via the with statement. Therefore this is a simplified version of the code I have:
class Logger():
def __init__(self, filename, mode='w', name='root'):
self.filename = filename
self.name = name
# remove the previous content of the file if mode for logger is 'w'
if mode == 'w':
with open(self.filename, 'w') as f:
f.write('')
def info(self, msg):
with open(self.filename, 'a') as f:
f.write(f'INFO:{self.name}:{msg}\n')
logger = Logger('log.txt')
logger.info('Starting program')
The problem is that this implementation will open and close the file as many times as the logger is called, which will be hundred of times. I'm concerned with this being an overheat of the program (the runtime of this program is important). It perhaps would be more sensible to open the file at the moment of creation of the logger, and close it when the program finishes. But this goes against the "use width" rule, and certainly there is a serious risk that I (or the user of the class) will forget to manually close the file at the end. Other problem of this approach is that if I want to create different loggers that dump to the same file, I'll have to add careful checks to know whether the file is already open or not by previous loggers...
So all in all, what's the most pythonic and sensible way to handle the opening/closing of files in this context?
While I agree with the other comments that the most pythonic way is to use the standard lib, I'll try to answer your question as it was asked.
I think the with construct is a great construct but it doesn't mean it works in every situation. Opening and saving a file handle for continual use is not unpythonic if it makes sense in your situation (IMO). Opening, do something, and closing it in the same function with try/except/finally blocks would be unpythonic. I think it may be preferred to only open it when you first try to use it (instead of at creation time). But that can depend on the rest of the application.
If you start creating different loggers that write to the same file, if in the same process, I would think the goal would be to have a single open file handle that all the loggers write to instead of each logger having their own handle they write to. But multi-instance and multi-process logging synchronization is where the stdlib shines, so...you know...your mileage may vary.
I'm working on a project that requires me using psutil, I was trying to compare some values to that of a .txt file's but for some reason whenever I called the psutil.Procces.exe() method outside an if statement it'd end up with an Access Denied exception, let me show you what I mean:
import psutil
import time
ini = 'start'
def getTaskList():
list_of_ran_proccesses = []
for procs in psutil.process_iter():
list_of_ran_proccesses.append(procs)
return list_of_ran_proccesses
def CompareRunningFiles():
if ini == "start":
list_of_old_procs = getTaskList()
while list_of_old_procs == getTaskList():
time.sleep(0.01)
for new_procs in psutil.process_iter():
if not new_procs in list_of_old_procs:
print(new_procs.exe())
CompareRunningFiles()
This example works completely fine but if i do this
import psutil
import time
ini = 'start'
def getTaskList():
list_of_ran_proccesses = []
for procs in psutil.process_iter():
list_of_ran_proccesses.append(procs)
return list_of_ran_proccesses
def CompareRunningFiles():
if ini == "start":
list_of_old_procs = getTaskList()
while list_of_old_procs == getTaskList():
time.sleep(0.01)
for new_procs in psutil.process_iter():
print(new_procs.exe())
CompareRunningFiles()
This for some reason ends up with an Access Denied exception.
Thank you for all your answers :)
Edit: I'm not sure but, can this be because the module is trying to access some protected directories?
Because after the if statement it would only try to get the directory of whatever process was launched but without the if statement it'd try and access all sorts of running processes.
So when it comes across a system process, it'd try and get it's directory too, which if the process runs inside a protected directory it would raise an Access Denied exception.
Basically, the if statement there prevents the program from trying to get the directory of some system processes(They are always on), some directories in Windows are protected and cannot be accessed directly which, without the if statement the module is trying to get the directories of all running processes and this causes an exception when it tries to do that for a process that's running from a protected directory (example: System Idle Process), using (as Omer said) a "try:... except psutil.AccessDenied: pass" will skip those processes and prevent this issue. Thank you Omer for the explanation and tripleee for the same :D
I'd like to ask if there's any way we could watch a directory in python and parse the latest text file being generated on the directory.
Tho i have this start up code which parse a certain text file.
import time
def follow(thefile):
thefile.seek(0,2)
while True:
line = thefile.readline()
if not line:
time.sleep(0.1)
continue
yield line
if __name__ == '__main__':
logfile = open(r'\\some directory\files.txt',"r")
loglines = follow(logfile)
for line in loglines:
print line,
See the bold files.txt i need that to be dynamic by watching the directory for newly generated text files and switch to the latest text file and parse it.
It will run on Windows XP service Pack 3
I'm using Python 2.7
Directory i'm watching is also using windows XP
Thank you.
To check for new files, repeatedly get a list of files currently in the directory with os.listdir('directory'). Save the entries in a set and calculate the difference of the set with the previous set.
# Initialize before an event loop:
old_entries = set()
# You need a loop that calls two handlers, each handler returning soon.
# Inside your loop, check for a "new file" event this way:
now_entries = os.listdir(r'\\some directory')
now_entries.symmetric_difference_update(old_entries)
for new_entry in now_entries:
handle_new_file(new_entry)
Your program needs to listen for two events:
New file in the directory.
New line in the old file.
You call follow(), which is like an event handler that never returns. I think you want that handler to return to one main event loop that checks for each kind of event. Your follow() function never returns because it continues within the while True infinite loop unless a new line is added to the file for it to yield. It will never yield if no more lines are getting added to that file.
Take a look into the FindFirstChangeNotification API
http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html
The approach here is to use the MS FindFirstChangeNotification API, exposed via the pywin32 win32file module. It needs a little explanation: you get a change handle for a directory (optionally with its subdirectories) for certain kinds of change. You then use the ubiquitous WaitForSingleObject call from win32event, which fires when something's changed in one of your directories.
Essentially because the Windows OS is responsible for managing creation/modification of files, you can ask it to let you know immediately when a file is changed/created.
I'm writing a program that adds normal UNIX accounts (i.e. modifying /etc/passwd, /etc/group, and /etc/shadow) according to our corp's policy. It also does some slightly fancy stuff like sending an email to the user.
I've got all the code working, but there are three pieces of code that are very critical, which update the three files above. The code is already fairly robust because it locks those files (ex. /etc/passwd.lock), writes to to a temporary files (ex. /etc/passwd.tmp), and then, overwrites the original file with the temporary. I'm fairly pleased that it won't interefere with other running versions of my program or the system useradd, usermod, passwd, etc. programs.
The thing that I'm most worried about is a stray ctrl+c, ctrl+d, or kill command in the middle of these sections. This has led me to the signal module, which seems to do precisely what I want: ignore certain signals during the "critical" region.
I'm using an older version of Python, which doesn't have signal.SIG_IGN, so I have an awesome "pass" function:
def passer(*a):
pass
The problem that I'm seeing is that signal handlers don't work the way that I expect.
Given the following test code:
def passer(a=None, b=None):
pass
def signalhander(enable):
signallist = (signal.SIGINT, signal.SIGQUIT, signal.SIGABRT, signal.SIGPIPE, signal.SIGALRM, signal.SIGTERM, signal.SIGKILL)
if enable:
for i in signallist:
signal.signal(i, passer)
else:
for i in signallist:
signal.signal(i, abort)
return
def abort(a=None, b=None):
sys.exit('\nAccount was not created.\n')
return
signalhander(True)
print('Enabled')
time.sleep(10) # ^C during this sleep
The problem with this code is that a ^C (SIGINT) during the time.sleep(10) call causes that function to stop, and then, my signal handler takes over as desired. However, that doesn't solve my "critical" region problem above because I can't tolerate whatever statement encounters the signal to fail.
I need some sort of signal handler that will just completely ignore SIGINT and SIGQUIT.
The Fedora/RH command "yum" is written is Python and does basically exactly what I want. If you do a ^C while it's installing anything, it will print a message like "Press ^C within two seconds to force kill." Otherwise, the ^C is ignored. I don't really care about the two second warning since my program completes in a fraction of a second.
Could someone help me implement a signal handler for CPython 2.3 that doesn't cause the current statement/function to cancel before the signal is ignored?
As always, thanks in advance.
Edit: After S.Lott's answer, I've decided to abandon the signal module.
I'm just going to go back to try: except: blocks. Looking at my code there are two things that happen for each critical region that cannot be aborted: overwriting file with file.tmp and removing the lock once finished (or other tools will be unable to modify the file, until it is manually removed). I've put each of those in their own function inside a try: block, and the except: simply calls the function again. That way the function will just re-call itself in the event of KeyBoardInterrupt or EOFError, until the critical code is completed.
I don't think that I can get into too much trouble since I'm only catching user provided exit commands, and even then, only for two to three lines of code. Theoretically, if those exceptions could be raised fast enough, I suppose I could get the "maximum reccurrsion depth exceded" error, but that would seem far out.
Any other concerns?
Pesudo-code:
def criticalRemoveLock(file):
try:
if os.path.isFile(file):
os.remove(file)
else:
return True
except (KeyboardInterrupt, EOFError):
return criticalRemoveLock(file)
def criticalOverwrite(tmp, file):
try:
if os.path.isFile(tmp):
shutil.copy2(tmp, file)
os.remove(tmp)
else:
return True
except (KeyboardInterrupt, EOFError):
return criticalOverwrite(tmp, file)
There is no real way to make your script really save. Of course you can ignore signals and catch a keyboard interrupt using try: except: but it is up to your application to be idempotent against such interrupts and it must be able to resume operations after dealing with an interrupt at some kind of savepoint.
The only thing that you can really to is to work on temporary files (and not original files) and move them after doing the work into the final destination. I think such file operations are supposed to be "atomic" from the filesystem prospective. Otherwise in case of an interrupt: restart your processing from start with clean data.