Print status update without loop iteration (python) - python

I have a very simple function that is just a list of data writes, but each write takes 5-10s, so the function takes about an hour to run. Since there is no loop, there is no iteration variable. What is the best way to update progress to the user?

Have you considered the logging module? You can create different kinds of handlers to, well, handle log messages. Here's a simple example, but the general idea is you can just put logging messages in your script that write to a file, a stream that prints to the console, or something else.
import datetime
import logging
import time
logger = logging.getLogger('a_name')
logger.setLevel(logging.DEBUG)
sh = logging.StreamHandler() # prints to console
logger.addHandler(sh)
with open('/tmp/test_file.txt', 'w') as f:
logger.info('beginning writing file at ' + str(datetime.datetime.now()))
time.sleep(30) # this is a proxy for doing some file writing
logger.info('the time now is' + str(datetime.datetime.now()))
...
loggering.info('file done being written')
You might want to look info formatting the log messages so you don't have to include the datetime str in an in-elegant way like I did.

Related

How to shutdown the logger in python after a function containing it is interrupted?

I am using the logging module in python inside a function. A simplified structure of the code is like below.
def testfunc(df):
import logging
import sys
from datetime import datetime
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# to print to the screen
ch = logging.StreamHandler(sys.__stdout__)
ch.setLevel(logging.INFO)
logger.addHandler(ch)
#to print to file
fh = logging.FileHandler('./data/treatment/Treatment_log_'+str(datetime.today().strftime('%Y-%m-%d'))+'.log')
fh.setLevel(logging.INFO)
logger.addHandler(fh)
#several lines of code and some information like:
logger.info('Loop starting...')
for i in range(6): # actually a long for-loop
#several lines of somewhat slow code (even with multiprocessing) and some information like:
logger.info('test '+str(i))
logging.shutdown()
return None
So, I know:
the logger need to be shutdown (logging.shutdown());
and it is included at the end of the function.
The issue is:
the actual function deals with subsets of a data frame, and sometimes it results in error because no sufficient data, etc.
If I run the function again, what I see is all messages are repeated twice (or even more, if I need to run again).
The situation remind the reported here, here, and here, for example... But slightly different...
I got, it is because the logging module was not shutdown, neither the handlers were removed... And I understand for the final function, I should anticipate such situations, and include steps to avoid raising errors, like shutdown the logger and finish the function, etc... But currently I am even using the log information to identify such situations...
My question is: how can I shut down it once such situation (function aborted because error) happened? ... in my current situation, in which I am just testing the code? Currently, the way to make it stop is to start the new console in Spyder (in my understanding, restarting the kernel). What is the correct procedure in this situation?
I appreciate any help...
I suppose you can check first to see if there is any existing logger
loggers = [logging.getLogger(name) for name in logging.root.manager.loggerDict]
if there isn't, you can create the logger
if there is, don't create a new one
Alternatively, you can have another file setting the logger, and calling this file through an subprocess.Popen() or similar.
The code for the first option is from here How to list all existing loggers using python.logging module

How to append printed script in text file? [duplicate]

I found some code online that generally works, but I want to use it multiple times in the same program (write different things to different files, while still printing to the screen the whole time).
That is to say, when it closes, I think sys.stdout closes, so printing at all, and using this class again fails. I tried reimporting sys, and other dumb stuff, but I can't get it to work.
Here's the site, and the code
groups.google.com/group/comp.lang.python/browse_thread/thread/d25a9f5608e473af/
import sys
class MyWriter:
def __init__(self, stdout, filename):
self.stdout = stdout
self.logfile = file(filename, 'a')
def write(self, text):
self.stdout.write(text)
self.logfile.write(text)
def close(self):
self.stdout.close()
self.logfile.close()
writer = MyWriter(sys.stdout, 'log.txt')
sys.stdout = writer
print 'test'
You are trying to reproduce poorly something that is done very well by the Python Standard Library; please check the logging module.
With this module you can do exactly what you want, but in a much simpler, standard, and extensible manner. You can proceed as follows (this example is a copy/paste from the logging cookbook):
Let’s say you want to log to console and file with different message
formats and in differing circumstances. Say you want to log messages
with levels of DEBUG and higher to file, and those messages at level
INFO and higher to the console. Let’s also assume that the file should
contain timestamps, but the console messages should not. Here’s how
you can achieve this:
import logging
# set up logging to file - see previous section for more details
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s %(name)-12s %(levelname)-8s %(message)s',
datefmt='%m-%d %H:%M',
filename='/temp/myapp.log',
filemode='w')
# define a Handler which writes INFO messages or higher to the sys.stderr
console = logging.StreamHandler()
console.setLevel(logging.INFO)
# set a format which is simpler for console use
formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s')
# tell the handler to use this format
console.setFormatter(formatter)
# add the handler to the root logger
logging.getLogger().addHandler(console)
# Now, we can log to the root logger, or any other logger. First the root...
logging.info('Jackdaws love my big sphinx of quartz.')
# Now, define a couple of other loggers which might represent areas in your
# application:
logger1 = logging.getLogger('myapp.area1')
logger2 = logging.getLogger('myapp.area2')
logger1.debug('Quick zephyrs blow, vexing daft Jim.')
logger1.info('How quickly daft jumping zebras vex.')
logger2.warning('Jail zesty vixen who grabbed pay from quack.')
logger2.error('The five boxing wizards jump quickly.')
When you run this, on the console you will see
root : INFO Jackdaws love my big sphinx of quartz.
myapp.area1 : INFO How quickly daft jumping zebras vex.
myapp.area2 : WARNING Jail zesty vixen who grabbed pay from quack.
myapp.area2 : ERROR The five boxing wizards jump quickly.
and in the file you will see something like
10-22 22:19 root INFO Jackdaws love my big sphinx of quartz.
10-22 22:19 myapp.area1 DEBUG Quick zephyrs blow, vexing daft Jim.
10-22 22:19 myapp.area1 INFO How quickly daft jumping zebras vex.
10-22 22:19 myapp.area2 WARNING Jail zesty vixen who grabbed pay from quack.
10-22 22:19 myapp.area2 ERROR The five boxing wizards jump quickly.
As you can see, the DEBUG message only shows up in the file. The other
messages are sent to both destinations.
This example uses console and file handlers, but you can use any
number and combination of handlers you choose.
Easy-peasy with Python 3.3 and above
Starting with Python 3.3, doing so has become significantly easier since logging.basicConfig now accepts the handlers = argument.
import logging
level = logging.INFO
format = ' %(message)s'
handlers = [logging.FileHandler('filename.log'), logging.StreamHandler()]
logging.basicConfig(level = level, format = format, handlers = handlers)
logging.info('Hey, this is working!')
Note however, that certain Python modules may also be posting logging messages to the INFO level.
This is where it comes handy to create a custom logging level, called for example OK, 5 levels above the default INFO level and 5 levels below the default WARNING level.
I know this is an old question, and the best answer is just to use logging for its intended purpose, but I just wanted to point out that if you're concerned only with affecting calls specifically to print (and not other interaction with sys.stdout), and you just want to paste a few lines into some old one-off script, there's nothing stopping you from simply reassigning the name to a different function which writes to two different files, since print is a function in Python 3+. You could even, god forbid, use a lambda with an or chain for the quickest, dirtiest solution out there:
old_print = print
log_file = open("logfile.log", "a")
print = lambda *args, **kw: old_print(*args, **kw) or old_print(*args, file=log_file, **kw)
print("Hello console and log file")
# ... more calls to print() ...
log_file.close()
Or for true fire-and-forget:
import atexit
old_print = print
log_file = open("logfile.log", "a")
atexit.register(log_file.close)
print = lambda *args, **kw: old_print(*args, **kw) or old_print(*args, file=log_file, **kw)
# ... do calls to print(), and you don't even have to close the file afterwards ...
It works fine assuming the program exits properly, but please no one use this in production code, just use logging :)
Edit: If you value some form of structure and want to write to the log file in real-time, consider something like:
from typing import Callable
def print_logger(
old_print: Callable,
file_name: str,
) -> Callable:
"""Returns a function which calls `old_print` twice, specifying a `file=` on the second call.
Arguments:
old_print: The `print` function to call twice.
file_name: The name to give the log file.
"""
def log_print(*args, **kwargs):
old_print(*args, **kwargs)
with open(file_name, "a") as log_file:
old_print(*args, file=log_file, **kwargs)
return log_print
And then invoke as follows:
print = print_logger(print, "logs/my_log.log")
Remove the line that's doing what you explicitly say you don't want done: the first line of close(), which closes stdout.
That is to say, when it closes, I think sys.stdout closes, so printing
at all, and using this class again fails. I tried reimporting sys, and
other dumb stuff, but I can't get it to work.
To answer your question, you should not be closing stdout. The python interpreter opens stdout, stdin and stderror at startup. In order for print to work, the interpreter requires stdout to be open. Reimporting sys does not do anything once a module has been loaded. You would need to reload the module. In this particular case, I am not sure a reload would fix the problem since sys.stdout allows stdout to be used as a file object.
Additionally, I think you have a bug in your code which may be causing print to
break. In line 2 you are assigning a MyWriter object to sys.stdout. This may by closing stdout when the garbage collector deletes the unused stdout file object.
writer = MyWriter(sys.stdout, 'log.txt')
sys.stdout = writer

Append environment variables in log format in Python

I have a Rabbitmq consumer that gets messages from another service with a unique_id. I need to set some environment variables which prints in every log within that process message block.
import logging
import os
LOG_FORMAT = '%(asctime)s - %(levelname)-7s %(message)s : unique_id : os.getenv("unique_id","1")'
logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
also updating os var before message process block:
os.environ["unique_id"] = new_unique_id
however, with the above code log format is not taking the latest value of unique_id. It always prints 1. Seems it only takes value while initialization.
I only want to append unique_id for each log message in that particular block with updated unique_id.
As it is a very big project so I don't want to append this to each log manually. I want to use a log format here.
New to the python, so not able to understand what I am doing wrong.

How to avoid created empty logger file if no valid parsing output in python?

I have a python script to run log parsing and it will periodically scan through some log files on the disk and try to parse them. However, If the file is not parsable or have no data, my code should just exit for the parsing.
The problem is my script generating an empty log file even when there is not a valid data.
ie:
-rw-r--r-- 1 user userid 0 May 28 08:10 parse.py_20190528_08_10_03.log
I guess this is probably because logger is already initialized when my script is started.
What I want to know is if there is some other way to avoid this by setting? I tried to check a few places but do not know how.
This is my import logger in my script:
import logger
logger = logging.getLogger('upgrade.py')
formatter=logging.Formatter("%(asctime)s - %(levelname)-8s %(message)s")
log_filename = '{}/{}_{}.log'.format(os.getcwd(),os.path.basename(sys.argv[0]),time.strftime("%Y%m%d_%H_%M_%S"))
fh = logging.FileHandler(log_filename)
fh.setLevel(logging.INFO)
fh.setFormatter(formatter)
logger.addHandler(fh)
After my parsing function, I use below to just make sure it did not dump data if no valid data.
main()
......parsing....
if len(outputs) != 0:
logger.info(outputs)
.......
.... output filtering.....
if len(out_list) == 0:
exit(0)
.....
However, this still not prevents it is creating 0 kb file in my directory. I trigger this tool in crontab and it is running periodically which generate lots of such files which is annoying and bad to check.
I know I can also have some outside watcher script to clear those file but that is not a smart act.
You can achieve this by setting the delay parameter to True for the FileHandler:
fh = logging.FileHandler(log_filename, delay=True)
From the docs:
https://docs.python.org/3/library/logging.handlers.html#logging.FileHandler
If delay is true, then file opening is deferred until the first call
to emit().

Python logging adds duplicate entries

I am adding logging to my Python code. Messages are logged to the file correctly, but it's logging duplicate messages, like re-logging already logged entries to the file.
This is my code:
import logging
logger = logging.getLogger('Sample')
logger.setLevel(logging.DEBUG)
formatter =logging.Formatter('%(message)s')
handler=logging.FileHandler('./sample.log')
handler.setFormatter(formatter)
logger.addHandler(handler)
def add(x, y):
return x + y
num_1=10
num_2=5
add_result=add(num_1,num_2)
logger.debug("Result: %s "%add_result)
Output:
1st run :
Single output
2nd run:
Three output
3rd run:
Six output
Try saving your script to a file test_log.py and then run python test_log.py from the terminal to start your script. This way, each run should always append a single log message to sample.log, as expected.
I guess you ran your code multiple times in an interactive python shell. The line logger.addHandler(handler) then always adds a new logging handler to your logger object, so that after running your code two times, you actually have two logging handlers that are both writing into your sample.log --> hence the duplicated entries.
Also, try changing your formatter to
formatter = logging.Formatter('%(asctime)-15s %(message)s').
This will add a timestamp to your log messages (format year-month-day hour:minutes:seconds,milliseconds), allowing you to better debug your code.

Categories