Python logging adds duplicate entries - python

I am adding logging to my Python code. Messages are logged to the file correctly, but it's logging duplicate messages, like re-logging already logged entries to the file.
This is my code:
import logging
logger = logging.getLogger('Sample')
logger.setLevel(logging.DEBUG)
formatter =logging.Formatter('%(message)s')
handler=logging.FileHandler('./sample.log')
handler.setFormatter(formatter)
logger.addHandler(handler)
def add(x, y):
return x + y
num_1=10
num_2=5
add_result=add(num_1,num_2)
logger.debug("Result: %s "%add_result)
Output:
1st run :
Single output
2nd run:
Three output
3rd run:
Six output

Try saving your script to a file test_log.py and then run python test_log.py from the terminal to start your script. This way, each run should always append a single log message to sample.log, as expected.
I guess you ran your code multiple times in an interactive python shell. The line logger.addHandler(handler) then always adds a new logging handler to your logger object, so that after running your code two times, you actually have two logging handlers that are both writing into your sample.log --> hence the duplicated entries.
Also, try changing your formatter to
formatter = logging.Formatter('%(asctime)-15s %(message)s').
This will add a timestamp to your log messages (format year-month-day hour:minutes:seconds,milliseconds), allowing you to better debug your code.

Related

How to shutdown the logger in python after a function containing it is interrupted?

I am using the logging module in python inside a function. A simplified structure of the code is like below.
def testfunc(df):
import logging
import sys
from datetime import datetime
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# to print to the screen
ch = logging.StreamHandler(sys.__stdout__)
ch.setLevel(logging.INFO)
logger.addHandler(ch)
#to print to file
fh = logging.FileHandler('./data/treatment/Treatment_log_'+str(datetime.today().strftime('%Y-%m-%d'))+'.log')
fh.setLevel(logging.INFO)
logger.addHandler(fh)
#several lines of code and some information like:
logger.info('Loop starting...')
for i in range(6): # actually a long for-loop
#several lines of somewhat slow code (even with multiprocessing) and some information like:
logger.info('test '+str(i))
logging.shutdown()
return None
So, I know:
the logger need to be shutdown (logging.shutdown());
and it is included at the end of the function.
The issue is:
the actual function deals with subsets of a data frame, and sometimes it results in error because no sufficient data, etc.
If I run the function again, what I see is all messages are repeated twice (or even more, if I need to run again).
The situation remind the reported here, here, and here, for example... But slightly different...
I got, it is because the logging module was not shutdown, neither the handlers were removed... And I understand for the final function, I should anticipate such situations, and include steps to avoid raising errors, like shutdown the logger and finish the function, etc... But currently I am even using the log information to identify such situations...
My question is: how can I shut down it once such situation (function aborted because error) happened? ... in my current situation, in which I am just testing the code? Currently, the way to make it stop is to start the new console in Spyder (in my understanding, restarting the kernel). What is the correct procedure in this situation?
I appreciate any help...
I suppose you can check first to see if there is any existing logger
loggers = [logging.getLogger(name) for name in logging.root.manager.loggerDict]
if there isn't, you can create the logger
if there is, don't create a new one
Alternatively, you can have another file setting the logger, and calling this file through an subprocess.Popen() or similar.
The code for the first option is from here How to list all existing loggers using python.logging module

Append environment variables in log format in Python

I have a Rabbitmq consumer that gets messages from another service with a unique_id. I need to set some environment variables which prints in every log within that process message block.
import logging
import os
LOG_FORMAT = '%(asctime)s - %(levelname)-7s %(message)s : unique_id : os.getenv("unique_id","1")'
logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
also updating os var before message process block:
os.environ["unique_id"] = new_unique_id
however, with the above code log format is not taking the latest value of unique_id. It always prints 1. Seems it only takes value while initialization.
I only want to append unique_id for each log message in that particular block with updated unique_id.
As it is a very big project so I don't want to append this to each log manually. I want to use a log format here.
New to the python, so not able to understand what I am doing wrong.

pypi opencensus-ext-azure not functioning properly (sends the same data multiple times + not sending logging.info() traces)

I am using the following function to send some logging standard output from Databricks to Azure application insights logs.
my function
import logging
from opencensus.ext.azure.log_exporter import AzureLogHandler
from opencensus.trace import config_integration
from opencensus.trace.samplers import AlwaysOnSampler
from opencensus.trace.tracer import Tracer
def custom_logging_function(log_type, instrumentation_key_value, input_x):
"""
Purpose: The standard output sent to Application insights logs
Inputs: -
Return: -
"""
config_integration.trace_integrations(['logging'])
logging.basicConfig(format='%(asctime)s traceId=%(traceId)s spanId=%(spanId)s %(message)s')
tracer=Tracer(sampler=AlwaysOnSampler())
logger=logging.getLogger(__name__)
logger.addHandler(AzureLogHandler(connection_string='InstrumentationKey={0}'.format(instrumentation_key_value)))
if log_type=="INFO" or log_type=="SUCESSFULL":
#[UPDATE]
logger.setLevel(logging.INFO)
logger.info(input_x)
#logging.info(input_x)
elif log_type=="ERROR":
#[UPDATE]
logger.setLevel(logging.ERROR)
logger.exception(input_x)
#logging.exception(input_x)
else:
logger.warning(input_x)
[UPDATE]
By setting the logging level to INFO, ERROR you can log different types of traces.
This function even though it is correctly executed it's faulty for the following two reasons:
Reason 1
When I want to print a logger.info() message it's not logged successfully in Application insights. For an non-explainable reason only the logger.warning() messages are successfully sent to Application insights logs.
For example,
custom_logging_function("INFO", instrumentation_key_value, "INFO: {0} chronical dates in the specified time-frame have been created!".format(len(date_list)))
# Uses the logger.info() based on my function!
Output
This is never logged. But rather the following only it's logged,
custom_logging_function("WARNING", instrumentation_key_value, "INFO: {0} chronical dates in the specified time-frame have been created!".format(len(date_list)))
# Uses the logger.warning() based on my function!
Output
The reason 1 has been solved by me..please check my function edit
------------------------------------------------------------------------
Reason 2
The same message is logged multiple times, instead only once.
Some code to interpret the same problem,
# Set keyword parameters
time_scale=12
time_frame_repetition=1
timestamp_snapshot=datetime.utcnow()
round_up = math.ceil(time_frame_repetition*365/time_scale)
day_list = [(timestamp_snapshot - timedelta(days=x)).strftime("%d") for x in range(round_up)]
month_list = [(timestamp_snapshot - timedelta(days=x)).strftime("%m") for x in range(round_up)]
year_list = [(timestamp_snapshot - timedelta(days=x)).strftime("%Y") for x in range(round_up)]
date_list=[[day_list[i], month_list[i], year_list[i]] for i in range(0, len(day_list))]
custom_logging_function("INFO", instrumentation_key_value, "INFO: {0} chronical dates in the specified time-frame have been created!".format(len(date_list))) #the function already written in the start of my post.
The output of the above code snippet is logged more than 1 time(s) in Application insights and I am trying to figure out why.
Output log in Application insights
As you can see from the output of the query the same row is logged multiple times.
What are your suggestions on the second matter since the first one was solved.
[UPDATE] based on the answer provided below by #Izchen
def instantiate_logger(instrumentation_key_value):
config_integration.trace_integrations(['logging'])
logging.basicConfig(format='%(asctime)s traceId=%(traceId)s spanId=%(spanId)s %(message)s')
tracer=Tracer(sampler=AlwaysOnSampler())
logger=logging.getLogger(__name__)
logger.addHandler(AzureLogHandler(connection_string='InstrumentationKey={0}'.format(instrumentation_key_value)))
return logger
logging_instance=instantiate_logger(instrumentation_key_value)
def custom_logging_function(logging_instance, disable_logging, log_type, input_x, *arguments):
"""
Purpose: The standard output sent to Application insights logs
Inputs: -
Return: The logger object.
"""
if disable_logging==0:
if log_type=="INFO" or log_type=="SUCCESSFUL":
logging_instance.setLevel(logging.INFO)
logging_instance.info(input_x)
print(input_x, *arguments)
elif log_type=="ERROR":
logging_instance.setLevel(logging.ERROR)
logging_instance.exception(input_x)
print(input_x, *arguments)
else:
logging_instance.warning(input_x)
print(input_x, *arguments)
else:
print(input_x, *arguments)
Still the code above logs the output of this function:
date_list=merge_hierarchy_list(year_list, month_list, day_list, None, None)
custom_logging_function(logging_instance, disable_logging_value, "INFO", "INFO: {0} chronological dates in the specified time-frame have been created!".format(len(date_list)))
Output (logged 2 times in Application Insights Log traces):
"INFO: 31 chronological dates in the specified time-frame have been created!"
For reason 2:
Are you running your Python file within Databricks notebooks? The notebooks will keep the state of all objects that are instantiated (including the Python logger used). We have come across duplicated log entries before when users run their code multiple times in notebooks, because the AzureLogHandler is added as a handler to the root logger everytime the code is executed again. Running as a normal Python module should not cause this behaviour since state is not kept within subsequent runs.
If you are not using the notebooks, then the issue seems to be with something adding the AzureLogHandler multiple times. Are there multiple workers of some sort in your Databricks pipeline executing the same logic?

How to avoid created empty logger file if no valid parsing output in python?

I have a python script to run log parsing and it will periodically scan through some log files on the disk and try to parse them. However, If the file is not parsable or have no data, my code should just exit for the parsing.
The problem is my script generating an empty log file even when there is not a valid data.
ie:
-rw-r--r-- 1 user userid 0 May 28 08:10 parse.py_20190528_08_10_03.log
I guess this is probably because logger is already initialized when my script is started.
What I want to know is if there is some other way to avoid this by setting? I tried to check a few places but do not know how.
This is my import logger in my script:
import logger
logger = logging.getLogger('upgrade.py')
formatter=logging.Formatter("%(asctime)s - %(levelname)-8s %(message)s")
log_filename = '{}/{}_{}.log'.format(os.getcwd(),os.path.basename(sys.argv[0]),time.strftime("%Y%m%d_%H_%M_%S"))
fh = logging.FileHandler(log_filename)
fh.setLevel(logging.INFO)
fh.setFormatter(formatter)
logger.addHandler(fh)
After my parsing function, I use below to just make sure it did not dump data if no valid data.
main()
......parsing....
if len(outputs) != 0:
logger.info(outputs)
.......
.... output filtering.....
if len(out_list) == 0:
exit(0)
.....
However, this still not prevents it is creating 0 kb file in my directory. I trigger this tool in crontab and it is running periodically which generate lots of such files which is annoying and bad to check.
I know I can also have some outside watcher script to clear those file but that is not a smart act.
You can achieve this by setting the delay parameter to True for the FileHandler:
fh = logging.FileHandler(log_filename, delay=True)
From the docs:
https://docs.python.org/3/library/logging.handlers.html#logging.FileHandler
If delay is true, then file opening is deferred until the first call
to emit().

Print status update without loop iteration (python)

I have a very simple function that is just a list of data writes, but each write takes 5-10s, so the function takes about an hour to run. Since there is no loop, there is no iteration variable. What is the best way to update progress to the user?
Have you considered the logging module? You can create different kinds of handlers to, well, handle log messages. Here's a simple example, but the general idea is you can just put logging messages in your script that write to a file, a stream that prints to the console, or something else.
import datetime
import logging
import time
logger = logging.getLogger('a_name')
logger.setLevel(logging.DEBUG)
sh = logging.StreamHandler() # prints to console
logger.addHandler(sh)
with open('/tmp/test_file.txt', 'w') as f:
logger.info('beginning writing file at ' + str(datetime.datetime.now()))
time.sleep(30) # this is a proxy for doing some file writing
logger.info('the time now is' + str(datetime.datetime.now()))
...
loggering.info('file done being written')
You might want to look info formatting the log messages so you don't have to include the datetime str in an in-elegant way like I did.

Categories