Writing custom log files in Databricks Repos using the logging package

Writing custom log files in Databricks Repos using the logging package - python

I would like to capture custom metrics as a notebook runs in Databricks. I would like to write these to a file using the logging package. The code below seems to run fine but it never writes to file. How do you achieve this in Databricks runtime 9.1?
Also note that I am running this is Repos so I have to explicitly write it to a location. Furthermore this code runs perfectly fine when run from my workspace.
logger = logging.getLogger('server_logger')
logger.setLevel(logging.INFO)
fh = logging.FileHandler('/dbfs/tmp/my_log.log')
fh.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
fh.setFormatter(formatter)
logger.addHandler(fh)
logger.warning(f'starting to log the process')

Perhaps the /dbfs/tmp directory doesn't exist, or you don't have write access to it. Changing the log filename to just mylog.log, it works as expected:
~/SO-logging-misc$ python so_74519222.py
~/SO-logging-misc$ more my_log.log
2022-11-21 14:33:22 - WARNING - starting to log the process

Related

logging to a file from bonobo etl

I have written a bonobo script to extract some data, and I would like to use python's logging module to write some status messages to a file while my job runs. I've done the following:
import logging
logging.basicConfig(filename=INFO["LOGFILE_PATH"]+r'\bonobo_job_'+date.today().isoformat(),
filemode='a',
format='%(name)s - %(levelname)s - %(message)s')
If I simply run the script in Pycharm, it logs to the file as I would expect. But if I run it from the command line with the bonobo run command, it ignores the filename and logs to stdout. How do I fix this? Is there a flag or environment variable I need to set somewhere?

Okay,I figured it out. For some reason, basicConfig doesn't work. I had to use getLogger and add a FileHandler. So in main I did this:
logger = logging.getLogger('bonobo_logger')
ch = logging.FileHandler(logfilename)
formatter = logging.Formatter('%(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)
Then in every node in my graph where I wanted to do logging, I called:
logger = logging.getLogger('bonobo_logger')
and used the logger object to write out all messages. If anyone knows a better way of doing it, please let me know.

Python: How to color logs while printing to a file?

A little context to what I am doing. I am running some python scripts through a different programming language on an industrial controller. Since I am not running the python scripts directly I can't watch any print or log statements from the terminal so I need to send the detailed logs to a log file.
Since we are logging a lot of information when debugging, I wanted to find a way to color the log file such as coloredlogs does to logs printed to terminal. I looked at coloredlogs but it appears that it can only print colored logs to files when using VIM. Does anyone know a way to print colored logs to a file using python that can be opened with a program such as wordpad? (maybe a .rtf file).

It can be a solution to use the Windows PowerShell Get-Content function to print a file which contains ANSI escape sequences to color the log.
For example:
import coloredlogs
import logging
# Create a logger object.
logger = logging.getLogger(__name__)
# Create a filehandler object
fh = logging.FileHandler('spam.log')
fh.setLevel(logging.DEBUG)
# Create a ColoredFormatter to use as formatter for the FileHandler
formatter = coloredlogs.ColoredFormatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.addHandler(fh)
# Install the coloredlogs module on the root logger
coloredlogs.install(level='DEBUG')
logger.debug("this is a debugging message")
logger.info("this is an informational message")
logger.warning("this is a warning message")
logger.error("this is an error message")
logger.critical("this is a critical message")
When opening a Windows PowerShell you can use Get-Content .\spam.log to print the logs in color.

Python logger: won't overwrite the original log?

So, when I copy paste the following x times to the python prompt,
it add the log x times to the end of the designated file.
How can I change the code so that each time I copy paste this to the prompt,
I simply overwrite the existing file (the code seems to not accept the
mode = 'w' option or I do not seem to understand its meaning)
def MinimalLogginf():
import logging
import os
paths = {'work': ''}
logger = logging.getLogger('oneDayFileLoader')
LogHandler = logging.FileHandler(os.path.join(paths["work"] , "oneDayFileLoader.log"), mode='w')
formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
LogHandler.setFormatter(formatter)
logger.addHandler(LogHandler)
logger.setLevel(logging.DEBUG)
#Let's say this is an error:
if(1 == 1):
logger.error('overwrite')
So I run it once:
MinmalLoggingf()
Now, I want the new log file to overwrite the log file created on the previous run:
MinmalLoggingf()

If I understand correctly, you're running a certain Python process for days at a time, and want to rotate the log every day. I'd recommend you go a different route, using a handler that automatically rotates the log file, e.g. http://www.blog.pythonlibrary.org/2014/02/11/python-how-to-create-rotating-logs/
But, if you want to control the log using the process in the same method you're comfortable with (Python console, pasting in code.. extremely unpretty and error prone, but sometimes quick-n-dirty is sufficient for the task at hand), well...
Your issue is that you create a new FileHandler each time you paste in the code, and you add it to the Logger object. You end up with a logger that has X FileHandlers attached to it, all of them writing to the same file. Try this:
import logging
paths = {'work': ''}
logger = logging.getLogger('oneDayFileLoader')
if logger.handlers:
logger.handlers[0].close()
logger.handlers = []
logHandler = logging.FileHandler(os.path.join(paths["work"] , "oneDayFileLoader.log"), mode='w')
formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
logHandler.setFormatter(formatter)
logger.addHandler(logHandler)
logger.setLevel(logging.DEBUG)
logger.error('overwrite')
Based on your request, I've also added an example using TimedRotatingFileHandler. Note I haven't tested it locally, so if you have issues ping back.
import logging
from logging.handlers import TimedRotatingFileHandler
logPath = os.path.join('', "fileLoaderLog")
logger = logging.getLogger('oneDayFileLoader')
logHandler = TimedRotatingFileHandler(logPath,
when="midnight",
interval=1)
formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
logHandler.setFormatter(formatter)
logger.addHandler(logHandler)
logger.setLevel(logging.DEBUG)
logger.error('overwrite')

Your log messages are being duplicated because you call addHandler more than once. Each call to addHandler adds an additional log handler.
If you want to make sure the file is created from scratch, add an extra line of code to remove it:
os.remove(os.path.join(paths["work"], "oneDayFileLoader.log"))

The mode is specified as part of logging.basicConfig and is passed through using filemode.
logging.basicConfig(
level = logging.DEBUG,
format = '%(asctime)s %(levelname)s %(message)s',
filename = 'oneDayFileLoader.log,
filemode = 'w'
)
https://docs.python.org/3/library/logging.html#simple-examples

To to prevent FileHandler logger from impacting other threads?

I've got a custom django admin command and I want to capture the log output for when that command is run and make it available for download in a separate file. Similar to "Console Output" functionality in Jenkins. This command is invoked using django-after-response and I'm running uWSGI.
At the beginning of the admin command, I do this:
deploy_log = NamedTemporaryFile()
formatter = logging.Formatter("%(asctime)-15s %(levelname)-8s %(message)s")
file_handler = logging.FileHandler(deploy_log.name)
file_handler.setFormatter(formatter)
file_handler.setLevel(logging.INFO)
logging.getLogger('').addHandler(file_handler)
Then at the end of the admin command:
logging.getLogger('').removeHandler(file_handler)
The problem I'm running into is that when there are multiple 'deploys' running simultaneously, the deploy_log for one thread will have entries from other threads. How do I avoid this?

I believe I have found the solution. I had to add the following to my uwsgi vassal ini file:
enable-threads = true
Now the log files are not getting jumbled together.

Python logging module logs on Mac, but not Linux

I am experiencing an issue where I am using the logging module in my app. I am working in Eclipse against the LDT Python (Py 2.7) interface (rather than Pydev) on my MacBook Pro. The logging module works through Eclipse; however, when I transfer my app over to a RHEL5 2.7, logging does not seem to be working at all. It is not throwing any exceptions, it is just not logging anything to console or file (it creates the file though).
Code:
# Initialize logging
log = logging.getLogger('pepPrep')
# Log to stderr
console = logging.StreamHandler()
console.setLevel(logging.INFO)
# Log to file
logname = 'pepPrep.' + datetime.datetime.now().strftime("%Y%m%d_%H:%M") + '.log'
filelog = logging.FileHandler(logname)
filelog.setLevel(logging.DEBUG)
# set a format
formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s')
# tell the handler to use this format
console.setFormatter(formatter)
filelog.setFormatter(formatter)
# add the handler to the root logger
log.addHandler(console)
log.addHandler(filelog)
log.INFO('This is a test')
log.DEBUG('This is a test2')
Any pointers on how I can make this work?

The default threshold for logging is WARNING, so INFO and DEBUG messages are not output by default. To do so, add e.g.
logging.getLogger().setLevel(logging.DEBUG)
to get DEBUG and INFO messages.
You can confirm this is your problem by doing
log.warning('This is a test3')
before adding that setLevel, and confirming that the warning is actually output.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Writing custom log files in Databricks Repos using the logging package - python

Perhaps the /dbfs/tmp directory doesn't exist, or you don't have write access to it. Changing the log filename to just mylog.log, it works as expected: ~/SO-logging-misc$ python so_74519222.py ~/SO-logging-misc$ more my_log.log 2022-11-21 14:33:22 - WARNING - starting to log the process

Related

logging to a file from bonobo etl

Python: How to color logs while printing to a file?

Python logger: won't overwrite the original log?

To to prevent FileHandler logger from impacting other threads?

Python logging module logs on Mac, but not Linux

Categories

Resources