Python: why logging in multiprocessing not working

Python: why logging in multiprocessing not working - python

After I port my script to Windows from Mac (both python 2.7.*), I find that all the logging not working in subprocess, only the father's logging are write to file. Here is my example code:
# test log among multiple process env
import logging, os
from multiprocessing import Process
def child():
logging.info('this is child')
if __name__ == '__main__':
logging.basicConfig(filename=os.path.join(os.getcwd(), 'log.out'),
level = logging.DEBUG, filemode='w',
format = '[%(filename)s:%(lineno)d]: %(asctime)s - %(levelname)s: %(message)s')
p = Process(target = child, args = ())
p.start()
p.join()
logging.info('this is father')
the output only write this is father into log.out, and the child's log missing. How to make logging woking in child process?

Each child is an independent process, and file handles in the parent may be closed in the child after a fork (assuming POSIX). In any case, logging to the same file from multiple processes is not supported. See the documentation for suggested approaches.

Related

Python create multiple log files

I am using batch processing and call below function parallelly. I need to create new log file for each process
Below is sample code
import logging
def processDocument(inputfilename):
logfile=inputfilename+'.log'
logging.basicConfig(
filename=logfile,
level=logging.INFO)
//performing some function
logging.info("process completed for file")
logging.shutdown()
It is creating log file. But when I pass this function in batch for calling 20 times. Only 16 log files are getting created.

These issue can happen with threads race conditions.
If the document processing is independent from each other, I would suggest to use multiprocessing via the high level class concurrent.futures.ProcessPoolExecutor.
If you want to stick to threads because the document processing is more I/O bound, there is concurrent.futures.ThreadPoolExecutor, which provides the same interface but with threads.
Last but not least, configure properly your logging like this toy example (which uses standard threading library):
import logging
from sys import stdout
import time
import threading
def processDocument(inputfilename:str):
logfile = inputfilename + '.log'
this_thread = threading.current_thread().name
this_thread_logger = logging.getLogger(this_thread)
file_handler = logging.FileHandler(filename=logfile)
out_handler = logging.StreamHandler(stdout)
file_handler.level = logging.INFO
out_handler.level = logging.INFO
this_thread_logger.addHandler(file_handler)
this_thread_logger.addHandler(out_handler)
this_thread_logger.info(f'Processing {inputfilename} from {this_thread}...')
time.sleep(1) # processing
this_thread_logger.info(f'Processing {inputfilename} from {this_thread}... Done')
def main():
filenames = ['hello', 'hello2', 'hello3']
threads = [threading.Thread(target=processDocument, args=(name,)) for name in filenames]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
logging.shutdown()
main()

how to log messages to a file when running multiple processes

I am learning about logging when running multiple process.
Below is how I normally log things when running a single process.
import logging
log_format = '%(asctime)s - %(name)s - %(levelname)s - %(messsage)s'
logger = logging.getLogger(__name__)
logger.setLevel('Debug')
file_handler = logging.FileHandler('C:/my_directory/logs/file_name.log')
formatter = logging.Formatter(log_format)
file_handler.setFormatter(formatter)
# to stop duplication
if not len(logger.handlers):
logger.addHandler(file_handler)
So after my code has run I can go to C:/my_directory/logs/file_name.log & check what I need to.
With multiple processes I understand its not so simple. I have read this great article. I have copied the example code below. What I don't understand is how to output the logged messages to a file like above so that I can read it after the code has finished?
from random import random
from time import sleep
from multiprocessing import current_process
from multiprocessing import Process
from multiprocessing import Queue
from logging.handlers import QueueHandler
import logging
# executed in a process that performs logging
def logger_process(queue):
# create a logger
logger = logging.getLogger('app')
# configure a stream handler
logger.addHandler(logging.StreamHandler())
# log all messages, debug and up
logger.setLevel(logging.DEBUG)
# run forever
while True:
# consume a log message, block until one arrives
message = queue.get()
# check for shutdown
if message is None:
break
# log the message
logger.handle(message)
# task to be executed in child processes
def task(queue):
# create a logger
logger = logging.getLogger('app')
# add a handler that uses the shared queue
logger.addHandler(QueueHandler(queue))
# log all messages, debug and up
logger.setLevel(logging.DEBUG)
# get the current process
process = current_process()
# report initial message
logger.info(f'Child {process.name} starting.')
# simulate doing work
for i in range(5):
# report a message
logger.debug(f'Child {process.name} step {i}.')
# block
sleep(random())
# report final message
logger.info(f'Child {process.name} done.')
# protect the entry point
if __name__ == '__main__':
# create the shared queue
queue = Queue()
# create a logger
logger = logging.getLogger('app')
# add a handler that uses the shared queue
logger.addHandler(QueueHandler(queue))
# log all messages, debug and up
logger.setLevel(logging.DEBUG)
# start the logger process
logger_p = Process(target=logger_process, args=(queue,))
logger_p.start()
# report initial message
logger.info('Main process started.')
# configure child processes
processes = [Process(target=task, args=(queue,)) for i in range(5)]
# start child processes
for process in processes:
process.start()
# wait for child processes to finish
for process in processes:
process.join()
# report final message
logger.info('Main process done.')
# shutdown the queue correctly
queue.put(None)
Update
I added the below code in the logger_process function just before the While True: loop. However when I look in the file, there is nothing there. I'm not seeing any output, not sure what I'm missing?
# add file handler
log_format = '%(asctime)s - %(name)s - %(levelname)s - %(messsage)s'
file_handler = logging.FileHandler('C:/my_directory/logs/file_name.log')
formatter = logging.Formatter(log_format)
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)

Python: How to use different logfiles for processes in multiprocessing.Pool?

I am using multiprocessing.Pool to run a number of independent processes in parallel. Not so much different from the basic example in the python docs:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
I would like each process to have a separate log file. I log various info from other modules in my codebase and some third-party packages (none of them is multiprocessing aware). So, for example, I would like this:
import logging
from multiprocessing import Pool
def f(x):
logging.info(f"x*x={x*x}")
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, range(10)))
to write on disk:
log1.log
log2.log
log3.log
log4.log
log5.log
How do I achieve it?

You'll need to use Pool's initializer() to set up and register the separate loggers immediately after workers start up. Under the hood the arguments to Pool(initializer) and Pool(initargs) end up being passed to Process(target) and Process(args) for creating new worker-processes...
Pool-workers get named in the format {start_method}PoolWorker-{number}, so e.g. SpawnWorker-1 if you use spawn as starting method for new processes.
The file number for the logfiles then can be extracted from the assigned worker-names with mp.current_process().name.split('-')[1].
import logging
import multiprocessing as mp
def f(x):
logger.info(f"x*x={x*x}")
return x*x
def _init_logging(level=logging.INFO, mode='a'):
worker_no = mp.current_process().name.split('-')[1]
filename = f"log{worker_no}.log"
fh = logging.FileHandler(filename, mode=mode)
fmt = logging.Formatter(
'%(asctime)s %(processName)-10s %(name)s %(levelname)-8s --- %(message)s'
)
fh.setFormatter(fmt)
logger = logging.getLogger()
logger.addHandler(fh)
logger.setLevel(level)
globals()['logger'] = logger
if __name__ == '__main__':
with mp.Pool(5, initializer=_init_logging, initargs=(logging.DEBUG,)) as pool:
print(pool.map(f, range(10)))
Note, due to the nature of multiprocessing, there's no guarantee for the exact number of files you end up with in your small example.
Since multiprocessing.Pool (contrary to concurrent.futures.ProcessPoolExecutor) starts workers as soon as you create the instance, you're bound to get the specified Pool(process)-number of files, so in your case 5. Actual thread/process-scheduling by your OS might cut this number short here, though.

python-daemon and logging: set logging level interactively

I have a python-daemon process that logs to a file via a ThreadedTCPServer (inspired by the cookbook example: https://docs.python.org/2/howto/logging-cookbook.html#sending-and-receiving-logging-events-across-a-network, as I will have many such processes writing to the same file). I am controlling the spawning of the daemon process using subprocess.Popen from an ipython console, and this is how the application will be run. I am able to successfully write to the log file from both the main ipython process, as well as the daemon process, but I am unable to change the level of both by just simply setting the level of the root logger in ipython. Is this something that should be possible? Or will it require custom functionality to set the logging.level of the daemon separately?
Edit: As requested, here is an attempt to provide a pseudo-code example of what I am trying to achieve. I hope that this is a sufficient description.
daemon_script.py
import logging
import daemon
from other_module import function_to_run_as_daemon
class daemon(object):
def __init__(self):
self.daemon_name = __name__
logging.basicConfig() # <--- required, or I don't get any log messages
self.logger = logging.getLogger(self.daemon_name)
self.logger.debug( "Created logger successfully" )
def run(self):
with daemon.daemonContext( files_preserve = [self.logger.handlers[0].stream] )
self.logger.debug( "Daemonised successfully - about to enter function" )
function_to_run_as_daemon()
if __name__ == "__main__":
d = daemon()
d.run()
Then in ipython i would run something like
>>> import logging
>>> rootlogger = logging.getLogger()
>>> rootlogger.info( "test" )
INFO:root:"test"
>>> subprocess.Popen( ["python" , "daemon_script.py"] )
DEBUG:__main__:"Created logger successfully"
DEBUG:__main__:"Daemonised successfully - about to enter function"
# now i'm finished debugging and testing, i want to reduce the level for all the loggers by changing the level of the handler
# Note that I also tried changing the level of the root handler, but saw no change
>>> rootlogger.handlers[0].setLevel(logging.INFO)
>>> rootlogger.info( "test" )
INFO:root:"test"
>>> print( rootlogger.debug("test") )
None
>>> subprocess.Popen( ["python" , "daemon_script.py"] )
DEBUG:__main__:"Created logger successfully"
DEBUG:__main__:"Daemonised successfully - about to enter function"
I think that I may not be approaching this correctly, but, its not clear to me what would work better. Any advice would be appreciated.

The logger you create in your daemon won't be the same as the logger you made in ipython. You could test this to be sure, by just printing out both logger objects themselves, which will show you their memory addresses.
I think a better pattern would be be that you pass if you want to be in "debug" mode or not, when you run the daemon. In other words, call popen like this:
subprocess.Popen( ["python" , "daemon_script.py", "debug"] )
It's up to you, you could pass a string meaning "debug mode is on" as above, or you could pass the log level constant that means "debug", e.g.:
subprocess.Popen( ["python" , "daemon_script.py", "10"] )
(https://docs.python.org/2/library/logging.html#levels)
Then in the daemon's init function use argv for example, to get that argument and use it:
...
import sys
def __init__(self):
self.daemon_name = __name__
logging.basicConfig() # <--- required, or I don't get any log messages
log_level = int(sys.argv[1]) # Probably don't actually just blindly convert it without error handling
self.logger = logging.getLogger(self.daemon_name)
self.logger.setLevel(log_level)
...

Does python logging support multiprocessing?

I have been told that logging can not be used in Multiprocessing. You have to do the concurrency control in case multiprocessing messes the log.
But I did some test, it seems like there is no problem using logging in multiprocessing
import time
import logging
from multiprocessing import Process, current_process, pool
# setup log
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',
datefmt='%a, %d %b %Y %H:%M:%S',
filename='/tmp/test.log',
filemode='w')
def func(the_time, logger):
proc = current_process()
while True:
if time.time() >= the_time:
logger.info('proc name %s id %s' % (proc.name, proc.pid))
return
if __name__ == '__main__':
the_time = time.time() + 5
for x in xrange(1, 10):
proc = Process(target=func, name=x, args=(the_time, logger))
proc.start()
As you can see from the code.
I deliberately let the subprocess write log at the same moment( 5s after start) to increase the chance of conflict. But there are no conflict at all.
So my question is can we use logging in multiprocessing?
Why so many posts say we can not ?

As Matino correctly explained: logging in a multiprocessing setup is not safe, as multiple processes (who do not know anything about the other ones existing) are writing into the same file, potentially intervening with each other.
Now what happens is that every process holds an open file handle and does an "append write" into that file. The question is under what circumstances the append write is "atomic" (that is, cannot be interrupted by e.g. another process writing to the same file and intermingling his output). This problem applies to every programming language, as in the end they'll do a syscall to the kernel. This answer answers under which circumstances a shared log file is ok.
It comes down to checking your pipe buffer size, on linux that is defined in /usr/include/linux/limits.h and is 4096 bytes. For other OSes you find here a good list.
That means: If your log line is less than 4'096 bytes (if on Linux), then the append is safe, if the disk is directly attached (i.e. no network in between). But for more details please check the first link in my answer. To test this you can do logger.info('proc name %s id %s %s' % (proc.name, proc.pid, str(proc.name)*5000)) with different lenghts. With 5000 for instance I got already mixed up log lines in /tmp/test.log.
In this question there are already quite a few solutions to this, so I won't add my own solution here.
Update: Flask and multiprocessing
Web frameworks like flask will be run in multiple workers if hosted by uwsgi or nginx. In that case, multiple processes may write into one log file. Will it have problems?
The error handling in flask is done via stdout/stderr which is then cought by the webserver (uwsgi, nginx, etc.) which needs to take care that logs are written in correct fashion (see e.g. this flask+nginx example), probably also adding process information so you can associate error lines to processes. From flasks doc:
By default as of Flask 0.11, errors are logged to your webserver’s log automatically. Warnings however are not.
So you'd still have this issue of intermingled log files if you use warn and the message exceeds the pipe buffer size.

It is not safe to write to a single file from multiple processes.
According to https://docs.python.org/3/howto/logging-cookbook.html#logging-to-a-single-file-from-multiple-processes
Although logging is thread-safe, and logging to a single file from
multiple threads in a single process is supported, logging to a single
file from multiple processes is not supported, because there is no
standard way to serialize access to a single file across multiple
processes in Python.
One possible solution would be to have each process write to its own file. You can achieve this by writing your own handler that adds process pid to the end of the file:
import logging.handlers
import os
class PIDFileHandler(logging.handlers.WatchedFileHandler):
def __init__(self, filename, mode='a', encoding=None, delay=0):
filename = self._append_pid_to_filename(filename)
super(PIDFileHandler, self).__init__(filename, mode, encoding, delay)
def _append_pid_to_filename(self, filename):
pid = os.getpid()
path, extension = os.path.splitext(filename)
return '{0}-{1}{2}'.format(path, pid, extension)
Then you just need to call addHandler:
logger = logging.getLogger('foo')
fh = PIDFileHandler('bar.log')
logger.addHandler(fh)

Use a queue for correct handling of concurrency simultaneously recovering from errors by feeding everything to the parent process via a pipe.
from logging.handlers import RotatingFileHandler
import multiprocessing, threading, logging, sys, traceback
class MultiProcessingLog(logging.Handler):
def __init__(self, name, mode, maxsize, rotate):
logging.Handler.__init__(self)
self._handler = RotatingFileHandler(name, mode, maxsize, rotate)
self.queue = multiprocessing.Queue(-1)
t = threading.Thread(target=self.receive)
t.daemon = True
t.start()
def setFormatter(self, fmt):
logging.Handler.setFormatter(self, fmt)
self._handler.setFormatter(fmt)
def receive(self):
while True:
try:
record = self.queue.get()
self._handler.emit(record)
except (KeyboardInterrupt, SystemExit):
raise
except EOFError:
break
except:
traceback.print_exc(file=sys.stderr)
def send(self, s):
self.queue.put_nowait(s)
def _format_record(self, record):
# ensure that exc_info and args
# have been stringified. Removes any chance of
# unpickleable things inside and possibly reduces
# message size sent over the pipe
if record.args:
record.msg = record.msg % record.args
record.args = None
if record.exc_info:
dummy = self.format(record)
record.exc_info = None
return record
def emit(self, record):
try:
s = self._format_record(record)
self.send(s)
except (KeyboardInterrupt, SystemExit):
raise
except:
self.handleError(record)
def close(self):
self._handler.close()
logging.Handler.close(self)
The handler does all the file writing from the parent process and uses just one thread to receive messages passed from child processes

QueueHandler is native in Python 3.2+, and safely handles multiprocessing logging.
Python docs have two complete examples: Logging to a single file from multiple processes
For those using Python < 3.2, just copy QueueHandler into your own code from: https://gist.github.com/vsajip/591589 or alternatively import logutils.
Each process (including the parent process) puts its logging on the Queue, and then a listener thread or process (one example is provided for each) picks those up and writes them all to a file - no risk of corruption or garbling.
Note: this question is basically a duplicate of How should I log while using multiprocessing in Python? so I've copied my answer from that question as I'm pretty sure it's currently the best solution.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.