How-To: Python TimedRotatingFileHandle for multiple process-instances and files?

How-To: Python TimedRotatingFileHandle for multiple process-instances and files? - python

I just got thrown into the deep end with my new contract. The current system uses the python logging module to do timed log-file rotation. The problem is that the log-file of the process running as a daemon gets rotated correctly, while the other log-file of the process instances that get created and destroyed when done does not rotate. Ever. I have now got to find a solution to this problem. After 2 days of research on the internet and python documentation I'm only halfway out of the dark. Since I'm new to the logging module I can't see the answer to the problem since I'm probably looking with my eyes closed!
The process is started with:
python /admin/bin/fmlog.py -l 10 -f /tmp/fmlog/fmapp_log.log -d
where:
-l 10 => DEBUG logging-level
-f ... => Filename to log to for app-instance
-d => run as daemon
The following shows a heavily edited version of my code:
#!/usr/bin python
from comp.app import app, yamlapp
...
from comp.utils.log4_new import *
# Exceptions handling class
class fmlogException(compException): pass
class fmlog(app):
# Fmlog application class
def __init__(self, key, config, **kwargs):
# Initialise the required variables
app.__init__(self, key, config, **kwargs)
self._data = {'sid': self._id}
...
def process(self, tid=None):
if tid is not None:
self.logd("Using thread '%d'." % (tid), data=self._data)
# Run the fmlog process
self.logi("Processing this '%s'" % (filename), data=self._data)
...
def __doDone__(self, success='Failure', msg='', exception=None):
...
self.logd("Process done!")
if __name__ == '__main__':
def main():
with yamlapp(filename=config, cls=fmlog, configcls=fmlogcfg, sections=sections, loglevel=loglevel, \
logfile=logfile, excludekey='_dontrun', sortkey='_priority', usethreads=threads, maxthreads=max, \
daemon=daemon, sleep=sleep) as a:
a.run()
main()
The yamlapp process (sub-class of app) is instantiated and runs as a daemon until manually stopped. This process will only create 1 or more instance(s) of the fmlog class and call the process() function when needed (certain conditions met). Up to x instances can be created per thread if the yamlapp process is run in thread-mode.
The app process code:
#!/usr/bin/env python
...
from comp.utils.log4_new import *
class app(comp.base.comp, logconfig, log):
def __init__(self, cls, **kwargs):
self.__setdefault__('_configcls', configitem)
self.__setdefault__('_daemon', True)
self.__setdefault__('_maxthreads', 5)
self.__setdefault__('_usethreads', False)
...
comp.base.comp.__init__(self, **kwargs)
logconfig.__init__(self, prog(), **getlogkwargs(**kwargs))
log.__init__(self, logid=prog())
def __enter__(self):
self.logi(msg="Starting application '%s:%s' '%d'..." % (self._cls.__name__, \
self.__class__.__name__, os.getpid()))
return self
def ...
def run(self):
...
if self._usethreads:
...
while True:
self.logd(msg="Start of run iteration...")
if not self._usethreads:
while not self._q.empty():
item = self._q.get()
try:
item.process()
self.logd(msg="End of run iteration...")
time.sleep(self._sleep)
The logging config and setup is done via the log4_new.py classes:
#!/usr/bin/env python
import logging
import logging.handlers
import re
class logconfig(comp):
def __init__(self, logid, **kwargs):
comp.__init__(self, **kwargs)
self.__setdefault__('_logcount', 20)
self.__setdefault__('_logdtformat', None)
self.__setdefault__('_loglevel', DEBUG)
self.__setdefault__('_logfile', None)
self.__setdefault__('_logformat', '[%(asctime)-15s][%(levelname)5s] %(message)s')
self.__setdefault__('_loginterval', 'S')
self.__setdefault__('_logintervalnum', 30)
self.__setdefault__('_logsuffix', '%Y%m%d%H%M%S')
self._logid = logid
self.__loginit__()
def __loginit__(self):
format = logging.Formatter(self._logformat, self._logdtformat)
if self._logfile:
hnd = logging.handlers.TimedRotatingFileHandler(self._logfile, when=self._loginterval, interval=self._logintervalnum, backupCount=self._logcount)
hnd.suffix = self._logsuffix
hnd.extMatch = re.compile(strftoregex(self._logsuffix))
else:
hnd = logging.StreamHandler()
hnd.setFormatter(format)
l = logging.getLogger(self._logid)
for h in l.handlers:
l.removeHandler(h)
l.setLevel(self._loglevel)
l.addHandler(hnd)
class log():
def __init__(self, logid):
self._logid = logid
def __log__(self, msg, level=DEBUG, data=None):
l = logging.getLogger(self._logid)
l.log(level, msg, extra=data)
def logd(self, msg, **kwargs):
self.__log__(level=DEBUG, msg=msg, **kwargs)
def ...
def logf(self, msg, **kwargs):
self.__log__(level=FATAL, msg=msg, **kwargs)
def getlogkwargs(**kwargs):
logdict = {}
for key, value in kwargs.iteritems():
if key.startswith('log'): logdict[key] = value
return logdict
Logging is done as expected: logs from yamlapp (sub-class of app) is written to fmapp_log.log, and logs from fmlog is written to fmlog.log.
The problem is that fmapp_log.log is rotated as expected, but fmlog.log is never rotated. How do I solve this? I know the process must run continuously for the rotation to happen, that is why only one logger is used. I suspect another handle must be created for the fmlog process which must never be destroyed when the process exits.
Requirements:
The app (framework or main) log and the fmlog (process) log must be to different files.
Both log-files must be time-rotated.
Hopefully someone will understand the above and be able to give me a couple of pointers.

Related

sentry sdk custom performance integration for python app

Sentry can track performance for celery tasks and API endpoints
https://docs.sentry.io/product/performance/
I have custom script that are lunching by crone and do set of similar tasks
I want to incorporated sentry_sdk into my script to get performance tracing of my tasks
Any advise how to do it with
https://getsentry.github.io/sentry-python/api.html#sentry_sdk.capture_event

You don't need use capture_event
I would suggest to use sentry_sdk.start_transaction instead. It also allows track your function performance.
Look at my example
from time import sleep
from sentry_sdk import Hub, init, start_transaction
init(
dsn="dsn",
traces_sample_rate=1.0,
)
def sentry_trace(func):
def wrapper(*args, **kwargs):
transaction = Hub.current.scope.transaction
if transaction:
with transaction.start_child(op=func.__name__):
return func(*args, **kwargs)
else:
with start_transaction(op=func.__name__, name=func.__name__):
return func(*args, **kwargs)
return wrapper
#sentry_trace
def b():
for i in range(1000):
print(i)
#sentry_trace
def c():
sleep(2)
print(1)
#sentry_trace
def a():
sleep(1)
b()
c()
if __name__ == '__main__':
a()
After starting this code you can see basic info of transaction a with childs b and c

Receiving the logging messages of a python process opened via subprocess

I have a python program that is using pygame. I want to create another pygame window for some additional content and have a seperate script for that. I use socket and localhost for communication.
I am using subprocess to run the script that displays the second pygame window. This script has a number of logging messages that are not displayed on the stdout of the terminal I am using. Is there a way to redirect the logging messages so that they are printed to the console alongside the logging messages of the main program?
So far I have set up a logwrapper that captures the logged output:
level = logging.DEBUG
def getLogger(module_name):
wrapper = LogWrapper(module_name)
return wrapper.getLogger(),wrapper
class LogWrapper():
def __init__(self,module_name):
self.module_name = module_name
self.log_capture_string = None
self.log_trace = []
#property
def trace(self):
values = self.log_capture_string.getvalue()
self.log_trace = self.log_trace+values.split("\n")
return self.log_trace
def getLogger(self, **kwargs):
### Create the logger
logging.config.dictConfigClass(configuration).configure()
logger = logging.getLogger(self.module_name)
logger.setLevel(level)
### Setup the console handler with a FIFOIO object
self.log_capture_string = FIFOIO(32768)
ch = logging.StreamHandler(self.log_capture_string)
ch.setLevel(logging.DEBUG)
### Optionally add a formatter
### Add the console handler to the logger
logger.addHandler(ch)
logger.info("set up logwrap for {}".format(self.module_name))
return logger
class FIFOIO(io.TextIOBase):
def __init__(self, size, *args):
self.maxsize = size
io.TextIOBase.__init__(self, *args)
self.deque = collections.deque()
def getvalue(self):
return ''.join(self.deque)
def write(self, x):
self.deque.append(x)
self.shrink()
def shrink(self):
if self.maxsize is None:
return
size = sum(len(x) for x in self.deque)
while size > self.maxsize:
x = self.deque.popleft()
size -= len(x)
But after running the main program and calling subprogram.logwrapper.trace upon termination doesn't capture any of the error messages from the runtime, only the initialization message, so I am looking for a better way to access this information.

How critical is pylint's no-self-use?

As an example I have a Django custom management command which periodically (APScheduler + CronTrigger) sends tasks to Dramatiq.
Why the following code with separate functions:
def get_crontab(options):
"""Returns crontab whether from options or settings"""
crontab = options.get("crontab")
if crontab is None:
if not hasattr(settings, "REMOVE_TOO_OLD_CRONTAB"):
raise ImproperlyConfigured("Whether set settings.REMOVE_TOO_OLD_CRONTAB or use --crontab argument")
crontab = settings.REMOVE_TOO_OLD_CRONTAB
return crontab
def add_cron_job(scheduler: BaseScheduler, actor, crontab):
"""Adds cron job which triggers Dramatiq actor"""
module_path = actor.fn.__module__
actor_name = actor.fn.__name__
trigger = CronTrigger.from_crontab(crontab)
job_path = f"{module_path}:{actor_name}.send"
job_name = f"{module_path}.{actor_name}"
scheduler.add_job(job_path, trigger=trigger, name=job_name)
def run_scheduler(scheduler):
"""Runs scheduler in a blocking way"""
def shutdown(signum, frame):
scheduler.shutdown()
signal.signal(signal.SIGINT, shutdown)
signal.signal(signal.SIGTERM, shutdown)
scheduler.start()
class Command(BaseCommand):
help = "Periodically removes too old publications from the RSS feed"
def add_arguments(self, parser: argparse.ArgumentParser):
parser.add_argument("--crontab", type=str)
def handle(self, *args, **options):
scheduler = BlockingScheduler()
add_cron_job(scheduler, tasks.remove_too_old_publications, get_crontab(options))
run_scheduler(scheduler)
is better than a code with methods?
class Command(BaseCommand):
help = "Periodically removes too old publications from the RSS feed"
def add_arguments(self, parser: argparse.ArgumentParser):
parser.add_argument("--crontab", type=str)
def get_crontab(self, options):
"""Returns crontab whether from options or settings"""
crontab = options.get("crontab")
if crontab is None:
if not hasattr(settings, "REMOVE_TOO_OLD_CRONTAB"):
raise ImproperlyConfigured(
"Whether set settings.REMOVE_TOO_OLD_CRONTAB or use --crontab argument"
)
crontab = settings.REMOVE_TOO_OLD_CRONTAB
return crontab
def handle(self, *args, **options):
scheduler = BlockingScheduler()
self.add_cron_job(scheduler, tasks.remove_too_old_publications, self.get_crontab(options))
self.run_scheduler(scheduler)
def add_cron_job(self, scheduler: BaseScheduler, actor, crontab):
"""Adds cron job which triggers Dramatiq actor"""
module_path = actor.fn.__module__
actor_name = actor.fn.__name__
trigger = CronTrigger.from_crontab(crontab)
job_path = f"{module_path}:{actor_name}.send"
job_name = f"{module_path}.{actor_name}"
scheduler.add_job(job_path, trigger=trigger, name=job_name)
def run_scheduler(self, scheduler):
"""Runs scheduler in a blocking way"""
def shutdown(signum, frame):
scheduler.shutdown()
signal.signal(signal.SIGINT, shutdown)
signal.signal(signal.SIGTERM, shutdown)
scheduler.start()
This code is used in a one single place and will not be reused.
StackOverflow requires more details, so:
The second code is the version that I originally wrote. After that, I runned Prospector with Pylint and besides other useful messages I've got pylint: no-self-use / Method could be a function (col 4). To solve this issue I rewrote my code as in the first example. But I still don't understand why it is better this way.

At least, in this case, it is not better. Pylint is notifying you about "self" being unused, just like it would notify you about a variable or an import being unused.
Couple of other options for fixing the pylint-messages would be to actually use "self" in the functions or add staticmethod (or classmethod) decorator. Examples for both are after the horizontal line. Here are the docs for staticmethod and here's the difference between staticmethod and classmethod.
Since this is a Django-command and you likely won't have multiple instances of the class or other classes that inherit Command (that would i.e. overload the functions) or something that would benefit from the functions being inside the class, pick the one you find most readable/easiest to change.
And just for completeness, StackExchange Code Review could have further insight for which is best, if any.
Example that uses self, main difference is that scheduler is created in __init__ and not passed as an argument to the functions that use it:
class Command(BaseCommand):
help = "Periodically removes too old publications from the RSS feed"
def __init__(self):
super().__init__()
self.scheduler = BlockingScheduler()
def add_arguments(self, parser: argparse.ArgumentParser):
parser.add_argument("--crontab", type=str)
def handle(self, *args, **options):
self.add_cron_job(tasks.remove_too_old_publications, self.get_crontab(options))
self.run_scheduler()
# ...
def run_scheduler(self):
"""Runs scheduler in a blocking way"""
def shutdown(signum, frame):
self.scheduler.shutdown()
signal.signal(signal.SIGINT, shutdown)
signal.signal(signal.SIGTERM, shutdown)
self.scheduler.start()
Example that uses staticmethod, where the only difference is the staticmethod-decorator and the functions with the decorator don't have self-argument:
class Command(BaseCommand):
help = "Periodically removes too old publications from the RSS feed"
def add_arguments(self, parser: argparse.ArgumentParser):
parser.add_argument("--crontab", type=str)
def handle(self, *args, **options):
scheduler = BlockingScheduler()
self.add_cron_job(scheduler, tasks.remove_too_old_publications, self.get_crontab(options))
self.run_scheduler(scheduler)
# ...
#staticmethod
def run_scheduler(scheduler):
"""Runs scheduler in a blocking way"""
def shutdown(signum, frame):
scheduler.shutdown()
signal.signal(signal.SIGINT, shutdown)
signal.signal(signal.SIGTERM, shutdown)
scheduler.start()

luigi - how to create a dependency not between files, but between tasks? (or how to not involve the output method)

Given two luigi tasks, how can I add one as a requirement for the other, in a way that if the required is done, the second task could start, with no output involved?
Currently I get RuntimeError: Unfulfilled dependency at run time: MyTask___home_... even though the task completed ok, because my requires / output methods are not configured right...
class ShellTask(ExternalProgramTask):
"""
ExternalProgramTask's subclass dedicated for one task with the capture output ability.
Args:
shell_cmd (str): The shell command to be run in a subprocess.
capture_output (bool, optional): If True the output is not displayed to console,
and printed after the task is done via
logger.info (both stdout + stderr).
Defaults to True.
"""
shell_cmd = luigi.Parameter()
requirement = luigi.Parameter(default='')
succeeded = False
def on_success(self):
self.succeeded = True
def requires(self):
return eval(self.requirement) if self.requirement else None
def program_args(self):
"""
Must be implemented in an ExternalProgramTask subclass.
Returns:
A script that would be run in a subprocess.Popen.
Args:
shell_cmd (luigi.Parameter (str)): the shell command to be passed as args
to the run method (run should not be overridden!).
"""
return self.shell_cmd.split()
class MyTask(ShellTask):
"""
Args: if __name__ == '__main__':
clean_output_files(['_.txt'])
task = MyTask(
shell_cmd='...',
requirement="MyTask(shell_cmd='...', output_file='_.txt')",
)
"""
pass
if __name__ == '__main__':
task_0 = MyTask(
shell_cmd='...',
requirement="MyTask(shell_cmd='...')",
)
luigi.build([task_0], workers=2, local_scheduler=False)
I hoped using the on_success could prompt something to the caller task, but I didn't figure out how to.
I'm currently overcoming this in the following way:
0) implement the output method based on the input of the task (much like the eval(requirement) I did
2) implement the run method (calling the super run and then writing "ok" to output
3) deleting the output files from main.
4) calling it somehitng like this:
if __name__ == '__main__':
clean_output_files(['_.txt'])
task = MyTask(
shell_cmd='...',
requirement="MyTask(shell_cmd='...', output_file='_.txt')",
)

So within your first luigi task, you could call your second Task within by making it a requirement.
For example:
class TaskB(luigi.Task):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.complete_flag = False
def run(self):
self.complete_flag = True
print('do something')
def complete(self):
return self.is_complete
class TaskA(luigi.Task):
def requires(self):
return TaskB()
def run(self):
print('Carry on with other logic')

Twisted trial hangs between tests

I'm pretty new to twisted and I'm attempting to write some unit tests using the trial test framework. My tests run and pass as expected, but for some reason trial is hanging between tests. I have to hit CTRL+C after each test to get it to move on to the next one. I'm guessing I have something configured incorrectly or I'm not calling some method I should be to tell trial the test is done.
Here is the class under test:
from twisted.internet import reactor, defer
import threading
import time
class SomeClass:
def doSomething(self):
return self.asyncMethod()
def asyncMethod(self):
d = defer.Deferred()
t = SomeThread(d)
t.start()
return d
class SomeThread(threading.Thread):
def __init__(self, d):
super(SomeThread, self).__init__()
self.d = d
def run(self):
time.sleep(2) # pretend to do something
retVal = 123
self.d.callback(retVal)
Here is the unit test class:
from twisted.trial import unittest
import tested
class SomeTest(unittest.TestCase):
def testOne(self):
sc = tested.SomeClass()
d = sc.doSomething()
return d.addCallback(self.allDone)
def allDone(self, retVal):
self.assertEquals(retVal, 123)
def testTwo(self):
sc = tested.SomeClass()
d = sc.doSomething()
return d.addCallback(self.allDone2)
def allDone2(self, retVal):
self.assertEquals(retVal, 123)
This is what the command line output looks like:
me$ trial test.py
test
SomeTest
testOne ... ^C [OK]
testTwo ... ^C [OK]
-------------------------------------------------------------------------------
Ran 2 tests in 8.499s
PASSED (successes=2)

I guess your problem has to do with your threads. Twisted is not thread-safe, and if you need to interface with threads you should let the reactor handle things by using deferToThread, callInThread, callFromThread.
See here for info on how to be thread-safe with Twisted.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How-To: Python TimedRotatingFileHandle for multiple process-instances and files? - python

Related

sentry sdk custom performance integration for python app

Receiving the logging messages of a python process opened via subprocess

How critical is pylint's no-self-use?

luigi - how to create a dependency not between files, but between tasks? (or how to not involve the output method)

Twisted trial hangs between tests

Categories

Resources