how to use the twistd log system for logging my data? - python

I am interested in the main features of the twistd logging system, and I'm using it to log some data i need, much more that logging the real state of the twisted application. BTW it is very noisy, i've readed this Twisted: disable logging of Twisted-framework classes but i am not sure to get the point. Also using the .noise will not fit my need.
I would like to know if it is possible and easy to separate clearly in two logging systems having on one hand the log necessary for the twisted matrix application and on the other only the log that will contain my important data?
(so that i can have the time features, the rotating and so on on my own data, as i i've already spent some efforts with the kindly help of many guys from here to adapt the process of twisted logging to my needs).
can someone give me some hints on how to achieve this?
or maybe your main advice will be that i should definitively open a file, print my time format and data line inside it. And implements my own rotating on this file rather that turning away/hijack the twisted logging system to my needs?
i have also think of using a log.msg(mydata, system = "myownflag") and then using a grep myownflag | my log > only-my-data but there can be better ideas...
(i'm new to twisted, and learning it the wrong way, from the end and deeping too fast on my needs, rather that on the library basis, so i miss lot of thing. Please apologize me for that.)
best regards.

Here is a log observer wrapper that filters out events from the wrong system:
from functools import wraps
def makeObserver(system, originalObserver):
#wraps(originalObserver)
def observe(event):
if event.get("system", None) == system:
originalObserver(event)
You can use this by wrapping any existing observer and adding it to the logging system:
from twisted.python.log import FileLogObserver, addObserver
fileObs = FileLogObserver(file("myownflag.log", "at"))
addObserver(makeObserver("myownflag", fileObs.emit)

Related

How can I measure the coverage (in production system)?

I would like to measure the coverage of my Python code which gets executed in the production system.
I want an answer to this question:
Which lines get executed often (hot spots) and which lines are never used (dead code)?
Of course this must not slow down my production site.
I am not talking about measuring the coverage of tests.
I assume you are not talking about test suite code coverage which the other answer is referring to. That is a job for CI indeed.
If you want to know which code paths are hit often in your production system, then you're going to have to do some instrumentation / profiling. This will have a cost. You cannot add measurements for free. You can do it cheaply though and typically you would only run it for short amounts of time, long enough until you have your data.
Python has cProfile to do full profiling, measuring call counts per function etc. This will give you the most accurate data but will likely have relatively high impact on performance.
Alternatively, you can do statistical profiling which basically means you sample the stack on a timer instead of instrumenting everything. This can be much cheaper, even with high sampling rate! The downside of course is a loss of precision.
Even though it is surprisingly easy to do in Python, this stuff is still a bit much to put into an answer here. There is an excellent blog post by the Nylas team on this exact topic though.
The sampler below was lifted from the Nylas blog with some tweaks. After you start it, it fires an interrupt every millisecond and records the current call stack:
import collections
import signal
class Sampler(object):
def __init__(self, interval=0.001):
self.stack_counts = collections.defaultdict(int)
self.interval = interval
def start(self):
signal.signal(signal.VTALRM, self._sample)
signal.setitimer(signal.ITIMER_VIRTUAL, self.interval, 0)
def _sample(self, signum, frame):
stack = []
while frame is not None:
formatted_frame = '{}({})'.format(
frame.f_code.co_name,
frame.f_globals.get('__name__'))
stack.append(formatted_frame)
frame = frame.f_back
formatted_stack = ';'.join(reversed(stack))
self.stack_counts[formatted_stack] += 1
signal.setitimer(signal.ITIMER_VIRTUAL, self.interval, 0)
You inspect stack_counts to see what your program has been up to. This data can be plotted in a flame-graph which makes it really obvious to see in which code paths your program is spending the most time.
If i understand it right you want to learn which parts of your application is used most often by users.
TL;DR;
Use one of the metrics frameworks for python if you do not want to do it by hand. Some of them are above:
DataDog
Prometheus
Prometheus Python Client
Splunk
It is usually done by function level and it actually depends on application;
If it is a desktop app with internet access:
You can create a simple db and collect how many times your functions are called. For accomplish it you can write a simple function and call it inside every function that you want to track. After that you can define an asynchronous task to upload your data to internet.
If it is a web application:
You can track which functions are called from js (mostly preferred for user behaviour tracking) or from web api. It is a good practice to start from outer to go inner. First detect which end points are frequently called (If you are using a proxy like nginx you can analyze server logs to gather information. It is the easiest and cleanest way). After that insert a logger to every other function that you want to track and simply analyze your logs for every week or month.
But you want to analyze your production code line by line (it is a very bad idea) you can start your application with python profilers. Python has one already: cProfile.
Maybe make a text file and through your every program method just append some text referenced to it like "Method one executed". Run the web application like 10 times thoroughly as a viewer would and after this make a python program that reads the file and counts a specific parts of it or maybe even a pattern and adds it to a variable and outputs the variables.

How to tell if WikiCorpus from gensim is working?

I downloaded the full wikipedia archive 14.9gb and I am running thise line of code:
wiki = WikiCorpus("enwiki-latest-pages-articles.xml.bz2")
My code doesn't seem to be getting past here and it has been running for an hour now, I understand that the target file is massive, but I was wondering how I could tell it is working, or what is the expected time for it to complete?
You can often use an OS-specific monitoring tool, such as top on Linux/Unix/MacOS systems, to get an idea whether your Python process is intensely computing, using memory, or continuing with IO.
Even the simple vocabulary-scan done when 1st instantiating WikiCorpus may take a long time, to both decompress and tokenize/tally, so I wouldn't be surprised by a runtime longer than hour. (And if it's relying on any virtual-memory/swapping during this simple operation, as may be clear from the output of top or similar monitoring, that'd slow things down even more.)
As a comparative baseline, you could time how long decompression-only takes with a shell command like:
% time bzcat enwiki-latest-pages-articles.xml.bz2 | wc
(A quick test on my MacBook Pro suggests 15GB of BZ2 data might take 30-minutes-plus just to decompress.)
In some cases, turning on Python logging at the INFO level will display progress information with gensim modules, though I'm not sure WikiCorpus shows anything until it finishes. Enabling INFO-level logging can be as simple as:
import logging
logging.getLogger().setLevel(logging.INFO)

Logging every step and action by Python in a Large Script

I have created a large python script at the end. But now I need a logger for it. I have input steps, prompts.. Function calls.. While Loops.., etc. in the script.
And also, the logger is have to log success operations too.
I couldn't find a suitable answer for me. I'm searching on the internet again, and wanted to ask you too.
Whats your opinion?
Thanks
There's a module logging in the standard library. Basic usage is very simple; in every module that needs to do logging, put
logger = logging.getLogger(__name__)
and log with, e.g.,
logger.info("Doing something interesting")
logger.warn("Oops, something's not right")
Then in the main module, put something like
logging.basicConfig(level=logging.INFO)
to print all logs with a severity of INFO or worse to standard error. The module is very configurable, see its documentation for details.

Threading with PyGTK

To begin, I must say that I have searched for quite a long time on this subject and I probably know of most basic resources. I am attempting to use this: https://github.com/woodenbrick/gtkPopupNotify to add a system of notifications to a previously all command line program. Sadly, this usually will hang due to the fact that I perform lots of sleep operations, etc. I would assume it would work if I could get a system of threading in place. Essentially, all I want is to make a notification that doesn't interfere with any other operations of the program including other PyGTK components. Functions to make these notifications at the moment are looking like this for me:
def showMessage(title, message):
notifier1 = gtkPopupNotify.NotificationStack(timeout=4)
notifier1.bg_color = gtk.gdk.Color("black")
notifier1.fg_color = gtk.gdk.Color("white")
notifier1.edge_offset_x = 5-27 #-27 for odd bugginess
notifier1.edge_offset_y = 5
notifier1.new_popup(title=title, message=message)
Any help would be greatly appreciated as I am becoming really fed up with this problem.
With PyGTK, I highly recommend avoiding threads altogether. The GTK libraries aren't fully thread-safe and, under Win-32, they don't support threads at all. So, trying to work with them ends up being a pain. You can get some really nice results by "faking it" using Python generators and the gobject.idle_add() method
As an alternative to coding it yourself, you can also just use Zenity, which is a Gnome program for launching notification dialogs from the command line. This should be thread-safe.
import subprocess
subprocess.call(["zenity", "--notification", "--text=You have been notified"])

Complete log management (python)

Similar questions have been asked, but I have not come across an easy-to-do-it way
We have some application logs of various kinds which fill up the space and we face other unwanted issues. How do I write a monitoring script(zipping files of particular size, moving them, watching them, etc..) for this maintenance? I am looking for a simple solution(as in what to use?), if possible in python or maybe just a shell script.
Thanks.
The "standard" way of doing this (atleast on most Gnu/Linux distros) is to use logrotate. I see a /etc/logrotate.conf on my Debian machine which has details on which files to rotate and at what frequency. It's triggered by a daily cron entry. This is what I'd recommend.
If you want your application itself to do this (which is a pain really since it's not it's job), you could consider writing a custom log handler. A RotatingFileHandler (or TimedRotatingFileHandler) might work but you can write a custom one.
Most systems are by default set up to automatically rotate log files which are emitted by syslog. You might want to consider using the SysLogHandler and logging to syslog (from all your apps regardless of language) so that the system infrastructure automatically takes care of things for you.
Use logrotate to do the work for you.
Remember that there are few cases where it may not work properly, for example if the logging application keeps the log file always open and is not able to resume it if the file is removed and recreated.
Over the years I encountered few applications like that, but even for them you could configure logrotate to restart them when it rotates the logs.

Categories