Similar questions have been asked, but I have not come across an easy-to-do-it way
We have some application logs of various kinds which fill up the space and we face other unwanted issues. How do I write a monitoring script(zipping files of particular size, moving them, watching them, etc..) for this maintenance? I am looking for a simple solution(as in what to use?), if possible in python or maybe just a shell script.
Thanks.
The "standard" way of doing this (atleast on most Gnu/Linux distros) is to use logrotate. I see a /etc/logrotate.conf on my Debian machine which has details on which files to rotate and at what frequency. It's triggered by a daily cron entry. This is what I'd recommend.
If you want your application itself to do this (which is a pain really since it's not it's job), you could consider writing a custom log handler. A RotatingFileHandler (or TimedRotatingFileHandler) might work but you can write a custom one.
Most systems are by default set up to automatically rotate log files which are emitted by syslog. You might want to consider using the SysLogHandler and logging to syslog (from all your apps regardless of language) so that the system infrastructure automatically takes care of things for you.
Use logrotate to do the work for you.
Remember that there are few cases where it may not work properly, for example if the logging application keeps the log file always open and is not able to resume it if the file is removed and recreated.
Over the years I encountered few applications like that, but even for them you could configure logrotate to restart them when it rotates the logs.
Related
We use Python(2.7)/Django(1.8.1) and Gunicorn(19.4.5) for our web application and supervisor(3.0) to monitor it. I have recently encountered 2 issues in logging:
Django was logging into previous day logs(We have log rotation enabled)
Django was not logging anything at all.
The first scenario is understandable where the log rotation changed the file but Django was not updated.
The second scenario fixed when I restarted the supervisor process. Which led me to believe again the file descriptor was not updated in the django process.
I came by this SO thread which states:
Each child is an independent process, and file handles in the parent
may be closed in the child after a fork (assuming POSIX). In any case,
logging to the same file from multiple processes is not supported.
So I have few questions:
My gunicorn has 4 child processes and if one of them fails while
writing to a log file will the other child process won't be able to
use it? and how to debug these kind of scenarios?
Personally I found debugging errors in python logging module to be
difficult. Can some one point how to debug errors such as this or is
there any way I can monkey patch logging to not fail silently?*(Kindly read update section)*
I have seen Django LogRotation causes the Issue type 1 as explained above and not some script scheduled via cron. So what is preferable?
Note: The logging config is not a problem. I have already spent fair amount of time trying to figure that out. Also if the config is the issue Django will not write log files after a process restart.
Update:
For my second question I see that logging modules provides an option to raiseExceptions on failure although this is discourages in production environment. Documentation here. So now my question becomes how do I set this in Django?
I felt like closing this question. Bit awkward and seems stupid after 2 months. But I guess being stupid is part of the learning and want this to be as a reference for people who stumble across this.
Scenario 1: Django on using TimedRotatingFileHandler seems not to update the file descriptor some times and hence writes to old log files unless we restart the supervisor. We are yet to find the reason for this behaviour and update the reason if found. For now we are using WatchedFileHandler and then using logrotate utility to rotate the logs.
Scenario 2: This is the stupid question. When I was logging with some string formatting I forgot to give enough variables which is why the logger was erring. But this didn't get propagated. But locally when I was testing I found that logging module was actually throwing that error but silently and any logs after it in the module were not getting printed. Lessons learns from this scenario were:
If there is a problem in logging find out if the string formatting does not err
Using log.debug('example: {msg}'.format(msg=msg)) of python instead of log.debug('example: %s', msg).
I am running my python application in apache environment and using timedRotatingFileHandler to log.
I have setup logger in a way that It is supposed to rotate midnight everyday. My all processes writes into the same logger file. Somehow logger is missing to log info at times. And sometimes I see logger writing into two files (old file and rotated file) at the same time.
I couldn't able to understand why is this happening? Doesn't TimedrotatingFileHandler work in multiprocess enivironment? If not why is that so?
Please help me to understand..
You can't log (and rotate) to the same file from multiple processes naively because OS wouldn't know how to serialize the write and rotate instructions from 2 different processes. What you are experiencing is known as a race condition as 2 processes are competing to write to the same file and close it and open with a new file handle at the same time at rotation time. Only 1 process will win a new file handle when you rotate, so that may explain the missing log event.
Here's a recipe from Python's documentation with hints about how to log to the same place.
http://docs.python.org/howto/logging-cookbook.html#logging-to-a-single-file-from-multiple-processes
Essentially you will want to have a separate process listening to logging events coming from multiple places and then that process will log the events to a single file. You can configure rotation in that listener process too.
If you are not sure how to write this, you can try using a package such as Sentry or Facebook's Scribe. I recommend Sentry because Scribe is not trivial to setup.
Sorry wasn't sure how to best word this question.
My scenario is that I have some python code (on a linux machine) that uses an xml file to acquire its arguements to perform a task, on completion of the task it disposes of the xml file and waits for another xml file to arrive to do it all over again.
I'm trying to find out the best way to be alerted an xml file has arrived in a specified folder.
On way would be to continually monitor the folder in the Python code, but that would mean a lot of excess resourses used while waiting for something to turn up (which may be as little as a few times a day). Another way, would be to set up a cronjob, but it's efficiency would't be any better than monitoring from within the code. An option I was hoping was possible would be to set up some sort of interrupt that would alert the code when an xml file appeared.
Any thoughts?
Thanks.
If you're looking for something "easy" to just run a specific script when new files arrive, the incron daemon provides a very handy combination of inotify(7) and cron(8)-like support for executing programs on demand.
If you want something a little better integrated into your application, or if you can't afford the constant fork(2) and execve(2) of the incron approach, then you should probably use the inotify(7) interface directly in your script. The pyinotify module can integrate with the underlying inotify(7) interfaces.
I work off a server along with a number of other people, who, because of unexplained occurrences, change files, symlinks, etc. Unfortunately, all have the same level of file system rights. Is there a straightforward way to capture activity: login/logout time (I know the 'last' command shows this), files changed (deleted, added, etc.), and symlinks created, changed or deleted? I'm wondering if it's more straight forward to do something like this in bash or Python, and which direction to go? Thanks for all help.
First, you really should lock your user accounts down on your server. But if you really want to monitor activity within the file system, you've got a couple of options.
Write a script to parse their bash history on logout, and save the log out time (poor)
Install iwatch, then point it at the locations to monitor, and write a script to record pertinent information whenever something changes to a log file.
If you want to monitor user activity, as if you are watching over their shoulder, that's a little harder. Especially because some things, such as sftp, don't actually make any entries in bash_history. It's an entirely different subsystem.
The best thing to do is just monitor the areas of the filesystem that the users have access to for changes and log them, which is another reason to lock down your users. Linux has user folders for a reason. Everybody gets their own sandbox so that they don't touch each other.
I am writing a Filemanager in (wx)python - a lot already works. When copying files there is already a progress dialog, overwrite handling etc.
Now in Vista when the user wants to copy a file to certain directories (eg %Program Files%) the application/script needs elevation, which cannot be asked for at runtime. So i have to start another app/script elevated, which does the work, but needs to communicate with the main app, so latter can update the progress etc.
I searched and found a lot of articles saying shared memory and pipes are the easiest way. So what i am looking for is a 'high level' platform independent ipc library whith python bindings using shared mem or pipes.
I already found ominORB, fnorb, etc. They look very interesting, but use TCP/IP, is there an equivalent lib using shared mem or pipes ? Since the ipc-client is always on the same machine sockets seems not to be neccesary here. And i am also afraid the user would have to allow ipc-socket-communications on his/her personal firewall.
EDIT: I really mean high level: it would be great to be able to just call some functions like when using omniORB instead of sending strings to stdin/stdout.
How about just communicating with the second process using stdin/stdout?
There are some caveats due to input and output buffering, but take a look at this Python Cookbook recipe, and also Pexpect, for ideas on how to do this.