We use Python(2.7)/Django(1.8.1) and Gunicorn(19.4.5) for our web application and supervisor(3.0) to monitor it. I have recently encountered 2 issues in logging:
Django was logging into previous day logs(We have log rotation enabled)
Django was not logging anything at all.
The first scenario is understandable where the log rotation changed the file but Django was not updated.
The second scenario fixed when I restarted the supervisor process. Which led me to believe again the file descriptor was not updated in the django process.
I came by this SO thread which states:
Each child is an independent process, and file handles in the parent
may be closed in the child after a fork (assuming POSIX). In any case,
logging to the same file from multiple processes is not supported.
So I have few questions:
My gunicorn has 4 child processes and if one of them fails while
writing to a log file will the other child process won't be able to
use it? and how to debug these kind of scenarios?
Personally I found debugging errors in python logging module to be
difficult. Can some one point how to debug errors such as this or is
there any way I can monkey patch logging to not fail silently?*(Kindly read update section)*
I have seen Django LogRotation causes the Issue type 1 as explained above and not some script scheduled via cron. So what is preferable?
Note: The logging config is not a problem. I have already spent fair amount of time trying to figure that out. Also if the config is the issue Django will not write log files after a process restart.
Update:
For my second question I see that logging modules provides an option to raiseExceptions on failure although this is discourages in production environment. Documentation here. So now my question becomes how do I set this in Django?
I felt like closing this question. Bit awkward and seems stupid after 2 months. But I guess being stupid is part of the learning and want this to be as a reference for people who stumble across this.
Scenario 1: Django on using TimedRotatingFileHandler seems not to update the file descriptor some times and hence writes to old log files unless we restart the supervisor. We are yet to find the reason for this behaviour and update the reason if found. For now we are using WatchedFileHandler and then using logrotate utility to rotate the logs.
Scenario 2: This is the stupid question. When I was logging with some string formatting I forgot to give enough variables which is why the logger was erring. But this didn't get propagated. But locally when I was testing I found that logging module was actually throwing that error but silently and any logs after it in the module were not getting printed. Lessons learns from this scenario were:
If there is a problem in logging find out if the string formatting does not err
Using log.debug('example: {msg}'.format(msg=msg)) of python instead of log.debug('example: %s', msg).
Related
I spent hours to dig the behavior, first about those questions:
Atomicity of `write(2)` to a local filesystem
How can I synchronize -- make atomic -- writes on one file from from two processes?
How does one programmatically determine if "write" system call is atomic on a particular file?
What happens if a write system call is called on same file by 2 different processes simultaneously
http://article.gmane.org/gmane.linux.kernel/43445
It seems if we use 'O_APPEND' flag when opening file, it will always be ok to logging to same file from multiple processes, on linux. And i believe python surely use 'O_APPEND' flag in its logging module.
And from a small test:
#!/bin/env python
import os
import logging
logger = logging.getLogger('spam_application')
logger.setLevel(logging.DEBUG)
# create file handler which logs even debug messages
fh = logging.FileHandler('spam.log')
logger.addHandler(fh)
formatter = logging.Formatter(
'%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
for i in xrange(10000):
p = os.getpid()
logger.debug('Log line number %s in %s', i, p)
And i run it with:
./test.py & ./test.py & ./test.py & ./test.py &
I found there were nothing wrong in spam.log. This behavior may support the conclusion above.
But problems comming:
What does this mean here?
And what are the scenes to use this, just for file rotation?
At last, if two process are doing write on one same file, i mean they are invoking write(2) on the same file, who make sure the data from the two processes not interleave(kernel or filesystem?), and how.[NOTE: i just want to see in deep about the write syscall, any hit about this is welcome.]
EDIT1 :
Do this and this just exist there for compatibility between different os environments, like windows, linux, or mac?
EDIT2 :
One more test, feed 8KB strings to logging.debug each time. And this time i can see the "interleaving" behavior in spam.log.
This behavior is just what specified about PIPE_BUF in one page above. So it seems the behavior is clear on linux, using O_APPEND is ok if the size to write(2) is less then PIPE_BUF.
I digged deeper and deeper. Now I think it's clear about these facts:
With O_APPEND, parallel write(2) from multiple processes is ok. It's just order of lines are undetermined, but lines don't interleave or overwrite each other. And the size of data is by any amount, according to Niall Douglas's answer for Understanding concurrent file writes from multiple processes. I have tested about this "by any amount" on linux and have not found the upper limit, so i guess it's right.
Without O_APPEND, it will be a mess. Here is what POSIX says "This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control."
Now we come into python. The test i did in EDIT3, that 8K, i found it's origin. Python's write() use fwrite(3) in fact, and my python set a BUFF_SIZE here which is 8192. According to an answer from abarnert in Default buffer size for a file on Linux. This 8192 has a long story.
However, more information is welcome.
I would not rely on tests here. Weird things can only happen in race conditions, and exhibiting a race condition by test is almost non sense because the race condition is unlikely to occur. So it can work nicely for 1000 test runs and randomly break later in prod... The page you cite says:
logging to a single file from multiple processes is not supported, because there is no standard way to serialize access to a single file across multiple processes in Python
That does not means that it will break... it could even be safe in a particular implementation on a particular file system. It just means that it can break without any hope of fix on any other version of Python or on any other filesystem.
If you really want to make sure of it, you will have to dive into Python source code (for your version) to control how logging is actually implemented, and control whether is it safe on your file system. And you will always be threatened by the possibility that a later optimization in logging module breaks your assumptions.
IMHO that is the reason for the warning in Logging Cookbook, and the existence of a special module to allow concurrent logging to the same file. This last one does not rely on anything unspecified, but just uses explicit locking.
I have tried a similar code like this(I have tried in python 3)
import threading
for i in range(0,100000):
t1 = threading.Thread(target= funtion_to_call_logger, args=(i,))
t1.start()
This worked completely fine for me, similar issue is addressed here.
This took a lot of Cpu time but not memory.
EDIT:
Fine means all the requested things were logged but Order was missing.Hence Race condition still not fixed ,
I am running my python application in apache environment and using timedRotatingFileHandler to log.
I have setup logger in a way that It is supposed to rotate midnight everyday. My all processes writes into the same logger file. Somehow logger is missing to log info at times. And sometimes I see logger writing into two files (old file and rotated file) at the same time.
I couldn't able to understand why is this happening? Doesn't TimedrotatingFileHandler work in multiprocess enivironment? If not why is that so?
Please help me to understand..
You can't log (and rotate) to the same file from multiple processes naively because OS wouldn't know how to serialize the write and rotate instructions from 2 different processes. What you are experiencing is known as a race condition as 2 processes are competing to write to the same file and close it and open with a new file handle at the same time at rotation time. Only 1 process will win a new file handle when you rotate, so that may explain the missing log event.
Here's a recipe from Python's documentation with hints about how to log to the same place.
http://docs.python.org/howto/logging-cookbook.html#logging-to-a-single-file-from-multiple-processes
Essentially you will want to have a separate process listening to logging events coming from multiple places and then that process will log the events to a single file. You can configure rotation in that listener process too.
If you are not sure how to write this, you can try using a package such as Sentry or Facebook's Scribe. I recommend Sentry because Scribe is not trivial to setup.
Similar questions have been asked, but I have not come across an easy-to-do-it way
We have some application logs of various kinds which fill up the space and we face other unwanted issues. How do I write a monitoring script(zipping files of particular size, moving them, watching them, etc..) for this maintenance? I am looking for a simple solution(as in what to use?), if possible in python or maybe just a shell script.
Thanks.
The "standard" way of doing this (atleast on most Gnu/Linux distros) is to use logrotate. I see a /etc/logrotate.conf on my Debian machine which has details on which files to rotate and at what frequency. It's triggered by a daily cron entry. This is what I'd recommend.
If you want your application itself to do this (which is a pain really since it's not it's job), you could consider writing a custom log handler. A RotatingFileHandler (or TimedRotatingFileHandler) might work but you can write a custom one.
Most systems are by default set up to automatically rotate log files which are emitted by syslog. You might want to consider using the SysLogHandler and logging to syslog (from all your apps regardless of language) so that the system infrastructure automatically takes care of things for you.
Use logrotate to do the work for you.
Remember that there are few cases where it may not work properly, for example if the logging application keeps the log file always open and is not able to resume it if the file is removed and recreated.
Over the years I encountered few applications like that, but even for them you could configure logrotate to restart them when it rotates the logs.
This is related to this Configure Apache to recover from mod_python errors, although I've since stopped assuming that this has anything to do with mod_python. Essentially, I have a problem that I wasn't able to reproduce consistently and I wanted some feedback on whether the proposed solution seems likely and some potential ways to try and reproduce this problem.
The setup: a django-powered site would begin throwing errors after a few days of use. They were always ImportErrors or ImproperlyConfigured errors, which amount to the same thing, since the message always specified trouble loading some module referenced in the settings.py file. It was not generally the same class. I am using preforked apache with 8 forked children, and whenever this problem would come up, one process would be broken and seven would be fine. Once broken, every request (with Debug On in the apache conf) would display the same trace every time it served a request, even if the failed load is not relevant to the particular request. An httpd restart always made the problem go away in the short run.
Noted problems: installation and updates are performed via svn with some post-update scripts. A few .pyc files accidentally were checked into the repository. Additionally, the project itself was owned by one user (not apache, although apache had permissions on the project) and there was a persistent plugin that ended up getting backgrounded as root. I call these noted problems because they would be wrong whether or not I noticed this error, and hence I have fixed them. The project is owned by apache and the plugin is backgrounded as apache. All .pyc files are out of the repository, and they are all force-recompiled after each checkout while the server and plugin have been stopped.
What I want to know is
Do these configuration disasters seem like a likely explanation for sporadic ImportErrors?
If there is still a problem somewhere else in my code, how would I best reproduce it?
As for 2, my approach thus far has been to write some stress tests that repeatedly request the same page so as to execute common code paths.
Incidentally, this has been running without incident for about 2 days since the fix, but the problem was observed with 1 to 10 day intervals between.
"Do these configuration disasters seem like a likely explanation for sporadic ImportErrors"
Yes. An old .pyc file is a disaster of the first magnitude.
We develop on Windows, but run production on Red Hat Linux. An accidentally moved .pyc file is an absolute mystery to debug because (1) it usually runs and (2) it has a Windows filename for the original source, making the traceback error absolutely senseless. I spent hours staring at logs -- on linux -- wondering why the file was "C:\This\N\That".
"If there is still a problem somewhere else in my code, how would I best reproduce it?"
Before reproducing errors, you should try to prevent them.
First, create unit tests to exercise everything.
Start with Django's tests.py testing. Then expand to unittest for all non-Django components. Then write yourself a "run_tests" script that runs every test you own. Run this periodically. Daily isn't often enough.
Second, be sure you're using logging. Heavily.
Third, wrap anything that uses external resources in generic exception-logging blocks like this.
try:
some_external_resource_processing()
except Exception, e:
logger.exception( e )
raise
This will help you pinpoint problems with external resources. Files and databases are often the source of bad behavior due to permission or access problems.
At this point, you have prevented a large number of errors. If you want to run cyclic load testing, that's not a bad idea either. Use unittest for this.
class SomeLoadtest( unittest.TestCase ):
def test_something( self ):
self.connection = urllib2.urlopen( "localhost:8000/some/path" )
results = self.connection.read()
This isn't the best way to do things, but it shows one approach. You might want to start using Selenium to test the web site "from the outside" as a complement to your unittests.
I'm having a very peculiar problem in my Python FastCGI code - sys.stdout has a file descriptor of '-1', so I can't write to it.
I'm checking this at the first line of my program, so I know it's not any of my code changing it.
I've tried sys.stdout = os.fdopen(1, 'w'), but anything written there won't get to my browser.
The same application works without difficulty under Apache.
I'm using the Microsoft-provided FastCGI extension for IIS documented here: http://learn.iis.net/page.aspx/248/configuring-fastcgi-extension-for-iis60/
I am using these settings in fcgiext.ini:
ExePath=C:\Python23\python.exe
Arguments=-u C:\app\app_wsgi.py
FlushNamedPipe=1
RequestTimeout=45
IdleTimeout=120
ActivityTimeout=30
Can anyone tell what's wrong or tell me where I should look to find out?
All suggestions greatly appreciated...
Forgive me if this is a dumb question, but I notice this line in your config file:
Arguments=-u C:\app\app_wsgi.py
Are you running a WSGI application or a FastCGI app? There is a difference. In WSGI, writing to stdout isn't a good idea. Your program should have an application object that can be called with an environment dict and a start_response function (for more info, see PEP 333). At any rate, your application's method of returning will be to return an iterable object that contains the response body, not writing to stdout.
Either way, you should also consider using isapi-wsgi. I've never used it myself, but I hear good things about it.
Do you have to use FastCGI? If not, you may want to try a ISAPI WSGI method. I have had success using:
http://code.google.com/p/isapi-wsgi/
and have also used PyISAPIe in the past:
http://sourceforge.net/apps/trac/pyisapie
I believe having stdout closed/invalid is in accordance to the FastCGI spec:
The Web server leaves a single file
descriptor, FCGI_LISTENSOCK_FILENO,
open when the application begins
execution. This descriptor refers to a
listening socket created by the Web
server.
FCGI_LISTENSOCK_FILENO equals
STDIN_FILENO. The standard descriptors
STDOUT_FILENO and STDERR_FILENO are
closed when the application begins
execution. A reliable method for an
application to determine whether it
was invoked using CGI or FastCGI is to
call
getpeername(FCGI_LISTENSOCK_FILENO),
which returns -1 with errno set to
ENOTCONN for a FastCGI application.
On windows, it's possible to launch a proces without a valid stdin and stdout. For example, if you execute a python script with pythonw.exe, the stdout is invdalid and if you insist on writing to it, it will block after 140 characters or something.
Writing to another destination than stdout looks like the safest solution.
Following the PEP 333 you can try to log to environ['wsgi.errors'] which is usualy the logger of the web server itself when you use fastcgi. Of course this is only available when a request is called but not during application startup.
You can get an example in the pylons code: http://pylonshq.com/docs/en/0.9.7/logging/#logging-to-wsgi-errors