How to log memory usage of an Django app per request

How to log memory usage of an Django app per request - python

Do you know about an efficient way to log memory usage of a django app per request ?
I have an apache/mod_wsgi/django stack, which runs usually well, but sometimes one process ends up eating a huge lot of memory. The servers ends up being short on mem, swapping a lot, and services are dramatically slowed down.
This situation is quite hard to fix because I don't know which request is to be blamed for this behavior, I can't reproduce it.
I'd like to have something deployed in production which logs the memory usage of the process before and after each request, with minimal overhead.
Before I start reinventing the wheel, do the community of my fellow djangoists know any existing solution to address this problem ?
Advices, middleware, snippet or maybe apache log configuration appreciated.
What (I think) I don't need is:
a set of dev-stage profiling/debugging tools, I already know some and I'd use them if I knew what to profile/debug, it looks a little bit too much to be forever monitoring services running in production. On top of that, what is usually displayed by those tol is a mem usage report of the code shred to pieces, It would really be helpful to just pinpoint the faulty request.
generic advices on how to optimize mem usage of a django app, well it's always good to read, but the idea here is rather «how to efficiently track down requests which need to be optimized».
My closest search results:
Django / WSGI - How to profile partial request? My profiling tools are per-request but app runs out of memory before then
Django memory usage going up with every request
Average php memory usage per request?

A Django middleware for tracking memory usage and generating a usable result immediately, needs to hook both process request and process response. In other words, look at difference between start and finish of request and log a warning if exceeds some threshold.
A complete middleware example is:
import os
import psutil
import sys
THRESHOLD = 2*1024*1024
class MemoryUsageMiddleware(object):
def process_request(self, request):
request._mem = psutil.Process(os.getpid()).memory_info()
def process_response(self, request, response):
mem = psutil.Process(os.getpid()).memory_info()
diff = mem.rss - request._mem.rss
if diff > THRESHOLD:
print >> sys.stderr, 'MEMORY USAGE %r' % ((diff, request.path),)
return response
This requires the 'psutil' module to be installed for doing memory calculation.
Is brute force and can lead to false positives in a multithread system. Because of lazy loading, you will also see it trigger on first few requests against new process as stuff loads up.

This may not fully cover your question, but I recommend trying nginx+uwsgi instead of apache2+mod_wsgi. In my tests it turned out to be much more stable (mod_wsgi choked at some point completely), much faster and uses a lot less memory (it may just fix all your issues altogether).
About tracking memory usage, you can create a simple middleware:
class SaveMemoryUsageMiddleware(object):
def process_response(self, request, response):
# track memory usage here and append to file or db
return response
and add it to your middlewares.
For memory tracking code I recommend checking out:
Total memory used by Python process?
However, it would probably be better if you could avoid doing this on production. Just for dev and tests to track down real problem.

Related

Memory leak in Google App Engine / Datastore / Flask / Python app

I have built a simple news aggregator site, in which the memory usage of all my App Engine instances keep growing until reaching the limit and therefore being shut down.
I have started to eliminate everything from my app to arrive at a minimal reproducible version. This is what I have now:
app = Flask(__name__)
datastore_client = datastore.Client()
#app.route('/')
def root():
query = datastore_client.query(kind='source')
query.order = ['list_sequence']
sources = query.fetch()
for source in sources:
pass
Stats show a typical saw-tooth pattern: at instance startup, it goes to 190 - 210 Mb, then upon some requests, but NOT ALL requests, memory usage increases by 20 - 30 Mb. (This, by the way, roughly corresponds to the estimated size of the query results, although I cannot be sure this is relevant info.) This keeps happening until it exceeds 512 Mb, when it is shut down. It usually happens at around the 50th - 100th request to "/". No other requests are made to anything else in the meantime.
Now, if I eliminate the "for" cycle, and only the query remains, the problem goes away, the memory usage remains at 190 Mb flat, no increase even after 100+ requests.
gc.collect() at the end does not help. I have also tried looking at the difference in tracemalloc stats at the beginning and end of the function, I have not found anything useful.
Has anyone experienced anything similar, please? Any ideas what might go wrong here? What additional tests / investigations can you recommend? Is this possibly a Google App Engine / Datastore issue I have no control of?
Thank you.

Now, if I eliminate the "for" cycle, and only the query remains, the problem goes away, the memory usage remains at 190 Mb flat, no increase even after 100+ requests.
query.fetch() returns an iterator, not an actual array of the results
https://googleapis.dev/python/datastore/latest/queries.html#google.cloud.datastore.query.Query.fetch
Looking at the source code, it looks like this iterator has code for fetching the next pages of the query. So you're for-loop forces it to fetch all the pages of the results. In fact I don't think it actually fetches anything until you start iterating. So that would be why removing your for-loop would make a difference
Unfortunately beyond that I'm not sure, since as you dig through the source code you pretty quickly run into the GRPC stubs and it's unclear if the problem is in there.
There is this question that is similar to yours, where the asker found a memory leak involved with instantiating datastore.Client().
How should I investigate a memory leak when using Google Cloud Datastore Python libraries?
This ultimately got linked to an issue in GRPC where GRPC would leak if it doesn't get closed
https://github.com/grpc/grpc/issues/22123
Hopefully, this points you in the right direction

#Alex in the other answer did a pretty good research, so I will follow up with this recommendation: try using the NDB Library. All calls with this library have to be wrapped into a context manager, which should guarantee cleaning up after closing. That could help fix your problem:
ndb_client = ndb.Client(**init_client)
with ndb_client.context():
query = MyModel.query().order(MyModel.my_column)
sources = query.fetch()
for source in sources:
pass
# if you try to query DataStore outside the context manager, it will raise an error
query = MyModel.query().order(MyModel.my_column)

How to measure memory usage of a web request when using Werkzeug/Flask?

Is there a way to measure the amount of memory allocated by an arbitrary web request in a Flask/Werkzeug app? By arbitrary, I mean I'd prefer a technique that lets me instrument code at a high enough level that I don't have to change it to test memory usage of different routes. If that's not possible but it's still possible to do this by wrapping individual requests with a little code, so be it.
In a PHP app I wrote a while ago, I accomplished this by calling the memory_get_peak_usage() function both at the start and the end of the request and taking the difference.
Is there an analog in Python/Flask/Werkzeug? Using Python 2.7.9 if it matters.

First of all, one should understand the main difference between PHP and Python requests processing. Roughly speaking, each PHP worker accepts only one request, handle it and then die (or reinit interpreter). PHP was designed directly for it, it's request processing language by its nature. So, it's pretty simple to measure per request memory usage. Request's peak memory usage is equal to the worker peak memory usage. It's a language feature.
At the same time, Python usually uses another approach to handle requests. There are two main models - synchronous and asynchronous request processing. However, both of them have the same difficulty when it comes to measure per request memory usage. The reason is that one Python worker handles plenty of requests (concurrently or sequentially) during his life. So, it's hard to get memory usage exactly for a request.
However, one can adapt an underlying framework and application code to accomplish collecting memory usage task. One possible solution is to use some kind of events. For example, one can raise an abstract mem_usage event on: before request, at the beginning of a view function, at the end of a view function, in some important places within the business logic and so on. Then it should exists a subscriber for such events, doing the next thing:
import resource
mem_usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
This subscriber have to accumulate such usage data and on the app_request_teardown/after_request send it to the metrics collection system with information about current request.endpoint or route or whatever.
Also, using a memory profiler is a good idea, but usually not for a production usage.
Further reading about request processing models:
CGI
FastCGI
PHP specific

Another possible solution is to use sys.setrace. Using this tool one can measure memory usage even per each line of code. Usage examples can be found in the memory_profiler project. Of course, it will slowdown the code significantly.

Django / WSGI - How to profile partial request? My profiling tools are per-request but app runs out of memory before then

How can I profile my python / django application which is crashing on a single request after 100 seconds of hogging more memory?
All I see in top is that the wsgi process is consuming memory slowly until it crashes.
The only profiling techniques I know run on a full request/response cycle but I'm not able to finish a request. What then?
I might even run the dev server and try to kill it mid-request and see where the stack is.

A bit fiddly and will have some overhead, but you could use sys.setprofile() to provide a function to be called on entry and exit from functions and dump progress of calls out to a log file yourself, potentially with check of memory in use at the same time.
http://docs.python.org/dev/library/sys.html#sys.setprofile
Also perhaps check out heapy as a way of getting console type access into your live process to dump out memory/object usage.

web.py memory leak

Am I doing something wrong or does web.py leak memory?
import web
class Index:
def GET(self): return 'hello web.py'
app = web.application(('/*', 'Index'), globals())
app.run()
Run the above file. Watch how much memory the task uses. Go to localhost:8080 in your browser. Close the browser (to keep the page from getting cached), then open the page again, and see how the memory usage rises. It goes up every time you close the browser and re-visit the page.
Running python 2.6 on Win XP.

After running your code and sending it thousands of requests (via another Python process using urllib2), I find that it grows by about 200k over the course of the first few hundred requests and then stops growing. That doesn't seem unreasonable, and it needn't indicate a memory leak. Remember that Python uses automatic memory management via a combination of reference counting and garbage collection, so there's no guarantee that every bit of memory it uses is reusable the instant it's no longer in use; and it may request memory from the OS and then not return it even though it isn't needed any more.
So I think the answer is: You aren't doing anything wrong, but web.py doesn't leak memory.

Is there a better way to serve the results of an expensive, blocking python process over HTTP?

We have a web service which serves small, arbitrary segments of a fixed inventory of larger MP3 files. The MP3 files are generated on-the-fly by a python application. The model is, make a GET request to a URL specifying which segments you want, get an audio/mpeg stream in response. This is an expensive process.
We're using Nginx as the front-end request handler. Nginx takes care of caching responses for common requests.
We initially tried using Tornado on the back-end to handle requests from Nginx. As you would expect, the blocking MP3 operation kept Tornado from doing its thing (asynchronous I/O). So, we went multithreaded, which solved the blocking problem, and performed quite well. However, it introduced a subtle race condition (under real world load) that we haven't been able to diagnose or reproduce yet. The race condition corrupts our MP3 output.
So we decided to set our application up as a simple WSGI handler behind Apache/mod_wsgi (still w/ Nginx up front). This eliminates the blocking issue and the race condition, but creates a cascading load (i.e. Apache creates too many processses) on the server under real world conditions. We're working on tuning Apache/mod_wsgi right now, but still at a trial-and-error phase. (Update: we've switched back to Tornado. See below.)
Finally, the question: are we missing anything? Is there a better way to serve CPU-expensive resources over HTTP?
Update: Thanks to Graham's informed article, I'm pretty sure this is an Apache tuning problem. In the mean-time, we've gone back to using Tornado and are trying to resolve the data-corruption issue.
For those who were so quick to throw more iron at the problem, Tornado and a bit of multi-threading (despite the data integrity problem introduced by threading) handles the load acceptably on a small (single core) Amazon EC2 instance.

Have you tried Spawning? It is a WSGI server with a flexible assortment of threading modes.

Are you making the mistake of using embedded mode of Apache/mod_wsgi? Read:
http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html
Ensure you use daemon mode if using Apache/mod_wsgi.

You might consider a queuing system with AJAX notification methods.
Whenever there is a request for your expensive resource, and that resource needs to be generated, add that request to the queue (if it's not already there). That queuing operation should return an ID of an object that you can query to get its status.
Next you have to write a background service that spins up worker threads. These workers simply dequeue the request, generate the data, then saves the data's location in the request object.
The webpage can make AJAX calls to your server to find out the progress of the generation and to give a link to the file once it's available.
This is how LARGE media sites work - those that have to deal with video in particular. It might be overkill for your MP3 work however.
Alternatively, look into running a couple machines to distribute the load. Your threads on Apache will still block, but atleast you won't consume resources on the web server.

Please define "cascading load", as it has no common meaning.
Your most likely problem is going to be if you're running too many Apache processes.
For a load like this, make sure you're using the prefork mpm, and make sure you're limiting yourself to an appropriate number of processes (no less than one per CPU, no more than two).

It looks like you are doing things right -- just lacking CPU power: can you determine what is the CPU loading in the process of generating these MP3?
I think the next thing you have to do there is to add more hardware to render the MP3's on other machines. Or that or find a way to deliver pre-rendered MP3 (maybe you can cahce some of your media?)
BTW, scaling for the web was the theme of a Keynote lecture by Jacob Kaplan-Moss on PyCon Brasil this year, and it is far from being a closed problem. The stack of technologies one needs to handle is quite impressible - (I could not find an online copy o f the presentation, though - -sorry for that)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.