Memory leak in Google App Engine / Datastore / Flask / Python app

Memory leak in Google App Engine / Datastore / Flask / Python app - python

I have built a simple news aggregator site, in which the memory usage of all my App Engine instances keep growing until reaching the limit and therefore being shut down.
I have started to eliminate everything from my app to arrive at a minimal reproducible version. This is what I have now:
app = Flask(__name__)
datastore_client = datastore.Client()
#app.route('/')
def root():
query = datastore_client.query(kind='source')
query.order = ['list_sequence']
sources = query.fetch()
for source in sources:
pass
Stats show a typical saw-tooth pattern: at instance startup, it goes to 190 - 210 Mb, then upon some requests, but NOT ALL requests, memory usage increases by 20 - 30 Mb. (This, by the way, roughly corresponds to the estimated size of the query results, although I cannot be sure this is relevant info.) This keeps happening until it exceeds 512 Mb, when it is shut down. It usually happens at around the 50th - 100th request to "/". No other requests are made to anything else in the meantime.
Now, if I eliminate the "for" cycle, and only the query remains, the problem goes away, the memory usage remains at 190 Mb flat, no increase even after 100+ requests.
gc.collect() at the end does not help. I have also tried looking at the difference in tracemalloc stats at the beginning and end of the function, I have not found anything useful.
Has anyone experienced anything similar, please? Any ideas what might go wrong here? What additional tests / investigations can you recommend? Is this possibly a Google App Engine / Datastore issue I have no control of?
Thank you.

Now, if I eliminate the "for" cycle, and only the query remains, the problem goes away, the memory usage remains at 190 Mb flat, no increase even after 100+ requests.
query.fetch() returns an iterator, not an actual array of the results
https://googleapis.dev/python/datastore/latest/queries.html#google.cloud.datastore.query.Query.fetch
Looking at the source code, it looks like this iterator has code for fetching the next pages of the query. So you're for-loop forces it to fetch all the pages of the results. In fact I don't think it actually fetches anything until you start iterating. So that would be why removing your for-loop would make a difference
Unfortunately beyond that I'm not sure, since as you dig through the source code you pretty quickly run into the GRPC stubs and it's unclear if the problem is in there.
There is this question that is similar to yours, where the asker found a memory leak involved with instantiating datastore.Client().
How should I investigate a memory leak when using Google Cloud Datastore Python libraries?
This ultimately got linked to an issue in GRPC where GRPC would leak if it doesn't get closed
https://github.com/grpc/grpc/issues/22123
Hopefully, this points you in the right direction

#Alex in the other answer did a pretty good research, so I will follow up with this recommendation: try using the NDB Library. All calls with this library have to be wrapped into a context manager, which should guarantee cleaning up after closing. That could help fix your problem:
ndb_client = ndb.Client(**init_client)
with ndb_client.context():
query = MyModel.query().order(MyModel.my_column)
sources = query.fetch()
for source in sources:
pass
# if you try to query DataStore outside the context manager, it will raise an error
query = MyModel.query().order(MyModel.my_column)

Related

Managing Heroku RAM for Unique Application

I have a Flask application that allows users to query a ~small database (2.4M rows) using SQL. It's similar to a HackerRank but more limited in scope. It's deployed on Heroku.
I've noticed during testing that I can predictably hit an R14 error (memory quota exceeded) or R15 (memory quota greatly exceeded) by running large queries. The queries that typically cause this are outside what a normal user might do, such as SELECT * FROM some_huge_table. That said, I am concerned that these errors will become a regular occurrence for even small queries when 5, 10, 100 users are querying at the same time.
I'm looking for some advice on how to manage memory quotas for this type of interactive site. Here's what I've explored so far:
Changing the # of gunicorn workers. This has had some effect but I still hit R14 and R15 errors consistently.
Forced limits on user queries, based on either text or the EXPLAIN output. This does work to reduce memory usage, but I'm afraid it won't scale to even a very modest # of users.
Moving to a higher Heroku tier. The plan I use currently provides ~512MB RAM. The largest plan is around 14GB. Again, this would help but won't even moderately scale, to say nothing of the associated costs.
Reducing the size of the database significantly. I would like to avoid this if possible. Doing the napkin math on a table with 1.9M rows going to 10k or 50k, the application would have greatly reduced memory needs and will scale better, but will still have some moderate max usage limit.
As you can see, I'm a novice at best when it comes to memory management. I'm looking for some strategies/ideas on how to solve this general problem, and if it's the case that I need to either drastically cut the data size or throw tons of $ at this, that's OK too.
Thanks

Coming from my personal experience, I see two approaches:
1. plan for it
Coming from your example, this means you try to calculate the maximum memory that the request would use, multiply it by the number of gunicorn workers, and use dynos big enough.
With a different example this could be valid, I don't think it is for you.
2. reduce memory usage, solution 1
The fact that too much application memory is used makes me think that likely in your code you are loading the whole result-set into memory (probably even multiple times in multiple formats) before returning it to the client.
In the end, your application is only getting the data from the database and converting it to some output format (JSON/CSV?).
What you are probably searching for is streaming responses.
Your Flask-view will work on a record-by-record base. It will read a single record, convert it to your output format, and return a single record.
Both your database client library and Flask will support this (on most databases it is called cursors / iterators).
2. reduce memory usage, solution 2
other services often go for simple pagination or limiting resultsets to manage server-side memory.
security sidenote
it sounds like the users can actually define the SQL statement in their API requests. This is a security and application risk. Apart from doing INSERT, UPDATE, or DELETE statements, the user could create a SQL statement that will not only blow your application memory, but also break your database.

How does app engine (python) manage memory across requests (Exceeded soft private memory limit)

I'm experiencing occasional Exceeded soft private memory limit error in a wide variety of request handlers in app engine. I understand that this error means that the RAM used by the instance has exceeded the amount allocated, and how that causes the instance to shut down.
I'd like to understand what might be the possible causes of the error, and to start, I'd like to understand how app engine python instances are expected to manage memory. My rudimentary assumptions were:
An F2 instance starts with 256 MB
When it starts up, it loads my application code - lets say 30 MB
When it handles a request it has 226 MB available
so long as that request does not exceed 226 MB (+ margin of error) the request completes w/o error
if it does exceed 226 MB + margin, the instance completes the request, logs the 'Exceeded soft private memory limit' error, then terminates - now go back to step 1
When that request returns, any memory used by it is freed up - ie. the unused RAM goes back to 226 MB
Step 3-4 are repeated for each request passed to the instance, indefinitely
That's how I presumed it would work, but given that I'm occasionally seeing this error across a fairly wide set of request handlers, I'm now not so sure. My questions are:
a) Does step #4 happen?
b) What could cause it not to happen? or not to fully happen? e.g. how could memory leak between requests?
c) Could storage in module level variables causes memory usage to leak? (I'm not knowingly using module level variables in that way)
d) What tools / techniques can I use to get more data? E.g. measure memory usage at entry to request handler?
In answers/comments, where possible, please link to the gae documentation.
[edit] Extra info: my app is congifured as threadsafe: false. If this has a bearing on the answer, please state what it is. I plan to change to threadsafe: true soon.
[edit] Clarification: This question is about the expected behavior of gae for memory management. So while suggestions like 'call gc.collect()' might well be partial solutions to related problems, they don't fully answer this question. Up until the point that I understand how gae is expected to behave, using gc.collect() would feel like voodoo programming to me.
Finally: If I've got this all backwards then I apologize in advance - I really cant find much useful info on this, so I'm mostly guessing..

App Engine's Python interpreter does nothing special, in terms of memory management, compared to any other standard Python interpreter. So, in particular, there is nothing special that happens "per request", such as your hypothetical step 4. Rather, as soon as any object's reference count decreases to zero, the Python interpreter reclaims that memory (module gc is only there to deal with garbage cycles -- when a bunch of objects never get their reference counts down to zero because they refer to each other even though there is no accessible external reference to them).
So, memory could easily "leak" (in practice, though technically it's not a leak) "between requests" if you use any global variable -- said variables will survive the instance of the handler class and its (e.g) get method -- i.e, your point (c), though you say you are not doing that.
Once you declare your module to be threadsafe, an instance may happen to serve multiple requests concurrently (up to what you've set as max_concurrent_requests in the automatic_scaling section of your module's .yaml configuration file; the default value is 8). So, your instance's RAM will need be a multiple of what each request needs.
As for (d), to "get more data" (I imagine you actually mean, get more RAM), the only thing you can do is configure a larger instance_class for your memory-hungry module.
To use less RAM, there are many techniques -- which have nothing to do with App Engine, everything to do with Python, and in particular, everything to do with your very specific code and its very specific needs.
The one GAE-specific issue I can think of is that ndb's caching has been reported to leak -- see https://code.google.com/p/googleappengine/issues/detail?id=9610 ; that thread also suggests workarounds, such as turning off ndb caching or moving to old db (which does no caching and has no leak). If you're using ndb and have not turned off its caching, that might be the root cause of "memory leak" problems you're observing.

Point 4 is an invalid asumption, Python's garbage collector doesn't return the memory that easily, Python's program is taking up that memory but it's not used until garbage collector has a pass. In the meantime if some other request requires more memory - new might be allocated, on top the memory from the first request. If you want to force Python to garbage collect, you can use gc.collect() as mentioned here

Take a look at this Q&A for approaches to check on garbage collection and for potential alternate explanations: Google App Engine DB Query Memory Usage

How to autoload a little config data read frequently but nearly never being updated?

We have a little data which almost won't be updated but read frequently (site config and some selection items like states and counties information), I think if I can move it to application memory instead of any database, our I/O performance would get a big improvement.
But we have a lot of web servers, I cannot figure out a good solution how to notice all the servers to reload these data.

You are likely looking for a cache pattern: Is there a Python caching library? You just need to ask how stale you can afford to be. If it was looking this up on every request, even a short-lived cache can massively improve performance. It's likely though that this information can live for minutes or hours without too much risk of being "stale".
If you can't live with a stale cache, I've implemented solutions that have a single database call, which keeps track of the last updated date for any of the cached data. This at least reduces the cache lookups to a single database call.
Be aware though, as soon as you are sharing updateable information, you have to deal with multi-threaded updates of shared state. Make sure you understand the implications of this. Hopefully your caching library handles this gracefully.

Highly variable performance on datastore and memcache operations (GAE)

I am trying to optimize performance on GAE but once I deploy I get very unstable results. It's really hard to see if each optimization actually works because datastore and memcache operations take a very variable time (it ranges from milliseconds to seconds for the same operations).
For these tests I am the only one making only one request on the application by refreshing the homepage. There is no other people/traffic happening (besides my own browser requesting images/css/js files from the page).
Edit: To make sure that the drops were not due to concurrent requests from the browser (images/css/js), I've redone the test by requesting ONLY the page with urllib2.urlopen(). Problem persists.
My questions are:
1) Is this something to expect due to the fact that machines/resources are shared?
2) What are the most common cases where this behavior can happen?
3) Where can I go from there?
Here is a very slow datastore get (memcache was just flushed):
Full size
Here is a very slow memcache get (things are cached because of the previous request):
Full size
Here is a slow but faster memcache get (same repro step as the previous one, different calls are slow):
Full size

To answer your questions,
1) yes, you can expect variance in remote calls because of the shared network;
2) the most common place you will see variance is in datastore requests -- the larger/further the request, the more variance you will see;
3) here are some options for you:
It looks like you are trying to fetch large amounts of data from the datastore/memcache. You may want to re-think the queries and caches so they retrieve smaller chunks of data. Does your app need all that data for a single request?
If the app really needs to process all that data on every request, another option is to preprocess it with a background task (cron, task queue, etc.) and put the results into memcache. The request that serves up the page should simply pick the right pieces out of the memcache and assemble the page.
#proppy's suggestion to use NDB is a good one. It takes some work to rewrite serial queries into parallel ones, but the savings from async calls can be huge. If you can benefit from parallel tasks (using map), all the better.

How to log memory usage of an Django app per request

Do you know about an efficient way to log memory usage of a django app per request ?
I have an apache/mod_wsgi/django stack, which runs usually well, but sometimes one process ends up eating a huge lot of memory. The servers ends up being short on mem, swapping a lot, and services are dramatically slowed down.
This situation is quite hard to fix because I don't know which request is to be blamed for this behavior, I can't reproduce it.
I'd like to have something deployed in production which logs the memory usage of the process before and after each request, with minimal overhead.
Before I start reinventing the wheel, do the community of my fellow djangoists know any existing solution to address this problem ?
Advices, middleware, snippet or maybe apache log configuration appreciated.
What (I think) I don't need is:
a set of dev-stage profiling/debugging tools, I already know some and I'd use them if I knew what to profile/debug, it looks a little bit too much to be forever monitoring services running in production. On top of that, what is usually displayed by those tol is a mem usage report of the code shred to pieces, It would really be helpful to just pinpoint the faulty request.
generic advices on how to optimize mem usage of a django app, well it's always good to read, but the idea here is rather «how to efficiently track down requests which need to be optimized».
My closest search results:
Django / WSGI - How to profile partial request? My profiling tools are per-request but app runs out of memory before then
Django memory usage going up with every request
Average php memory usage per request?

A Django middleware for tracking memory usage and generating a usable result immediately, needs to hook both process request and process response. In other words, look at difference between start and finish of request and log a warning if exceeds some threshold.
A complete middleware example is:
import os
import psutil
import sys
THRESHOLD = 2*1024*1024
class MemoryUsageMiddleware(object):
def process_request(self, request):
request._mem = psutil.Process(os.getpid()).memory_info()
def process_response(self, request, response):
mem = psutil.Process(os.getpid()).memory_info()
diff = mem.rss - request._mem.rss
if diff > THRESHOLD:
print >> sys.stderr, 'MEMORY USAGE %r' % ((diff, request.path),)
return response
This requires the 'psutil' module to be installed for doing memory calculation.
Is brute force and can lead to false positives in a multithread system. Because of lazy loading, you will also see it trigger on first few requests against new process as stuff loads up.

This may not fully cover your question, but I recommend trying nginx+uwsgi instead of apache2+mod_wsgi. In my tests it turned out to be much more stable (mod_wsgi choked at some point completely), much faster and uses a lot less memory (it may just fix all your issues altogether).
About tracking memory usage, you can create a simple middleware:
class SaveMemoryUsageMiddleware(object):
def process_response(self, request, response):
# track memory usage here and append to file or db
return response
and add it to your middlewares.
For memory tracking code I recommend checking out:
Total memory used by Python process?
However, it would probably be better if you could avoid doing this on production. Just for dev and tests to track down real problem.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.