I am trying to optimize performance on GAE but once I deploy I get very unstable results. It's really hard to see if each optimization actually works because datastore and memcache operations take a very variable time (it ranges from milliseconds to seconds for the same operations).
For these tests I am the only one making only one request on the application by refreshing the homepage. There is no other people/traffic happening (besides my own browser requesting images/css/js files from the page).
Edit: To make sure that the drops were not due to concurrent requests from the browser (images/css/js), I've redone the test by requesting ONLY the page with urllib2.urlopen(). Problem persists.
My questions are:
1) Is this something to expect due to the fact that machines/resources are shared?
2) What are the most common cases where this behavior can happen?
3) Where can I go from there?
Here is a very slow datastore get (memcache was just flushed):
Full size
Here is a very slow memcache get (things are cached because of the previous request):
Full size
Here is a slow but faster memcache get (same repro step as the previous one, different calls are slow):
Full size
To answer your questions,
1) yes, you can expect variance in remote calls because of the shared network;
2) the most common place you will see variance is in datastore requests -- the larger/further the request, the more variance you will see;
3) here are some options for you:
It looks like you are trying to fetch large amounts of data from the datastore/memcache. You may want to re-think the queries and caches so they retrieve smaller chunks of data. Does your app need all that data for a single request?
If the app really needs to process all that data on every request, another option is to preprocess it with a background task (cron, task queue, etc.) and put the results into memcache. The request that serves up the page should simply pick the right pieces out of the memcache and assemble the page.
#proppy's suggestion to use NDB is a good one. It takes some work to rewrite serial queries into parallel ones, but the savings from async calls can be huge. If you can benefit from parallel tasks (using map), all the better.
Related
I have a Flask application that allows users to query a ~small database (2.4M rows) using SQL. It's similar to a HackerRank but more limited in scope. It's deployed on Heroku.
I've noticed during testing that I can predictably hit an R14 error (memory quota exceeded) or R15 (memory quota greatly exceeded) by running large queries. The queries that typically cause this are outside what a normal user might do, such as SELECT * FROM some_huge_table. That said, I am concerned that these errors will become a regular occurrence for even small queries when 5, 10, 100 users are querying at the same time.
I'm looking for some advice on how to manage memory quotas for this type of interactive site. Here's what I've explored so far:
Changing the # of gunicorn workers. This has had some effect but I still hit R14 and R15 errors consistently.
Forced limits on user queries, based on either text or the EXPLAIN output. This does work to reduce memory usage, but I'm afraid it won't scale to even a very modest # of users.
Moving to a higher Heroku tier. The plan I use currently provides ~512MB RAM. The largest plan is around 14GB. Again, this would help but won't even moderately scale, to say nothing of the associated costs.
Reducing the size of the database significantly. I would like to avoid this if possible. Doing the napkin math on a table with 1.9M rows going to 10k or 50k, the application would have greatly reduced memory needs and will scale better, but will still have some moderate max usage limit.
As you can see, I'm a novice at best when it comes to memory management. I'm looking for some strategies/ideas on how to solve this general problem, and if it's the case that I need to either drastically cut the data size or throw tons of $ at this, that's OK too.
Thanks
Coming from my personal experience, I see two approaches:
1. plan for it
Coming from your example, this means you try to calculate the maximum memory that the request would use, multiply it by the number of gunicorn workers, and use dynos big enough.
With a different example this could be valid, I don't think it is for you.
2. reduce memory usage, solution 1
The fact that too much application memory is used makes me think that likely in your code you are loading the whole result-set into memory (probably even multiple times in multiple formats) before returning it to the client.
In the end, your application is only getting the data from the database and converting it to some output format (JSON/CSV?).
What you are probably searching for is streaming responses.
Your Flask-view will work on a record-by-record base. It will read a single record, convert it to your output format, and return a single record.
Both your database client library and Flask will support this (on most databases it is called cursors / iterators).
2. reduce memory usage, solution 2
other services often go for simple pagination or limiting resultsets to manage server-side memory.
security sidenote
it sounds like the users can actually define the SQL statement in their API requests. This is a security and application risk. Apart from doing INSERT, UPDATE, or DELETE statements, the user could create a SQL statement that will not only blow your application memory, but also break your database.
I am building a website having in my mind that hundreds (I wish thousands!) of 'get' queries -per day- will be cached for a couple of months in the filesystem.
Reading the cache documentation, however, I observe that the default values lean towards a small and fast cache cycle.
An old post describes that a strategy like the one I imagine, wrecked havoc in their servers.
Of course, the current django code seems to have evolved since 2012. However the cache defaults still remain the same...
I wonder whether I am on the right track or not.
My familiarity with caching is restricted in enjoying the W3 Total Cache results after saving thousands of files in the relevant directories without understanding anything but its basic settings.
How would an experienced developer approach "stage 1" of this task:
Without the budget -yet- to support solutions based on Redis (for example) (Not a valid argument)
How would you cache a normally augmenting number of queries -capable to form a bulk- for a long period of time, running on rather basic server resources?
Django's cache backend *should be implementation agnostic. For example, if you want to start with filesystem cache or redis cache or memcache it shouldn't really matter to django.
I can think of a couple issues with your approach:
how fast is your dataset growing? If you have pretty stable sized dataset, it shoudn't matter if the cache entries are long-lived.
how will you invalidate your queries? if the queries are being cached for months, it suggests that data does not change; cache invalidation is a big thing to consider, clients shouldn't see stale data.
are you using Filesystem cache? if data is being cached per server, are requests being consitantly assigned to the same servers? If not then multiple servers can have duplicate caches, this is one of the benefits of using a centralized cache (redis/memcache)
you should be able to calculate a pretty good estimate based on your current dataset size, how much data you'd like to cache, and the rate of growth of your data on how large of cache you'd need. I feel like a shared cache will go very far, and can be ran on "basic server" resources.
For stage 1, i would:
choose a shared cache, either redis or memcached, this should be a lot less painful when you start to scale to multi server setups
estimate how much data you will need to cache, and what sort of data size growth you predict, to make sure your cache is of an appropriate size.
I feel like cache invalidation is usually not a set policy on how long the data should persist in the cache, it is governed by when your data changes, which should force invalidate the cache so that clients don't see stale data
Is there a way to measure the amount of memory allocated by an arbitrary web request in a Flask/Werkzeug app? By arbitrary, I mean I'd prefer a technique that lets me instrument code at a high enough level that I don't have to change it to test memory usage of different routes. If that's not possible but it's still possible to do this by wrapping individual requests with a little code, so be it.
In a PHP app I wrote a while ago, I accomplished this by calling the memory_get_peak_usage() function both at the start and the end of the request and taking the difference.
Is there an analog in Python/Flask/Werkzeug? Using Python 2.7.9 if it matters.
First of all, one should understand the main difference between PHP and Python requests processing. Roughly speaking, each PHP worker accepts only one request, handle it and then die (or reinit interpreter). PHP was designed directly for it, it's request processing language by its nature. So, it's pretty simple to measure per request memory usage. Request's peak memory usage is equal to the worker peak memory usage. It's a language feature.
At the same time, Python usually uses another approach to handle requests. There are two main models - synchronous and asynchronous request processing. However, both of them have the same difficulty when it comes to measure per request memory usage. The reason is that one Python worker handles plenty of requests (concurrently or sequentially) during his life. So, it's hard to get memory usage exactly for a request.
However, one can adapt an underlying framework and application code to accomplish collecting memory usage task. One possible solution is to use some kind of events. For example, one can raise an abstract mem_usage event on: before request, at the beginning of a view function, at the end of a view function, in some important places within the business logic and so on. Then it should exists a subscriber for such events, doing the next thing:
import resource
mem_usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
This subscriber have to accumulate such usage data and on the app_request_teardown/after_request send it to the metrics collection system with information about current request.endpoint or route or whatever.
Also, using a memory profiler is a good idea, but usually not for a production usage.
Further reading about request processing models:
CGI
FastCGI
PHP specific
Another possible solution is to use sys.setrace. Using this tool one can measure memory usage even per each line of code. Usage examples can be found in the memory_profiler project. Of course, it will slowdown the code significantly.
We have a little data which almost won't be updated but read frequently (site config and some selection items like states and counties information), I think if I can move it to application memory instead of any database, our I/O performance would get a big improvement.
But we have a lot of web servers, I cannot figure out a good solution how to notice all the servers to reload these data.
You are likely looking for a cache pattern: Is there a Python caching library? You just need to ask how stale you can afford to be. If it was looking this up on every request, even a short-lived cache can massively improve performance. It's likely though that this information can live for minutes or hours without too much risk of being "stale".
If you can't live with a stale cache, I've implemented solutions that have a single database call, which keeps track of the last updated date for any of the cached data. This at least reduces the cache lookups to a single database call.
Be aware though, as soon as you are sharing updateable information, you have to deal with multi-threaded updates of shared state. Make sure you understand the implications of this. Hopefully your caching library handles this gracefully.
Is it viable to have a logger entity in app engine for writing logs? I'll have an app with ~1500req/sec and am thinking about doing it with a taskqueue. Whenever I receive a request, I would create a task and put it in a queue to write something to a log entity (with a date and string properties).
I need this because I have to put statistics in the site that I think that doing it this way and reading the logs with a backend later would solve the problem. Would rock if I had programmatic access to the app engine logs (from logging), but since that's unavailable, I dont see any other way to do it..
Feedback is much welcome
There are a few ways to do this:
Accumulate logs and write them in a single datastore put at the end of the request. This is the highest latency option, but only slightly - datastore puts are fairly fast. This solution also consumes the least resources of all the options.
Accumulate logs and enqueue a task queue task with them, which writes them to the datastore (or does whatever else you want with them). This is slightly faster (task queue enqueues tend to be quick), but it's slightly more complicated, and limited to 100kb of data (which hopefully shouldn't be a limitation).
Enqueue a pull task with the data, and have a regular push task or a backend consume the queue and batch-and-insert into the datastore. This is more complicated than option 2, but also more efficient.
Run a backend that accumulates and writes logs, and make URLFetch calls to it to store logs. The urlfetch handler can write the data to the backend's memory and return asynchronously, making this the fastest in terms of added user latency (less than 1ms for a urlfetch call)! This will require waiting for Python 2.7, though, since you'll need multi-threading to process the log entries asynchronously.
You might also want to take a look at the Prospective Search API, which may allow you to do some filtering and pre-processing on the log data.
How about keeping a memcache data structure of request info (recorded as they arrive) and then run an every 5 minute (or faster) cron job that crunches the stats on the last 5 minutes of requests from the memcache and just records those stats in the data store for that 5 minute interval. The same (or a different) cron job could then clear the memcache too - so that it doesn't get too big.
Then you can run big-picture analysis based on the aggregate of 5 minute interval stats, which might be more manageable than analyzing hours of 1500req/s data.