GAE Soft Memory Use

GAE Soft Memory Use - python

I am new to Google App Engine, but trying to find the true source of how much soft memory my application is consuming.
I am running the F1 instance class (128MB Memory limit) in the standard environment and have not yet had a soft memory exceeded error.
The tools I am using to check memory are:
Google App Engine Dashboard (Memory usage chart) - shows memory use gradually increasing over the past week from 250MB to over 1GB. Refer to first image below.
Google App Engine Dashboard (Instance summary table) - shows the average Memory usage at 122MB. Refer to first image below.
logging runtime.memory_usage() - shows a range between 120MB and 160MB throughout the day.
Stackdriver Monitoring - shows memory mostly hovering around 150MB, but spiking as new instances are spawned. Refer to second image below.
Appreciate any guidance on which information source I should be using to determine the actual memory use of the application, and what Google would use to throw a soft memory error.
App Engine Dashboard:
Stackdriver Monitoring:

App Engine won't throw an exception when you reach the soft limit. Instead, your instance will be gracefully restarted (stop accepting new requests, finish any existing requests, and shutdown).
In your first graph, the "250MB to over 1GB" is the aggregate memory usage over all your App Engine instances. You can see in the instance summary table that the average memory per instance is 122.3MB, so it's under the soft limit.
The Stackdriver graph is showing aggregate memory usage over a region. You can see that the spikes in memory correlate with multiple instances running simultaneously.

Related

Managing Heroku RAM for Unique Application

I have a Flask application that allows users to query a ~small database (2.4M rows) using SQL. It's similar to a HackerRank but more limited in scope. It's deployed on Heroku.
I've noticed during testing that I can predictably hit an R14 error (memory quota exceeded) or R15 (memory quota greatly exceeded) by running large queries. The queries that typically cause this are outside what a normal user might do, such as SELECT * FROM some_huge_table. That said, I am concerned that these errors will become a regular occurrence for even small queries when 5, 10, 100 users are querying at the same time.
I'm looking for some advice on how to manage memory quotas for this type of interactive site. Here's what I've explored so far:
Changing the # of gunicorn workers. This has had some effect but I still hit R14 and R15 errors consistently.
Forced limits on user queries, based on either text or the EXPLAIN output. This does work to reduce memory usage, but I'm afraid it won't scale to even a very modest # of users.
Moving to a higher Heroku tier. The plan I use currently provides ~512MB RAM. The largest plan is around 14GB. Again, this would help but won't even moderately scale, to say nothing of the associated costs.
Reducing the size of the database significantly. I would like to avoid this if possible. Doing the napkin math on a table with 1.9M rows going to 10k or 50k, the application would have greatly reduced memory needs and will scale better, but will still have some moderate max usage limit.
As you can see, I'm a novice at best when it comes to memory management. I'm looking for some strategies/ideas on how to solve this general problem, and if it's the case that I need to either drastically cut the data size or throw tons of $ at this, that's OK too.
Thanks

Coming from my personal experience, I see two approaches:
1. plan for it
Coming from your example, this means you try to calculate the maximum memory that the request would use, multiply it by the number of gunicorn workers, and use dynos big enough.
With a different example this could be valid, I don't think it is for you.
2. reduce memory usage, solution 1
The fact that too much application memory is used makes me think that likely in your code you are loading the whole result-set into memory (probably even multiple times in multiple formats) before returning it to the client.
In the end, your application is only getting the data from the database and converting it to some output format (JSON/CSV?).
What you are probably searching for is streaming responses.
Your Flask-view will work on a record-by-record base. It will read a single record, convert it to your output format, and return a single record.
Both your database client library and Flask will support this (on most databases it is called cursors / iterators).
2. reduce memory usage, solution 2
other services often go for simple pagination or limiting resultsets to manage server-side memory.
security sidenote
it sounds like the users can actually define the SQL statement in their API requests. This is a security and application risk. Apart from doing INSERT, UPDATE, or DELETE statements, the user could create a SQL statement that will not only blow your application memory, but also break your database.

How to diagnose Google App Engine Flask Memory Leak

I have a Google App Engine app running (Flask app) that seems to have a memory leak. See the plot of the memory usage below. The memory usage continually creeps up until it hits the limit and the instance is shutdown and a new one is started up.
It's a simple API with about 8 endpoints. None of them handle large amounts of data.
I added an endpoint that takes a memory snapshot with the tracemalloc package, and compares it to the last snapshot and then writes the output to Google Cloud Storage.
I don't see anything in the reports that indicates a memory leak. The peak memory usage is recorded as about 0.12 GiB.
I am also calling gc.collect() at the end of every function that is called by each endpoint.
Any ideas on how to diagnose this, or what might be causing it?

There could be many reasons for this situation to be encountered. Is your app creating temporary files? Temporary files can be a cause of a memory leak. Temporary files can also be created from errors, or warnings. First of all, I would check my Stackdriver logs for errors and warnings and I would try to fix them.
Is your application interacting with databases or storage buckets ? Some memory related issues can be related to a bad interaction of your app with any data storage service. This issue was also encountered here and was mitigated by treating the Google Cloud Storage errors.
Another thing that you can do is investigate a bit the way of the memory is used in your function. For this you have some nice tools you can use like Pympler and Heapy. Playing with those tools may give you valuable clues about what your issue is.

Cloud PubSub to App Engine memory Issue (and Should I move to DataFlow?)

I currently run a "small" flask application on (Google Cloud's App Engine) that is used to integrate applications (it listens to webhooks and calls other APIs). The issue is that I consistently exceed the soft memory limit after 35 - 45 requests.
Memory footprint of the combined instances:
Since I intend to increase the load on this system by orders of magnitude this worries me.
There seems to be three possible solutions to me but I don't know where to start:
Switch to DataFlow: I already use Pub/Sub between two App Engine instances to add higher consistency, but maybe App Engine is the wrong platform or this kind of platform.
Fix memory leak: The issue here could be a possible memory leak, but I can't find the right tools to analyse the memory usage on the App Engine platform (on my local machine usage of the Python process hovers around 51MB)
Divide the system into multiple microservices to decrease the footprint per instance. (Maintaining the code base will probably be harder though).
Any advice or experience is very welcome.

If your case is indeed a memory leak, you need to verify your code as this will consistently lead to your application crashing. There are other posts, like this one that discusses tools and strategies to address memory issues in python code.
You could potentially use Dataflow or Cloud Functions in your project. If you provide more details about the nature of your use case in a separate question, one could evaluate if these options could be a better alternative to your current App Engine approach.
Finally, dividing your application into multiple services is likely the best long term solution to your issue, as it will make it easier to find any memory leak, to control costs and to generally maintain your application.
There are few pages in App Engine’s documentation that discuss best practices in microservice design using App Engine [1] [2] [3]. Proper microservice-based applications permit clear logging and monitoring as well as an increase in application reliability and scalability, among other benefits [1]. These benefits that I mentioned here are important to you. Following the layout as discussed in [4], you can scale your services individually and independently of each other. If you believe that one of your services is more resource-demanding, you can adjust the scaling parameters in order to provide optimal performance for that service. For example, you can manage the numbers of instances that are fired during operations.
You can use the app elements max_concurrent_requests and target_throughput_utilization which you can define in your App Engine’s configuration file, the app.yaml file. See [5]. To clarify, you want to reduce your max_concurrent_requests in your case.
Please note that, as discussed in previous comments, this road could lead to higher costs. If you are using the free tier then you will need to check [4] for available resources to you in this tier.
Regarding the issue of your instances running out of memory, if you find out that it is not due to a memory leak, then another solution would be to use a different instance_class which means that you can instantiate App Engines with higher compute resources (also higher costs). Please see [5] and [6].
[1] https://cloud.google.com/appengine/docs/standard/python/microservices-on-app-engine
[2] https://cloud.google.com/appengine/docs/standard/python/designing-microservice-api
[3] https://cloud.google.com/appengine/docs/standard/python/microservice-performance
[4] https://cloud.google.com/appengine/docs/standard/python/an-overview-of-app-engine
[5] https://cloud.google.com/appengine/docs/standard/python/config/appref
[6] https://cloud.google.com/appengine/docs/standard/#instance_classes

Difference between cAdvisor memory report and resources maxrss in python

I'm running a simple flask web app using kubernetes as infrastructure. Recently I realized a curious behavior, when I was testing memory consumption. Using the following python code I report the total RSS used by my process.
resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
After some warmup making requests to the server the reported resident memory was about 128Mb.
cAdvisor on the other hand reported 102Mib of rss. If it was the opposite there would be some sense in it, since the container could be using some memory for other stuff beside running my app, but the weird thing is that the python process apparently is using more memory than the container is aware of.
Making the conversion from Mib to Mb does not explain that since 102Mib ~ 107Mb.
What does the memory usage reported by cAdvisor stands for? Which number should I use as a reliable memory usage report?

Google App Engine, Datastore and Task Queues, Performance Bottlenecks?

We're designing a system that will take thousands of rows at a time and send them via JSON to a REST API built on Google App Engine. Typically 3-300KB of data but let's say in extreme cases a few MB.
The REST API app will then adapt this data to models on the server and save them to the Datastore. Are we likely to (eventually if not immediately) encounter any performance bottlenecks here with Google App Engine, whether it's working with that many models or saving so many rows of data at a time to the datastore?
The client does a GET to get thousands of records, then a PUT with thousands of records. Is there any reason for this to take more than a few seconds, and necessitate the need for a Task queues API?

The only bottleneck in App Engine (apart from the single entity group limitation) is how many entities you can process in a single thread on a single instance. This number depends on your use case and the quality of your code. Once you reach a limit, you can (a) use a more powerful instance, (b) use multi-threading and/or (c) add more instances to scale up your processing capacity to any level you desire.
Task API is a very useful tool for large data loads. It allows you to split your job into a large number of smaller tasks, set the desired processing rate, and let App Engine automatically adjust the number of instances to meet that rate. Another option is a MapReduce API.

This is a really good question, one that I've been asked in interviews, seen pop up in a lot of different situations as well. Your system essentially consists of two things:
Savings (or writing) models to the data store
Reading from the data store.
From my experience of this problem, when you view these two things differently you're able to come up with solid solutions to both. I typically use a cache, such as memcachd, in order to keep data easily accessible for reading. At the same time, for writing, I try to have a main db and a few slave instances as well. All the writes will go to the slave instances (thereby not locking up the main db for reads that sync to the cache), and the writes to the slave db's can be distributed in a round robin approach there by ensuring that your insert statements are not skewed by any of the model's attributes having a high occurance.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.