Full proto too large to save, cleared variables - python

I got this error while rendering google app engine code.
Do any body have knowledge about this error?

Are you using appstats? It looks like this can happen when appstats is recording state about your app, especially if you're storing lots of data on the stack. It isn't harmful, but you won't be able to see everything when inspecting calls in appstats.

Related

Handle load time of big JSON for autocomplete on the server side

Hello I have an app that have autocomplete suggestions for the user based on input in the search bar.
I use a package named fast_autocomplete which works great but I have a problem that each time I want to use the prediction I have to load 50MB of JSON data that I mad for the autocomplete (500K records).
While running it on a server side loading, parsing and sending back the data is quite slow for what you would except for autocomplete functionality. It less than a second and right now (testing locally) its takes few seconds.
Checking the issue it seems that it takes a lot of time to load the 50MB JSON for each request. Loading the data and building the autocomplete object each time a new request comes in is a waste of time.
For that I wondered if there is a way to keep the object loaded alive all the time and when a new HTTP request comes in the JSON is already loaded.
How big sites like Amazon, Ebay, Google makes there autocomplete so fast?
If I understand correctly you're loading your data for every request despite that data staying the same for each user. Why not cache that data? or store it outside of your request handler's scope?
auto_complete_data = open("...").read()
#route('/autocomplete')
def autocomplete():
# keep reusing the `auto_complete_data`

Does python with wsgi (uwsgi) under nginx have some small default cache?

In my small web-site I feel need to make some data widely available, to avoid exchanging with database for every request made. E.g. this could be the list of current users show in the bottom of every page or the time of last update of ranking.
The stuff works in Python (Flask) running upon nginx + uwsgi (this docker image).
I wonder, do I have some small cache or shared memory for keeping such information "out of the box", or I need to take care of explicitly setting up some dedicated cache? Or perhaps some thing like this is provided by nginx?
alternatively I still can use database for it has its own cache I think, anyway
Sorry if question seems to be naive/silly - for I come from java world (where things a bit different as we serve all requests with one fat instance of java application) - and have some difficulty grasping what powers does wsgi/uwsgi provide. Thanks in advance!
Firstly, nginx has cache:
https://www.nginx.com/blog/nginx-caching-guide/
But for flask cacheing you also have options:
https://pythonhosted.org/Flask-Cache/
http://flask.pocoo.org/docs/1.0/patterns/caching/
Did you have a look at caching section from Flask docs?
It literally says:
Flask itself does not provide caching for you, but Werkzeug, one of the libraries it is based on, has some very basic cache support
You create a cache object once and keep it around, similar to how Flask objects are created. If you are using the development server you can create a SimpleCache object, that one is a simple cache that keeps the item stored in the memory of the Python interpreter:
from werkzeug.contrib.cache import SimpleCache
cache = SimpleCache()
-- UPDATE --
Or you could solve on the frontend side storing data in the web browser local storage.
If there's nothing in the local storage you call the DB, else you use the information from local storage rather than making db call.
Hope it helps.

GAE - Upload optimized image to cloud storage

I'm working on a simple app that takes images optimizes them and saves them in cloud storage. I found an example that takes the file and uses PIL to optimize it. The code looks like this:
def inPlaceOptimizeImage(photo_blob):
blob_key = photo_blob.key()
new_blob_key = None
img = Image.open(photo_blob.open())
output = StringIO.StringIO()
img.save(output,img.format, optimized=True,quality=90)
opt_img = output.getvalue()
output.close()
# Create the file
file_name = files.blobstore.create(mime_type=photo_blob.content_type)
# Open the file and write to it
with files.open(file_name, 'a') as f:
f.write(opt_img)
# Finalize the file. Do this before attempting to read it.
files.finalize(file_name)
# Get the file's blob key
return files.blobstore.get_blob_key(file_name)
This works fine locally (although I don't know how well it's being optimized because when I run the uploaded image through something like http://www.jpegmini.com/ it gets reduced by 2.4x still). However when I deploy the app and try uploading images I frequently get 500 errors and these messages in the logs:
F 00:30:33.322 Exceeded soft private memory limit of 128 MB with 156 MB after servicing 7 requests total
W 00:30:33.322 While handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application.
I have two questions:
Is this even the best way to optimize and save images in cloud storage?
How do I prevent these 500 errors from occurring?
Thanks in advance.
The error you're experiencing is happening due to the memory limits of your Instance class.
What I would suggest you to do is to edit your .yaml file in order to configure your module, and specify your Instance class to be F2 or higher.
In case you are not using modules, you should also add “module: default” at the beginning of your app.yaml file to let GAE know that this is your default module.
You can take a look to this article from the docs to see the different Instance classes available, and the way to easily configure them.
Another more basic workaround would be to limit the image size when uploading it, but you will eventually finish having a similar issue.
Regarding the previous matter and a way to optimize your images, you may want to take a look at the App Engine Images API that provides the ability to manipulate image data using a dedicated Images service. In your case, you might like the "I'm Feeling Lucky" transformation. By using this API you might not need to update your Instance class.

How AppEngine instances work on the local server

Newbie on appengine and I really don't know how to phrase the question which sadly results in me not knowing what keywords to google and I hope that i really do get help other than the bashing that a lot of people do.
I'm confused between the behavior of appengine online and the appengine on the local server.
Background info:
Btw this is in Python
Initially i assumed that , when needed or as authored
an instance of the app or module will be created.
And that instance will be the one serving multiple requests from different clients.
In this behavior any initialization code will only be run once.
But in the local development server.
Every time i add something new, specially in the main.py,
the server is able to catch the new changes,
then on browser-refresh be able to run it.
This made me think, wait...
Does it run the entire script over and over again
on every request?
Question:
Does an instance/module run the entire code on every request or is this just an added behavior to the dev server to make development easier?
Both your assumptions - about behaviour in production and development - are wrong.
In production, GAE spins up instances as required. This may be in response to increased load, or the host may simply decide after a certain amount of time to recycle an instance by killing it and starting a new one. Initialization code will always be run whenever a new instance is started.
In development, you only get a single instance. However, the server watches your file system for changes. If it detects a change to the code itself, it will restart itself, and therefore re-run the initialization code. But if you don't make any code changes between requests, the existing process continues indefinitely, and init code will not be re-run.

Does local GAE read and write to a local datastore file on the hard drive while it's running?

I have just noticed that when I have a running instance of my GAE application, there nothing happens with the datastore file when I add or remove entries using Python code or in admin console. I can even remove the file and still have all data safe and sound in admin area and accessible from code. But when I restart my application, all data obviously goes away and I have a blank datastore. So, the question - does GAE reads all data from the file only when it starts and then deals with it in the memory, saving the data after I stop the application? Does it make any requests to the datastore file when the application is running? If it doesn't save anything to the file while it's running, then, possibly, data may be lost if the application unexpectedly stops? Please make it clear for me if you know how it works in this aspect.
How the datastore reads and writes its underlying files varies - the standard datastore is read on startup, and written progressively, journal-style, as the app modifies data. The SQLite backend uses a SQLite database.
You shouldn't have to care, though - neither backend is designed for robustness in the face of failure, as they're development backends. You shouldn't be modifying or deleting the underlying files, either.
By default the dev_appserver will store it's data in a temporary location (which is why it disappears and you can't see anything changing)
If you don't want your data to disappear on restart set --datastore_path when running your dev server like:
dev_appserver.py --datastore_path /path/to/app/myapp.db /path/to/app
As nick said, the dev server is not built to be bulletproof, it's designed to help you quickly develop your app. The production setup is very different and will not do anything unexpected when you are dealing with exceptional circumstances.

Categories