I am using Google-App-Engine standard (Not flex) Enviroment with Python2.7, and I need to load some pre-trained models (Gensim's Word2vec and Keras's LSTM).
I need to load it once (since it very slow - takes around 1.5 seconds) and keep it in faster access for several hours.
What is the best & fastest way to do so?
Thanks!
IMHO the best place for read-only data (including imported code!) needed to be accessed at any time by individual requests is the global application variables area.
Such variables would typically be loaded exactly once per GAE instance lifetime and available until the instance goes away.
Since loading of the data is expensive you need to be aware that it could impact the response time for requests coming in while the instance is starting up (i.e. while the loading request is still active). There are 2 ways to address this:
one would be to use "lazy" loading of the data - effective if just a small percentage of the incoming requests actually need the data. But the requests which actually need the data when it's not available will still be affected, so it'll just reduce the impact of the problem. The method is described in detail in the App Engine Startup time and the Global Variable problem article:
from google.appengine.ext import ndb
# a global variable
gCDNServer = None
def getCDN():
global gCDNServer
if gCDNServer==None:
gCDNServer = Settings.query(Settings.name == "gCDNServer").value
return gCDNServer
another approach, which would completely eliminate the problem, is to make your app support warmup requests (available only if you're using automatic scaling). The data would be loaded by the warmup request handler and will always be available for "live" requests (because no "live" requests will be routed to the instance until the warmup request handling completes).
It might be possible to add logic to drop the data from memory (to reduce the app's memory footprint) if/when you know it'll no longer be needed (i.e. after those several hours you mentioned expired), but that complicates the picture, especially if you configured your app as threadsafe. I'd simply separate the code which doesn't need the data from the one which does in different services and leave autoscaling shut down the instances with the global data when no longer needed.
Related
I have a code that contains a variable that I want to change manually when I want without stopping the main loop neither pause it (with input()). I can't find a library that allows me to set manually in the run, or access the RAM memory to change that value.
for now I set a file watcher that reads the parameters every 1 minutes but this is inefficient way I presume.
I guess you just want to expose API. You did it with files which works but less common. You can use common best practices such as:
HTTP web-server. You can do it quickly with Flask/bottle.
gRPC
pub/sub mechanism - Redis, Kafka (more complicated, requires another storage solution - the DB itself).
I guess that there are tons of other solution but you got the idea. I hope that's what you're looking for.
With those solution you can manually trigger your endpoint and change whatever you want in your application.
In my Bottle app running on pythonanywhere, I want objects to be persisted between requests.
If I write something like this:
X = {'count': 0}
#route('/count')
def count():
X['count'] += 1
tpl = SimpleTemplate('Hello {{count}}!')
return tpl.render(count=X['count'])
The count increments, meaning that X persists between requests.
I am currently running this on pythonanywhere, which is a managed service where I have no control over the web server (nginx I presume?) threading, load balancing (if any) etc...
My question is, is this coincidence because it's only using one thread while on minimal load from me doing my tests?
More generally, at which point will this stop working? E.g. I have more than one thread/socket/instance/load-balanced server etc...?
Beyond that, what is my best options to make something like this work (sticking to Bottle) even if I have to move to a barebones server.
Here's what Bottle docs have to say about their request object:
A thread-safe instance of LocalRequest. If accessed from within a request callback, this instance always refers to the current request (even on a multi-threaded server).
But I don't fully understand what that means, or where global variables like the one I used stand with regards to multi-threading.
TL;DR: You'll probably want to use an external database to store your state.
If your application is tiny, and you're planning to always have exactly one server process running, then your current approach can work; "all" you need to do is acquire a lock around every (!) access to the shared state (the dict X in your sample code). (I put "all" in scare quotes there because it's likely to become more complicated than it sounds at first.)
But, since you're asking about multithreading, I'll assume that your application is more than a toy, meaning that you plan to receive substantial traffic and/or want to handle multiple requests concurrently. In this case, you'll want multiple processes, which means that your approach--storing state in memory--cannot work. Memory is not shared across processes. The (general) way to share state across processes is to store the state externally, e.g. in a database.
Are you familiar with Redis? That'd be on my short list of candidates.
I go the answers by contacting PythonAnywhere support, who had this to say:
When you run a website on a free PythonAnywhere account, just
one process handles all of your requests -- so a global variable like
the one you use there will be fine. But as soon as you want to scale
up, and get (say) a hacker account, then you'll have multiple processes
(not, not threads) -- and of course each one will have its own global
variables, so things will go wrong.
So that part deals with the PythonAnywhere specifics on why it works, and when it would stop working on there.
The answer to the second part, about how to share variables between multiple Bottle processes, I also got from their support (most helpful!) once they understood that a database would not work well in this situation.
Different processes cannot of course share variables, and the most viable solution would be to:
write your own kind of caching server to handle keeping stuff in memory [...] You'd have one process that ran all of the time, and web API requests would access it somehow (an internal REST API?). It could maintain stuff in memory [...]
Ps: I didn't expect other replies to tell me to store state in a database, I figured that the fact I'm asking this means I have a good reason not to use a database, apologies for time wasted!
Is there a way to measure the amount of memory allocated by an arbitrary web request in a Flask/Werkzeug app? By arbitrary, I mean I'd prefer a technique that lets me instrument code at a high enough level that I don't have to change it to test memory usage of different routes. If that's not possible but it's still possible to do this by wrapping individual requests with a little code, so be it.
In a PHP app I wrote a while ago, I accomplished this by calling the memory_get_peak_usage() function both at the start and the end of the request and taking the difference.
Is there an analog in Python/Flask/Werkzeug? Using Python 2.7.9 if it matters.
First of all, one should understand the main difference between PHP and Python requests processing. Roughly speaking, each PHP worker accepts only one request, handle it and then die (or reinit interpreter). PHP was designed directly for it, it's request processing language by its nature. So, it's pretty simple to measure per request memory usage. Request's peak memory usage is equal to the worker peak memory usage. It's a language feature.
At the same time, Python usually uses another approach to handle requests. There are two main models - synchronous and asynchronous request processing. However, both of them have the same difficulty when it comes to measure per request memory usage. The reason is that one Python worker handles plenty of requests (concurrently or sequentially) during his life. So, it's hard to get memory usage exactly for a request.
However, one can adapt an underlying framework and application code to accomplish collecting memory usage task. One possible solution is to use some kind of events. For example, one can raise an abstract mem_usage event on: before request, at the beginning of a view function, at the end of a view function, in some important places within the business logic and so on. Then it should exists a subscriber for such events, doing the next thing:
import resource
mem_usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
This subscriber have to accumulate such usage data and on the app_request_teardown/after_request send it to the metrics collection system with information about current request.endpoint or route or whatever.
Also, using a memory profiler is a good idea, but usually not for a production usage.
Further reading about request processing models:
CGI
FastCGI
PHP specific
Another possible solution is to use sys.setrace. Using this tool one can measure memory usage even per each line of code. Usage examples can be found in the memory_profiler project. Of course, it will slowdown the code significantly.
I am trying to optimize performance on GAE but once I deploy I get very unstable results. It's really hard to see if each optimization actually works because datastore and memcache operations take a very variable time (it ranges from milliseconds to seconds for the same operations).
For these tests I am the only one making only one request on the application by refreshing the homepage. There is no other people/traffic happening (besides my own browser requesting images/css/js files from the page).
Edit: To make sure that the drops were not due to concurrent requests from the browser (images/css/js), I've redone the test by requesting ONLY the page with urllib2.urlopen(). Problem persists.
My questions are:
1) Is this something to expect due to the fact that machines/resources are shared?
2) What are the most common cases where this behavior can happen?
3) Where can I go from there?
Here is a very slow datastore get (memcache was just flushed):
Full size
Here is a very slow memcache get (things are cached because of the previous request):
Full size
Here is a slow but faster memcache get (same repro step as the previous one, different calls are slow):
Full size
To answer your questions,
1) yes, you can expect variance in remote calls because of the shared network;
2) the most common place you will see variance is in datastore requests -- the larger/further the request, the more variance you will see;
3) here are some options for you:
It looks like you are trying to fetch large amounts of data from the datastore/memcache. You may want to re-think the queries and caches so they retrieve smaller chunks of data. Does your app need all that data for a single request?
If the app really needs to process all that data on every request, another option is to preprocess it with a background task (cron, task queue, etc.) and put the results into memcache. The request that serves up the page should simply pick the right pieces out of the memcache and assemble the page.
#proppy's suggestion to use NDB is a good one. It takes some work to rewrite serial queries into parallel ones, but the savings from async calls can be huge. If you can benefit from parallel tasks (using map), all the better.
INTRO
I've recently switched to Python, after about 10 years of PHP development and habits.
Eg. in Symfony2, every request to server (Apache for instance) has to load eg. container class and instantiate it, to construct the "rest" of the objects.
As far as I understand (I hope) Python's WSGI env, an app is created once, and until that app closes, every request just calls methods/functions.
This means that I can have eg. one instance of some class, that can be accessed every time, request is dispatched, without having to instantiate it in every request. Am I right?
QUESTION
I want to have one instance of class since the call to __init__ is very expensive (in both computing and resources lockup). In PHP instantiating this in every request degrades performance, am I right that with Python's WSGI I can instantiate this once, on app startup, and use through requests? If so, how do I achieve this?
WSGI is merely a standardized interface that makes it possible to build the various components of a web-server architecture so that they can talk to each other.
Pyramid is a framework whose components are glued with each other through WSGI.
Pyramid, like other WSGI frameworks, makes it possible to choose the actual server part of the stack, like gunicorn, Apache, or others. That choice is for you to make, and there lies the ultimate answer to your question.
What you need to know is whether your server is multi-threaded or multi-process. In the latter case, it's not enough to check whether a global variable has been instantiated in order to initialize costly resources, because subsequent requests might end up in separate processes, that don't share state.
If your model is multi-threaded, then you might indeed rely on global state, but be aware of the fact that you are introducing a strong dependency in your code. Maybe a singleton pattern coupled with dependency-injection can help to keep your code cleaner and more open to change.
The best method I found was mentioned (and I missed it earlier) in Pyramid docs:
From Pyramid Docs#Startup
Note that an augmented version of the values passed as **settings to the Configurator constructor will be available in Pyramid view callable code as request.registry.settings. You can create objects you wish to access later from view code, and put them into the dictionary you pass to the configurator as settings. They will then be present in the request.registry.settings dictionary at application runtime.
There are a number of ways to do this in pyramid, depending on what you want to accomplish in the end. It might be useful to look closely at the Pyramid/SQLAlchemy tutorial as an example of how to handle an expensive initialization (database connection and metadata setup) and then pass that into the request-handling engine.
Note that in the referenced link, the important part for your question is the __init__.py file's handling of initialize_sql and the subsequent creation of DBSession.