sharing objects between module in GAE

sharing objects between module in GAE - python

To share a state(e.g. user) between a module in django people sometime use thread local storage, but as google app engine follows CGI standard and keeps state of a request in os.environ , can I share objects between two modules just by setting it e.g.
mod1.my_data = {} and now any other module can get handle to my_data?
without worrying about other threads/requests sharing/overwriting it?

Later requests that happen to be served on the same process (you can't control that) would access just the same mod1.my_data object (unless you take pains to reassign it as a fresh object at the start of each request, of course).

Related

What is the scope of the Google App Engine api's namespace manager?

I'm building an API with Flask in Google App Engine Python Standard Environment that is served through multiple domains.
The API can be used to store data and fetch data.
I want to use cloud datastore's multitenancy to store or fetch data only in a namespace determined by the domain from which the API is accessed.
The only way I can see to do this is to use google.appengine.api.namespace_manager to set the namespace either at request time or in a context manager at I/O time.
I wrote this context manager:
#contextmanager
def multitenancy_namespace(namespace):
original_namespace = namespace_manager.get_namespace()
if namespace:
new_namespace = to_namespace_safe_url(namespace)
namespace_manager.set_namespace(new_namespace)
yield
namespace_manager.set_namespace(original_namespace)
And it works as expected.
My worry is the scope of namespace_manager. I have not been able to find any documentation about this.
If my API is threaded and used simultaneously by >1000 users, assuming that the namespace as set by namespace_manager.set_namespace(...) is global, I would expect some collisions - data being stored in the wrong namespace because another request called set_namespace after the first request, but before the first request did its I/O.
I wrote a threaded test here that passes, which tells me that the scope of namespace is at least confined to individual threads (which is good enough for my Flask application).
But what is the context of namespace_manager? What does set_namespace actually do? Where is the namespace setting saved? Is there a use case where a namespace collision could happen?

If you look to the source code of namespace_manager.set_namespace(...) you'll see it does it by settings the namespace to an environment variable:
def set_namespace(namespace):
"""Set the default namespace for the current HTTP request.
Args:
namespace: A string naming the new namespace to use. A value of None
will unset the default namespace value.
"""
if namespace is None:
os.environ.pop(_ENV_CURRENT_NAMESPACE, None)
else:
validate_namespace(namespace)
os.environ[_ENV_CURRENT_NAMESPACE] = namespace
When a thread switches context the AppEngine backups & restores the environment variables as needed. They are guaranteed to be request bounded so it is opaque to user's code. I don't remember if there is docs for this, I think I've learned this at some forum thread.
The comment Set the default namespace for the current HTTP request is implicitly confirms this.
We use it at www.myclasses.org for few years and it's never been a problem.
So relax, you are safe to use it in multi-threaded environment!

Starting app engine modules in Google App Engine

App engine "modules" are a new (and experimental, and confusingly-named) feature in App Engine: https://developers.google.com/appengine/docs/python/modules. Developers are being urged to convert use of the "backends" feature to use of this new feature.
There seem to be two ways to start an instance of a module: to send a HTTP request to it (i.e. at http://modulename.appname.appspot.com for the appname application and modulename module), or to call google.appengine.api.modules.start_module().
The Simple Way
The simple way to start an instance of a module would seem to be to create an HTTP request. However, in my case this results in only two outcomes, neither of which is what I want:
If I use the name of the backend that my application defines, i.e. http://backend.appname.appspot.com, the request is properly routed to the backend and properly denied (because backend access is defined by default to be private).
Anything else results in the request being routed to the sole frontend instance of the default module, even using random character strings as module names, such as http://sdlsdjfsldfsdf.appname.appspot.com. This even holds for made-up instance IDs such as in the case of http://99.sdlsdjfsldfsdf.appname.appspot.com, etc. And of course (this is the problem) for the actual name of my module as well.
Starting via the API
The documentation says that calling start_module() with the name of a module and version should cause the specified version of the specified module to start up. However, I'm getting an UnexpectedStateError whenever I call this function with valid arguments.
The Unfortunate State of Affairs
Because I can't get this to work, I'm wondering if there is some subtlety that the documentation might not have mentioned. My setup is pretty straightforward, so I'm wondering if this is a widespread problem to which someone has found a solution.

It turns out that versions cannot be numeric. This problem seems to have been happening because our module's version was "1" and not (for example) "v1".

With modules, they changed the terminology around a little bit. What used to be "backends" are now "basic scaling" or "manual scaling" instances.
"Automatic scaling" and "basic scaling" instances start when they process a request, while "manual scaling" instances run constantly.
Generally to start an instance you would send an HTTP request to your module's URL.
start_module() seems to have limited use for modules with "manual scaling" instances, or restarting modules that have been stopped with stop_module().

You can add:
login: admin
To the handler for your backend. This way an admin user can call your backend and trigger it to run. With login: admin, you can also have issue URLFetch requests froom elsewhwere in your app (ie from a frontend) trigger your backend.

Accessing Request from init.py in Pyramid

When setting up a Pyramid app and adding settings to the Configurator, I'm having issues understanding how to access information from request, like request.session and such. I'm completely new at using Pyramid and I've searched all over the place for information on this but found nothing.
What I want to do is access information in the request object when sending out exception emails on production. I can't access the request object, since it's not global in the __init__.py file when creating the app. This is what I've got now:
import logging
import logging.handlers
from logging import Formatter
config.include('pyramid_exclog')
logger = logging.getLogger()
gm = logging.handlers.SMTPHandler(('localhost', 25), 'email#email.com', ['email#email.com'], 'Error')
gm.setLevel(logging.ERROR)
logger.addHandler(gm)
This works fine, but I want to include information about the logged in user when sending out the exception emails, stored in session. How can I access that information from __init__.py?

Attempting to make request a global variable, or somehow store a pointer to "current" request globally (if that's what you're going to try with subscribing to NewRequest event) is not a terribly good idea - a Pyramid application can have more than one thread of execution, so more than one request can be active within a single process at the same time. So the approach may appear to work during development, when the application runs in a single thread mode and just one user accesses it, but produce really funny results when deployed to a production server.
Pyramid has pyramid.threadlocal.get_current_request() function which returns thread-local request variable, however, the docs state that:
This function should be used extremely sparingly, usually only in unit
testing code. it’s almost always usually a mistake to use
get_current_request outside a testing context because its usage makes
it possible to write code that can be neither easily tested nor
scripted.
which suggests that the whole approach is not "pyramidic" (same as pythonic, but for Pyramid :)
Possible other solutions include:
look at exlog.extra_info parameter which should include environ and params attributes of the request into the log message
registering exception views would allow completely custom processing of exceptions
Using WSGI middleware, such as WebError#error_catcher or Paste#error_catcher to send emails when an exception occurs
if you want to log not only exceptions but possibly other non-fatal information, maybe just writing a wrapper function would be enough:
if int(request.POST['donation_amount']) >= 1000000:
send_email("Wake up, we're rich!", authenticated_userid(request))

Django statelessness?

I'm just wondering if Django was designed to be a fully stateless framework?
It seems to encourage statelessness and external storage mechanisms (databases and caches) but I'm wondering if it is possible to store some things in the server's memory while my app is in develpoment and runs via manage.py runserver.

Sure it's possible. But if you are writing a web application you probably won't want to do that because of threading issues.

That depends on what you mean by "store things in the server's memory." It also depends on the type of data. If you can, you're better off storing "global data" in a database or in the file system somewhere. Unless it is needed every request it doesn't really make sense to store it in the Django instance itself. You'll need to implement some form of locking to prevent race conditions, but you'd need to worry about race conditions if you stored everything on the server object anyway.
Of course, if you're talking about user-by-user data, Django does support sessions. Or, and this is another perfectly good option if you're willing to make the user save the data, cookies.

The best way to maintain state in a django app on a per-user basis is request.session (see django sessions) which is a dictionary you can use to remember things about the current user.
For Application-wide state you should use the a persistent datastore (database or key/value store)
example view for sessions:
def my_view(request):
pages_viewed = request.session.get('pages_viewed', 1) + 1
request.session['pages_viewed'] = pages_viewed
...
If you wanted to maintain local variables on a per app-instance basis you can just define module level variables like so
# number of times my_view has been served since by this server
# instance since the last restart
served_since_restart = 0
def my_view(request):
served_since_restart += 1
...
If you wanted to maintain some server state across ALL app servers (like total number of pages viewed EVER) you should probably use a persistent key/value store like redis, memcachedb, or riak. There is a decent comparison of all these options here: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
You can do it with redis (via redis-py) like so (assuming your redis server is at "127.0.0.1" (localhost) and it's port 6379 (the default):
import redis
def my_view(request):
r = redis.Redis(host='127.0.0.1', port="6379")
served = r.get('pages_served_all_time', 0)
served += 1
r.set('pages_served_all_time', served)
...

There is LocMemCache cache backend that stores data in-process. You can use it with sessions (but with great care: this cache is not cross-process so you will have to use single process for deployment because it will not be guaranteed that subsequent requests will be handled by the same process otherwise). Global variables may also work (use threadlocals if they shouldn't be shared for all process threads; the warning about cross-process communication also applies here).
By the way, what's wrong with external storage? External storage provides easy cross-process data sharing and other features (like memory limiting algorithms for cache or persistance with databases).

Shared object between requests in Django

I am using a Python module (PyCLIPS) and Django 1.3.
I want develop a thread-safety class which realizes the Object Pool and the Singleton patterns and also that have to be shared between requests in Django.
For example, I want to do the following:
A request gets the object with some ID from the pool, do
something with it and push it back to the pool, then send response
with the object's ID.
Another request, that has the object's ID, gets
the object with the given ID from the pool and repeats the steps from the above request.
But the state of the object will has to be kept while it'll be at the pool while the server is running.
It should be like a Singleton Session Bean in Java EE
How I should do it? Is there something I'll should read?
Update:
I can't store objects from the pool in a database, because these objects are wrappers under a library written on C-language which is API for the Expert System Engine CLIPS.
Thanks!

Well, I think a different angle is necessary here. Django is not like Java, the solution should be tailored for a multi-process environment, not a multi-threaded one.
Django has no immediate equivalent of a singleton session bean.
That said, I see no reason your description does not fit a classic database model. You want to save per object data, which should always go in the DB layer.
Otherwise, you can always save stuff on the session, which Django provides for both logged-in users as well as for anonymous ones - see the docs on Django sessions.
Usage of any other pattern you might be familiar with from a Java environment will ultimately fail, considering the vast difference between running a Java web container, and the Python/Django multi-process environment.
Edit: well, considering these objects are not native to your app rather accessed via a third-party library, it does complicate things. My gut feeling is that these objects should not be handled by the web layer but rather by some sort of external service which you can access from a multi-process environment. As Daniel mentioned, you can always throw them in the cache (if said objects are pickle-able). But it feels as if these objects do not belong in the web tier.

Assuming the object cannot be pickled, you will need to create an app to manage the object and all of the interactions that need to happen against it. Probably the easiest implementation would be to create a single process wsgi app (on a different port) that exposes an api to do all of the operations that you need. Whether you use a RESTful api or form posts is up to your personal preference.

Are these database objects? Because if so, the db itself is really the pool, and there's no need to do anything special - each request can independently load the instance from the db, modify it, and save it back.
Edit after comment Well, the biggest problem is that a production web server environment is likely to be multi-process, so any global variables (ie the pool) are not shared between processes. You will need to store them somewhere that's globally accessible. A short in the dark, but are they serializable using Pickle? If so, then perhaps memcache might work.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.