Google app engine - Python: How to use Namespaces and Ancestors? - python

I'm trying to understand how namespaces and ancestors work in GAE.
So I made a simple multitenant "ToDo List" application where each tenant is a user. So I set the namespaces in appengine_config.py like so:
def namespace_manager_default_namespace_for_request():
name = users.GetCurrentUser().user_id()
return name
And then in my main.py I manage the views like so:
#Make tasks with strong consistency by giving them a parent
DEFAULT_TASKS_ANCESTOR = ndb.Key('Agenda', 'default_agenda')
class TasksList(webapp2.RequestHandler):
def get(self):
namespace = namespace_manager.get_namespace()
tasks_query = Task.query(ancestor=DEFAULT_TASKS_ANCESTOR)
tasks = tasks_query.fetch(10, use_cache=False)
context = {
'tasks': tasks,
'namespace': namespace,
}
template = JINJA_ENVIRONMENT.get_template('templates/tasks_list.html')
self.response.write(template.render(context))
def post(self):
name = cgi.escape(self.request.get('name'))
task = Task(parent=DEFAULT_TASKS_ANCESTOR)
task.name = name
task.put()
self.redirect('/tasks_list')
...And that gave a leaking data:
I logged in as userA, created a task, then logged out and logged in again as a different user (userB), and I could see that task from userA.
I confirmed from the admin panel that the task entities are from different namespaces and that the ancestor is also different "since it includes the namespace?".
I also turned caching off by using .fetch(10, use_cache=False). But the problem was still there. So it was not a caching issue.
So finally I make the parent just before .query() and .put() and it did work! Like so:
DEFAULT_TASKS_ANCESTOR = ndb.Key('Agenda', 'default_agenda')
tasks_query = Task.query(ancestor=DEFAULT_TASKS_ANCESTOR)
But now I have questions...
Why it worked? Since those task entities don't share the same namespace, how could I query data from another namespace? Even if I query for an ancestor from another namespace by mistake should I get the data?
Is DEFAULT_TASKS_ANCESTOR from namespaceA the same ancestor as DEFAULT_TASKS_ANCESTOR from the namespaceB, or are they two completely different ancestors?
Do namespaces really compartmentalize data in the datastore or is that not the case?
If ancestors play that much of a big role even in namespaces, should I still use namespaces or ancestors to compartmentalize data in a multitenant application?
Thank you in advance!

I think your problem was defining the DEFAULT_TASKS_ANCESTOR constant outside of the requests. In that case, datastore used the default namespace to create this key. So by querying with that ancestor key, you used the same namespace.
From Google's datastore documentation:
By default, the datastore uses the current namespace setting in the namespace manager for datastore requests. The API applies this current namespace to Key or Query objects when they are created. Therefore, you need to be careful if an application stores Key or Query objects in serialized forms, since the namespace is preserved in those serializations.
As you have found out, it worked when you defined the ancestor key inside the requests. You can set the namespace for a particular query in the constructor call (i.e. ndb.Query(namespace='1234')), so you can query entities from a different namespace.
For your other questions:
These are in my understanding different keys, since they include different namespaces.
To make a hard compartmentalization, you could check on the server side whether the namespace used for a query is equal to the userID of the current user.
It really depends on the overall structure of the application. IMHO, ancestor keys are more relevant with regard to strong consistency. Think of the situation where a user might have several todo-lists. He should be able to see them all, but strong consistency may only be required at the list level.

Related

How to replace template method pattern with functional style?

You can see the code here
The concrete problem that I'm trying to solve is this.
Say that I need to provide REST interface to some entities modeled (sqlalchemy in my case) with some tool stored in database. Say that this collection is called parents.
I would need handlers like this
GET /parents
GET /parents/some_id
POST /parents
DELETE /parents/some_id
Conceptually all this handlers are very similar.
They all take ids from url, then create appropriate query. Then they fetching data with that query, then turn this data into dict and then call jsonify to create correct http response.
So with OOP I could design this like that.
class AbstractHandler():
def __init__(serializer, **specs):
self.specs = specs
def handle_request(self, spec_data, *_ids):
q = self.create_query(_ids)
d = self.fetch_data(self.specs[spec_data['name']](**(spec_data['args'] + (query, ))
out = serializer(d)
return jsonify(out)
The spec is a function that takes some parameters and query and produce more refined query based of this parameters.
So for example
GET /parents?spec={'name': 'by_name', 'args': ['adam'}
would return parent named Adam from collection.
This code has some flaws but I hope you see the point how template method makes flow of control here and subclasses can change how they would create query, how they would fetch data (item handler would need to call query.one() and collection handler would need to call query.all() for example)
So I can replace create_query, fetch_data with dependency injection instead. But that would create a problem that someone could create wrong configuration by giving the wrong dependency. That's basically what I've done, but with using partial functions instead.
So what I'm thinking right now is that I can solve this problem by creating factory functions for every type of handler that I need, that would give appropriate dependency to handler.
That's very much alike with template method solution I think. The difference basically is that in template method correctness dependencies is guarantied by object type and in my solution it's guarantied by type of factory function.
So enough with what I think, I'd like to know what you think about that?
How people in functional world solve this kinds of problem?

Is querying NDB JsonProperty in Google App Engine possible? If not, any alternatives?

Is there any way of using JsonProperties in queries in NDB/GAE? I can't seem to find any information about this.
Person.query(Person.custom.eye_color == "blue").fetch()
With a model looking something like this:
class Person(ndb.Model):
height = ndb.IntegerProperty(default=-1)
#...
#...
custom = ndb.JsonProperty(indexed=False, compressed=False)
The use case is this: I'm storing data about customers, where we at first only needed to query specific data. Now, we want to be able to query for any type of registred data about the persons. For example eye color, which some may have put into the system, or any other custom key/value pair in our JsonProperty.
I know about the expando class but for me, it seems a lot easier to be able to query jsonproperty and to keep all the custom properties on the same "name"; custom. That means that the front end can just loop over the properties in custom. If an expando class would be used, it would be harder to differentiate.
Rather than using a JSONProperty have you considered using a StructuredProperty. You maintain the same structure, just stored differently and you can filter by sub components of the StructureProperty with some restrictions, but that may be sufficient.
See https://developers.google.com/appengine/docs/python/ndb/queries#filtering_structured_properties
for querying StructuredProperties.

Managing global data in a namespaced, multitenant Appengine application

I'm designing a multitenant system using namespaces.
Users authenticate via OpenID, and a User model is saved in Cloud Datastore. Users will be grouped into Organizations, also modeled in the database. Application data needs to be partitioned by organization.
So the idea is to map namespaces to "Organizations".
When a user logs in, their Organization is looked up and saved in a session.
WSGI middleware inspects the session and sets the namespace accordingly.
My question relates to how best to manage switching between data that is "global" (i.e. User and Organization) and application data (namespaced by organization)
My current approach is to use python decorators and context managers to temporarily switch to the global namepace for operations that access such global data. e.g.
standard_datastore_op()
with global_namespace():
org = Organization.query(Organization.key=org_key)
another_standard_datastore_op(org.name)
or
#global_namespace
def process_login(user_id):
user = User.get_by_id(user_id)
This also implies that models have cross-namespace KeyProperties:
class DomainData(ndb.Model): # in the current user's namespace
title = ndb.StringProperty()
foreign_org = ndb.KeyProperty(Organization) #in the "global" namespace
Does this seem a reasonable approach? It feels a little fragile to me, but I suspect that is because I'm new to working with namespaces in App Engine. My alternative idea is to extract all "global" data from Cloud Datastore into external web-service but I'd rather avoid that if possible.
Advice gratefully received. Thanks in advance
Decorators is a perfectly fine approach, which also has the benefit of clearly labeling which functions operate outside of organization-specific namespace bounds.
def global_namespace(global_namespace_function):
def wrapper():
# Save the current namespace.
previous_namespace = namespace_manager.get_namespace()
try:
# Empty string = default namespace, change to whatever you want to use as 'global'
global_namespace = ''
# Switch to 'global' namespace
namespace_manager.set_namespace(global_namespace)
# Run code that requires global namespace
global_namespace_function()
finally:
# Restore the saved namespace.
namespace_manager.set_namespace(previous_namespace)
return wrapper
On a related note, we also have documentation on using namespaces for multitenenacy.

Alternative to singleton?

I'm a Python & App Engine (and server-side!) newbie, and I'm trying to create very simple CMS. Each deployment of the application would have one -and only one -company object, instantiated from something like:
class Company(db.Model):
name = db.StringPropery()
profile = db.TextProperty()
addr = db.TextProperty()
I'm trying to provide the facility to update the company profile and other details.
My first thought was to have a Company entity singleton. But having looked at (although far from totally grasped) this thread I get the impression that it's difficult, and inadvisable, to do this.
So then I thought that perhaps for each deployment of the CMS I could, as a one-off, run a script (triggered by a totally obscure URL) which simply instantiates Company. From then on, I would get this instance with theCompany = Company.all()[0]
Is this advisable?
Then I remembered that someone in that thread suggested simply using a module. So I just created a Company.py file and stuck a few variables in it. I've tried this in the SDK and it seems to work -to my suprise, modified variable values "survived" between requests.
Forgive my ignorance but, I assume these values are only held in memory rather than on disk -unlike Datastore stuff? Is this a robust solution? (And would the module variables be in scope for all invocations of my application's scripts?)
Global variables are "app-cached." This means that each particular instance of your app will remember these variables' values between requests. However, when an instance is shutdown these values will be lost. Thus I do not think you really want to store these values in module-level variables (unless they are constants which do not need to be updated).
I think your original solution will work fine. You could even create the original entity using the remote API tool so that you don't need an obscure page to instantiate the one and only Company object.
You can also make the retrieval of the singleton Company entity a bit faster if you retrieve it by key.
If you will need to retrieve this entity frequently, then you can avoid round-trips to the datastore by using a caching technique. The fastest would be to app-cache the Company entity after you've retrieved it from the datastore. To protect against the entity from becoming too out of date, you can also app-cache the time you last retrieved the entity and if that time is more than N seconds old then you could re-fetch it from the datastore. For more details on this option and how it compares to alternatives, check out Nick Johnson's article Storage options on App Engine.
It sounds like you are trying to provide a way for your app to be configurable on a per-application basis.
Why not use the datastore to store your company entity with a key_name? Then you will always know how to fetch the company entity, and you'll be able edit the company without redeploying.
company = Company(key_name='c')
# set stuff on company....
company.put()
# later in code...
company = Company.get_by_key_name('c')
Use memcache to store the details of the company and avoid repeated datastore calls.
In addition to memcache, you can use module variables to cache the values. They are cached, as you have seen, between requests.
I think the approach you read about is the simplest:
Use module variables, initialized in None.
Provide accessors (get/setters) for these variables.
When a variable is accessed, if its value is None, fetch it from the database. Otherwise, just use it.
This way, you'll have app-wide variables provided by the module (which won't be instantiated again and again), they will be shared and you won't lose them.

Design pattern to organize non-trivial ORM queries?

I am developing a web API with 10 tables or so in the backend, with several one-to-many and many-to-many associations. The API essentially is a database wrapper that performs validated updates and conditional queries. It's written in Python, and I use SQLAlchemy for ORM and CherryPy for HTTP handling.
So far I have separated the 30-some queries the API performs into functions of their own, which look like this:
# in module "services.inventory"
def find_inventories(session, user_id, *inventory_ids, **kwargs):
query = session.query(Inventory, Product)
query = query.filter_by(user_id=user_id, deleted=False)
...
return query.all()
def find_inventories_by(session, app_id, user_id, by_app_id, by_type, limit, page):
....
# in another service module
def remove_old_goodie(session, app_id, user_id):
try:
old = _current_goodie(session, app_id, user_id)
services.inventory._remove(session, app_id, user_id, [old.id])
except ServiceException, e:
# log it and do stuff
....
The CherryPy request handler calls the query methods, which are scattered across several service modules, as needed. The rationale behind this solution is, since they need to access multiple model classes, they don't belong to individual models, and also these database queries should be separated out from direct handling of API accesses.
I realize that the above code might be called Foreign Methods in the realm of refactoring. I could well live with this way of organizing for a while, but as things are starting to look a little messy, I'm looking for a way to refactor this code.
Since the queries are tied directly to the API and its business logic, they are hard to generalize like getters and setters.
It smells to repeat the session argument like that, but as the current implementation of the API creates a new CherryPy handler instance for each API call and therefore the session object, there is no global way of getting at the current session.
Is there a well-established pattern to organize such queries? Should I stick with the Foreign Methods and just try to unify the function signature (argument ordering, naming conventions etc.)? What would you suggest?
The standard way to have global access to the current session in a threaded environment is ScopedSession. There are some important aspects to get right when integrating with your framework, mainly transaction control and clearing out sessions between requests. A common pattern is to have an autocommit=False (the default) ScopedSession in a module and wrap any business logic execution in a try-catch clause that rolls back in case of exception and commits if the method succeeded, then finally calls Session.remove(). The business logic would then import the Session object into global scope and use it like a regular session.
There seems to be an existing CherryPy-SQLAlchemy integration module, but as I'm not too familiar with CherryPy, I can't comment on its quality.
Having queries encapsulated as functions is just fine. Not everything needs to be in a class. If they get too numerous just split into separate modules by topic.
What I have found useful is too factor out common criteria fragments. They usually fit rather well as classmethods on model classes. Aside from increasing readability and reducing duplication, they work as implementation hiding abstractions up to some extent, making refactoring the database less painful. (Example: instead of (Foo.valid_from <= func.current_timestamp()) & (Foo.valid_until > func.current_timestamp()) you'd have Foo.is_valid())
SQLAlchemy strongly suggests that the session maker be part of some global configuration.
It is intended that the sessionmaker()
function be called within the global
scope of an application, and the
returned class be made available to
the rest of the application as the
single class used to instantiate
sessions.
Queries which are in separate modules isn't an interesting problem. The Django ORM works this way. A web site usually consists of multiple Django "applications", which sounds like your site that has many "service modules".
Knitting together multiple services is the point of an application. There aren't a lot of alternatives that are better.

Categories