I'm a Python & App Engine (and server-side!) newbie, and I'm trying to create very simple CMS. Each deployment of the application would have one -and only one -company object, instantiated from something like:
class Company(db.Model):
name = db.StringPropery()
profile = db.TextProperty()
addr = db.TextProperty()
I'm trying to provide the facility to update the company profile and other details.
My first thought was to have a Company entity singleton. But having looked at (although far from totally grasped) this thread I get the impression that it's difficult, and inadvisable, to do this.
So then I thought that perhaps for each deployment of the CMS I could, as a one-off, run a script (triggered by a totally obscure URL) which simply instantiates Company. From then on, I would get this instance with theCompany = Company.all()[0]
Is this advisable?
Then I remembered that someone in that thread suggested simply using a module. So I just created a Company.py file and stuck a few variables in it. I've tried this in the SDK and it seems to work -to my suprise, modified variable values "survived" between requests.
Forgive my ignorance but, I assume these values are only held in memory rather than on disk -unlike Datastore stuff? Is this a robust solution? (And would the module variables be in scope for all invocations of my application's scripts?)
Global variables are "app-cached." This means that each particular instance of your app will remember these variables' values between requests. However, when an instance is shutdown these values will be lost. Thus I do not think you really want to store these values in module-level variables (unless they are constants which do not need to be updated).
I think your original solution will work fine. You could even create the original entity using the remote API tool so that you don't need an obscure page to instantiate the one and only Company object.
You can also make the retrieval of the singleton Company entity a bit faster if you retrieve it by key.
If you will need to retrieve this entity frequently, then you can avoid round-trips to the datastore by using a caching technique. The fastest would be to app-cache the Company entity after you've retrieved it from the datastore. To protect against the entity from becoming too out of date, you can also app-cache the time you last retrieved the entity and if that time is more than N seconds old then you could re-fetch it from the datastore. For more details on this option and how it compares to alternatives, check out Nick Johnson's article Storage options on App Engine.
It sounds like you are trying to provide a way for your app to be configurable on a per-application basis.
Why not use the datastore to store your company entity with a key_name? Then you will always know how to fetch the company entity, and you'll be able edit the company without redeploying.
company = Company(key_name='c')
# set stuff on company....
company.put()
# later in code...
company = Company.get_by_key_name('c')
Use memcache to store the details of the company and avoid repeated datastore calls.
In addition to memcache, you can use module variables to cache the values. They are cached, as you have seen, between requests.
I think the approach you read about is the simplest:
Use module variables, initialized in None.
Provide accessors (get/setters) for these variables.
When a variable is accessed, if its value is None, fetch it from the database. Otherwise, just use it.
This way, you'll have app-wide variables provided by the module (which won't be instantiated again and again), they will be shared and you won't lose them.
Related
This is the use case: I have a server that receives instructions from many clients. Each client instructions are handled by its own Session object, who holds all the information about the state of the session and queries mongoengine for the data it needs.
Now, suppose session1 queries mongoengine and gets document "A" as a document object.
Later, session2 also queries and gets document "A", as another separate document object.
Now we have 2 document objects representing document "A", and to get them consistent I need to call A.update() and A.reload() all the time, which seems unnecessary.
Is there any way I can get a reference to the same document object over the two queries? This way both sessions could make changes to the document object and those changes would be seen by the other sessions, since they would be made to the same python object.
I've thought about making a wrapper for mongoengine that caches the documents that we have as document objects at runtime and ensures there are no multiple objects for the same document at any given time. But my knowledge of mongoengine is too rudimentary to do it at the time.
Any thoughts on this? Is my entire design flawed? Is there any easy solution?
I don't think going in that direction is a good idea. From what I understand you are in a web application context, you might be able to get something working for threads within a single process but you won't be able to share instances across different processes (and it gets even worse if you have processes running on different machines).
One way to address this is to use optimistic concurrency validation, you basically maintain a field like "version-identifier" that gets updated whenever the instance is updated and whenever you save/update the object, you run a query like "update object if version-identifier=... else you fail"
This means that if there are concurrent requests, 1 of them will succeed (first one to be flusged), the other one will fail because the version-identifier that they have is outdated. MongoEngine has no built in support for that but more info can be found here https://github.com/MongoEngine/mongoengine/issues/1563
I'm trying to understand how namespaces and ancestors work in GAE.
So I made a simple multitenant "ToDo List" application where each tenant is a user. So I set the namespaces in appengine_config.py like so:
def namespace_manager_default_namespace_for_request():
name = users.GetCurrentUser().user_id()
return name
And then in my main.py I manage the views like so:
#Make tasks with strong consistency by giving them a parent
DEFAULT_TASKS_ANCESTOR = ndb.Key('Agenda', 'default_agenda')
class TasksList(webapp2.RequestHandler):
def get(self):
namespace = namespace_manager.get_namespace()
tasks_query = Task.query(ancestor=DEFAULT_TASKS_ANCESTOR)
tasks = tasks_query.fetch(10, use_cache=False)
context = {
'tasks': tasks,
'namespace': namespace,
}
template = JINJA_ENVIRONMENT.get_template('templates/tasks_list.html')
self.response.write(template.render(context))
def post(self):
name = cgi.escape(self.request.get('name'))
task = Task(parent=DEFAULT_TASKS_ANCESTOR)
task.name = name
task.put()
self.redirect('/tasks_list')
...And that gave a leaking data:
I logged in as userA, created a task, then logged out and logged in again as a different user (userB), and I could see that task from userA.
I confirmed from the admin panel that the task entities are from different namespaces and that the ancestor is also different "since it includes the namespace?".
I also turned caching off by using .fetch(10, use_cache=False). But the problem was still there. So it was not a caching issue.
So finally I make the parent just before .query() and .put() and it did work! Like so:
DEFAULT_TASKS_ANCESTOR = ndb.Key('Agenda', 'default_agenda')
tasks_query = Task.query(ancestor=DEFAULT_TASKS_ANCESTOR)
But now I have questions...
Why it worked? Since those task entities don't share the same namespace, how could I query data from another namespace? Even if I query for an ancestor from another namespace by mistake should I get the data?
Is DEFAULT_TASKS_ANCESTOR from namespaceA the same ancestor as DEFAULT_TASKS_ANCESTOR from the namespaceB, or are they two completely different ancestors?
Do namespaces really compartmentalize data in the datastore or is that not the case?
If ancestors play that much of a big role even in namespaces, should I still use namespaces or ancestors to compartmentalize data in a multitenant application?
Thank you in advance!
I think your problem was defining the DEFAULT_TASKS_ANCESTOR constant outside of the requests. In that case, datastore used the default namespace to create this key. So by querying with that ancestor key, you used the same namespace.
From Google's datastore documentation:
By default, the datastore uses the current namespace setting in the namespace manager for datastore requests. The API applies this current namespace to Key or Query objects when they are created. Therefore, you need to be careful if an application stores Key or Query objects in serialized forms, since the namespace is preserved in those serializations.
As you have found out, it worked when you defined the ancestor key inside the requests. You can set the namespace for a particular query in the constructor call (i.e. ndb.Query(namespace='1234')), so you can query entities from a different namespace.
For your other questions:
These are in my understanding different keys, since they include different namespaces.
To make a hard compartmentalization, you could check on the server side whether the namespace used for a query is equal to the userID of the current user.
It really depends on the overall structure of the application. IMHO, ancestor keys are more relevant with regard to strong consistency. Think of the situation where a user might have several todo-lists. He should be able to see them all, but strong consistency may only be required at the list level.
I am interested in adding a spell checker to my app -- I'm planning on using difflib with a custom word list that's ~147kB large (13,025 words).
When testing user queries against this list, would it make more sense to:
load the dictionary into memcache (I guess from the datastore?) and keep it in memcache or
build an index for the Search API and pass in the user query against it there
I guess what I'm asking is which is faster: memcache or a search index?
Thanks.
Memcache is definitely faster.
Another important consideration is cost. Memcache API calls are free, while Search API calls have their own quota and pricing.
By the way, you may store your library as a static file, because it's small and it does not change. There is no need to store it in the Datastore.
Memcache is faster however you need to consider the following.
it is not reliable, at any moment entities can be purged. So your code needs a fallback for non cached data
You can only fetch by key, so as you said you would need to store whole dictionaries in memcache objects.
Each memcache entity can only store 1MB. If you dictionary is larger you would have to span multiple entities. Ok not relevant in your case.
There are some other alternatives. How often will the dictionary be updated ?
Here is one alternate strategy.
You could store it in the filesystem (requires app updates) or GCS if you want to update the dictionary outside of app updates. Then you can load the dictionary in each instance into memory at startup or on first request and cache it at the running instance level, then you won't have any round trips to services adding latencies. This will also be simpler code wise (ie no fallbacks if not in memcache etc)
Here is an example. In this case the code lives in a module, which is imported as required. I am using a yaml file for additional configuration, it could just as easily json load a dictionary, or you could define a python dictionary in the module.
_modsettings = {}
def loadSettings(settings='settings.yaml'):
x= _modsettings
if not x:
try:
_modsettings.update(load(open(settings,'r').read()))
except IOError:
pass
return _modsettings
settings = loadSettings()
Then whenever I want the settings dictionary my code just refers to mymodule.settings.
By importing this module during a warmup request you won't get a race condition, or have to import/parse the dictionary during a user facing request. You can put in more error traps as appropriate ;-)
I'm designing a multitenant system using namespaces.
Users authenticate via OpenID, and a User model is saved in Cloud Datastore. Users will be grouped into Organizations, also modeled in the database. Application data needs to be partitioned by organization.
So the idea is to map namespaces to "Organizations".
When a user logs in, their Organization is looked up and saved in a session.
WSGI middleware inspects the session and sets the namespace accordingly.
My question relates to how best to manage switching between data that is "global" (i.e. User and Organization) and application data (namespaced by organization)
My current approach is to use python decorators and context managers to temporarily switch to the global namepace for operations that access such global data. e.g.
standard_datastore_op()
with global_namespace():
org = Organization.query(Organization.key=org_key)
another_standard_datastore_op(org.name)
or
#global_namespace
def process_login(user_id):
user = User.get_by_id(user_id)
This also implies that models have cross-namespace KeyProperties:
class DomainData(ndb.Model): # in the current user's namespace
title = ndb.StringProperty()
foreign_org = ndb.KeyProperty(Organization) #in the "global" namespace
Does this seem a reasonable approach? It feels a little fragile to me, but I suspect that is because I'm new to working with namespaces in App Engine. My alternative idea is to extract all "global" data from Cloud Datastore into external web-service but I'd rather avoid that if possible.
Advice gratefully received. Thanks in advance
Decorators is a perfectly fine approach, which also has the benefit of clearly labeling which functions operate outside of organization-specific namespace bounds.
def global_namespace(global_namespace_function):
def wrapper():
# Save the current namespace.
previous_namespace = namespace_manager.get_namespace()
try:
# Empty string = default namespace, change to whatever you want to use as 'global'
global_namespace = ''
# Switch to 'global' namespace
namespace_manager.set_namespace(global_namespace)
# Run code that requires global namespace
global_namespace_function()
finally:
# Restore the saved namespace.
namespace_manager.set_namespace(previous_namespace)
return wrapper
On a related note, we also have documentation on using namespaces for multitenenacy.
I have a simple GAE system that contains models for Account, Project and Transaction.
I am using Django to generate a web page that has a list of Projects in a table that belong to a given Account and I want to create a link to each project's details page. I am generating a link that converts the Project's key to string and includes that in the link to make it easy to lookup the Project object. This gives a link that looks like this:
My Project Name
Is it secure to create links like this? Is there a better way? It feels like a bad way to keep context.
The key string shows up in the linked page and is ugly. Is there a way to avoid showing it?
Thanks.
There is few examples, in GAE docs, that uses same approach, and also Key are using characters safe for including in URLs. So, probably, there is no problem.
BTW, I prefer to use numeric ID (obj_key.id()), when my model uses number as identifier, just because it's looks not so ugly.
Whether or not this is 'secure' depends on what you mean by that, and how you implement your app. Let's back off a bit and see exactly what's stored in a Key object. Take your key, go to shell.appspot.com, and enter the following:
db.Key(your_key)
this returns something like the following:
datastore_types.Key.from_path(u'TestKind', 1234, _app=u'shell')
As you can see, the key contains the App ID, the kind name, and the ID or name (along with the kind/id pairs of any parent entities - in this case, none). Nothing here you should be particularly concerned about concealing, so there shouldn't be any significant risk of information leakage here.
You mention as a concern that users could guess other URLs - that's certainly possible, since they could decode the key, modify the ID or name, and re-encode the key. If your security model relies on them not guessing other URLs, though, you might want to do one of a couple of things:
Reconsider your app's security model. You shouldn't rely on 'secret URLs' for any degree of real security if you can avoid it.
Use a key name, and set it to a long, random string that users will not be able to guess.
A final concern is what else users could modify. If you handle keys by passing them to db.get, the user could change the kind name, and cause you to fetch a different entity kind to that which you intended. If that entity kind happens to have similarly named fields, you might do things to the entity (such as revealing data from it) that you did not intend. You can avoid this by passing the key to YourModel.get instead, which will check the key is of the correct kind before fetching it.
All this said, though, a better approach is to pass the key ID or name around. You can extract this by calling .id() on the key object (for an ID - .name() if you're using key names), and you can reconstruct the original key with db.Key.from_path('kind_name', id) - or just fetch the entity directly with YourModel.get_by_id.
After doing some more research, I think I can now answer my own question. I wanted to know if using GAE keys or ids was inherently unsafe.
It is, in fact, unsafe without some additional code, since a user could modify URLs in the returned webpage or visit URL that they build manually. This would potentially let an authenticated user edit another user's data just by changing a key Id in a URL.
So for every resource that you allow access to, you need to ensure that the currently authenticated user has the right to be accessing it in the way they are attempting.
This involves writing extra queries for each operation, since it seems there is no built-in way to just say "Users only have access to objects that are owned by them".
I know this is an old post, but i want to clarify one thing. Sometimes you NEED to work with KEYs.
When you have an entity with a #Parent relationship, you cant get it by its ID, you need to use the whole KEY to get it back form the Datastore. In these cases you need to work with the KEY all the time if you want to retrieve your entity.
They aren't simply increasing; I only have 10 entries in my Datastore and I've already reached 7001.
As long as there is some form of protection so users can't simply guess them, there is no reason not to do it.