Python, gae, ndb - get all keys in a kind

Python, gae, ndb - get all keys in a kind - python

I know how to get all entities by a key using Book.get_by_id(key)
where Book is an ndb.Model.
How do I get all the keys within my Kind?
Is it using fetch()(https://cloud.google.com/appengine/docs/python/ndb/queryclass#Query_fetch) ?
I don't want to get the keys/IDs from a given entity or some value. Just retrieve all the available keys, so I could retrieve their respectful entities and display it all to the user

If you only want the keys, use the keys_only keyword in the fetch() method:
Book.query().fetch(keys_only=True)
Then you can fetch all the entities using ndb.get_multi(keys). According to Guido, this may be more efficient than returning the entities in the query (if the entities are already in the cache).

With all_books = Book.query().fetch() the all_books variable will now have every entity of your Book model.
Note though that when you have lots of entities in the Book model - it won't be a good idea to load&show them all at once. You will need some kind of pagination implementation (depending on what exactly you're doing) - otherwise your pages will load forever which will create a bad experience for your users.
Read more at https://cloud.google.com/appengine/docs/python/ndb/queries

If you only wish to get all keys just use
entity.query().fetch(key_only=True)
which will return a list of all keys in that entity group. If you wanna get the IDs and not keys you can also use:
map(lambda key: key.id(), entity.query().fetch(key_only=True))

Related

how to get all the key from one bucket in couchbase?

Using Python SDK, could not find how to get all the keys from one bucket
in couchbase.
Docs reference:
http://docs.couchbase.com/sdk-api/couchbase-python-client-2.2.0/api/couchbase.html#item-api-methods
https://github.com/couchbase/couchbase-python-client/tree/master/examples
https://stackoverflow.com/questions/27040667/how-to-get-all-keys-from-couchbase
Is there a simple way to get all the keys ?

I'm a little concerned as to why you would want every single key. The number of documents can get very large, and I can't think of a good reason to want every single key.
That being said, here are a couple of ways to do it in Couchbase:
N1QL. First, create a primary index (CREATE PRIMARY INDEX ON bucketname), then select the keys: SELECT META().id FROM bucketname; In Python, you can use N1QLQuery and N1QLRequest to execute these.
Create a map/reduce view index. Literally the default map function when you create a new map/reduce view index is exactly that: function (doc, meta) { emit(meta.id, null); }. In Python, use the View class.
You don't need Python to do these things, by the way, but you can use it if you'd like. Check out the documentation for the Couchbase Python SDK for more information.

I'm a little concerned as to why you would want every single key. The number of documents can get very large, and I can't think of a good reason to want every single key.
There is a document for every customer with the key being the username for the customer. That username is only held as a one-way hash (along with the password) for authentication. It is not stored in its original form or in a form from which the original can be recovered. It's not feasible to ask the 100 million customers to provide their userids. This came from an actual customer on #seteam.

When an entry is deleted from the datastore, is its corresponding search document also deleted?

I am using Google App Engine's Search API to index entities from the Datastore. After I create or modify an object, I have to add it to the Search index. I do this by creating a add_to_search_index method for each model whose entities are indexed, for example:
class Location(ndb.Model):
...
def add_to_search_index(self):
fields = [
search.TextField(name="name", value=self.name),
search.GeoField(name="location", value= search.GeoPoint(self.location.lat, self.location.lon)),
]
document = search.Document(doc_id=str(self.key.id()), fields=fields)
index = search.Index(name='Location_index')
index.put(document)
Does the search API automatically maintain any correspondence between indexed documents and datastore entities?
I suspect they are not, meaning that the Search API will maintain deleted, obsolete entities in its index. If that's the case, then I suppose the best approach would be to use the NDB hook methods to create a remove_from_search_index method that is called before put (for edits/updates) and delete. Please advise if there is a better solution for maintaining correspondence between the datastore and search indices.

Since the datastore (NDB) and the search API are separate back ends they are to be maintained separately. I see you're using the key.id() as the document id. You can use this document id to get a document or to delete it. Maintaining the creation of the search document can be done in the model's _post_put_hook and _post_delete_hook. You may also use the repository pattern to do this. How you do this is up to you.
index = search.Index(name='Location_index')
index.delete([doc_id])

appengine: how can I check if a property from an entity exists in the datastore?

I know it is not possible to query the datastore for missing values (see this question).
What about from python code? Is it possible to check if the value from an entity property comes from the datastore or from the default value?
Use case:
Model Kind_X has 1000 entities. For the property Kind_X.my_property.
500 entities do not have my_property
400 entities my_property is None
100 entities are other values
I would like to set my_property to ABC only for those 500 entities that do not have the property. The 400 entities that have the value None can not be modified.
Note: setting my_property default as ABC is not an acceptable solution.

It's not possible to do this using the high-level ext.db framework. You could retrieve data using the lower level google.appengine.api.datastore framework (documentation is in the docstrings).
Why do you need to distinguish these two cases? It may be that there's a better approach.

You could iterate over all the entities of a given kind and check it programmatically with:
entities = Model.all()
for entity in entities :
if not entity.newproperty :
print "Hey, this entity is missing something"
If the number of entities is big, you should use the mapreduce library to avoid timeout.

If you don't have lots of data you could do a map reduce and store the keys of the entities you want in a new model that only has a ListProperty holding the keys.
It's kind of a dirty hack and works only for less thatn 5k entities. It will also creates lots of metadata so be careful

from google.appengine.api import datastore
entity_key = 'ag1lbmdlbG1pbmFzd2VicgoLEgRVc2VyGGIM'
entity = datastore.Get(entity_key)
print 'my_property' in entity

Is it safe to pass Google App Engine Entity Keys into web pages to maintain context?

I have a simple GAE system that contains models for Account, Project and Transaction.
I am using Django to generate a web page that has a list of Projects in a table that belong to a given Account and I want to create a link to each project's details page. I am generating a link that converts the Project's key to string and includes that in the link to make it easy to lookup the Project object. This gives a link that looks like this:
My Project Name
Is it secure to create links like this? Is there a better way? It feels like a bad way to keep context.
The key string shows up in the linked page and is ugly. Is there a way to avoid showing it?
Thanks.

There is few examples, in GAE docs, that uses same approach, and also Key are using characters safe for including in URLs. So, probably, there is no problem.
BTW, I prefer to use numeric ID (obj_key.id()), when my model uses number as identifier, just because it's looks not so ugly.

Whether or not this is 'secure' depends on what you mean by that, and how you implement your app. Let's back off a bit and see exactly what's stored in a Key object. Take your key, go to shell.appspot.com, and enter the following:
db.Key(your_key)
this returns something like the following:
datastore_types.Key.from_path(u'TestKind', 1234, _app=u'shell')
As you can see, the key contains the App ID, the kind name, and the ID or name (along with the kind/id pairs of any parent entities - in this case, none). Nothing here you should be particularly concerned about concealing, so there shouldn't be any significant risk of information leakage here.
You mention as a concern that users could guess other URLs - that's certainly possible, since they could decode the key, modify the ID or name, and re-encode the key. If your security model relies on them not guessing other URLs, though, you might want to do one of a couple of things:
Reconsider your app's security model. You shouldn't rely on 'secret URLs' for any degree of real security if you can avoid it.
Use a key name, and set it to a long, random string that users will not be able to guess.
A final concern is what else users could modify. If you handle keys by passing them to db.get, the user could change the kind name, and cause you to fetch a different entity kind to that which you intended. If that entity kind happens to have similarly named fields, you might do things to the entity (such as revealing data from it) that you did not intend. You can avoid this by passing the key to YourModel.get instead, which will check the key is of the correct kind before fetching it.
All this said, though, a better approach is to pass the key ID or name around. You can extract this by calling .id() on the key object (for an ID - .name() if you're using key names), and you can reconstruct the original key with db.Key.from_path('kind_name', id) - or just fetch the entity directly with YourModel.get_by_id.

After doing some more research, I think I can now answer my own question. I wanted to know if using GAE keys or ids was inherently unsafe.
It is, in fact, unsafe without some additional code, since a user could modify URLs in the returned webpage or visit URL that they build manually. This would potentially let an authenticated user edit another user's data just by changing a key Id in a URL.
So for every resource that you allow access to, you need to ensure that the currently authenticated user has the right to be accessing it in the way they are attempting.
This involves writing extra queries for each operation, since it seems there is no built-in way to just say "Users only have access to objects that are owned by them".

I know this is an old post, but i want to clarify one thing. Sometimes you NEED to work with KEYs.
When you have an entity with a #Parent relationship, you cant get it by its ID, you need to use the whole KEY to get it back form the Datastore. In these cases you need to work with the KEY all the time if you want to retrieve your entity.

They aren't simply increasing; I only have 10 entries in my Datastore and I've already reached 7001.
As long as there is some form of protection so users can't simply guess them, there is no reason not to do it.

App Engine Django Form Uniqueness Validation?

Is there a simpler way to use uniqueness validation with Django Forms in AppEngine?
I understand that performance would be problem if we keep an uniqueness constraint but since the amount of data being added is very small performance is not a big concern, rather development time is a concern here.
Any help is appreciated.

You can use keys for uniqueness:
The complete key of an entity,
including the path, the kind and the
name or numeric ID, is unique and
specific to that entity. The complete
key is assigned when the entity is
created in the datastore, and none of
its parts can change...
Every entity has an identifier. An
application can assign its own
identifier for use in the key by
giving the instance constructor a
key_name argument (a str value):
s = Story(key_name="xzy123")
...Once the entity has been created, its
ID or name cannot be changed.
EDIT
As jbochi noted, this could be dangerous and you could loss data. Another way to achieve the same is using an hash function with shard counters. A good example is showed in "Paging through large datasets" article.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.