Serialize an entity key to a string in Python for GAE

Serialize an entity key to a string in Python for GAE - python

In the Java low-level API, there is a way to turn an entity key into a string so you can pass it around to a client via JSON if you want. Is there a way to do this for python?

Depending on whether you use keynames or not, obj.key().name() or obj.key().id() can be used to retrieve keyname or ID, respectively. Neither of those contain name of the entity class so they are not sufficient to retrieve the original object from datastore. Granted, in most cases you usually know the entity kind when working with it, so that not a problem.
A universal solution, working in both cases (keynames or not) is obj.key().id_or_name(). This way you can retrieve the original object as follows:
from google.appengine.ext import db
#...
obj_key = db.Key.from_path('EntityClass', id_or_name)
obj = db.get(obj_key)
If you don't mind passing the long, cryptic string that also containts some extra data (like name of your GAE app), you can use the string representation of the key (str(obj.key()) and pass it directly to db.get for retrieving the object.

str(entity.key()) will return a base64-encoded representation of the key.
entity.key().name() or entity.key().id() will return just the name or ID, omitting the kind and the ancestry.

better:
string_key = entity.key().urlsafe()
and after you can decode de key with
key = ndb.Key(urlsafe=string_key)

You should be able to do:
entity.key().name()
This should return the string representation of the key. See here
Is that what you're looking for?

Related

Cannot deserialize properly a response using pymongo

I was using an API that were written in NodeJS, but for some reasons I had to re-write the code in python. The thing is that the database is in MongoDB, and that the response to all the queries (with large results) includes a de-serialized version with$id
as, for example {$oid: "f54h5b4jrhnf}.
this object id representation with the nested $iod
instead of just the plain string that Node use to return, is messing with the front end, and I haven't been able to find a way to get just the string, rather than this nested object (other than iterate in every single document and extract the id string) without also changing the way the front end treat the response
is there a solution to get a json response of the shape [{"_id":"63693f438cdbc3adb5286508", etc...}
?
I tried using pymongo and mongoengine, both seems unable to de-serialize in a simple way

You have several options (more than mentioned below).
MongoDB
In a MongoDB query, you could project/convert all ObjectIds to a string using "$toString".
Python
Iterate, like you mention in your question.
--OR--
You could also define/use a custom pymongo TypeRegistry class with a custom TypeDecoder and use it with a collection's CodecOptions so that every ObjectId read from a collection is automatically decoded as a string.
Here's how I did it with a toy database/collection.
from bson.objectid import ObjectId
from bson.codec_options import TypeDecoder
class myObjectIdDecoder(TypeDecoder):
bson_type = ObjectId
def transform_bson(self, value):
return str(value)
from bson.codec_options import TypeRegistry
type_registry = TypeRegistry([myObjectIdDecoder()])
from bson.codec_options import CodecOptions
codec_options = CodecOptions(type_registry=type_registry)
collection = db.get_collection('geojson', codec_options=codec_options)
# the geojson collection _id field values have type ObjectId
# but because of the custom CodecOptions/TypeRegistry/TypeDecoder
# all ObjectId's are decoded as strings for python
collection.find_one()["_id"]
# returns '62ae621406926107b33b523c' I.e., just a string.

What is the "Entity" part of the Datastore query results list?

From the Google docs, one can store Datastore query results like so:
from google.cloud import datastore
client = datastore.Client()
query = client.query()
results = list(query.fetch())
Which returns e.g.:
[<Entity('my_kind', '00001') {'my_property_01': 'abc', 'my_property_02': 'xyz'}>]
My question, which might be more related to Python than to Google Datastore, is what's going on with this list? It's enclosed in angle brackets, and has no separator between the round brackets and curly brackets. Is this a normal list, and does this format have much use in Python coding?

results is a normal list. However, since you only have one result you are only seeing one entity with the key ('my_kind', '0001'), and properties of my_property_01 & my_property_02. You are probably surprised in how the Entity class overrides the __str__ method to describe the entity this way.

Python-Eve: Prevent inserting duplicates without using unique fields

I am trying to prevent inserting duplicate documents by the following approach:
Get a list of all documents from the desired endpoint which will contain all the documents in JSON-format. This list is called available_docs.
Use a pre_POST_<endpoint> hook in order to handle the request before inserting to the data. I am not using the on_insert hook since I need to do this before validation.
Since we can access the request object use request.json to get the payload JSON-formatted
Check if request.json is already contained in available_docs
Insert new document if it's not a duplicate only, abort otherwise.
Using this approach I got the following snippet:
def check_duplicate(request):
if not request.json in available_sims:
print('Not a duplicate')
else:
print('Duplicate')
flask.abort(422, description='Document is a duplicate and already in database.')
The available_docs list looks like this:
available_docs = [{'foo': ObjectId('565e12c58b724d7884cd02bb'), 'bar': [ObjectId('565e12c58b724d7884cd02b9'), ObjectId('565e12c58b724d7884cd02ba')]}]
The payload request.json looks like this:
{'foo': '565e12c58b724d7884cd02bb', 'bar': ['565e12c58b724d7884cd02b9', '565e12c58b724d7884cd02ba']}
As you can see, the only difference between the document which was passed to the API and the document already stored in the DB is the datatype of the IDs. Due to that fact, the if-statement in my above snippet evaluates to True and judges the document to be inserted not being a duplicate whereas it definitely is a duplicate.
Is there a way to check if a passed document is already in the database? I am not able to use unique fields since the combination of all document fields needs to be unique only. There is an unique identifier (which I left out in this example), but this is not suitable for the desired comparison since it is kind of a time stamp.
I think something like casting the given IDs at the keys foo and bar as ObjectIDs would do the trick, but I do not know how to to this since I do not know where to get the datatype ObjectID from.

You approach would be much slower than setting a unique rule for the field.
Since, from your example, you are going to compare objectids, can't you simply use those as the _id field for the collection? In Mongo (and Eve of course) that field is unique by default. Actually, you typically don't even define it. You would not need to do anything at all, as a POST of a document with an already existing id would fail right away.
If you can't go that way (maybe you need to compare a different objectid field and still, for some reason, you can't simply set a unique rule for the field), I would look at querying the db for the field value instead than getting all the documents from the db and then scanning them sequentially in code. Something like db.find({db_field: new_document_field_value}). If that returns true, new document is a duplicate. Make sure db_field is indexed (which usually holds true also for fields tagged with unique rule)
EDIT after the comments. A trivial implementation would probable be something like this:
def pre_POST_callback(resource, request):
# retrieve mongodb collection using eve connection
docs = app.data.driver.db['docs']
if docs.find_one({'foo': <value>}):
flask.abort(422, description='Document is a duplicate and already in database.')
app = Eve()
app.run()

Here's my approach on preventing duplicate records:
def on_insert_subscription(items):
c_subscription = app.data.driver.db['subscription']
user = decode_token()
if user:
for item in items:
if c_subscription.find_one({
'topic': ObjectId(item['topic']),
'client': ObjectId(user['user_id'])
}):
abort(422, description="Client already subscribed to this topic")
else:
item['client'] = ObjectId(user['user_id'])
else:
abort(401, description='Please provide proper credentials')
What I'm doing here is creating subscriptions for clients. If a client is already subscribed to a topic I throw 422.
Note: the client ID is decoded from the JWT token.

Store dictionary in database

I create a Berkeley database, and operate with it using bsddb module. And I need to store there information in a style, like this:
username = '....'
notes = {'name_of_note1':{
'password':'...',
'comments':'...',
'title':'...'
}
'name_of_note2':{
#keys same as previous, but another values
}
}
This is how I open database
db = bsddb.btopen['data.db','c']
How do I do that ?

So, first, I guess you should open your database using parentheses:
db = bsddb.btopen('data.db','c')
Keep in mind that Berkeley's pattern is key -> value, where both key and value are string objects (not unicode). The best way in your case would be to use:
db[str(username)] = json.dumps(notes)
since your notes are compatible with the json syntax.
However, this is not a very good choice, say, if you want to query only usernames' comments. You should use a relational database, such as sqlite, which is also built-in in Python.

A simple solution was described by #Falvian.
For a start there is a column pattern in ordered key/value store. So the key/value pattern is not the only one.
I think that bsddb is viable solution when you don't want to rely on sqlite. The first approach is to create a documents = bsddb.btopen['documents.db','c'] and store inside json values. Regarding the keys you have several options:
Name the keys yourself, like you do "name_of_note_1", "name_of_note_2"
Generate random identifiers using uuid.uuid4 (don't forget to check it's not already used ;)
Or use a row inside this documents with key=0 to store a counter that you will use to create uids (unique identifiers).
If you use integers don't forget to pack them with lambda x: struct.pack('>q', uid) before storing them.
If you need to create index. I recommend you to have a look at my other answer introducting composite keys to build index in bsddb.

Create dictionary of a sqlalchemy query object in Pyramid

I am new to Python and Pyramid. In a test application I am using to learn more about Pyramid, I want to query a database and create a dictionary based on the results of a sqlalchemy query object and finally send the dictionary to the chameleon template.
So far I have the following code (which works fine), but I wanted to know if there is a better way to create my dictionary.
...
index = 0
clients = {}
q = self.request.params['q']
for client in DBSession.query(Client).filter(Client.name.like('%%%s%%' % q)).all():
clients[index] = { "id": client.id, "name": client.name }
index += 1
output = { "clients": clients }
return output
While learning Python, I found a nice way to create a list in a for loop statement like the following:
myvar = [user.name for user in users]
So, the other question I had: is there a similar 'one line' way like the above to create a dictionary of a sqlalchemy query object?
Thanks in advance.

well, yes, we can tighten this up a bit.
First, this pattern:
index = 0
for item in seq:
frobnicate(index, item)
item += 1
is common enough that there's a builtin function that does it automatically, enumerate(), used like this:
for index, item in enumerate(seq):
frobnicate(index, item)
but, I'm not sure you need it, Associating things with an integer index starting from zero is the functionality of a list, you don't really need a dict for that; unless you want to have holes, or need some of the other special features of dicts, just do:
stuff = []
stuff.extend(seq)
when you're only interested in a small subset of the attributes of a database entity, it's a good idea to tell sqlalchemy to emit a query that returns only that:
query = DBSession.query(Client.id, Client.name) \
.filter(q in Client.name)
In the above i've also shortened the .name.like('%%%s%%' % q) into just q in name since they mean the same thing (sqlalchemy expands it into the correct LIKE expression for you)
Queries constructed in this way return a special thing that looks like a tuple, and can be easily turned into a dict by calling _asdict() on it:
so to put it all together
output = [row._asdict() for row in DBSession.query(Client.id, Client.name)
.filter(q in Client.name)]
or, if you really desperately need it to be a dict, you can use a dict comprehension:
output = {index: row._asdict()
for index, row
in enumerate(DBSession.query(Client.id, Client.name)
.filter(q in Client.name))}

#TokenMacGuy gave a nice and detailed answer to your question. However, I have a feeling you've asked a wrong question :)
You don't need to convert SQLALchemy objects to dictionaries before passing them to the template - that would be quite inconvenient. You can pass the result of a query as is and directly use SQLALchemy mapped objects in your template
q = self.request.params['q']
clients = DBSession.query(Client).filter(q in Client.name).all()
return {'clients': clients}

If you want to turn a SqlAlchemy object into a dict, you can use this code:
def obj_to_dict(obj):
return dict((col.name, getattr(obj, col.name)) for col in sqlalchemy_orm.class_mapper(obj.__class__).mapped_table.c)
there is another attribute of the mapped table that has the relationships in it , but the code gets dicey.
you don't need to cast an object into a dict for any of the template libraries, but if you decide to persist the data ( memcached, session, pickle, etc ) you'll either need to use dicts or write some code to 'merge' the persisted data back into the session.
a quick side note- if you render any of this data through json , you'll either need to have a custom json renderer that can handle datetime objects , or change the values in a function.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.