Cannot deserialize properly a response using pymongo - python

I was using an API that were written in NodeJS, but for some reasons I had to re-write the code in python. The thing is that the database is in MongoDB, and that the response to all the queries (with large results) includes a de-serialized version with$id
as, for example {$oid: "f54h5b4jrhnf}.
this object id representation with the nested $iod
instead of just the plain string that Node use to return, is messing with the front end, and I haven't been able to find a way to get just the string, rather than this nested object (other than iterate in every single document and extract the id string) without also changing the way the front end treat the response
is there a solution to get a json response of the shape [{"_id":"63693f438cdbc3adb5286508", etc...}
?
I tried using pymongo and mongoengine, both seems unable to de-serialize in a simple way

You have several options (more than mentioned below).
MongoDB
In a MongoDB query, you could project/convert all ObjectIds to a string using "$toString".
Python
Iterate, like you mention in your question.
--OR--
You could also define/use a custom pymongo TypeRegistry class with a custom TypeDecoder and use it with a collection's CodecOptions so that every ObjectId read from a collection is automatically decoded as a string.
Here's how I did it with a toy database/collection.
from bson.objectid import ObjectId
from bson.codec_options import TypeDecoder
class myObjectIdDecoder(TypeDecoder):
bson_type = ObjectId
def transform_bson(self, value):
return str(value)
from bson.codec_options import TypeRegistry
type_registry = TypeRegistry([myObjectIdDecoder()])
from bson.codec_options import CodecOptions
codec_options = CodecOptions(type_registry=type_registry)
collection = db.get_collection('geojson', codec_options=codec_options)
# the geojson collection _id field values have type ObjectId
# but because of the custom CodecOptions/TypeRegistry/TypeDecoder
# all ObjectId's are decoded as strings for python
collection.find_one()["_id"]
# returns '62ae621406926107b33b523c' I.e., just a string.

Related

JSON issue with MongoDB ObjectId

As you know MongoDB documents has at least one ObjectId named _id. It's not possible to convert a document contains an ObjectId to JSON. currently I have two solutions to convert this document to JSON:
del doc['_id']
or create a new document with a string instance of that field.
What it just works when I know which field contains ObjectId. What to do if I have multiple ObjectId and I don't know what are they?
MongoDB returns a BSON (not a JSON) document, so actually you want to convert a BSON document into JSON document.
Try to take a look into this artickle: https://technobeans.com/2012/09/10/mongodb-convert-bson-to-json/

Why does db.insert(dict) add _id key to the dict object while using pymongo

I am using pymongo in the following way:
from pymongo import *
a = {'key1':'value1'}
db1.collection1.insert(a)
print a
This prints
{'_id': ObjectId('53ad61aa06998f07cee687c3'), 'key1': 'value1'}
on the console.
I understand that _id is added to the mongo document. But why is this added to my python dictionary too? I did not intend to do this. I am wondering what is the purpose of this? I could be using this dictionary for other purposes to and the dictionary gets updated as a side effect of inserting it into the document? If I have to, say, serialise this dictionary into a json object, I will get a
ObjectId('53ad610106998f0772adc6cb') is not JSON serializable
error. Should not the insert function keep the value of the dictionary same while inserting the document in the db.
As many other database systems out there, Pymongo will add the unique identifier necessary to retrieve the data from the database as soon as it's inserted (what would happen if you insert two dictionaries with the same content {'key1':'value1'} in the database? How would you distinguish that you want this one and not that one?)
This is explained in the Pymongo docs:
When a document is inserted a special key, "_id", is automatically added if the document doesn’t already contain an "_id" key. The value of "_id" must be unique across the collection.
If you want to change this behavior, you could give the object an _id attribute before inserting. In my opinion, this is a bad idea. It would easily lead to collisions and you would lose juicy information that is stored in a "real" ObjectId, such as creation time, which is great for sorting and things like that.
>>> a = {'_id': 'hello', 'key1':'value1'}
>>> collection.insert(a)
'hello'
>>> collection.find_one({'_id': 'hello'})
{u'key1': u'value1', u'_id': u'hello'}
Or if your problem comes when serializing to Json, you can use the utilities in the BSON module:
>>> a = {'key1':'value1'}
>>> collection.insert(a)
ObjectId('53ad6d59867b2d0d15746b34')
>>> from bson import json_util
>>> json_util.dumps(collection.find_one({'_id': ObjectId('53ad6d59867b2d0d15746b34')}))
'{"key1": "value1", "_id": {"$oid": "53ad6d59867b2d0d15746b34"}}'
(you can verify that this is valid json in pages like jsonlint.com)
_id act as a primary key for documents, unlike SQL databases, its required in mongodb.
to make _id serializable, you have 2 options:
set _id to a JSON serializable datatype in your documents before inserting them (e.g. int, str) but keep in mind that it must be unique per document.
use a custom BSON serializion encoder/decoder classes:
from bson.json_util import default as bson_default
from bson.json_util import object_hook as bson_object_hook
class BSONJSONEncoder(json.JSONEncoder):
def default(self, o):
return bson_default(o)
class BSONJSONDecoder(json.JSONDecoder):
def __init__(self, **kwrgs):
JSONDecoder.__init__(self, object_hook=bson_object_hook)
as #BorrajaX answered already want to add some more.
_id is a unique identifier, when a document is inserted to the collection it generates with some random numbers. Either you can set your own id or you can use what MongoDB has created for you.
As documentation mentions about this.
For your case, you can simply ignore this key by using del keyword del a["_id"].
or
if you need _id for further operations you can use dumps from bson module.
import json
from bson.json_util import loads as bson_loads, dumps as bson_dumps
a["_id"]=json.loads(bson_dumps(a["_id"]))
or
before inserting document you can add your custom _id you won't need serialize your dictionary
a["_id"] = "some_id"
db1.collection1.insert(a)
This behavior can be circumvented by using the copy module. This will pass a copy of the dictionary to pymongo leaving the original intact. Based on the code snippet in your example, one should modifiy it like so:
import copy
from pymongo import *
a = {'key1':'value1'}
db1.collection1.insert(copy.copy(a))
print a
Clearly the docs answer your question
MongoDB stores documents on disk in the BSON serialization format. BSON is a binary representation of JSON documents, though it contains more data types than JSON.
The value of a field can be any of the BSON data types, including other documents, arrays, and arrays of documents. The following document contains values of varying types:
var mydoc = {
_id: ObjectId("5099803df3f4948bd2f98391"),
name: { first: "Alan", last: "Turing" },
birth: new Date('Jun 23, 1912'),
death: new Date('Jun 07, 1954'),
contribs: [ "Turing machine", "Turing test", "Turingery" ],
views : NumberLong(1250000)
}
to know more about BSON

pymongo SONManipulator find_one with few values populated

Using Custom Types in PyMongo, was able to insert records. My custom object has 5 fields. I need a find_one method which searches based on the fields populated in my custom object. Lets say I create a custom object and populate only 2 fields (unique values though). I need to search MongoDB using this object and manipulator. How do I achieve this? The example mentioned in above link uses find_one() without any argument.
I tried below but it did not return data
self.my_collection.find_one({'custom': custom})
#Of course I could specify the complete query. Below fetched the data.
self.my_collection.find_one({'custom.prop1': custom.prop1,'custom.prop2': custom.prop2})
As of now I have below work around to achieve the same:
def get_one_by_specified_properties(self, custom):
query_str_map = dict()
for attr in custom_object_properties_names_as_list():
if custom.__getattribute__(attr):
query_str_map.update({'custom.' + attr: custom.__getattribute__(attr)})
document = self.my_collection.find_one(query_str_map)
return document['custom'] if document else None

Why sqlalchemy add \ to " for a perfect JSON string to postgresql json field?

SQLAlchemy 0.9 added built-in support for the JSON data type of PostgreSQL. But when I defined an object mapper which has a JSON field and set its value to a perfect JSON string:
json = '{"HotCold":"Cold,"Value":"10C"}'
The database gets the data in the form:
"{\"HotCold\":\"Cold\",\"Value":\"10C\"}"
All internal double quotes are backslashed, but if I set JSON from a python dict:
json = {"HotCold": "Cold, "Value": "10C"}
I get the JSON data in the database as:
{"HotCold":"Cold,"Value":"10C"}
Why is that? Do I have to pass the data in dict form to make it compatible with SQLAlchemy JSON support?
The short answer: Yes, you have to.
The JSON type in SQLAlchemy is used to store a Python structure as JSON. It effectively does:
database_value = json.dumps(python_value)
on store, and uses
python_value = json.loads(database_value)
You stored a string, and that was turned into a JSON value. The fact that the string itself contained JSON was just a coincidence. Don't store JSON strings, store Python values that are JSON-serializable.
A quick demo to illustrate:
>>> print json.dumps({'foo': 'bar'})
{"foo": "bar"}
>>> print json.dumps('This is a "string" with quotes!')
"This is a \"string\" with quotes!"
Note how the second example has the exact same quoting applied.
Use the JSON SQLAlchemy type to store extra structured data on an object; PostgreSQL gives you access to the contents in SQL expressions on the server side, and SQLAlchemy gives you full access to the contents as Python values on the Python side.
Take into account you should always set the whole value anew on an object. Don't mutate a value inside of it and expect that SQLAlchemy detects the change automatically for you; see the PostgreSQL JSON type documentation.
Meh, but I don't want to do three round trips as in json.loads(), to pass to SQLAlchemy, which would then do json.dumps(), and then Postgres would do unmarshaling again.
So, instead I created a Metadata Table which specified the jsonb column type as Text. Now I take my json strings, and SQLALchemy passes them through and Postgres stores them as jsonb objects.
import sqlalchemy as sa
metadata = sa.MetaData()
rawlog = sa.Table('rawlog', metadata, sa.Column('document', sa.Text)
with create_engine("postgresql:///mydb") as engine:
with engine.acquire() as conn:
conn.execute(rawlog.insert().values(document=document)
Where document is a string, rather than a python object.
I ran into a similar scenario today:
after inserting new row with a JSONB field via SQLAlchemy, I checked PostgreSQL DB:
"jsonb_fld"
"""{\""addr\"": \""66 RIVERSIDE DR\"", \""state\"": \""CA\"", ...
Reviewing Python code, it sets JSONB field value like so:
row[some_jsonb_field] = json.dumps(some_dict)
after I took out the json.dumps(...) and simply do:
row[some_jsonb_field] = some_dict
everything looks better in DB: no more extra \ or ".
Once again I realized that Python and SQLAlchemy, in the case, already take
care of the minute details, such as json.dumps. Less code, more satisfaction.
I ran into the same problem! It seems that SQLAlchemy does its own json.dumps() internally, so this is what is happening:
>>> x={"a": '1'}
>>> json.dumps(x) [YOUR CODE]
'{"a": "1"}'
>>> json.dumps(json.dumps(x)) [SQLAlchemy applies json.dumps again]
'"{\\"a\\": \\"1\\"}"' [OUTPUT]
Instead, take out the json.dumps() from your code and you'll load the JSON you want.

Serialize an entity key to a string in Python for GAE

In the Java low-level API, there is a way to turn an entity key into a string so you can pass it around to a client via JSON if you want. Is there a way to do this for python?
Depending on whether you use keynames or not, obj.key().name() or obj.key().id() can be used to retrieve keyname or ID, respectively. Neither of those contain name of the entity class so they are not sufficient to retrieve the original object from datastore. Granted, in most cases you usually know the entity kind when working with it, so that not a problem.
A universal solution, working in both cases (keynames or not) is obj.key().id_or_name(). This way you can retrieve the original object as follows:
from google.appengine.ext import db
#...
obj_key = db.Key.from_path('EntityClass', id_or_name)
obj = db.get(obj_key)
If you don't mind passing the long, cryptic string that also containts some extra data (like name of your GAE app), you can use the string representation of the key (str(obj.key()) and pass it directly to db.get for retrieving the object.
str(entity.key()) will return a base64-encoded representation of the key.
entity.key().name() or entity.key().id() will return just the name or ID, omitting the kind and the ancestry.
better:
string_key = entity.key().urlsafe()
and after you can decode de key with
key = ndb.Key(urlsafe=string_key)
You should be able to do:
entity.key().name()
This should return the string representation of the key. See here
Is that what you're looking for?

Categories