JSON issue with MongoDB ObjectId - python

As you know MongoDB documents has at least one ObjectId named _id. It's not possible to convert a document contains an ObjectId to JSON. currently I have two solutions to convert this document to JSON:
del doc['_id']
or create a new document with a string instance of that field.
What it just works when I know which field contains ObjectId. What to do if I have multiple ObjectId and I don't know what are they?

MongoDB returns a BSON (not a JSON) document, so actually you want to convert a BSON document into JSON document.
Try to take a look into this artickle: https://technobeans.com/2012/09/10/mongodb-convert-bson-to-json/

Related

Cannot deserialize properly a response using pymongo

I was using an API that were written in NodeJS, but for some reasons I had to re-write the code in python. The thing is that the database is in MongoDB, and that the response to all the queries (with large results) includes a de-serialized version with$id
as, for example {$oid: "f54h5b4jrhnf}.
this object id representation with the nested $iod
instead of just the plain string that Node use to return, is messing with the front end, and I haven't been able to find a way to get just the string, rather than this nested object (other than iterate in every single document and extract the id string) without also changing the way the front end treat the response
is there a solution to get a json response of the shape [{"_id":"63693f438cdbc3adb5286508", etc...}
?
I tried using pymongo and mongoengine, both seems unable to de-serialize in a simple way
You have several options (more than mentioned below).
MongoDB
In a MongoDB query, you could project/convert all ObjectIds to a string using "$toString".
Python
Iterate, like you mention in your question.
--OR--
You could also define/use a custom pymongo TypeRegistry class with a custom TypeDecoder and use it with a collection's CodecOptions so that every ObjectId read from a collection is automatically decoded as a string.
Here's how I did it with a toy database/collection.
from bson.objectid import ObjectId
from bson.codec_options import TypeDecoder
class myObjectIdDecoder(TypeDecoder):
bson_type = ObjectId
def transform_bson(self, value):
return str(value)
from bson.codec_options import TypeRegistry
type_registry = TypeRegistry([myObjectIdDecoder()])
from bson.codec_options import CodecOptions
codec_options = CodecOptions(type_registry=type_registry)
collection = db.get_collection('geojson', codec_options=codec_options)
# the geojson collection _id field values have type ObjectId
# but because of the custom CodecOptions/TypeRegistry/TypeDecoder
# all ObjectId's are decoded as strings for python
collection.find_one()["_id"]
# returns '62ae621406926107b33b523c' I.e., just a string.

Can't delete mongodb document using pymongo

I'm trying to delete one specific document using (_id) with pymongo and i can't do it, some idea..
thks.
I have this code:
s = "ISODate('{0}')".format(nom_fitxer_clean)
#i generate the next string.. (ISODate('2018-11-07 00:00:00'))
myquery = { "_id": s }
#query string ({'_id': "ISODate('2018-10-07 00:00:00')"})
mycol.delete_one(myquery)
I do not get any errors or delete the document.
UPDATE:
Document
I think one possible solution could be to replace ISODate with ObjectId in your query string.
Moreover, delete_one deletes the first object which matches with your query. So it is possible that there exist multiple objects which match your query?

Why does db.insert(dict) add _id key to the dict object while using pymongo

I am using pymongo in the following way:
from pymongo import *
a = {'key1':'value1'}
db1.collection1.insert(a)
print a
This prints
{'_id': ObjectId('53ad61aa06998f07cee687c3'), 'key1': 'value1'}
on the console.
I understand that _id is added to the mongo document. But why is this added to my python dictionary too? I did not intend to do this. I am wondering what is the purpose of this? I could be using this dictionary for other purposes to and the dictionary gets updated as a side effect of inserting it into the document? If I have to, say, serialise this dictionary into a json object, I will get a
ObjectId('53ad610106998f0772adc6cb') is not JSON serializable
error. Should not the insert function keep the value of the dictionary same while inserting the document in the db.
As many other database systems out there, Pymongo will add the unique identifier necessary to retrieve the data from the database as soon as it's inserted (what would happen if you insert two dictionaries with the same content {'key1':'value1'} in the database? How would you distinguish that you want this one and not that one?)
This is explained in the Pymongo docs:
When a document is inserted a special key, "_id", is automatically added if the document doesn’t already contain an "_id" key. The value of "_id" must be unique across the collection.
If you want to change this behavior, you could give the object an _id attribute before inserting. In my opinion, this is a bad idea. It would easily lead to collisions and you would lose juicy information that is stored in a "real" ObjectId, such as creation time, which is great for sorting and things like that.
>>> a = {'_id': 'hello', 'key1':'value1'}
>>> collection.insert(a)
'hello'
>>> collection.find_one({'_id': 'hello'})
{u'key1': u'value1', u'_id': u'hello'}
Or if your problem comes when serializing to Json, you can use the utilities in the BSON module:
>>> a = {'key1':'value1'}
>>> collection.insert(a)
ObjectId('53ad6d59867b2d0d15746b34')
>>> from bson import json_util
>>> json_util.dumps(collection.find_one({'_id': ObjectId('53ad6d59867b2d0d15746b34')}))
'{"key1": "value1", "_id": {"$oid": "53ad6d59867b2d0d15746b34"}}'
(you can verify that this is valid json in pages like jsonlint.com)
_id act as a primary key for documents, unlike SQL databases, its required in mongodb.
to make _id serializable, you have 2 options:
set _id to a JSON serializable datatype in your documents before inserting them (e.g. int, str) but keep in mind that it must be unique per document.
use a custom BSON serializion encoder/decoder classes:
from bson.json_util import default as bson_default
from bson.json_util import object_hook as bson_object_hook
class BSONJSONEncoder(json.JSONEncoder):
def default(self, o):
return bson_default(o)
class BSONJSONDecoder(json.JSONDecoder):
def __init__(self, **kwrgs):
JSONDecoder.__init__(self, object_hook=bson_object_hook)
as #BorrajaX answered already want to add some more.
_id is a unique identifier, when a document is inserted to the collection it generates with some random numbers. Either you can set your own id or you can use what MongoDB has created for you.
As documentation mentions about this.
For your case, you can simply ignore this key by using del keyword del a["_id"].
or
if you need _id for further operations you can use dumps from bson module.
import json
from bson.json_util import loads as bson_loads, dumps as bson_dumps
a["_id"]=json.loads(bson_dumps(a["_id"]))
or
before inserting document you can add your custom _id you won't need serialize your dictionary
a["_id"] = "some_id"
db1.collection1.insert(a)
This behavior can be circumvented by using the copy module. This will pass a copy of the dictionary to pymongo leaving the original intact. Based on the code snippet in your example, one should modifiy it like so:
import copy
from pymongo import *
a = {'key1':'value1'}
db1.collection1.insert(copy.copy(a))
print a
Clearly the docs answer your question
MongoDB stores documents on disk in the BSON serialization format. BSON is a binary representation of JSON documents, though it contains more data types than JSON.
The value of a field can be any of the BSON data types, including other documents, arrays, and arrays of documents. The following document contains values of varying types:
var mydoc = {
_id: ObjectId("5099803df3f4948bd2f98391"),
name: { first: "Alan", last: "Turing" },
birth: new Date('Jun 23, 1912'),
death: new Date('Jun 07, 1954'),
contribs: [ "Turing machine", "Turing test", "Turingery" ],
views : NumberLong(1250000)
}
to know more about BSON

Why sqlalchemy add \ to " for a perfect JSON string to postgresql json field?

SQLAlchemy 0.9 added built-in support for the JSON data type of PostgreSQL. But when I defined an object mapper which has a JSON field and set its value to a perfect JSON string:
json = '{"HotCold":"Cold,"Value":"10C"}'
The database gets the data in the form:
"{\"HotCold\":\"Cold\",\"Value":\"10C\"}"
All internal double quotes are backslashed, but if I set JSON from a python dict:
json = {"HotCold": "Cold, "Value": "10C"}
I get the JSON data in the database as:
{"HotCold":"Cold,"Value":"10C"}
Why is that? Do I have to pass the data in dict form to make it compatible with SQLAlchemy JSON support?
The short answer: Yes, you have to.
The JSON type in SQLAlchemy is used to store a Python structure as JSON. It effectively does:
database_value = json.dumps(python_value)
on store, and uses
python_value = json.loads(database_value)
You stored a string, and that was turned into a JSON value. The fact that the string itself contained JSON was just a coincidence. Don't store JSON strings, store Python values that are JSON-serializable.
A quick demo to illustrate:
>>> print json.dumps({'foo': 'bar'})
{"foo": "bar"}
>>> print json.dumps('This is a "string" with quotes!')
"This is a \"string\" with quotes!"
Note how the second example has the exact same quoting applied.
Use the JSON SQLAlchemy type to store extra structured data on an object; PostgreSQL gives you access to the contents in SQL expressions on the server side, and SQLAlchemy gives you full access to the contents as Python values on the Python side.
Take into account you should always set the whole value anew on an object. Don't mutate a value inside of it and expect that SQLAlchemy detects the change automatically for you; see the PostgreSQL JSON type documentation.
Meh, but I don't want to do three round trips as in json.loads(), to pass to SQLAlchemy, which would then do json.dumps(), and then Postgres would do unmarshaling again.
So, instead I created a Metadata Table which specified the jsonb column type as Text. Now I take my json strings, and SQLALchemy passes them through and Postgres stores them as jsonb objects.
import sqlalchemy as sa
metadata = sa.MetaData()
rawlog = sa.Table('rawlog', metadata, sa.Column('document', sa.Text)
with create_engine("postgresql:///mydb") as engine:
with engine.acquire() as conn:
conn.execute(rawlog.insert().values(document=document)
Where document is a string, rather than a python object.
I ran into a similar scenario today:
after inserting new row with a JSONB field via SQLAlchemy, I checked PostgreSQL DB:
"jsonb_fld"
"""{\""addr\"": \""66 RIVERSIDE DR\"", \""state\"": \""CA\"", ...
Reviewing Python code, it sets JSONB field value like so:
row[some_jsonb_field] = json.dumps(some_dict)
after I took out the json.dumps(...) and simply do:
row[some_jsonb_field] = some_dict
everything looks better in DB: no more extra \ or ".
Once again I realized that Python and SQLAlchemy, in the case, already take
care of the minute details, such as json.dumps. Less code, more satisfaction.
I ran into the same problem! It seems that SQLAlchemy does its own json.dumps() internally, so this is what is happening:
>>> x={"a": '1'}
>>> json.dumps(x) [YOUR CODE]
'{"a": "1"}'
>>> json.dumps(json.dumps(x)) [SQLAlchemy applies json.dumps again]
'"{\\"a\\": \\"1\\"}"' [OUTPUT]
Instead, take out the json.dumps() from your code and you'll load the JSON you want.

pymongo saving embedded objectIds, InvalidDocumentError

Using the pymongo driver bare to connect python to mongodb, why is it that using an ObjectId instance as the key for an embedded document raises an InvalidDocument error?
I am trying to link documents using objectids and cant seem to understand why I would want to convert them to strings when the ones created automatically for the driver are ObjectId instances.
item = collection.find({'x':'foo'})
item['otherstuff'] = {pymongo.objectid.ObjectId() : 'data about this link'}
collection.update({'x':'foo'}, item)
bson.errors.InvalidDocument: documents must have only string keys, key was ObjectId('4f0b5d4e764df61c67000000')
In practice the linked ids represent documents that contain questions, and the values in the dictionary here keyed as 'otherstuff' for example would represent this individual document's responses to that particular question.
Is there a reason applying objectids like this won't encode into bson and then fails? Is it impossible to nest ObjectIds within documents like this to cross-reference? Have I misunderstood the purpose of them?
The BSON spec dictates that keys must be strings, so PyMongo is right to reject this as an invalid document (and would be regardless of at what level an ObjectId was used as a key, whether at the top level or in an embedded document). This is necessary, among other reasons, so that the query language can be unambiguous. Imagine you had this document (and that it were a valid BSON document):
{ _id: ...,
"4f0cbe6d7f40d36b24a5c4d7": true,
ObjectId("4f0cbe6d7f40d36b24a5c4d7"): false
}
And then you attempted to query with:
db.foo.find({"4f0cbe6d7f40d36b24a5c4d7": false})
Should this return this document? Should that string be auto-boxed into an ObjectId? How would Mongo know when that can be auto-boxed, and how to disambiguate in cases like this document?
A possible alternative solution to your problem is to have an array of embedded documents like:
{ answers: [
{ answer_id: ObjectId("..."), summary: "Good answer to this question" },
{ answer_id: ObjectId("..."), summary: "Bad answer to this question" }
]
}
This is valid BSON, and will also be indexable more efficiently. If you add an index on answers, you can search efficiently for exact matches on these subdocuments; if you add an index on answers.answer_id, then you can search efficiently by the ObjectId of the answer you're looking for (and so on).

Categories