Just wondering if there is a way to extend the FormatChecker that is passed to the jsonschema validator? I can't seem to find anything in the documentation.
Basically, I am trying to check if a string value is a valid timezone. I'm using pytz for the timezone side. But, I need to check the provided json string property is contained within that list.
The only other way I can think of is to extract the list as an enum field in the schema. But, it's a huge list and seems a pretty clunky way of doing it. Ideally, I'd like something like this:
from pytz import common_timezones
from jsonschema import validate, FormatChecker
timezone_checker = FormatChecker(formats=["timezone"])
timezone_checker.extend(check_timezone)
instance = { "timezone": "Australia/Sydney" }
schema = {
"properties": {
"timezone": {"type": "string", "format": "timezone"}
}
}
validate(instance=instance, schema=schema, format_checker=timezone_checker)
...
def check_timezone(p):
if not isinstance(p, str):
return False
return p in common_timezones
Thanks in advance.
The function to do so is called FormatChecker.checks.
I am trying to stream data from a mongoDB to Elasticsearch using both pymongo and the Python client elasticsearch.
I have set a mapping, I report here the snippet related to the field of interest:
"updated_at": {
"type": "date",
"format": "dateOptionalTime"
}
My script grabs each document from the MongoDB using pymongo and tries indexing it into Elasticsearch as
from elasticsearch import Elasticsearch
from pymongo import MongoClient
mongo_client = MongoClient('localhost', 27017)
es_client = Elasticsearch(hosts=[{"host": "localhost", "port": 9200}])
db = mongo_client['my_db']
collection = db['my_collection']
for doc in collection.find():
es_client.index(
index='index_name',
doc_type='my_type',
id=str(doc['_id']),
body=json.dumps(doc, default=json_util.default)
)
The problem I have in running it is:
elasticsearch.exceptions.RequestError: TransportError(400, u'MapperParsingException[failed to parse [updated_at]]; nested: ElasticsearchIllegalArgumentException[unknown property [$date]]; ')
I believe the source of the problem is in the fact that pymongo serializes the field updated_at as a datetime.datetime object, as I can see if I print the doc in the for loop:
u'updated_at': datetime.datetime(2014, 8, 31, 17, 18, 13, 17000)
This conflicts with Elasticsearch looking for an object of type date as specified in the mapping.
Any ideas how to solve this?
You're on the right path, your Python datetime needs to be serialized as an ISO 8601-compliant date string. So, you need to add a CustomEncoder in your json.dumps() call. First, declare your CustomEncoder as a subclass of JSONEncoder which will handle the transformation of datetime and time properties, but delegate the rest to its superclass:
class CustomEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.strftime('%Y-%m-%dT%H:%M:%S%z')
if isinstance(obj, time):
return obj.strftime('%H:%M:%S')
if hasattr(obj, 'to_json'):
return obj.to_json()
return super(CustomEncoder, self).default(obj)
And then you can use it in your json.dumps call, like this:
...
body=json.dumps(doc, default=json_util.default, cls=CustomEncoder)
...
I guess your problem is that you're using
body=json.dumps(doc, default=json_util.default)
but you should be using
body=doc
Doing that works for me, since it seems elasticsearch is caring for the aliasing of the dictionarly into a JSON document (of course, assuming doc is a dictionary, which I guess it is).
At least in the version of elasticsearch I'm using (2.x), datetime.datetime is correctly aliased, with no need of a mapping. For example, this works for me:
doc = {"updated_on": datetime.now(timezone.utc)}
res = es.index(index=es_index, doc_type='my_type',
id=1, body=doc)
And is recognized by Kibana as a date.
You can use:
from elasticsearch_dsl.serializer import serializer
serializer.dumps(your_dict)
Replace your_dict with your Document().prepare() or document.to_dict()
Making sure I timestamp to elastic using datetime.now(timezone.utc)
from datetime import datetime, timezone
doc = {
"timestamp": datetime.now(timezone.utc),
#the rest of your data
}
Solved the problem of the time having a strange drift on elastic search.
I've done some coding in RoR, and in Rails, when I return a JSON object via an API call, it returns as
{ "id" : "1", "name" : "Dan" }.
However in Python (with Flask and Flask-SQLAlchemy), when I return a JSON object via json.dumps or jsonpickle.encode it is returned as
"{ \"id\" : \"1\", \"name\": \"Dan\" }" which seems very unwieldily as it can't easily be parsed on the other end (by an iOS app in this case - Obj-C).
What am I missing here, and what should I do to return it as a JSON literal, rather than a JSON string?
This is what my code looks like:
people = models.UserRelationships.query.filter_by(user_id=user_id, active=ACTIVE_RECORD)
friends = people.filter_by(friends=YES)
json_object = jsonpickle.encode(friends.first().as_dict(), unpicklable=False, keys=True)
print(json_object) # this prints here, i.e. { "id" : "1", "name" : "Dan" }
return json_object # this returns "{ \"id\" : \"1\", \"name\": \"Dan\" }" to the browser
What is missing in your understanding here is that when you use the JSON modules in Python, you're not working with a JSON object. JSON is by definition just a string that matches a certain standard.
Lets say you have the string:
friends = '{"name": "Fred", "id": 1}'
If you want to work with this data in python, you will want to load it into a python object:
import json
friends_obj = json.loads(friends)
At this point friends_obj is a python dictionary.
If you want to convert it (or any other python dictionary or list) then this is where json.dumps comes in handy:
friends_str = json.dumps(friends_obj)
print friends_str
'{"name": "Fred", "id": 1}'
However if we attempt to "dump" the original friends string you'll see you get a different result:
dumped_str = json.dumps(friends)
print dumped_str
'"{\\"name\\": \\"Fred\\", \\"id\\": 1}"'
This is because you're basically attempting to encode an ordinary string as JSON and it is escaping the characters. I hope this helps make sense of things!
Cheers
Looks like you are using Django here, in which case do something like
from django.utils import simplejson as json
...
return HttpResponse(json.dumps(friends.first().as_dict()))
This is almost always a sign that you're double-encoding your data somewhere. For example:
>>> obj = { "id" : "1", "name" : "Dan" }
>>> j = json.dumps(obj)
>>> jj = json.dumps(j)
>>> print(obj)
{'id': '1', 'name': 'Dan'}
>>> print(j)
{"id": "1", "name": "Dan"}
>>> print(jj)
"{\"id\": \"1\", \"name\": \"Dan\"}"
Here, jj is a perfectly valid JSON string representation—but it's not a representation of obj, it's a representation of the string j, which is useless.
Normally you don't do this directly; instead, either you started with a JSON string rather than an object in the first place (e.g., you got it from a client request or from a text file), or you called some function in a library like requests or jsonpickle that implicitly calls json.dumps with an already-encoded string. But either way, it's the same problem, with the same solution: Just don't double-encode.
You should be using flask.jsonify, which will not only encode correctly, but also set the content-type headers accordingly.
people = models.UserRelationships.query.filter_by(user_id=user_id, active=ACTIVE_RECORD)
friends = people.filter_by(friends=YES)
return jsonify(friends.first().as_dict())
My response back from MongoDB after querying an aggregated function on document using Python, It returns valid response and i can print it but can not return it.
Error:
TypeError: ObjectId('51948e86c25f4b1d1c0d303c') is not JSON serializable
Print:
{'result': [{'_id': ObjectId('51948e86c25f4b1d1c0d303c'), 'api_calls_with_key': 4, 'api_calls_per_day': 0.375, 'api_calls_total': 6, 'api_calls_without_key': 2}], 'ok': 1.0}
But When i try to return:
TypeError: ObjectId('51948e86c25f4b1d1c0d303c') is not JSON serializable
It is RESTfull call:
#appv1.route('/v1/analytics')
def get_api_analytics():
# get handle to collections in MongoDB
statistics = sldb.statistics
objectid = ObjectId("51948e86c25f4b1d1c0d303c")
analytics = statistics.aggregate([
{'$match': {'owner': objectid}},
{'$project': {'owner': "$owner",
'api_calls_with_key': {'$cond': [{'$eq': ["$apikey", None]}, 0, 1]},
'api_calls_without_key': {'$cond': [{'$ne': ["$apikey", None]}, 0, 1]}
}},
{'$group': {'_id': "$owner",
'api_calls_with_key': {'$sum': "$api_calls_with_key"},
'api_calls_without_key': {'$sum': "$api_calls_without_key"}
}},
{'$project': {'api_calls_with_key': "$api_calls_with_key",
'api_calls_without_key': "$api_calls_without_key",
'api_calls_total': {'$add': ["$api_calls_with_key", "$api_calls_without_key"]},
'api_calls_per_day': {'$divide': [{'$add': ["$api_calls_with_key", "$api_calls_without_key"]}, {'$dayOfMonth': datetime.now()}]},
}}
])
print(analytics)
return analytics
db is well connected and collection is there too and I got back valid expected result but when i try to return it gives me Json error. Any idea how to convert the response back into JSON. Thanks
Pymongo provides json_util - you can use that one instead to handle BSON types
def parse_json(data):
return json.loads(json_util.dumps(data))
You should define you own JSONEncoder and using it:
import json
from bson import ObjectId
class JSONEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, ObjectId):
return str(o)
return json.JSONEncoder.default(self, o)
JSONEncoder().encode(analytics)
It's also possible to use it in the following way.
json.encode(analytics, cls=JSONEncoder)
>>> from bson import Binary, Code
>>> from bson.json_util import dumps
>>> dumps([{'foo': [1, 2]},
... {'bar': {'hello': 'world'}},
... {'code': Code("function x() { return 1; }")},
... {'bin': Binary("")}])
'[{"foo": [1, 2]}, {"bar": {"hello": "world"}}, {"code": {"$code": "function x() { return 1; }", "$scope": {}}}, {"bin": {"$binary": "AQIDBA==", "$type": "00"}}]'
Actual example from json_util.
Unlike Flask's jsonify, "dumps" will return a string, so it cannot be used as a 1:1 replacement of Flask's jsonify.
But this question shows that we can serialize using json_util.dumps(), convert back to dict using json.loads() and finally call Flask's jsonify on it.
Example (derived from previous question's answer):
from bson import json_util, ObjectId
import json
#Lets create some dummy document to prove it will work
page = {'foo': ObjectId(), 'bar': [ObjectId(), ObjectId()]}
#Dump loaded BSON to valid JSON string and reload it as dict
page_sanitized = json.loads(json_util.dumps(page))
return page_sanitized
This solution will convert ObjectId and others (ie Binary, Code, etc) to a string equivalent such as "$oid."
JSON output would look like this:
{
"_id": {
"$oid": "abc123"
}
}
Most users who receive the "not JSON serializable" error simply need to specify default=str when using json.dumps. For example:
json.dumps(my_obj, default=str)
This will force a conversion to str, preventing the error. Of course then look at the generated output to confirm that it is what you need.
from bson import json_util
import json
#app.route('/')
def index():
for _ in "collection_name".find():
return json.dumps(i, indent=4, default=json_util.default)
This is the sample example for converting BSON into JSON object. You can try this.
As a quick replacement, you can change {'owner': objectid} to {'owner': str(objectid)}.
But defining your own JSONEncoder is a better solution, it depends on your requirements.
Posting here as I think it may be useful for people using Flask with pymongo. This is my current "best practice" setup for allowing flask to marshall pymongo bson data types.
mongoflask.py
from datetime import datetime, date
import isodate as iso
from bson import ObjectId
from flask.json import JSONEncoder
from werkzeug.routing import BaseConverter
class MongoJSONEncoder(JSONEncoder):
def default(self, o):
if isinstance(o, (datetime, date)):
return iso.datetime_isoformat(o)
if isinstance(o, ObjectId):
return str(o)
else:
return super().default(o)
class ObjectIdConverter(BaseConverter):
def to_python(self, value):
return ObjectId(value)
def to_url(self, value):
return str(value)
app.py
from .mongoflask import MongoJSONEncoder, ObjectIdConverter
def create_app():
app = Flask(__name__)
app.json_encoder = MongoJSONEncoder
app.url_map.converters['objectid'] = ObjectIdConverter
# Client sends their string, we interpret it as an ObjectId
#app.route('/users/<objectid:user_id>')
def show_user(user_id):
# setup not shown, pretend this gets us a pymongo db object
db = get_db()
# user_id is a bson.ObjectId ready to use with pymongo!
result = db.users.find_one({'_id': user_id})
# And jsonify returns normal looking json!
# {"_id": "5b6b6959828619572d48a9da",
# "name": "Will",
# "birthday": "1990-03-17T00:00:00Z"}
return jsonify(result)
return app
Why do this instead of serving BSON or mongod extended JSON?
I think serving mongo special JSON puts a burden on client applications. Most client apps will not care using mongo objects in any complex way. If I serve extended json, now I have to use it server side, and the client side. ObjectId and Timestamp are easier to work with as strings and this keeps all this mongo marshalling madness quarantined to the server.
{
"_id": "5b6b6959828619572d48a9da",
"created_at": "2018-08-08T22:06:17Z"
}
I think this is less onerous to work with for most applications than.
{
"_id": {"$oid": "5b6b6959828619572d48a9da"},
"created_at": {"$date": 1533837843000}
}
For those who need to return the data thru Jsonify with Flask:
cursor = db.collection.find()
data = []
for doc in cursor:
doc['_id'] = str(doc['_id']) # This does the trick!
data.append(doc)
return jsonify(data)
You could try:
objectid = str(ObjectId("51948e86c25f4b1d1c0d303c"))
in my case I needed something like this:
class JsonEncoder():
def encode(self, o):
if '_id' in o:
o['_id'] = str(o['_id'])
return o
This is how I've recently fixed the error
#app.route('/')
def home():
docs = []
for doc in db.person.find():
doc.pop('_id')
docs.append(doc)
return jsonify(docs)
I know I'm posting late but thought it would help at least a few folks!
Both the examples mentioned by tim and defuz(which are top voted) works perfectly fine. However, there is a minute difference which could be significant at times.
The following method adds one extra field which is redundant and may not be ideal in all the cases
Pymongo provides json_util - you can use that one instead to handle BSON types
Output: {
"_id": {
"$oid": "abc123"
}
}
Where as the JsonEncoder class gives the same output in the string format as we need and we need to use json.loads(output) in addition. But it leads to
Output: {
"_id": "abc123"
}
Even though, the first method looks simple, both the method need very minimal effort.
I would like to provide an additional solution that improves the accepted answer. I have previously provided the answers in another thread here.
from flask import Flask
from flask.json import JSONEncoder
from bson import json_util
from . import resources
# define a custom encoder point to the json_util provided by pymongo (or its dependency bson)
class CustomJSONEncoder(JSONEncoder):
def default(self, obj): return json_util.default(obj)
application = Flask(__name__)
application.json_encoder = CustomJSONEncoder
if __name__ == "__main__":
application.run()
If you will not be needing the _id of the records I will recommend unsetting it when querying the DB which will enable you to print the returned records directly e.g
To unset the _id when querying and then print data in a loop you write something like this
records = mycollection.find(query, {'_id': 0}) #second argument {'_id':0} unsets the id from the query
for record in records:
print(record)
If you want to send it as a JSON response you need to format in two steps
Using json_util.dumps() from bson to covert ObjectId in BSON response to
JSON compatible format i.e. "_id": {"$oid": "123456789"}
The above JSON Response obtained from json_util.dumps() will have backslashes and quotes
To remove backslashes and quotes use json.loads() from json
from bson import json_util
import json
bson_data = [{'_id': ObjectId('123456789'), 'field': 'somedata'},{'_id': ObjectId('123456781'), 'field': 'someMoredata'}]
json_data_with_backslashes = json_util.dumps(bson_data)
# output will look like this
# "[{\"_id\": {\"$oid\": \"123456789\"}, \"field\": \"somedata\"},{\"_id\": {\"$oid\": \"123456781\"}, \"field\": \"someMoredata\"}]"
json_data = json.loads(json_data_with_backslashes)
# output will look like this
# [{"_id": {"$oid": "123456789"},"field": "somedata"},{"_id": {"$oid": "123456781"},"field": "someMoredata"}]
Flask's jsonify provides security enhancement as described in JSON Security. If custom encoder is used with Flask, its better to consider the
points discussed in the JSON Security
If you don't want _id in response, you can refactor your code something like this:
jsonResponse = getResponse(mock_data)
del jsonResponse['_id'] # removes '_id' from the final response
return jsonResponse
This will remove the TypeError: ObjectId('') is not JSON serializable error.
from bson.objectid import ObjectId
from core.services.db_connection import DbConnectionService
class DbExecutionService:
def __init__(self):
self.db = DbConnectionService()
def list(self, collection, search):
session = self.db.create_connection(collection)
return list(map(lambda row: {i: str(row[i]) if isinstance(row[i], ObjectId) else row[i] for i in row}, session.find(search))
SOLUTION for: mongoengine + marshmallow
If you use mongoengine and marshamallow then this solution might be applicable for you.
Basically, I imported String field from marshmallow, and I overwritten default Schema id to be String encoded.
from marshmallow import Schema
from marshmallow.fields import String
class FrontendUserSchema(Schema):
id = String()
class Meta:
fields = ("id", "email")
I am somewhat new to python and I am wondering what the best way is to generate json in a loop. I could just mash a bunch of strings together in the loop, but I'm sure there is a better way. Here's some more specifics. I am using app engine in python to create a service that returns json as a response.
So as an example, let's say someone requests a list of user records from the service. After the service queries for the records, it needs to return json for each record it found. Maybe something like this:
{records:
{record: { name:bob, email:blah#blah.com, age:25 } },
{record: { name:steve, email:blah#blahblah.com, age:30 } },
{record: { name:jimmy, email:blah#b.com, age:31 } },
}
Excuse my poorly formatted json. Thanks for your help.
Creating your own JSON is silly. Use json or simplejson for this instead.
>>> json.dumps(dict(foo=42))
'{"foo": 42}'
My question is how do I add to the
dictionary dynamically? So foreach
record in my list of records, add a
record to the dictionary.
You may be looking to create a list of dictionaries.
records = []
record1 = {"name":"Bob", "email":"bob#email.com"}
records.append(record1)
record2 = {"name":"Bob2", "email":"bob2#email.com"}
records.append(record2)
Then in app engine, use the code above to export records as json.
Few steps here.
First import simplejson
from django.utils import simplejson
Then create a function that will return json with the appropriate data header.
def write_json(self, data):
self.response.headers['Content-Type'] = 'application/json'
self.response.out.write(simplejson.dumps(data))
Then from within your post or get handler, create a python dictionary with the desired data and pass that into the function you created.
ret = {"records":{
"record": {"name": "bob", ...}
...
}
write_json(self, ret)