Does boto.dynamodb2 support storing a dict of dicts? - python

Solution: Updating to the brand-new boto 2.35.2 fixed the problem.
How can I store a dict of dicts in DynamoDB using boto?
The straightforward approach that I've been trying doesn't seem to work. Trying to save an item defined this way:
data = {
'id': '123456',
'foo': {'hello': 'world'}
}
item = Item(my_table, data=data)
item.save(overwrite=True)
generates this exception:
TypeError: Unsupported type "<type 'dict'>" for value "{'hello': 'world'}"
I've seen conflicting information on the web about whether this is supported. I can't get it to work; am using boto 2.35.1.
Here's a complete example that demonstrates the problem:
import boto.dynamodb2
from boto.dynamodb2.fields import HashKey
from boto.dynamodb2.table import Table
from boto.dynamodb2.items import Item
conn = boto.dynamodb2.connect_to_region('us-east-1')
my_table = Table.create('my_table',
connection=conn,
schema=[
HashKey('id')
])
my_table = Table('my_table')
data = {
'id': '123456',
'foo': {'hello': 'world'}
}
item = Item(my_table, data=data)
item.save(overwrite=True)

DynamoDB API now supports map and list object:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DataFormat.html
And seams that last boto version (yesterday at the time I answered) added this too:
http://boto.readthedocs.org/en/latest/releasenotes/v2.35.2.html
but personally I didn't play with yet.

Related

PyMongo Atlas Search not returning anything

I'm trying to do a full text search using Atlas for MongoDB. I'm doing this through the PyMongo driver in Python. I'm using the aggregate pipeline, and doing a $search but it seems to return nothing.
cursor = db.collection.aggregate([
{"$search": {"text": {"query": "hello", "path": "text_here"}}},
{"$project": {"file_name": 1}}
])
for x in cursor:
print(x)
What I'm trying to achieve with this code is to search through a field in the collection called "text_here", and I'm searching for a term "hello" and returning all the results that contain that term and listing them by their "file_name". However, it returns nothing and I'm quite confused as this is almost identical to the example code on the documentation website. The only thing I could think of right now is that possible the path isn't correct and it can't access the field I've specified. Also, this code returns no errors, simply just returns nothing as I've tested by looping through cursor.
I had the same issue. I solved it by also passing the name of the index in the query. For example:
{
index: "name_of_the_index",
text: {
query: 'john doe',
path: 'name'
}
}
I followed the tutorials but couldn't get any result back without specifying the "index" name. I wish this was mentioned in the documentation as mandatory.
If you are only doing a find and project, you don't need an aggregate query, just a find(). The syntax you want is:
db.collection.find({'$text': {'$search': 'hello'}}, {'file_name': 1})
Equivalent using aggregate:
cursor = db.collection.aggregate([
{'$match': {'$text': {'$search': 'hello'}}},
{'$project': {'file_name': 1}}])
Worked example:
from pymongo import MongoClient, TEXT
db = MongoClient()['mydatabase']
db.collection.create_index([('text_here', TEXT)])
db.collection.insert_one({"text_here": "hello, is it me you're looking for", "file_name": "foo.bar"})
cursor = db.collection.find({'$text': {'$search': 'hello'}}, {'file_name': 1})
for item in cursor:
print(item)
prints:
{'_id': ObjectId('5fc81ce9a4a46710459de610'), 'file_name': 'foo.bar'}

Flask-SQLAlchemy ORM/GeoAlchemy2 results to a dictionary and ultimately JSON

I am using Flask/SQLAlchemy to create a web app with a map in it, so naturally I'm using a PostGIS database. The geom column requires an ST_Transform and somehow I need to turn this column and all others into JSON. The general structure of the database is:
from app import login, db
from datetime import datetime
from geoalchemy2 import Geometry
from time import time
from flask import current_app
from sqlalchemy import func
class Streets(db.Model):
id = db.Column(db.Integer, primary_key=True)
street = db.Column(db.String(50))
geom = db.Column(Geometry(geometry_type='LINESTRING'))
def to_dict(self):
data = {
'id': self.id,
'street': self.street,
'_geom': func.ST_AsGeoJSON(func.ST_Transform(self.geom, 4326))
}
return data
My api route turns this result into an api:
return jsonify(Streets.query.get_or_404(id).to_dict())
But I keep getting this error: NameError: name 'ST_AsGeoJSON' is not defined
I also tried to create my _geom value like this:
data['_geom'] = db.session.query(func.ST_AsGeoJSON(func.ST_Transform(self.geom, 4326)))
The error message is: TypeError: Object of type 'BaseQuery' is not JSON serializable
Finally, I tried an api route like this:
data = Streets.to_dict(
db.session.query(
func.ST_AsGeoJSON(
func.ST_Transform(
Streets.geom, 4326
)
)
)
.filter(Streets.id==id))
return jsonify(data)
And I get a different error:
AttributeError: 'BaseQuery' object has no attribute 'id'
If I run this in flask shell it works:
streets = db.session.query(
Streets.id,
Streets.street,
func.ST_AsGeoJSON(func.ST_Transform(Streets.geom, 4326)))
How can I perform ST_Transform and get JSON to my api route?
UPDATE
I found this in the SQLALchemy documentation that got me some progress: "orm.column_property() can be used to map a SQL expression". So I tried adding this to my class Streets(db.Model):
coords = db.column_property(func.ST_AsGeoJSON(func.ST_Transform(geom, 4326)))
Then I add it to data like this:
def to_dict(self):
data = {
'id': self.id,
'street': self.street,
'coords': self.coords
}
return data
But now I'm double encoding my results, once into GeoJSON and then I jsonify it:
return jsonify(Streets.query.get_or_404(id).to_dict())
So my api inserts \'s:
{"coords": "{\"type\":\"MultiLineString\",\"coordinates\":[[[-80.8357132798193,35.2260689001034],[-80.8347602582754,35.2252424284259]]]}"}
And using ST_AsText just turns it into text:
{"coords": "MULTILINESTRING((-80.8357132798193 35.2260689001034,-80.8347602582754 35.2252424284259))"}
I think I'm close with this update, but does anyone have a suggestion for getting correct GeoJSON with the JSON of the other fields of my database?
The first error
NameError: name 'ST_AsGeoJSON' is not defined
means that your example code is not what you were actually using. You had forgot to access it through func. It would not work after fixing that either, since you'd be mixing the SQL world and the Python world. func.ST_AsGeoJSON(...) creates an SQL function expression object that is supposed to be compiled to SQL and sent to the DB in a query, not passed to jsonify().
The second error
TypeError: Object of type 'BaseQuery' is not JSON serializable
should be somewhat obvious.
data['_geom'] = db.session.query(func.ST_AsGeoJSON(func.ST_Transform(self.geom, 4326)))
creates a Query, and a too broad query at that, since you've not limited it to fetch data of the current object. The Query object is not JSON serializable.
In
data = Streets.to_dict(db.session.query(...)...)
you pass the Query object as self to Streets.to_dict(), which then tries to access its id attribute in
'id': self.id,
which fails for obvious reasons – namely passing an unrelated object as the instance to a method.
The column_property() approach produces the doubly encoded JSON because SQLAlchemy does not by default expect ST_AsGeoJSON to return JSON and treats it as text instead, which it actually returns. Try decoding in between manually:
def to_dict(self):
data = {
'id': self.id,
'street': self.street,
'coords': json.loads(self.coords)
}
return data

RavenDB Object properly saved but some empty attributes when querying

I'm currently trying to save some python objects (websites) via PyRavenDB in a RavenDB database. The problem is that data are saved properly, but when I test it by querying the results, some of the attributes are returned empty.
The code is simple, I can't properly find be the problem.
The JSON object in the database is the following (verified via the DB web UI).
{
"htmlCode": "<code>TEST HTML</code>",
"added": "2017-02-21",
"uniqueid": "262e4584f3e546afa2c67045a0096b54",
"url": "www.example.com",
"myHash": "d41d8cd98f00b204e9800998ecf8427e",
"lastaccessed": "2017-02-21"
}
When I use this code to query
from pyravendb.store import document_store
store = document_store.documentstore(url="http://somewhere:someport", database="websites")
store.initialize()
with store.open_session() as session:
query_result = list(session.query().where_equals("www.example.com", url))
print query_result
print type(query_result)
return query_result
It returns this object :
{
'uniqueid': 'f942e86f965d4709a2d69caca3001f2a',
'url': '',
'myHash': 'd41d8cd98f00b204e9800998ecf8427e',
'htmlCode': '',
'added': '2017-02-21',
'lastaccessed': '2017-02-21'
}
As you can see, url and html code are empty. They should be okey since in DB they are properly stored.
Thanks.
The problem here is that you don't use the where_equal right.
where_equal first argument is the field name you want to query with and then the value (def where_equals(self, field_name, value)).
Just change this line query_result = list(session.query().where_equals("www.example.com", url))
To this query_result = list(session.query().where_equals("url", "www.example.com"))
This will fix your problem

TypeError: ObjectId('') is not JSON serializable

My response back from MongoDB after querying an aggregated function on document using Python, It returns valid response and i can print it but can not return it.
Error:
TypeError: ObjectId('51948e86c25f4b1d1c0d303c') is not JSON serializable
Print:
{'result': [{'_id': ObjectId('51948e86c25f4b1d1c0d303c'), 'api_calls_with_key': 4, 'api_calls_per_day': 0.375, 'api_calls_total': 6, 'api_calls_without_key': 2}], 'ok': 1.0}
But When i try to return:
TypeError: ObjectId('51948e86c25f4b1d1c0d303c') is not JSON serializable
It is RESTfull call:
#appv1.route('/v1/analytics')
def get_api_analytics():
# get handle to collections in MongoDB
statistics = sldb.statistics
objectid = ObjectId("51948e86c25f4b1d1c0d303c")
analytics = statistics.aggregate([
{'$match': {'owner': objectid}},
{'$project': {'owner': "$owner",
'api_calls_with_key': {'$cond': [{'$eq': ["$apikey", None]}, 0, 1]},
'api_calls_without_key': {'$cond': [{'$ne': ["$apikey", None]}, 0, 1]}
}},
{'$group': {'_id': "$owner",
'api_calls_with_key': {'$sum': "$api_calls_with_key"},
'api_calls_without_key': {'$sum': "$api_calls_without_key"}
}},
{'$project': {'api_calls_with_key': "$api_calls_with_key",
'api_calls_without_key': "$api_calls_without_key",
'api_calls_total': {'$add': ["$api_calls_with_key", "$api_calls_without_key"]},
'api_calls_per_day': {'$divide': [{'$add': ["$api_calls_with_key", "$api_calls_without_key"]}, {'$dayOfMonth': datetime.now()}]},
}}
])
print(analytics)
return analytics
db is well connected and collection is there too and I got back valid expected result but when i try to return it gives me Json error. Any idea how to convert the response back into JSON. Thanks
Pymongo provides json_util - you can use that one instead to handle BSON types
def parse_json(data):
return json.loads(json_util.dumps(data))
You should define you own JSONEncoder and using it:
import json
from bson import ObjectId
class JSONEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, ObjectId):
return str(o)
return json.JSONEncoder.default(self, o)
JSONEncoder().encode(analytics)
It's also possible to use it in the following way.
json.encode(analytics, cls=JSONEncoder)
>>> from bson import Binary, Code
>>> from bson.json_util import dumps
>>> dumps([{'foo': [1, 2]},
... {'bar': {'hello': 'world'}},
... {'code': Code("function x() { return 1; }")},
... {'bin': Binary("")}])
'[{"foo": [1, 2]}, {"bar": {"hello": "world"}}, {"code": {"$code": "function x() { return 1; }", "$scope": {}}}, {"bin": {"$binary": "AQIDBA==", "$type": "00"}}]'
Actual example from json_util.
Unlike Flask's jsonify, "dumps" will return a string, so it cannot be used as a 1:1 replacement of Flask's jsonify.
But this question shows that we can serialize using json_util.dumps(), convert back to dict using json.loads() and finally call Flask's jsonify on it.
Example (derived from previous question's answer):
from bson import json_util, ObjectId
import json
#Lets create some dummy document to prove it will work
page = {'foo': ObjectId(), 'bar': [ObjectId(), ObjectId()]}
#Dump loaded BSON to valid JSON string and reload it as dict
page_sanitized = json.loads(json_util.dumps(page))
return page_sanitized
This solution will convert ObjectId and others (ie Binary, Code, etc) to a string equivalent such as "$oid."
JSON output would look like this:
{
"_id": {
"$oid": "abc123"
}
}
Most users who receive the "not JSON serializable" error simply need to specify default=str when using json.dumps. For example:
json.dumps(my_obj, default=str)
This will force a conversion to str, preventing the error. Of course then look at the generated output to confirm that it is what you need.
from bson import json_util
import json
#app.route('/')
def index():
for _ in "collection_name".find():
return json.dumps(i, indent=4, default=json_util.default)
This is the sample example for converting BSON into JSON object. You can try this.
As a quick replacement, you can change {'owner': objectid} to {'owner': str(objectid)}.
But defining your own JSONEncoder is a better solution, it depends on your requirements.
Posting here as I think it may be useful for people using Flask with pymongo. This is my current "best practice" setup for allowing flask to marshall pymongo bson data types.
mongoflask.py
from datetime import datetime, date
import isodate as iso
from bson import ObjectId
from flask.json import JSONEncoder
from werkzeug.routing import BaseConverter
class MongoJSONEncoder(JSONEncoder):
def default(self, o):
if isinstance(o, (datetime, date)):
return iso.datetime_isoformat(o)
if isinstance(o, ObjectId):
return str(o)
else:
return super().default(o)
class ObjectIdConverter(BaseConverter):
def to_python(self, value):
return ObjectId(value)
def to_url(self, value):
return str(value)
app.py
from .mongoflask import MongoJSONEncoder, ObjectIdConverter
def create_app():
app = Flask(__name__)
app.json_encoder = MongoJSONEncoder
app.url_map.converters['objectid'] = ObjectIdConverter
# Client sends their string, we interpret it as an ObjectId
#app.route('/users/<objectid:user_id>')
def show_user(user_id):
# setup not shown, pretend this gets us a pymongo db object
db = get_db()
# user_id is a bson.ObjectId ready to use with pymongo!
result = db.users.find_one({'_id': user_id})
# And jsonify returns normal looking json!
# {"_id": "5b6b6959828619572d48a9da",
# "name": "Will",
# "birthday": "1990-03-17T00:00:00Z"}
return jsonify(result)
return app
Why do this instead of serving BSON or mongod extended JSON?
I think serving mongo special JSON puts a burden on client applications. Most client apps will not care using mongo objects in any complex way. If I serve extended json, now I have to use it server side, and the client side. ObjectId and Timestamp are easier to work with as strings and this keeps all this mongo marshalling madness quarantined to the server.
{
"_id": "5b6b6959828619572d48a9da",
"created_at": "2018-08-08T22:06:17Z"
}
I think this is less onerous to work with for most applications than.
{
"_id": {"$oid": "5b6b6959828619572d48a9da"},
"created_at": {"$date": 1533837843000}
}
For those who need to return the data thru Jsonify with Flask:
cursor = db.collection.find()
data = []
for doc in cursor:
doc['_id'] = str(doc['_id']) # This does the trick!
data.append(doc)
return jsonify(data)
You could try:
objectid = str(ObjectId("51948e86c25f4b1d1c0d303c"))
in my case I needed something like this:
class JsonEncoder():
def encode(self, o):
if '_id' in o:
o['_id'] = str(o['_id'])
return o
This is how I've recently fixed the error
#app.route('/')
def home():
docs = []
for doc in db.person.find():
doc.pop('_id')
docs.append(doc)
return jsonify(docs)
I know I'm posting late but thought it would help at least a few folks!
Both the examples mentioned by tim and defuz(which are top voted) works perfectly fine. However, there is a minute difference which could be significant at times.
The following method adds one extra field which is redundant and may not be ideal in all the cases
Pymongo provides json_util - you can use that one instead to handle BSON types
Output: {
"_id": {
"$oid": "abc123"
}
}
Where as the JsonEncoder class gives the same output in the string format as we need and we need to use json.loads(output) in addition. But it leads to
Output: {
"_id": "abc123"
}
Even though, the first method looks simple, both the method need very minimal effort.
I would like to provide an additional solution that improves the accepted answer. I have previously provided the answers in another thread here.
from flask import Flask
from flask.json import JSONEncoder
from bson import json_util
from . import resources
# define a custom encoder point to the json_util provided by pymongo (or its dependency bson)
class CustomJSONEncoder(JSONEncoder):
def default(self, obj): return json_util.default(obj)
application = Flask(__name__)
application.json_encoder = CustomJSONEncoder
if __name__ == "__main__":
application.run()
If you will not be needing the _id of the records I will recommend unsetting it when querying the DB which will enable you to print the returned records directly e.g
To unset the _id when querying and then print data in a loop you write something like this
records = mycollection.find(query, {'_id': 0}) #second argument {'_id':0} unsets the id from the query
for record in records:
print(record)
If you want to send it as a JSON response you need to format in two steps
Using json_util.dumps() from bson to covert ObjectId in BSON response to
JSON compatible format i.e. "_id": {"$oid": "123456789"}
The above JSON Response obtained from json_util.dumps() will have backslashes and quotes
To remove backslashes and quotes use json.loads() from json
from bson import json_util
import json
bson_data = [{'_id': ObjectId('123456789'), 'field': 'somedata'},{'_id': ObjectId('123456781'), 'field': 'someMoredata'}]
json_data_with_backslashes = json_util.dumps(bson_data)
# output will look like this
# "[{\"_id\": {\"$oid\": \"123456789\"}, \"field\": \"somedata\"},{\"_id\": {\"$oid\": \"123456781\"}, \"field\": \"someMoredata\"}]"
json_data = json.loads(json_data_with_backslashes)
# output will look like this
# [{"_id": {"$oid": "123456789"},"field": "somedata"},{"_id": {"$oid": "123456781"},"field": "someMoredata"}]
Flask's jsonify provides security enhancement as described in JSON Security. If custom encoder is used with Flask, its better to consider the
points discussed in the JSON Security
If you don't want _id in response, you can refactor your code something like this:
jsonResponse = getResponse(mock_data)
del jsonResponse['_id'] # removes '_id' from the final response
return jsonResponse
This will remove the TypeError: ObjectId('') is not JSON serializable error.
from bson.objectid import ObjectId
from core.services.db_connection import DbConnectionService
class DbExecutionService:
def __init__(self):
self.db = DbConnectionService()
def list(self, collection, search):
session = self.db.create_connection(collection)
return list(map(lambda row: {i: str(row[i]) if isinstance(row[i], ObjectId) else row[i] for i in row}, session.find(search))
SOLUTION for: mongoengine + marshmallow
If you use mongoengine and marshamallow then this solution might be applicable for you.
Basically, I imported String field from marshmallow, and I overwritten default Schema id to be String encoded.
from marshmallow import Schema
from marshmallow.fields import String
class FrontendUserSchema(Schema):
id = String()
class Meta:
fields = ("id", "email")

Removing _id element from Pymongo results

I'm attempting to create a web service using MongoDB and Flask (using the pymongo driver). A query to the database returns documents with the "_id" field included, of course. I don't want to send this to the client, so how do I remove it?
Here's a Flask route:
#app.route('/theobjects')
def index():
objects = db.collection.find()
return str(json.dumps({'results': list(objects)},
default = json_util.default,
indent = 4))
This returns:
{
"results": [
{
"whatever": {
"field1": "value",
"field2": "value",
},
"whatever2": {
"field3": "value"
},
...
"_id": {
"$oid": "..."
},
...
}
]}
I thought it was a dictionary and I could just delete the element before returning it:
del objects['_id']
But that returns a TypeError:
TypeError: 'Cursor' object does not support item deletion
So it isn't a dictionary, but something I have to iterate over with each result as a dictionary. So I try to do that with this code:
for object in objects:
del object['_id']
Each object dictionary looks the way I'd like it to now, but the objects cursor is empty. So I try to create a new dictionary and after deleting _id from each, add to a new dictionary that Flask will return:
new_object = {}
for object in objects:
for key, item in objects.items():
if key == '_id':
del object['_id']
new_object.update(object)
This just returns a dictionary with the first-level keys and nothing else.
So this is sort of a standard nested dictionaries problem, but I'm also shocked that MongoDB doesn't have a way to easily deal with this.
The MongoDB documentation explains that you can exclude _id with
{ _id : 0 }
But that does nothing with pymongo. The Pymongo documentation explains that you can list the fields you want returned, but "(“_id” will always be included)". Seriously? Is there no way around this? Is there something simple and stupid that I'm overlooking here?
To exclude the _id field in a find query in pymongo, you can use:
db.collection.find({}, {'_id': False})
The documentation is somewhat missleading on this as it says the _id field is always included. But you can exclude it like shown above.
Above answer fails if we want specific fields and still ignore _id. Use the following in such cases:
db.collection.find({'required_column_A':1,'required_col_B':1, '_id': False})
You are calling
del objects['_id']
on the cursor object!
The cursor object is obviously an iterable over the result set and not single
document that you can manipulate.
for obj in objects:
del obj['_id']
is likely what you want.
So your claim is completely wrong as the following code shows:
import pymongo
c = pymongo.Connection()
db = c['mydb']
db.foo.remove({})
db.foo.save({'foo' : 42})
for row in db.foo.find():
del row['_id']
print row
$ bin/python foo.py
> {u'foo': 42}

Categories