Marshmallow: Dict of nested Schema - python

I'm wondering how to serialize a dict of nested Schema.
Naively, I would expect syntaxes like this to work:
fields.List(Schema)
fields.Dict(Schema)
or maybe
fields.List(fields.Nested(Schema))
fields.Dict(fields.Nested(Schema))
Serializing a list of Schema can be achieved through Nested(Schema, many=True), but I don't know about a dict of Schema.
Assume, for example's sake, that my object is defined like this:
from marshmallow import Schema, fields, pprint
class AlbumSchema(Schema):
year = fields.Int()
class ArtistSchema(Schema):
name = fields.Str()
# What should I write, here?
# This won't work
albums = fields.Nested(AlbumSchema(), many=True)
# If I write this, AlbumSchema is ignored, so this is equivalent to
albums = fields.Dict(AlbumSchema(), many=True)
# this, which is not satisfying (AlbumSchema unused)
albums = fields.Dict()
# This is not the way either
albums = fields.Dict(fields.Nested(AlbumSchema))
album_1 = dict(year=1971)
album_2 = dict(year=1970)
bowie = dict(name='David Bowie',
albums={
'Hunky Dory': album_1,
'The Man Who Sold the World': album_2
}
)
schema = ArtistSchema()
result = schema.dump(bowie)
pprint(result.data, indent=2)
I expect my object to be serialized as
{ 'albums': { 'Hunky Dory': {'year': 1971},
'The Man Who Sold the World': {'year': 1970}},
'name': 'David Bowie'}
(Question also discussed on GitHub.)

This is not possible right now, but it is a feature request:
https://github.com/marshmallow-code/marshmallow/issues/483
https://github.com/marshmallow-code/marshmallow/issues/496
and it has been worked on already:
https://github.com/marshmallow-code/marshmallow/compare/dev...deckar01:483-structured-dict
2017-12-31: This feature was added to Marshmallow 3.0.0b5 (https://github.com/marshmallow-code/marshmallow/pull/700).

Related

How do I produce nested JSON from database query with joins? Using Python / SQLAlchemy

I have a specify use case but my question pertains to the best way of doing this in general.
I have three tables
Order - primary key order_id
OrderLine - Linking table with order_id, product_id and quantity. An order has 1 or more order lines
Product - primary key product_id, each order line has one product
In sqlachemy / python how do I generate nested JSON along the lines of:
{
"orders": [
{
"order_id": 1
"some_order_level_detail": "Kansas"
"order_lines": [
{
"product_id": 1,
"product_name": "Clawhammer",
"quantity": 5
},
...
]
},
...
]
}
Potential Ideas
Hack away doing successive queries
First idea which I want to get away from if possible is using list comprehesion and a brute force approach.
def get_json():
answer = {
"orders": [
{
"order_id": o.order_id,
"some_order_level_detail": o.some_order_level_detail,
"order_lines": [
{
"product_id": 1,
"product_name": Product.query.get(o_line.product_id).product_name,
"quantity": 5
}
for o_line in OrderLine.query.filter(order_id=o.order_id).all()
]
}
for o in Order.query.all()
]
}
This gets hard to maintain mixing the queries with json. Ideally I'd like to do a query first...
Get joined results first, somehow manipulate later
The second idea is to do a join query to join the three tables showing per row in OrderLine the order and product details.
My question to pythonista out there is is there a nice way to convert this to nested json.
Another way?
This really seems like such a common requirement I'm really wondering whether there is a book method for this sort of thing?
Is there an SQLAchemy version of this
Look into marshmallow-sqlalchemy, as it does exactly what you're looking for.
I strongly advise against baking your serialization directly into your model, as you will eventually have two services requesting the same data, but serialized in a different way (including fewer or more nested relationships for performance, for instance), and you will either end up with either (1) a lot of bugs that your test suite will miss unless you're checking for literally every field or (2) more data serialized than you need and you'll run into performance issues as the complexity of your application scales.
With marshmallow-sqlalchemy, you'll need to define a schema for each model you'd like to serialize. Yes, it's a bit of extra boilerplate, but believe me - you will be much happier in the end.
We build applications using flask-sqlalchemy and marshmallow-sqlalchemy like this (also highly recommend factory_boy so that you can mock your service and write unit tests in place of of integration tests that need to touch the database):
# models
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
children = relationship("Child", back_populates="parent")
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey('parent.id'))
parent = relationship('Parent', back_populates='children',
foreign_keys=[parent_id])
# schemas. Don't put these in your models. Avoid tight coupling here
from marshmallow_sqlalchemy import ModelSchema
import marshmallow as ma
class ParentSchema(ModelSchema):
children = ma.fields.Nested(
'myapp.schemas.child.Child', exclude=('parent',), many=True)
class Meta(ModelSchema.Meta):
model = Parent
strict = True
dump_only = ('id',)
class ChildSchema(ModelSchema):
parent = ma.fields.Nested(
'myapp.schemas.parent.Parent', exclude=('children',))
class Meta(ModelSchema.Meta):
model = Child
strict = True
dump_only = ('id',)
# services
class ParentService:
'''
This service intended for use exclusively by /api/parent
'''
def __init__(self, params, _session=None):
# your unit tests can pass in _session=MagicMock()
self.session = _session or db.session
self.params = params
def _parents(self) -> typing.List[Parent]:
return self.session.query(Parent).options(
joinedload(Parent.children)
).all()
def get(self):
schema = ParentSchema(only=(
# highly recommend specifying every field explicitly
# rather than implicit
'id',
'children.id',
))
return schema.dump(self._parents()).data
# views
#app.route('/api/parent')
def get_parents():
service = ParentService(params=request.get_json())
return jsonify(data=service.get())
# test factories
class ModelFactory(SQLAlchemyModelFactory):
class Meta:
abstract = True
sqlalchemy_session = db.session
class ParentFactory(ModelFactory):
id = factory.Sequence(lambda n: n + 1)
children = factory.SubFactory('tests.factory.children.ChildFactory')
class ChildFactory(ModelFactory):
id = factory.Sequence(lambda n: n + 1)
parent = factory.SubFactory('tests.factory.parent.ParentFactory')
# tests
from unittest.mock import MagicMock, patch
def test_can_serialize_parents():
parents = ParentFactory.build_batch(4)
session = MagicMock()
service = ParentService(params={}, _session=session)
assert service.session is session
with patch.object(service, '_parents') as _parents:
_parents.return_value = parents
assert service.get()[0]['id'] == parents[0].id
assert service.get()[1]['id'] == parents[1].id
assert service.get()[2]['id'] == parents[2].id
assert service.get()[3]['id'] == parents[3].id
I would add a .json() method to each model, so that they call each other. It's essentially your "hacked" solution but a bit more readable/maintainable. Your Order model could have:
def json(self):
return {
"id": self.id,
"order_lines": [line.json() for line in self.order_lines]
}
Your OrderLine model could have:
def json(self):
return {
"product_id": self.product_id,
"product_name": self.product.name,
"quantity": self.quantity
}
Your resource at the top level (where you're making the request for orders) could then do:
...
orders = Order.query.all()
return {"orders": [order.json() for order in orders]}
...
This is how I normally structure this JSON requirement.
Check my answer in this thread Flask Sqlalchmey - Marshmallow Nested Schema fails for joins with filter ( where ) conditions and using the Marshmallow package you include in your schema something like this:
name = fields.Nested(Schema, many=True)

Python mongoengine select_related(n) not doing what I expected

I have an object stored in mongo that has a list of reference fields. In a restplus app I need to parse this list of objects and map them into a JSON doc to return for a client.
# Classes I have saved in Mongo
class ThingWithList(Document):
list_of_objects = ListField(ReferenceField(InfoHolder))
class InfoHolder(Document):
thing_id = StringField()
thing_i_care_about = ReferenceField(Info)
class Info(Document):
name = StringField()
foo = StringField()
bar = StringField()
I am finding iterating through the list to be very slow. I guess because I am having to do another database query every time I dereference children of objects in the list.
Simple (but rubbish) method:
info_to_return = []
thing = ThingWithList.get_from_id('thingsId')
for o in list_of_objects:
info = {
'id': o.id,
'name': o.thing_i_care_about.name,
'foo': o.thing_i_care_about.foo,
'bar': o.thing_i_care_about.bar
}
info_to_return.append(info)
return(info_to_return)
I thought I would be able to solve this by using select_related which sounds like it should do the dereferencing for me N levels deep so that I only do one big mongo call rather than several per iteration. When I add
thing.select_related(3)
it seems to have no effect. Have I just misunderstood what this function is for. How else could I speed up my query?

Is there a way to check whether a related object is already fetched?

I would like to be able to check if a related object has already been fetched by using either select_related or prefetch_related, so that I can serialize the data accordingly. Here is an example:
class Address(models.Model):
street = models.CharField(max_length=100)
zip = models.CharField(max_length=10)
class Person(models.Model):
name = models.CharField(max_length=20)
address = models.ForeignKey(Address)
def serialize_address(address):
return {
"id": address.id,
"street": address.street,
"zip": address.zip
}
def serialize_person(person):
result = {
"id": person.id,
"name": person.name
}
if is_fetched(person.address):
result["address"] = serialize_address(person.address)
else:
result["address"] = None
######
person_a = Person.objects.select_related("address").get(id=1)
person_b = Person.objects.get(id=2)
serialize_person(person_a) #should be object with id, name and address
serialize_person(person_b) #should be object with only id and name
In this example, the function is_fetched is what I am looking for. I would like to determine if the person object already has a resolves address and only if it has, it should be serialized as well. But if it doesn't, no further database query should be executed.
So is there a way to achieve this in Django?
Since Django 2.0 you can easily check for all fetched relation by:
obj._state.fields_cache
ModelStateFieldsCacheDescriptor is responsible for storing your cached relations.
>>> Person.objects.first()._state.fields_cache
{}
>>> Person.objects.select_related('address').first()._state.fields_cache
{'address': <Address: Your Address>}
If the address relation has been fetched, then the Person object will have a populated attribute called _address_cache; you can check this.
def is_fetched(obj, relation_name):
cache_name = '_{}_cache'.format(relation_name)
return getattr(obj, cache_name, False)
Note you'd need to call this with the object and the name of the relation:
is_fetched(person, 'address')
since doing person.address would trigger the fetch immediately.
Edit reverse or many-to-many relations can only be fetched by prefetch_related; that populates a single attribute, _prefetched_objects_cache, which is a dict of lists where the key is the name of the related model. Eg if you do:
addresses = Address.objects.prefetch_related('person_set')
then each item in addresses will have a _prefetched_objects_cache dict containing a "person' key.
Note, both of these are single-underscore attributes which means they are part of the private API; you're free to use them, but Django is also free to change them in future releases.
Per this comment on the ticket linked in the comment by #jaap3 above, the recommended way to do this for Django 3+ (perhaps 2+?) is to use the undocumented is_cached method on the model's field, which comes from this internal mixin:
>>> person1 = Person.objects.first()
>>> Person.address.is_cached(person1)
False
>>> person2 = Person.objects.select_related('address').last()
>>> Person.address.is_cached(person2)
True

load a relationship in a to_json method

I have a fairly basic CRUDMixin
class CRUDMixin(object):
""" create, read, update and delete methods for SQLAlchemy """
id = db.Column(db.Integer, primary_key=True)
#property
def columns(self):
return [ c.name for c in self.__table__.columns ]
def read(self):
""" return json of this current model """
return dict([ (c, getattr(self, c)) for c in self.columns ])
# ...
For something like an Article class which will subclass this, it might have a relationship with another class, like so:
author_id = db.Column(db.Integer, db.ForeignKey('users.id'))
The only real problem is that it will not return any user details in the json. Ideally, the json should look like this:
{
'id': 1234,
'title': 'this is an article',
'body': 'Many words go here. Many shall be unread. Roman Proverb.',
'author': {
'id': 14
'name': 'Thor',
'joined': October 1st, 1994
}
}
As it is right now, it will just give author_id: 14.
Can I detect if a column is a relationship and load it as json as well in this way?
You have to setup the entire relation by adding something like
author = db.relationship("Author") # I assume that you have an Author model
Then to json your result you have differents way to handle relations.
Take a look at this 2 responses :
jsonify a SQLAlchemy result set in Flask
How to serialize SqlAlchemy result to JSON?
You can also take a look at flask-restful which provide a method/decorator (marshal_with) to marshal your results in a good way with nested object (relations).
http://flask-restful.readthedocs.org/en/latest/fields.html#advanced-nested-field

Serializing ReferenceProperty in Appengine Datastore to JSON

I am using the following code to serialize my appengine datastore to JSON
class DictModel(db.Model):
def to_dict(self):
return dict([(p, unicode(getattr(self, p))) for p in self.properties()])
class commonWordTweets(DictModel):
commonWords = db.StringListProperty(required=True)
venue = db.ReferenceProperty(Venue, required=True, collection_name='commonWords')
class Venue(db.Model):
id = db.StringProperty(required=True)
fourSqid = db.StringProperty(required=False)
name = db.StringProperty(required=True)
twitter_ID = db.StringProperty(required=True)
This returns the following JSON response
[
{
"commonWords": "[u'storehouse', u'guinness', u'badge', u'2011"', u'"new', u'mayor', u'dublin)']",
"venue": "<__main__.Venue object at 0x1028ad190>"
}
]
How can I return the actual venue name to appear?
Firstly, although it's not exactly your question, it's strongly recommended to use simplejson to produce json, rather than trying to turn structures into json strings yourself.
To answer your question, the ReferenceProperty just acts as a reference to your Venue object. So you just use its attributes as per normal.
Try something like:
cwt = commonWordTweets() # Replace with code to get the item from your datastore
d = {"commonWords":cwt.commonWords, "venue": cwt.venue.name}
jsonout = simplejson.dumps(d)

Categories