Problem
How to group fields together when serialising a flat-structured SQLAlchemy object with Marshmallow without changing the flat data structure in the background?
Example
Suppose a SQLAlchemy model in a Flask app like this:
from app import db # db using SQLAlchemy
class Data(db.Model):
x = db.Column(db.Float(), required=True)
y = db.Column(db.Float(), required=True)
d = db.Column(db.Float())
How can I serialise this object so that x and y are nested into coordinates, while maintaining a flat data structure in the background (the model)? The output should look something like this:
{
"coordinates": {
"x": 10.56
"y": 1
},
"d": 42.0
}
The problem arises specifically because I use the Data schema with the many=True option. The initialisation is roughly:
schema_data = MomentData()
schema_datas = MomentData(many=True)
Solution Candidates
So this is what I've tried so far, but none of them seemed to work.
Creating a second Schema
Adding a second schema and modifying the Data schema from before yields:
class CoordinatesSchema(Schema):
x = fields.Float(required=True)
y = fields.Float(required=True)
class DataSchema(Schema):
coordinates = fields.Nested(coordinatesSchema, required=True)
d = fields.Float()
Having that in place raises the problem of having to go through every Data item and manually adding the Coordinates schema. My data is coming from a SQLAlchemy query returning a list of Data objects, so that I can easily dump them using schema_datas.
Using fields.Dict
Since Marshmallows fields module offers a dictionary, I tried that as well.
def DataSchema(Schema):
coordinates = fields.Dict(keys=fields.String(),
value=fields.Float(),
required=True,
default={
"x": Data.x,
"y": Data.y
})
d = fields.Float()
Doesn't seem to work either, because Marshmallow can't find Data.x and Data.y automatically when using schema_datas.dump().
Using Self-Nesting
The most logical solution path would be to self-nest. But (from what I understood reading the documentation) self-nesting only refers to nesting one or more other instances within the object. I want to nest the same instance.
class DataSchema(Schema):
x = fields.Float(required=True, load_only=True)
y = fields.Float(required=True, load_only=True)
coordinates = fields.Nested(
lambda: DataSchema(only=('x', 'y')),
dump_only=True)
But unfortunately this also didn't work.
Using Decorator #pre_dump
Inspired by this issue on Marshmallow's Github page, I tried to use the #pre_dump decorator to achieve the desired outcome, but failed again.
class CoordinatesSchema(Schema):
x = fields.Float(required=True)
y = fields.Float(required=True)
class DataSchema(Schema):
coordinates = fields.Nested(coordinatesSchema, required=True)
d = fields.Float()
#pre_dump
def group_coordinates(self, data, many):
return {
"coordinates": {
"x": data.x,
"y": data.y
},
"d": data.d
}
But I can't figure out how to do it properly...
So my question is, what am I doing wrong and how can I solve this problem?
Related
I am studying the library graphene, (https://github.com/graphql-python/graphene) and I was trying to understand how I can serialize / return a nested json into the graphene and perform the query in the correct way.
The code that I will insert below follows the example of the link available in the repository (it is at the end of the question).
import graphene
from graphene.types.resolver import dict_resolver
class User(graphene.ObjectType):
id = graphene.ID()
class Meta:
default_resolver = dict_resolver
class Patron(graphene.ObjectType):
id = graphene.ID()
name = graphene.String()
age = graphene.Int()
user = User
class Meta:
default_resolver = dict_resolver
class Query(graphene.ObjectType):
patron = graphene.Field(Patron)
#staticmethod
def resolve_patron(root, info):
return Patron(**{"id":1, "name": "Syrus", "age": 27, "user": {"id": 2}})
schema = graphene.Schema(query=Query)
query = """
query something{
patron {
id
}
}
"""
if __name__ == "__main__":
result = schema.execute(query)
print(result.data)
The idea is basically to be able to use a multi-level json to "resolve" with graphql. This example is very simple, in the actual use case I plan, there will be several levels in json.
I think that if you use the setattr at the lowest level of json and go up, it works, but I would like to know if someone has already implemented or found a more practical way of doing it.
original example:
https://github.com/graphql-python/graphene/blob/master/examples/simple_example.py
I have a specify use case but my question pertains to the best way of doing this in general.
I have three tables
Order - primary key order_id
OrderLine - Linking table with order_id, product_id and quantity. An order has 1 or more order lines
Product - primary key product_id, each order line has one product
In sqlachemy / python how do I generate nested JSON along the lines of:
{
"orders": [
{
"order_id": 1
"some_order_level_detail": "Kansas"
"order_lines": [
{
"product_id": 1,
"product_name": "Clawhammer",
"quantity": 5
},
...
]
},
...
]
}
Potential Ideas
Hack away doing successive queries
First idea which I want to get away from if possible is using list comprehesion and a brute force approach.
def get_json():
answer = {
"orders": [
{
"order_id": o.order_id,
"some_order_level_detail": o.some_order_level_detail,
"order_lines": [
{
"product_id": 1,
"product_name": Product.query.get(o_line.product_id).product_name,
"quantity": 5
}
for o_line in OrderLine.query.filter(order_id=o.order_id).all()
]
}
for o in Order.query.all()
]
}
This gets hard to maintain mixing the queries with json. Ideally I'd like to do a query first...
Get joined results first, somehow manipulate later
The second idea is to do a join query to join the three tables showing per row in OrderLine the order and product details.
My question to pythonista out there is is there a nice way to convert this to nested json.
Another way?
This really seems like such a common requirement I'm really wondering whether there is a book method for this sort of thing?
Is there an SQLAchemy version of this
Look into marshmallow-sqlalchemy, as it does exactly what you're looking for.
I strongly advise against baking your serialization directly into your model, as you will eventually have two services requesting the same data, but serialized in a different way (including fewer or more nested relationships for performance, for instance), and you will either end up with either (1) a lot of bugs that your test suite will miss unless you're checking for literally every field or (2) more data serialized than you need and you'll run into performance issues as the complexity of your application scales.
With marshmallow-sqlalchemy, you'll need to define a schema for each model you'd like to serialize. Yes, it's a bit of extra boilerplate, but believe me - you will be much happier in the end.
We build applications using flask-sqlalchemy and marshmallow-sqlalchemy like this (also highly recommend factory_boy so that you can mock your service and write unit tests in place of of integration tests that need to touch the database):
# models
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
children = relationship("Child", back_populates="parent")
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey('parent.id'))
parent = relationship('Parent', back_populates='children',
foreign_keys=[parent_id])
# schemas. Don't put these in your models. Avoid tight coupling here
from marshmallow_sqlalchemy import ModelSchema
import marshmallow as ma
class ParentSchema(ModelSchema):
children = ma.fields.Nested(
'myapp.schemas.child.Child', exclude=('parent',), many=True)
class Meta(ModelSchema.Meta):
model = Parent
strict = True
dump_only = ('id',)
class ChildSchema(ModelSchema):
parent = ma.fields.Nested(
'myapp.schemas.parent.Parent', exclude=('children',))
class Meta(ModelSchema.Meta):
model = Child
strict = True
dump_only = ('id',)
# services
class ParentService:
'''
This service intended for use exclusively by /api/parent
'''
def __init__(self, params, _session=None):
# your unit tests can pass in _session=MagicMock()
self.session = _session or db.session
self.params = params
def _parents(self) -> typing.List[Parent]:
return self.session.query(Parent).options(
joinedload(Parent.children)
).all()
def get(self):
schema = ParentSchema(only=(
# highly recommend specifying every field explicitly
# rather than implicit
'id',
'children.id',
))
return schema.dump(self._parents()).data
# views
#app.route('/api/parent')
def get_parents():
service = ParentService(params=request.get_json())
return jsonify(data=service.get())
# test factories
class ModelFactory(SQLAlchemyModelFactory):
class Meta:
abstract = True
sqlalchemy_session = db.session
class ParentFactory(ModelFactory):
id = factory.Sequence(lambda n: n + 1)
children = factory.SubFactory('tests.factory.children.ChildFactory')
class ChildFactory(ModelFactory):
id = factory.Sequence(lambda n: n + 1)
parent = factory.SubFactory('tests.factory.parent.ParentFactory')
# tests
from unittest.mock import MagicMock, patch
def test_can_serialize_parents():
parents = ParentFactory.build_batch(4)
session = MagicMock()
service = ParentService(params={}, _session=session)
assert service.session is session
with patch.object(service, '_parents') as _parents:
_parents.return_value = parents
assert service.get()[0]['id'] == parents[0].id
assert service.get()[1]['id'] == parents[1].id
assert service.get()[2]['id'] == parents[2].id
assert service.get()[3]['id'] == parents[3].id
I would add a .json() method to each model, so that they call each other. It's essentially your "hacked" solution but a bit more readable/maintainable. Your Order model could have:
def json(self):
return {
"id": self.id,
"order_lines": [line.json() for line in self.order_lines]
}
Your OrderLine model could have:
def json(self):
return {
"product_id": self.product_id,
"product_name": self.product.name,
"quantity": self.quantity
}
Your resource at the top level (where you're making the request for orders) could then do:
...
orders = Order.query.all()
return {"orders": [order.json() for order in orders]}
...
This is how I normally structure this JSON requirement.
Check my answer in this thread Flask Sqlalchmey - Marshmallow Nested Schema fails for joins with filter ( where ) conditions and using the Marshmallow package you include in your schema something like this:
name = fields.Nested(Schema, many=True)
I have an object stored in mongo that has a list of reference fields. In a restplus app I need to parse this list of objects and map them into a JSON doc to return for a client.
# Classes I have saved in Mongo
class ThingWithList(Document):
list_of_objects = ListField(ReferenceField(InfoHolder))
class InfoHolder(Document):
thing_id = StringField()
thing_i_care_about = ReferenceField(Info)
class Info(Document):
name = StringField()
foo = StringField()
bar = StringField()
I am finding iterating through the list to be very slow. I guess because I am having to do another database query every time I dereference children of objects in the list.
Simple (but rubbish) method:
info_to_return = []
thing = ThingWithList.get_from_id('thingsId')
for o in list_of_objects:
info = {
'id': o.id,
'name': o.thing_i_care_about.name,
'foo': o.thing_i_care_about.foo,
'bar': o.thing_i_care_about.bar
}
info_to_return.append(info)
return(info_to_return)
I thought I would be able to solve this by using select_related which sounds like it should do the dereferencing for me N levels deep so that I only do one big mongo call rather than several per iteration. When I add
thing.select_related(3)
it seems to have no effect. Have I just misunderstood what this function is for. How else could I speed up my query?
I'm writing a package that imports audio files, processes them, plots them etc., for research purposes.
At each stage of the pipeline, settings are pulled from a settings module as shown below.
I want to be able to update a global setting like MODEL_NAME and have it update in any dicts containing it too.
settings.py
MODEL_NAME = 'Test1'
DAT_DIR = 'dir1/dir2/'
PROCESSING = {
"key1":{
"subkey2":0,
"subkey3":1
},
"key2":{
"subkey3":MODEL_NAME
}
}
run.py
import settings as s
wavs = import_wavs(s.DAT_DIR)
proc_wavs = proc_wavs(wavs,s.PROCESSING)
Some of the settings dicts I would like to contain MODEL_NAME, which works fine. The problem arises when I want to change MODEL_NAME during runtime. So if I do:
import settings as s
wavs = import_wavs(s.DAT_DIR)
s.MODEL_NAME='test1'
proc_wavs1 = proc_wavs(wavs,s.PROCESSING)
s.MODEL_NAME='test2'
proc_wavs2 = proc_wavs(wavs,s.PROCESSING)
But obviously both the calls so s.PROCESSING will contain the MODEL_NAME originally assigned in the settings file.
What is the best way to have it update?
Possible solutions I've thought of:
Store the variables as a mutable type, then update it e.g.:
s.MODEL_NAME[0] = ["test1"]
# do processing things
s.MODEL_NAME[0] = ["test2"]
Define each setting category as a function instead, so it is rerun on
each call e.g.:
MODEL_NAME = 'test1' ..
def PROCESSING():
return {
"key1":{
"subkey2":0,
"subkey3":1
},
"key2":{
"subkey3":MODEL_NAME
}
}
Then
s.MODEL_NAME='test1'
proc_wavs1 = proc_wavs(wavs,s.PROCESSING())
s.MODEL_NAME='test2'
proc_wavs1 = proc_wavs(wavs,s.PROCESSING())
I thought this would work great, but then it's very difficult to
change any entries of the functions during runtime eg if I wanted to
update the value of subkey2 and run something else.
Other thoughts maybe a class with an update method or something, does anyone have any better ideas?
You want to configure generic and specific settings structured in dictionaries for functions that perform waves analysis.
Start by defining a settings class, like :
class Settings :
data_directory = 'path/to/waves'
def __init__(self, model):
self.parameters= {
"key1":{
"subkey1":0,
"subkey2":0
},
"key2":{
"subkey1":model
}
}
# create a new class based on model1
s1 = Settings('model1')
# attribute values to specific keys
s1.parameters["key1"]["subkey1"] = 3.1415926
s1.parameters["key1"]["subkey2"] = 42
# an other based on model2
s2 = Settings('model2')
s2.parameters["key1"]["subkey1"] = 360
s2.parameters["key1"]["subkey2"] = 1,618033989
# load the audio
wavs = openWaves(Settings.data_directory)
# process with the given parameters
results1 = processWaves(wavs,s1)
results2 = processWaves(wavs,s2)
I am using the following code to serialize my appengine datastore to JSON
class DictModel(db.Model):
def to_dict(self):
return dict([(p, unicode(getattr(self, p))) for p in self.properties()])
class commonWordTweets(DictModel):
commonWords = db.StringListProperty(required=True)
venue = db.ReferenceProperty(Venue, required=True, collection_name='commonWords')
class Venue(db.Model):
id = db.StringProperty(required=True)
fourSqid = db.StringProperty(required=False)
name = db.StringProperty(required=True)
twitter_ID = db.StringProperty(required=True)
This returns the following JSON response
[
{
"commonWords": "[u'storehouse', u'guinness', u'badge', u'2011"', u'"new', u'mayor', u'dublin)']",
"venue": "<__main__.Venue object at 0x1028ad190>"
}
]
How can I return the actual venue name to appear?
Firstly, although it's not exactly your question, it's strongly recommended to use simplejson to produce json, rather than trying to turn structures into json strings yourself.
To answer your question, the ReferenceProperty just acts as a reference to your Venue object. So you just use its attributes as per normal.
Try something like:
cwt = commonWordTweets() # Replace with code to get the item from your datastore
d = {"commonWords":cwt.commonWords, "venue": cwt.venue.name}
jsonout = simplejson.dumps(d)