Mongodb schema design for polymorphic objects - python

I'm new to MongoDB and am trying to design a simple schema for a set of python objects. I'm having a tough time working with the concept of polymorphism.
Below is some pseudo-code. How would you represent this inheritance hierarchy in MongoDB schema:
class A:
content = 'video' or 'image' or 'music'
data = contentData # where content may be video or image or music depending on content.
class videoData:
length = *
director = *
actors = *
class imageData:
dimensions = *
class musicData:
genre = *
The problem I'm facing is that the schema of A.data depends on A.content. How can A be represented in a mongodb schema?

Your documents could look like this:
{ _type: "video",
data: {
length: 120,
director: "Smith",
actors = ["Jones", "Lee"]
}
}
So, basically, "data" points to an embedded document with the document's type-specified fields.

This doesn't particularly answer your question, but you might check out Ming. It does polymorphism for you when it maps the document to the object.
http://merciless.sourceforge.net/tour.html

Related

Trying to use Embedded Documents Fields in MongoDB

I'm following freecodecamp's video on MongoDB using mongoengine (as db). I'm trying to use the embedded document list field to add information to my main document. Also using a Streamlit webapp as my input source
My class's are:
class Contest(db.Document):
date_created = db.DateTimeField(default=datetime.today)
name = db.StringField(required=True)
format = db.EmbeddedDocumentField(Format)
class Format(db.EmbeddedDocument):
contest_id = db.ObjectIdField()
name = db.StringField()
Then I've tried a few different ways to to add the format to a specific contest instance.
Try #1
def set_format(active_contest):
format : Format = None
name = st.text_input('Name of Format:')
submit = st.button('Set Format Name')
if submit == True:
format.contest_id = active_contest.id
format.name = name
active_contest.save()
setting Format to None is the way the freecodecamp video shows... but i get this error: AttributeError: 'NoneType' object has no attribute 'contest_id'.
So I tried switching it to: format = Format()... this way it doesn't give me an error, but also doesn't update the Contest document to include the format information.
I also tried switching active_contest.save() to format.save() but then i get a: AttributeError: 'Format' object has no attribute 'save'
I've also tried the update function instead of save... but i get similar errors every-which way.
New to mongoDB and programming in general. Thanks in advance!
First of all, if you want to store Format as embedded document, the contest_id is not necessary in Format class. With this approach you will end with something like this in your MongoDB collection:
{
"date_created":ISODate(...),
"name": "...",
"format": {
"name": "..."
}
}
Another approach could be something like:
class Contest(db.Document):
date_created = db.DateTimeField(default=datetime.today)
name = db.StringField(required=True)
format = db.ReferenceField('Format') # <- Replaced by ReferenceField
class Format(db.Document): # <- EmbeddedDocument replaced by Document
name = db.StringField()
In that case each instance of "Format" will be stored in a separate collection. So you will end with something like this in MongoDB:
Collection Contest:
{
"date_created":ISODate(...),
"name": "...",
"format": :ObjectId("...") // <-- here's the relation field
}
Collection Format:
{
"_id":"...",
"name":"..",
}
Both approaches shares the same code:
def set_format(active_contest): # <-- here's the instance of 'Contest'
format : Format = Format() # <-- create a new Format instance
name = st.text_input('Name of Format:')
submit = st.button('Set Format Name')
if submit == True:
format.name = name
active_contest.format = format # <-- assigns the format to contest
active_contest.save() <- stores both because you are saving the 'parent' object

how to serialize a nested json inside a graphene resolve?

I am studying the library graphene, (https://github.com/graphql-python/graphene) and I was trying to understand how I can serialize / return a nested json into the graphene and perform the query in the correct way.
The code that I will insert below follows the example of the link available in the repository (it is at the end of the question).
import graphene
from graphene.types.resolver import dict_resolver
class User(graphene.ObjectType):
id = graphene.ID()
class Meta:
default_resolver = dict_resolver
class Patron(graphene.ObjectType):
id = graphene.ID()
name = graphene.String()
age = graphene.Int()
user = User
class Meta:
default_resolver = dict_resolver
class Query(graphene.ObjectType):
patron = graphene.Field(Patron)
#staticmethod
def resolve_patron(root, info):
return Patron(**{"id":1, "name": "Syrus", "age": 27, "user": {"id": 2}})
schema = graphene.Schema(query=Query)
query = """
query something{
patron {
id
}
}
"""
if __name__ == "__main__":
result = schema.execute(query)
print(result.data)
The idea is basically to be able to use a multi-level json to "resolve" with graphql. This example is very simple, in the actual use case I plan, there will be several levels in json.
I think that if you use the setattr at the lowest level of json and go up, it works, but I would like to know if someone has already implemented or found a more practical way of doing it.
original example:
https://github.com/graphql-python/graphene/blob/master/examples/simple_example.py

Python mongoengine select_related(n) not doing what I expected

I have an object stored in mongo that has a list of reference fields. In a restplus app I need to parse this list of objects and map them into a JSON doc to return for a client.
# Classes I have saved in Mongo
class ThingWithList(Document):
list_of_objects = ListField(ReferenceField(InfoHolder))
class InfoHolder(Document):
thing_id = StringField()
thing_i_care_about = ReferenceField(Info)
class Info(Document):
name = StringField()
foo = StringField()
bar = StringField()
I am finding iterating through the list to be very slow. I guess because I am having to do another database query every time I dereference children of objects in the list.
Simple (but rubbish) method:
info_to_return = []
thing = ThingWithList.get_from_id('thingsId')
for o in list_of_objects:
info = {
'id': o.id,
'name': o.thing_i_care_about.name,
'foo': o.thing_i_care_about.foo,
'bar': o.thing_i_care_about.bar
}
info_to_return.append(info)
return(info_to_return)
I thought I would be able to solve this by using select_related which sounds like it should do the dereferencing for me N levels deep so that I only do one big mongo call rather than several per iteration. When I add
thing.select_related(3)
it seems to have no effect. Have I just misunderstood what this function is for. How else could I speed up my query?

Include OneToMany relationship in query in peewee (Flask)

I have two models Storage and Drawers
class Storage(BaseModel):
id = PrimaryKeyField()
name = CharField()
description = CharField(null=True)
class Drawer(BaseModel):
id = PrimaryKeyField()
name = CharField()
storage = ForeignKeyField(Storage, related_name="drawers")
at the moment I'm producing json from a select query
storages = Storage.select()
As a result I have got a json array, which looks like this:
[{
description: null,
id: 1,
name: "Storage"
},
{
description: null,
id: 2,
name: "Storage 2"
}]
I know, that peewee allows to query for all drawers with storage.drawer(). But I'm struggling to include a json array to every storage which contains all drawers of that storage. I tried to use a join
storages = Storage.select(Storage, Drawer)
.join(Drawer)
.where(Drawer.storage == Storage.id)
.group_by(Storage.id)
But I just retrieve the second storage which does have drawers, but the array of drawers is not included. Is this even possible with joins? Or do I need to iterate over every storage retrieve the drawers and append them to the storage?
This is the classic O(n) query problem for ORMs. The documentation goes into some detail on various ways to approach the problem.
For this case, you will probably want prefetch(). Instead of O(n) queries, it will execute O(k) queries, one for each table involved (so 2 in your case).
storages = Storage.select().order_by(Storage.name)
drawers = Drawer.select().order_by(Drawer.name)
query = prefetch(storages, drawers)
To serialize this, we'll iterate through the Storage objects returned by prefetch. The associated drawers will have been pre-populated using the Drawer.storage foreign key's related_name + '_prefetch' (drawers_prefetch):
accum = []
for storage in query:
data = {'name': storage.name, 'description': storage.description}
data['drawers'] = [{'name': drawer.name}
for drawer in storage.drawers_prefetch]
accum.append(data)
To make this even easier you can use the playhouse.shortcuts.model_to_dict helper:
accum = []
for storage in query:
accum.append(model_to_dict(storage, backrefs=True, recurse=True))

Serializing ReferenceProperty in Appengine Datastore to JSON

I am using the following code to serialize my appengine datastore to JSON
class DictModel(db.Model):
def to_dict(self):
return dict([(p, unicode(getattr(self, p))) for p in self.properties()])
class commonWordTweets(DictModel):
commonWords = db.StringListProperty(required=True)
venue = db.ReferenceProperty(Venue, required=True, collection_name='commonWords')
class Venue(db.Model):
id = db.StringProperty(required=True)
fourSqid = db.StringProperty(required=False)
name = db.StringProperty(required=True)
twitter_ID = db.StringProperty(required=True)
This returns the following JSON response
[
{
"commonWords": "[u'storehouse', u'guinness', u'badge', u'2011"', u'"new', u'mayor', u'dublin)']",
"venue": "<__main__.Venue object at 0x1028ad190>"
}
]
How can I return the actual venue name to appear?
Firstly, although it's not exactly your question, it's strongly recommended to use simplejson to produce json, rather than trying to turn structures into json strings yourself.
To answer your question, the ReferenceProperty just acts as a reference to your Venue object. So you just use its attributes as per normal.
Try something like:
cwt = commonWordTweets() # Replace with code to get the item from your datastore
d = {"commonWords":cwt.commonWords, "venue": cwt.venue.name}
jsonout = simplejson.dumps(d)

Categories