I have an object stored in mongo that has a list of reference fields. In a restplus app I need to parse this list of objects and map them into a JSON doc to return for a client.
# Classes I have saved in Mongo
class ThingWithList(Document):
list_of_objects = ListField(ReferenceField(InfoHolder))
class InfoHolder(Document):
thing_id = StringField()
thing_i_care_about = ReferenceField(Info)
class Info(Document):
name = StringField()
foo = StringField()
bar = StringField()
I am finding iterating through the list to be very slow. I guess because I am having to do another database query every time I dereference children of objects in the list.
Simple (but rubbish) method:
info_to_return = []
thing = ThingWithList.get_from_id('thingsId')
for o in list_of_objects:
info = {
'id': o.id,
'name': o.thing_i_care_about.name,
'foo': o.thing_i_care_about.foo,
'bar': o.thing_i_care_about.bar
}
info_to_return.append(info)
return(info_to_return)
I thought I would be able to solve this by using select_related which sounds like it should do the dereferencing for me N levels deep so that I only do one big mongo call rather than several per iteration. When I add
thing.select_related(3)
it seems to have no effect. Have I just misunderstood what this function is for. How else could I speed up my query?
Related
I'm running into an issue using mongoengine. A raw query that works on Compass isn't working using _ _ raw _ _ on mongoengine. I'd like to rewrite it using mongoengine's methods, but I'd like to understand why it's not working using _ _ raw_ _ either.
I'm running an embedded document list field that has inheritence. The query is "give me all sequences that are have a 'type A' Assignment "
My schema:
class Sequence(Document):
seq = StringField(required = True)
samples = EmbeddedDocumentListField(Sample)
assignments = EmbeddedDocumentListField(Assignment)
class Sample(EmbeddedDocument):
name = StringField()
class Assignment(EmbeddedDocument):
name = StringField()
meta = {'allow_inheritance': True}
class TypeA(Assignment):
pass
class TypeB(Assignment):
other_field = StringField()
pass
Writing {'assignments._cls': 'TypeA'} into Compass returns a list. But on mongoengine I get an empty field:
from mongo_objects import Sequence
def get_samples_assigned_as_class(cls : str):
query_raw = Sequence.objects(__raw__={'assignments._cls': cls}) # raw query, fails
#query2 = Sequence.objects(assignments___cls = cls) # Fist attempt, failed
#query3 = Sequence.objects.get().assignments.filter(cls = cls) # Second attempt, also failed. Didn't like that it queried everything first
print(query_raw) # empty list, iterating does nothing.
get_samples_assigned_as_class('TypeA')
"Assignments" is a list because one sequence may have multiples of the same class. An in depth awnser on how to query these lists for categorical information would be ideal, as I'm not sure how to properly go about it. I'm mostly filtering on the inheritence _cls, but eventually I'd like to do nested queries (cls : TypeA, sample : Sample_1)
Thanks
In Django, can I re-use an existing Q object on multiple models, without writing the same filters twice?
I was thinking about something along the lines of the pseudo-Django code below, but did not find anything relevant in the documentation :
class Author(Model):
name = TextField()
company_name = TextField()
class Book(Model):
author = ForeignKey(Author)
# Create a Q object for the Author model
q_author = Q(company_name="Books & co.")
# Use it to retrieve Book objects
qs = Book.objects.filter(author__matches=q_author)
If that is not possible, can I extend an existing Q object to work on a related field? Pseudo-example :
# q_book == Q(author__company_name="Books & co.")
q_book = q_author.extend("author")
# Use it to retrieve Book objects
qs = Book.objects.filter(q_book)
The only thing I've found that comes close is using a subquery, which is a bit unwieldy :
qs = Book.objects.filter(author__in=Author.objects.filter(q_author))
From what I can tell by your comment, it just looks like you're trying to pass a set of common arguments to multiple filters, to do that you can just unpack a dictionary
The values in the dictionary can still be q objects if required as if it were a value you would pass in to the filter argument normally
args = { 'author__company_name': "Books & co" }
qs = Book.objects.filter(**args)
args['author_name'] = 'Foo'
qs = Book.objects.filter(**args)
To share this between different models, you'd have to do some dictionary mangling
author_args = { k.lstrip('author__'): v for k, v in args.items }
You can do this
books = Book.objects.filter(author__company_name="Books & co")
So I'm a flask/sqlalchemy newbie but this seems like it should be a pretty simple. Yet for the life of me I can't get it to work and I can't find any documentation for this anywhere online. I have a somewhat complex query I run that returns me a list of database objects.
items = db.session.query(X, func.count(Y.x_id).label('total')).filter(X.size >= size).outerjoin(Y, X.x_id == Y.x_id).group_by(X.x_id).order_by('total ASC')\
.limit(20).all()
after I get this list of items I want to loop through the list and for each item update some property on it.
for it in items:
it.some_property = 'xyz'
db.session.commit()
However what's happening is that I'm getting an error
it.some_property = 'xyz'
AttributeError: 'result' object has no attribute 'some_property'
I'm not crazy. I'm positive that the property does exist on model X which is subclassed from db.Model. Something about the query is preventing me from accessing the attributes even though I can clearly see they exist in the debugger. Any help would be appreciated.
class X(db.Model):
x_id = db.Column(db.Integer, primary_key=True)
size = db.Column(db.Integer, nullable=False)
oords = db.relationship('Oords', lazy=True, backref=db.backref('x', lazy='joined'))
def __init__(self, capacity):
self.size = size
Given your example your result objects do not have the attribute some_property, just like the exception says. (Neither do model X objects, but I hope that's just an error in the example.)
They have the explicitly labeled total as second column and the model X instance as the first column. If you mean to access a property of the X instance, access that first from the result row, either using index, or the implicit label X:
items = db.session.query(X, func.count(Y.x_id).label('total')).\
filter(X.size >= size).\
outerjoin(Y, X.x_id == Y.x_id).\
group_by(X.x_id).\
order_by('total ASC').\
limit(20).\
all()
# Unpack a result object
for x, total in items:
x.some_property = 'xyz'
# Please commit after *all* the changes.
db.session.commit()
As noted in the other answer you could use bulk operations as well, though your limit(20) will make that a lot more challenging.
You should use the update function.
Like that:
from sqlalchemy import update
stmt = update(users).where(users.c.id==5).\
values(name='user #5')
Or :
session = self.db.get_session()
session.query(Organisation).filter_by(id_organisation = organisation.id_organisation).\
update(
{
"name" : organisation.name,
"type" : organisation.type,
}, synchronize_session = False)
session.commit();
session.close()
The sqlAlchemy doc : http://docs.sqlalchemy.org/en/latest/core/dml.html
I would like to be able to check if a related object has already been fetched by using either select_related or prefetch_related, so that I can serialize the data accordingly. Here is an example:
class Address(models.Model):
street = models.CharField(max_length=100)
zip = models.CharField(max_length=10)
class Person(models.Model):
name = models.CharField(max_length=20)
address = models.ForeignKey(Address)
def serialize_address(address):
return {
"id": address.id,
"street": address.street,
"zip": address.zip
}
def serialize_person(person):
result = {
"id": person.id,
"name": person.name
}
if is_fetched(person.address):
result["address"] = serialize_address(person.address)
else:
result["address"] = None
######
person_a = Person.objects.select_related("address").get(id=1)
person_b = Person.objects.get(id=2)
serialize_person(person_a) #should be object with id, name and address
serialize_person(person_b) #should be object with only id and name
In this example, the function is_fetched is what I am looking for. I would like to determine if the person object already has a resolves address and only if it has, it should be serialized as well. But if it doesn't, no further database query should be executed.
So is there a way to achieve this in Django?
Since Django 2.0 you can easily check for all fetched relation by:
obj._state.fields_cache
ModelStateFieldsCacheDescriptor is responsible for storing your cached relations.
>>> Person.objects.first()._state.fields_cache
{}
>>> Person.objects.select_related('address').first()._state.fields_cache
{'address': <Address: Your Address>}
If the address relation has been fetched, then the Person object will have a populated attribute called _address_cache; you can check this.
def is_fetched(obj, relation_name):
cache_name = '_{}_cache'.format(relation_name)
return getattr(obj, cache_name, False)
Note you'd need to call this with the object and the name of the relation:
is_fetched(person, 'address')
since doing person.address would trigger the fetch immediately.
Edit reverse or many-to-many relations can only be fetched by prefetch_related; that populates a single attribute, _prefetched_objects_cache, which is a dict of lists where the key is the name of the related model. Eg if you do:
addresses = Address.objects.prefetch_related('person_set')
then each item in addresses will have a _prefetched_objects_cache dict containing a "person' key.
Note, both of these are single-underscore attributes which means they are part of the private API; you're free to use them, but Django is also free to change them in future releases.
Per this comment on the ticket linked in the comment by #jaap3 above, the recommended way to do this for Django 3+ (perhaps 2+?) is to use the undocumented is_cached method on the model's field, which comes from this internal mixin:
>>> person1 = Person.objects.first()
>>> Person.address.is_cached(person1)
False
>>> person2 = Person.objects.select_related('address').last()
>>> Person.address.is_cached(person2)
True
I am using the following code to serialize my appengine datastore to JSON
class DictModel(db.Model):
def to_dict(self):
return dict([(p, unicode(getattr(self, p))) for p in self.properties()])
class commonWordTweets(DictModel):
commonWords = db.StringListProperty(required=True)
venue = db.ReferenceProperty(Venue, required=True, collection_name='commonWords')
class Venue(db.Model):
id = db.StringProperty(required=True)
fourSqid = db.StringProperty(required=False)
name = db.StringProperty(required=True)
twitter_ID = db.StringProperty(required=True)
This returns the following JSON response
[
{
"commonWords": "[u'storehouse', u'guinness', u'badge', u'2011"', u'"new', u'mayor', u'dublin)']",
"venue": "<__main__.Venue object at 0x1028ad190>"
}
]
How can I return the actual venue name to appear?
Firstly, although it's not exactly your question, it's strongly recommended to use simplejson to produce json, rather than trying to turn structures into json strings yourself.
To answer your question, the ReferenceProperty just acts as a reference to your Venue object. So you just use its attributes as per normal.
Try something like:
cwt = commonWordTweets() # Replace with code to get the item from your datastore
d = {"commonWords":cwt.commonWords, "venue": cwt.venue.name}
jsonout = simplejson.dumps(d)