Query based in Embedded Documents List fields in mongoengine - python

I'm running into an issue using mongoengine. A raw query that works on Compass isn't working using _ _ raw _ _ on mongoengine. I'd like to rewrite it using mongoengine's methods, but I'd like to understand why it's not working using _ _ raw_ _ either.
I'm running an embedded document list field that has inheritence. The query is "give me all sequences that are have a 'type A' Assignment "
My schema:
class Sequence(Document):
seq = StringField(required = True)
samples = EmbeddedDocumentListField(Sample)
assignments = EmbeddedDocumentListField(Assignment)
class Sample(EmbeddedDocument):
name = StringField()
class Assignment(EmbeddedDocument):
name = StringField()
meta = {'allow_inheritance': True}
class TypeA(Assignment):
pass
class TypeB(Assignment):
other_field = StringField()
pass
Writing {'assignments._cls': 'TypeA'} into Compass returns a list. But on mongoengine I get an empty field:
from mongo_objects import Sequence
def get_samples_assigned_as_class(cls : str):
query_raw = Sequence.objects(__raw__={'assignments._cls': cls}) # raw query, fails
#query2 = Sequence.objects(assignments___cls = cls) # Fist attempt, failed
#query3 = Sequence.objects.get().assignments.filter(cls = cls) # Second attempt, also failed. Didn't like that it queried everything first
print(query_raw) # empty list, iterating does nothing.
get_samples_assigned_as_class('TypeA')
"Assignments" is a list because one sequence may have multiples of the same class. An in depth awnser on how to query these lists for categorical information would be ideal, as I'm not sure how to properly go about it. I'm mostly filtering on the inheritence _cls, but eventually I'd like to do nested queries (cls : TypeA, sample : Sample_1)
Thanks

Related

Django Query to Combine Multiple ArrayFields to one text string

I have an object model where Documents are long text files that can have Attachments and both sets of objects can also have spreadsheet-like Tables. Each table has a rectangular array with text. I want users to be able to search for a keyword across the table contents, but the results will be displayed by the main document (so instead of seeing each table that matches, you'll just see the document that has the most tables that match your query).
Below you can see a test query I'm trying to run that in an ideal world would convert all of the table contents (across all attachments) to one long string, that I can then pass to a SearchHighlight to make the headline. For some reason, the test query returns the tables as different objects, rather than concatenated to one long string.
I'm using a custom function that mimics the Postgres 13 StringAgg as I'm using Postgres 10.
Thanks in advance for your help, let me know if I need to provide more information to replicate this.
my models.py:
class Document(AbstractDocument):
tables = GenericRelation(Table)
class Attachment(AbstractDocument):
tables_new = GenericRelation(Table)
main_document = ForeignKey(Document, on_delete=CASCADE, related_name="attachments")
class Table(models.Model):
content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
object_id = models.SlugField()
content_object = GenericForeignKey()
content = ArrayField(ArrayField(models.TextField(null=True)))
my query:
def myStringAgg(field: str):
return Func(
F(field),
Value(" "),
Value(""),
function="array_to_string",
output_field=models.TextField(),
)
s = Document.objects.all() \
.annotate(tt=myStringAgg("attachments__tables__content")) \
.values_list('tt', flat=True)
# what I get
>>> <DocumentSet ['table1', 'table2']>
# what I want
>>> <DocumentSet ['table1 table2']>
I'm using Django 3.2 and Postgres 10.
To clarify what my full scope is, this what the final query would look like:
qs = Document.objects.filter(
Q(tables__search_vector=query) |
Q(attachments__tables__search_vector=query)
)
.annotate(rank=rank)
.order_by("-rank")
.annotate(snippet=SearchHeadline(
myStringAgg("attachments__tables__content"),
query, max_fragments=5)
)
You can use the join function to create a string from a list:
s = Document.objects.all() \
.annotate(tt=myStringAgg("attachments__tables__content")) \
.values_list('tt', flat=True)
s = " ". join(list(s))

Python mongoengine select_related(n) not doing what I expected

I have an object stored in mongo that has a list of reference fields. In a restplus app I need to parse this list of objects and map them into a JSON doc to return for a client.
# Classes I have saved in Mongo
class ThingWithList(Document):
list_of_objects = ListField(ReferenceField(InfoHolder))
class InfoHolder(Document):
thing_id = StringField()
thing_i_care_about = ReferenceField(Info)
class Info(Document):
name = StringField()
foo = StringField()
bar = StringField()
I am finding iterating through the list to be very slow. I guess because I am having to do another database query every time I dereference children of objects in the list.
Simple (but rubbish) method:
info_to_return = []
thing = ThingWithList.get_from_id('thingsId')
for o in list_of_objects:
info = {
'id': o.id,
'name': o.thing_i_care_about.name,
'foo': o.thing_i_care_about.foo,
'bar': o.thing_i_care_about.bar
}
info_to_return.append(info)
return(info_to_return)
I thought I would be able to solve this by using select_related which sounds like it should do the dereferencing for me N levels deep so that I only do one big mongo call rather than several per iteration. When I add
thing.select_related(3)
it seems to have no effect. Have I just misunderstood what this function is for. How else could I speed up my query?

How to update object returned in query

So I'm a flask/sqlalchemy newbie but this seems like it should be a pretty simple. Yet for the life of me I can't get it to work and I can't find any documentation for this anywhere online. I have a somewhat complex query I run that returns me a list of database objects.
items = db.session.query(X, func.count(Y.x_id).label('total')).filter(X.size >= size).outerjoin(Y, X.x_id == Y.x_id).group_by(X.x_id).order_by('total ASC')\
.limit(20).all()
after I get this list of items I want to loop through the list and for each item update some property on it.
for it in items:
it.some_property = 'xyz'
db.session.commit()
However what's happening is that I'm getting an error
it.some_property = 'xyz'
AttributeError: 'result' object has no attribute 'some_property'
I'm not crazy. I'm positive that the property does exist on model X which is subclassed from db.Model. Something about the query is preventing me from accessing the attributes even though I can clearly see they exist in the debugger. Any help would be appreciated.
class X(db.Model):
x_id = db.Column(db.Integer, primary_key=True)
size = db.Column(db.Integer, nullable=False)
oords = db.relationship('Oords', lazy=True, backref=db.backref('x', lazy='joined'))
def __init__(self, capacity):
self.size = size
Given your example your result objects do not have the attribute some_property, just like the exception says. (Neither do model X objects, but I hope that's just an error in the example.)
They have the explicitly labeled total as second column and the model X instance as the first column. If you mean to access a property of the X instance, access that first from the result row, either using index, or the implicit label X:
items = db.session.query(X, func.count(Y.x_id).label('total')).\
filter(X.size >= size).\
outerjoin(Y, X.x_id == Y.x_id).\
group_by(X.x_id).\
order_by('total ASC').\
limit(20).\
all()
# Unpack a result object
for x, total in items:
x.some_property = 'xyz'
# Please commit after *all* the changes.
db.session.commit()
As noted in the other answer you could use bulk operations as well, though your limit(20) will make that a lot more challenging.
You should use the update function.
Like that:
from sqlalchemy import update
stmt = update(users).where(users.c.id==5).\
values(name='user #5')
Or :
session = self.db.get_session()
session.query(Organisation).filter_by(id_organisation = organisation.id_organisation).\
update(
{
"name" : organisation.name,
"type" : organisation.type,
}, synchronize_session = False)
session.commit();
session.close()
The sqlAlchemy doc : http://docs.sqlalchemy.org/en/latest/core/dml.html

django-tables2 add dynamic columns to table class from hstore

My general question is: can I use the data stored in a HStoreField (Django 1.8.9) to generate columns dynamically for an existing Table class of django-tables2? As an example below, say I have a model:
from django.contrib.postgres import fields as pgfields
GameSession(models.Model):
user = models.ForeignKey('profile.GamerProfile')
game = models.ForeignKey('games.Game')
last_achievement = models.ForeignKey('games.Achievement')
extra_info = pgfields.HStoreField(null=True, blank=True)
Now, say I have a table defined as:
GameSessionTable(tables.Table):
class Meta(BaseMetaTable):
model = GameSession
fields = []
orderable=False
id = tables.LinkColumn(accessor='id', verbose_name='Id', viewname='reporting:session_stats', args=[A('id')], attrs={'a':{'target':'_blank'}})
started = DateTimeColumn(accessor='startdata.when_started', verbose_name='Started')
stopped = DateTimeColumn(accessor='stopdata.when_stopped', verbose_name='Stopped')
game_name = tables.LinkColumn(accessor='game.name', verbose_name='Game name', viewname='reporting:game_stats', args=[A('mainjob.id')], attrs={'a':{'target':'_blank'}})
I want to be able to add columns for each of the keys stored in the extra_info column for all of the GameSessions. I have tried to override the init() method of the GameSessionTable class, where I have access to the queryset, then make a set of all the keys of my GameSession objects, then add them to self, however that doesn't seem to work. Code below:
def __init__(self, data, *args, **kwargs):
super(GameSessionTable, self).__init__(data, *args, **kwargs)
if data:
extra_cols=[]
# just to be sure, check that the model has the extra_info HStore field
if data.model._meta.get_field('extra_info'):
extra_cols = list(set([item for q in data if q.extra_info for item in q.extra_info.keys()]))
for col in extra_cols:
self.columns.columns[col] = tables.Column(accessor='extra_info.%s' %col, verbose_name=col.replace("_", " ").title())
Just a mention, I have had a look at https://spapas.github.io/2015/10/05/django-dynamic-tables-similar-models/#introduction but it's not been much help because the use case there is related to the fields of a model, whereas my situation is slightly different as you can see above.
Just wanted to check, is this even possible or do I have to define an entirely different table for this data, or potentially use an entirely different library altogether like django-reports-builder?
Managed to figure this out to a certain extent. The code I was running above was slightly wrong, so I updated it to run my code before the superclass init() gets run, and changed where I was adding the columns.
As a result, my init() function now looks like this:
def __init__(self, data, *args, **kwargs):
if data:
extra_cols=[]
# just to be sure, check that the model has the extra_info HStore field
if data.model._meta.get_field('extra_info'):
extra_cols = list(set([item for q in data if q.extra_info for item in q.extra_info.keys()]))
for col in extra_cols:
self.base_columns[col] = tables.Column(accessor='extra_info.%s' %col, verbose_name=col.replace("_", " ").title())
super(GameSessionTable, self).__init__(data, *args, **kwargs)
Note that I replaced self.columns.columns (which were BoundColumn instances) with self.base_columns. This allows the superclass to then consider these as well when initializing the Table class.
Might not be the most elegant solution, but it seems to do the trick for me.

SQLAlchemy: Dynamically loading tables from a list

I am trying to create a program that loads in over 100 tables from a database so that I can change all appearances of a user's user id.
Rather than map all of the tables individually, I decided to use a loop to map each of the tables using an array of objects. This way, the table definitions can be stored in a config file and later updated.
Here is my code so far:
def init_model(engine):
"""Call me before using any of the tables or classes in the model"""
meta.Session.configure(bind=engine)
meta.engine = engine
class Table:
tableID = ''
primaryKey = ''
pkType = sa.types.String()
class mappedClass(object):
pass
WIW_TBL = Table()
LOCATIONS_TBL = Table()
WIW_TBL.tableID = "wiw_tbl"
WIW_TBL.primaryKey = "PORTAL_USERID"
WIW_TBL.pkType = sa.types.String()
LOCATIONS_TBL.tableID = "locations_tbl"
LOCATIONS_TBL.primaryKey = "LOCATION_CODE"
LOCATIONS_TBL.pkType = sa.types.Integer()
tableList = ([WIW_TBL, LOCATIONS_TBL])
for i in tableList:
i.tableID = sa.Table(i.tableID.upper(), meta.metadata,
sa.Column(i.primaryKey, i.pkType, primary_key=True),
autoload=True,
autoload_with=engine)
orm.mapper(i.mappedClass, i.tableID)
The error that this code returns is:
sqlalchemy.exc.ArgumentError: Class '<class 'changeofname.model.mappedClass'>' already has a primary mapper defined. Use non_primary=True to create a non primary Mapper. clear_mappers() will remove *all* current mappers from all classes.
I cant use clear_mappers as it wipes all of the classes and the entity_name scheme doesn't seem to apply here.
It seems that every object wants to use the same class, although they all should have their own instance of it.
Does anyone have any ideas?
Well, in your case it *is the same Class you try to map to different Tables. To solve this, create a class dynamically for each Table:
class Table(object):
tableID = ''
primaryKey = ''
pkType = sa.types.String()
def __init__(self):
self.mappedClass = type('TempClass', (object,), {})
But I would prefer slightly cleaner version:
class Table2(object):
def __init__(self, table_id, pk_name, pk_type):
self.tableID = table_id
self.primaryKey = pk_name
self.pkType = pk_type
self.mappedClass = type('Class_' + self.tableID, (object,), {})
# ...
WIW_TBL = Table2("wiw_tbl", "PORTAL_USERID", sa.types.String())
LOCATIONS_TBL = Table2("locations_tbl", "LOCATION_CODE", sa.types.Integer())

Categories