How to use order_by in Django queryset? - python

I apply order_by to a queryset and the results are mixed up.
Is there any difference if I apply it to a queryset?
The table has three columns: Name Age Description
Here is the code:
Query = table1.objects.filter(Name__iexact='Nico').order_by('Age').distinct()
ages = Query.values_list('Age')
descr= Query.values_list('Description')
Here only Age is sorted correctly. The match between these Age and Description is distorted. What is wrong here?

What is wrong ? Well, actually, the design itself. "Parallel lists" - two lists (or similar types) matched on position - is an antipattern. It's bulky, harder to use (you have to keep track of indexes and subscript the two lists), and brittle (if anything changes one of the lists for whatever reason, the data are not matched anymore).
If you need to keep both informations matched, keep them in a single list (or here queryset) as tuples (here query.values_list("Age", "Description")) or dicts (here query.values("Age", "Description")). This will make your code simpler and safer.
NB: I of course assume you want to match those two querysets given your mention that "The match between these Age and Description is distorted".

Try to have your order_by on the end after distinct.
Either that or try
Query = table1.objects.filter(Name__iexact='Nico').distinct()
Query = Query.order_by('Age')

Related

SQLAlchemy - Search entire DB model columns for a pattern

I have a database which I need to return any value that matches the query or looks similar.
Using flask-sqlalchemy I can filter manually, but I'm having trouble getting a list of objects using list comprehensions or any other more pythonic method.
I have already tried to create a dict with all model columns (which I have a list ) and the search value passed to the filter_by query
column_dict_and_value = {column_name:'value' for column_name in columns}
a = model.query.filter_by(**column_dict_and_value).all()
...but doesn't even return an existing object, although
a = model.query.filter_by(column_name='value').all()
actually returns the object.
I tried filter(.like('%')) and it sort of works
a = model.query.filter(model.column_name.like('valu%')).all()
returns me a list of all the objects that matches that pattern but for that column, and I want to iterate over all columns. Basically a full blown search since I'm not sure what the user is going to look for and I want to show the object if the query exist on any column or a list of objects if the query is incomplete, like the * in any search.
I tried to use list comprehensions to iterate through the model attributes but I'm not allowed to do that apparently:
a = [model.query.filter(model.column.like('valu%')) for column in columns]
...this complains that column is not an attribute of the model, which makes sense due to the syntax, but I had to try anyway, innit?
I'm a newbie regarding databases and class objects so please be gentle. I tried to look for similar answers but I can't find the one that suits me. the filter_by(one_column, second_column, etc) doesn't look to me very pythonic and if for any reason I change the model i need to change the query too where as creating a dict comprehension of the model seems to me more foolproof.
SOLUTION
Based on calestini proposed answer I wrote this code that seems to do the trick. It also cleans the list because if all() doesn't have any result (since it looks in every field) it returns None and adds up to the list. I'd rather prefer it clean.
Note: All my fields are text. Please check the third proposed solution from calestini if yours differ. Not tested it though.
columns = [
"foo",
"bar",
"foobar"
]
def list_everything(search):
d = {column: search for column in columns}
raw = [
model.query.filter(getattr(model, col).ilike(f"{val}%")).all()
for col, val in d.items()
]
return [item for item in raw if item]
I'll keep optimizing and update this code if I come with a better solution. Thanks a lot
Probably the reason why you are not getting any result from
a = model.query.filter_by(**column_dict_and_value).all()
is because filter_by is testing for direct equality. In your case you are looking for a pattern, so you will need to loop and use filter as opposed to filter_by.
Assuming all your columns are String or Text types, you can try the following:
a = model.query
for col, val in column_dict_and_value.items():
a = a.filter(getattr(model, col).ilike(f'{val}%')) #ilike for case insensitive
a = a.all()
Or vs And
The issue can also be that in your case you could be testing for intersection but expecting union. In other words, you are returning something only if the pattern matches in all columns. If instead you want a result in case any column matches, then you need to tweak the code a bit:
condition = or_(*[getattr(model, col).ilike(f'{val}%') for col, val in column_dict_and_value.items()])
a = model.query.filter(codition).all()
Now if you actually have other data types among the columns, then you can try to first verify if the column type and then pass the same logic:
for col, val in column_dict_and_value.items():
## check if field python equivalent is string
if isinstance(class_.__table__.c[col].type.python_type, str):
...

Django: distinct on a foreign key, then ordering

I have two models, Track and Pair. Each Pair has a track1, track2 and popularity. I'm trying to get an ordered list by popularity (descending) of pairs, with no two pairs having the same track1. Here's what I've tried so far:
lstPairs = Pair.objects.order_by('-popularity','track1__id').distinct('track1__id')[:iNumPairs].values_list('track1__id', 'track2__id', 'popularity')
This gave me the following error:
ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
...so I tried this:
lstPairs = Pair.objects.order_by('-popularity','track1__id').distinct('popularity', 'track1__id')[:iNumPairs].values_list('track1__id', 'track2__id', 'popularity')
This gave me entries with duplicate track1__ids. Does anyone know of a way of solving this problem? I'm guessing I'll have to use raw() or something similar but I don't know how I'd approach a problem like this. I'm using PostgreSQL for the DB backend so DISTINCT should be supported.
First, let's clarify: DISTINCT is standard SQL, while DISTINCT ON is a PostgreSQL extension.
The error (DISTINCT ON expressions must match initial ORDER BY expressions) indicates, that you should fix your ORDER BY clause, not the DISTINT ON (if you do that, you'll end up with different results, like you already experienced).
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.
This will give you your expected results:
lstPairs = Pair.objects.order_by('track1__id','-popularity').distinct('track1__id')[:iNumPairs].values_list('track1__id', 'track2__id', 'popularity')
In SQL:
SELECT DISTINCT ON (track1__id) track1__id, track2__id, popularity
FROM pairs
ORDER BY track1__id, popularity DESC
But probably in a wrong order.
If you want your original order, you can use a sub-query here:
SELECT *
FROM (
SELECT DISTINCT ON (track1__id) track1__id, track2__id, popularity
FROM pairs
ORDER BY track1__id
-- LIMIT here, if necessary
)
ORDER BY popularity DESC, track1__id
See the documentation on distinct.
First:
On PostgreSQL only, you can pass positional arguments (*fields) in order to specify the names of fields to which the DISTINCT should apply.
You dont' specify what is your database backend, if it is not PostrgreSQL you have no chance to make it work.
Second:
When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order.
I think that you should use raw(), or get the entire list of Pairs ordered by popularity and then make the filtering by track1 uniqueness in Python.

Order_by in sqlalchemy with outer join

I have the following sqlalchemy queries:
score = Scores.query.group_by(Scores.email).order_by(Scores.date).subquery()
students = db.session.query(Students, score.c.id).filter_by(archive=0).order_by(Students.exam_date).outerjoin(score, Students.email == score.c.email)
And then I render the things with:
return render_template('students.html', students=students.all())
Now, the issue is that I want the last score for a student to be displayed, as there are many of them corresponding to each user. But the first one seems to be returned. I tried some sortings and order_by on the first query, score, but without success.
How can I affect and pick only one latest result from the "score" to be paired with the corresponding row in "students"?
Thanks!
First of all you want to make sure that the subquery selects only rows for a particular student. You do that with the correlate method:
score = db.session.query(Scores.id).order_by(Scores.date.desc()).correlate(Students)
This alone does nothing, as you do not access the students. The idea of correlate is that if you use Students on your subquery, it will not add it to the FROM list but instead rely on the outer query providing it. Now you will want to refine your query (the join condition if you wish):
score = score.filter(Students.email == Scores.email)
This will produce a subquery that each time only returns the score for a single student. The remaining question is, if each student has to multiple scores. If so, you need to limit it (if there isn't, you don't need the order_by part from above as well):
score = score.limit(1)
Now you have made sure your query returns a single scalar value. Since you are using the query in a select context you will want to turn it into a scalar value:
students = db.session.query(Students, score.as_scalar()).filter_by(archive=0).order_by(Students.exam_date)
The as_scalar method is a way of telling SQLAlchemy that this returns a single row and column. Because otherwise you could not put it into a select.
Note: I am not 100% sure you need that limit if you put as_scalar. Try it out and expirment. If each student has only one score anyway then you don't need to worry at all about any of that stuff.
A little hint on the way: A Query instance is by itself an iterable. That means as long as you don't print it or similar, you can pass around a query just like a list with the only exception that will really only run on access (lazy) and that you can dynamically refine it, e.g. you could slice it: students[:10] would only select the first 10 students (and if you do it on the query it will execute it as a LIMIT instead of slicing a list, where the former is much more efficient).

Django models - how to filter out duplicate values by PK after the fact?

I build a list of Django model objects by making several queries. Then I want to remove any duplicates, (all of these objects are of the same type with an auto_increment int PK), but I can't use set() because they aren't hashable.
Is there a quick and easy way to do this? I'm considering using a dict instead of a list with the id as the key.
In general it's better to combine all your queries into a single query if possible. Ie.
q = Model.objects.filter(Q(field1=f1)|Q(field2=f2))
instead of
q1 = Models.object.filter(field1=f1)
q2 = Models.object.filter(field2=f2)
If the first query is returning duplicated Models then use distinct()
q = Model.objects.filter(Q(field1=f1)|Q(field2=f2)).distinct()
If your query really is impossible to execute with a single command, then you'll have to resort to using a dict or other technique recommended in the other answers. It might be helpful if you posted the exact query on SO and we could see if it would be possible to combine into a single query. In my experience, most queries can be done with a single queryset.
Is there a quick and easy way to do this? I'm considering using a dict instead of a list with the id as the key.
That's exactly what I would do if you were locked into your current structure of making several queries. Then a simply dictionary.values() will return your list back.
If you have a little more flexibility, why not use Q objects? Instead of actually making the queries, store each query in a Q object and use a bitwise or ("|") to execute a single query. This will achieve your goal and save database hits.
Django Q objects
You can use a set if you add the __hash__ function to your model definition so that it returns the id (assuming this doesn't interfere with other hash behaviour you may have in your app):
class MyModel(models.Model):
def __hash__(self):
return self.pk
If the order doesn't matter, use a dict.
Remove "duplicates" depends on how you define "duplicated".
If you want EVERY column (except the PK) to match, that's a pain in the neck -- it's a lot of comparing.
If, on the other hand, you have some "natural key" column (or short set of columns) than you can easily query and remove these.
master = MyModel.objects.get( id=theMasterKey )
dups = MyModel.objects.filter( fld1=master.fld1, fld2=master.fld2 )
dups.all().delete()
If you can identify some shorter set of key fields for duplicate identification, this works pretty well.
Edit
If the model objects haven't been saved to the database yet, you can make a dictionary on a tuple of these keys.
unique = {}
...
key = (anObject.fld1,anObject.fld2)
if key not in unique:
unique[key]= anObject
I use this one:
dict(zip(map(lambda x: x.pk,items),items)).values()

In Django, how does one filter a QuerySet with dynamic field lookups?

Given a class:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=20)
Is it possible, and if so how, to have a QuerySet that filters based on dynamic arguments? For example:
# Instead of:
Person.objects.filter(name__startswith='B')
# ... and:
Person.objects.filter(name__endswith='B')
# ... is there some way, given:
filter_by = '{0}__{1}'.format('name', 'startswith')
filter_value = 'B'
# ... that you can run the equivalent of this?
Person.objects.filter(filter_by=filter_value)
# ... which will throw an exception, since `filter_by` is not
# an attribute of `Person`.
Python's argument expansion may be used to solve this problem:
kwargs = {
'{0}__{1}'.format('name', 'startswith'): 'A',
'{0}__{1}'.format('name', 'endswith'): 'Z'
}
Person.objects.filter(**kwargs)
This is a very common and useful Python idiom.
A simplified example:
In a Django survey app, I wanted an HTML select list showing registered users. But because we have 5000 registered users, I needed a way to filter that list based on query criteria (such as just people who completed a certain workshop). In order for the survey element to be re-usable, I needed for the person creating the survey question to be able to attach those criteria to that question (don't want to hard-code the query into the app).
The solution I came up with isn't 100% user friendly (requires help from a tech person to create the query) but it does solve the problem. When creating the question, the editor can enter a dictionary into a custom field, e.g.:
{'is_staff':True,'last_name__startswith':'A',}
That string is stored in the database. In the view code, it comes back in as self.question.custom_query . The value of that is a string that looks like a dictionary. We turn it back into a real dictionary with eval() and then stuff it into the queryset with **kwargs:
kwargs = eval(self.question.custom_query)
user_list = User.objects.filter(**kwargs).order_by("last_name")
Additionally to extend on previous answer that made some requests for further code elements I am adding some working code that I am using
in my code with Q. Let's say that I in my request it is possible to have or not filter on fields like:
publisher_id
date_from
date_until
Those fields can appear in query but they may also be missed.
This is how I am building filters based on those fields on an aggregated query that cannot be further filtered after the initial queryset execution:
# prepare filters to apply to queryset
filters = {}
if publisher_id:
filters['publisher_id'] = publisher_id
if date_from:
filters['metric_date__gte'] = date_from
if date_until:
filters['metric_date__lte'] = date_until
filter_q = Q(**filters)
queryset = Something.objects.filter(filter_q)...
Hope this helps since I've spent quite some time to dig this up.
Edit:
As an additional benefit, you can use lists too. For previous example, if instead of publisher_id you have a list called publisher_ids, than you could use this piece of code:
if publisher_ids:
filters['publisher_id__in'] = publisher_ids
Django.db.models.Q is exactly what you want in a Django way.
This looks much more understandable to me:
kwargs = {
'name__startswith': 'A',
'name__endswith': 'Z',
***(Add more filters here)***
}
Person.objects.filter(**kwargs)
A really complex search forms usually indicates that a simpler model is trying to dig it's way out.
How, exactly, do you expect to get the values for the column name and operation?
Where do you get the values of 'name' an 'startswith'?
filter_by = '%s__%s' % ('name', 'startswith')
A "search" form? You're going to -- what? -- pick the name from a list of names? Pick the operation from a list of operations? While open-ended, most people find this confusing and hard-to-use.
How many columns have such filters? 6? 12? 18?
A few? A complex pick-list doesn't make sense. A few fields and a few if-statements make sense.
A large number? Your model doesn't sound right. It sounds like the "field" is actually a key to a row in another table, not a column.
Specific filter buttons. Wait... That's the way the Django admin works. Specific filters are turned into buttons. And the same analysis as above applies. A few filters make sense. A large number of filters usually means a kind of first normal form violation.
A lot of similar fields often means there should have been more rows and fewer fields.

Categories