How to sort sqlalchemy table with python function? - python

I have table with column equation, on this column I store mathematical equation like x+12-25+z**2, x*z-12, etc. for each row it is different equation and I want to sort my table by equation's output. X and Z are some python variables, you can think of them as numpy.array. Variables are updated every 15-30 minutes.
My table looks like this
class Table(Base):
....
equation = Column(String(1024))
I calculate equations in python by my function calculate_equation(string) it takes care of the placing all variables and doing math operations also it returns only 1 number.
I tried with hybrid_property which looks like this:
#hybrid_property
def equation_value(self):
return calculate_equation(self.equation)
And sort it with:
session.query(Table).order_by(Table.equation_value).all()
But it throws errors.
Any advises on how to do it? Is this even correct? Should I use other data storing mechanism?
Suggestions are appreciated.

The .order_by() method emits an SQL ORDER BY clause; so the thing you're ordering by needs to be an SQL expression, not an arbitrary python function. Just tagging the method with #hybrid_property isn't enough - you need to also implement #equation_value.expression as an sqlalchemy expression.
You have two options:
Implement your calculate_equation function in SQL (or as an sqlalchemy expression/hybrid prop), which will allow you to use it in an ORDER BY clause. From your description, this is probably very difficult.
Just query for everything and do the sorting afterwards in memory:
sorted(session.query(Table).all(), key=calculate_equation)

Related

Wrong result when annotating with foreign key set filter

Model C has a foreign key link to model B. Model B has a foreign key link to model A. This is, model A instance may have many model B instances, and Model B may have many model C instances.
Pseudo-code:
class A:
...
class B:
a = models.ForeignKey(A, ...)
class C:
b = models.ForeignKey(B, ...)
I would like to retrieve the entire list of elements of model A from the database, annotating it with the count of B elements it has, where their respective count of C elements is not none (or any random number, for that matter).
I tried this:
A.objects.annotate(
at_least_one_count=Count('b', filter=Q(b__c__isnull=False))
)
To the best of my knowledge, this should be working. However, numbers returned are not correct (in a practical case). It returns a higher number than without the filter, which is obviously not possible:
A.objects.annotate(
at_least_one_count=Count('b')
)
Edit: when I run the following query individually for every A instance, I get the same numbers, which makes me think there might be something wrong in my code:
A.objects.first().b_set.filter(c__isnull=False).__len__()
Note: I would like to perform this query without SQL. If I have to utilise some more advance Pythonic tools that Django provides, I am happy to do it, as long as I stay Object-Oriented. I am trying to move away from using raw SQL for all database operations, and re-write them all with Django ORM. However, it seems to be overly complicated.
The answer is simple: after applying queries that translate to join statements, one must execute a distinct on filter, which in Django is done by calling .distinct(...) on the query set.
In this case, if you are using a filter, and the distinct restriction is on the filter object, then you want to use:
...=Count('b', filter=Q(b__c__isnull=False), distinct=True)

hybrid property with join in sqlalchemy

I have probably not grasped the use of #hybrid_property fully. But what I try to do is to make it easy to access a calculated value based on a column in another table and thus a join is required.
So what I have is something like this (which works but is awkward and feels wrong):
class Item():
:
#hybrid_property
def days_ago(self):
# Can I even write a python version of this ?
pass
#days_ago.expression
def days_ago(cls):
return func.datediff(func.NOW(), func.MAX(Event.date_started))
This requires me to add the join on the Action table by the caller when I need to use the days_ago property. Is the hybrid_property even the correct approach to simplifying my queries where I need to get hold of the days_ago value ?
One way or another you need to load or access Action rows either via join or via lazy load (note here it's not clear what Event vs. Action is, I'm assuming you have just Item.actions -> Action).
The non-"expression" version of days_ago intends to function against Action objects that are relevant only to the current instance. Normally within a hybrid, this means just iterating through Item.actions and performing the operation in Python against loaded Action objects. Though in this case you're looking for a simple aggregate you could instead opt to run a query, but again it would be local to self so this is like object_session(self).query(func.datediff(...)).select_from(Action).with_parent(self).scalar().
The expression version of the hybrid when formed against another table typically requires that the query in which it is used already have the correct FROM clauses set up, so it would look like session.query(Item).join(Item.actions).filter(Item.days_ago == xyz). This is explained at Join-Dependent Relationship Hybrid.
your expression here might be better produced as a column_property, if you can afford using a correlated subquery. See that at http://docs.sqlalchemy.org/en/latest/orm/mapping_columns.html#using-column-property-for-column-level-options.

Is it possible in SQLAlchemy to filter by a database function or stored procedure?

We're using SQLalchemy in a project with a legacy database. The database has functions/stored procedures. In the past we used raw SQL and we could use these functions as filters in our queries.
I would like to do the same for SQLAlchemy queries if possible. I have read about the #hybrid_property, but some of these functions need one or more parameters, for example;
I have a User model that has a JOIN to a bunch of historical records. These historical records for this user, have a date and a debit and credit field, so we can look up the balance of a user at a specific point in time, by doing a SUM(credit) - SUM(debit) up until the given date.
We have a database function for that called dbo.Balance(user_id, date_time). I can use this to check the balance of a user at a given point in time.
I would like to use this as a criterium in a query, to select only users that have a negative balance at a specific date/time.
selection = users.filter(coalesce(Users.status, 0) == 1,
coalesce(Users.no_reminders, 0) == 0,
dbo.pplBalance(Users.user_id, datetime.datetime.now()) < -0.01).all()
This is of course a non-working example, just for you to get the gist of what I'd like to do. The solution looks to be to use hybrd properties, but as I mentioned above, these only work without parameters (as they are properties, not methods).
Any suggestions on how to implement something like this (if it's even possible) are welcome.
Thanks,
a #hybrid_property isn't by itself a means of producing a particular SQL statement, it is only a helper that can add more query-generation capabilities to an ORM-mapped class.
SQL functions that can be called as plain functions (e.g. without any kind of "EXEC XYZ" type of syntax) can be called using the func construct, meaning the query you have is pretty much ready to go:
from sqlalchemy import func
selection = users.filter(coalesce(Users.status, 0) == 1,
coalesce(Users.no_reminders, 0) == 0,
func.dbo.pplBalance(Users.user_id, datetime.datetime.now()) < -0.01).all()

Filtering a django queryset based on computed values

I'm running a bunch of filters on one of my models. Specifically, I'm doing something like this in one of my views:
cities = City.objects.filter(name__icontains=request.GET['name']
cities = City.objects.filter(population__gte=request.GET['lowest_population']
return cities
Now I'd like to add one other, different type of filter. Specifically, I'd like to include only those cities that are a certain distance away from a particular zip code. I already have the relevant function for this, i.e. something like:
distanceFromZipCode(city, zipCode)
# This returns 110 miles, for example
How do I combine django's queryset filtering with this additional filter I'd like to add? I know that if cities were merely a list, I could just use .filter() and pass in the appropriate lambda (e.g. return true if the distance from the relevant zip code is <100).
But I'm dealing with query sets, not simple lists, so how would I do this?
The root of the issue is that you're trying to mix sql filters, which are done within the db, and a python filter, which is done once the records are materialized from the db. You can't do that without taking the items from the database and then filtering on top of that.
You can't, do this via your python function, but you can do this via geodjango:
https://docs.djangoproject.com/en/dev/ref/contrib/gis/db-api/#distance-queries
cites = cities.filter(distance_lt=101)
would get you what you want
You're trying to mix a Python method with a database query, and that's not possible. Either you write the SQL to perform the distance calculation (fast), or you fetch every row and call your method (slow). Django filters simply translate parameters into a SQL WHERE clause, so if you can't express it in SQL, you probably can't express it in a filter.
If you are storing the location of the city as geometry you can use a distance spatial filter chained with the rest of your filters:
from django.contrib.gis.measure import D
zipCode = ZipCode.objects.all()[0]
cities = City.objects.filter(point__distance_lte=(zipCode.geom, D(mi=110)))
This assumes you have a ZipCode model with geometry of each zip code and the geometry is stored in a field called 'geom' and your City object has a point field called 'point'.
In my opinion you should use the Queryset object, and define complex filter method inside custom manager.
from django.db import models
from django.db.models import Q
class CityManager(models.Manager):
def get_filtered_cities(self, name=None, lowest_population=None, zip_code=None):
query = Q()
if(name):
query = Q(name__icontains=name)
if(lowest_population):
query = query & Q(population__gte=lowest_population)
if(zip_code):
pass #other query object
return self.get_query_set().filter(query)

Column comparison in Django queries

I have a following model:
class Car(models.Model):
make = models.CharField(max_length=40)
mileage_limit = models.IntegerField()
mileage = models.IntegerField()
I want to select all cars where mileage is less than mileage_limit, so in SQL it would be something like:
select * from car where mileage < mileage_limit;
Using Q object in Django, I know I can compare columns with any value/object, e.g. if I wanted to get cars that have mileage say less than 100,000 it would be something like:
cars = Car.objects.filter(Q(mileage__lt=100000))
Instead of a fixed value I would like to use the column name (in my case it is mileage_limit). So I would like to be able to do something like:
cars = Car.objects.filter(Q(mileage__lt=mileage_limit))
However this results in an error, since it is expecting a value/object, not a column name. Is there a way to compare two columns using Q object? I feel like it would be a very commonly used feature and there should be an easy way to do this, however couldn't find anything about it in the documentation.
Note: this is a simplified example, for which the use of Q object might seem to be unnecessary. However the real model has many more columns, and the real query is more complex, that's why I am using Q. Here in this question I just wanted to figure out specifically how to compare columns using Q.
EDIT
Apparently after release of Django 1.1 it would be possible to do the following:
cars = Car.objects.filter(mileage__lt=F('mileage_limit'))
Still not sure if F is supposed to work together with Q like this:
cars = Car.objects.filter(Q(mileage__lt=F('mileage_limit')))
You can't do this right now without custom SQL. The django devs are working on an F() function that would make it possible: #7210 - F() syntax, design feedback required.
Since I had to look this up based on the accepted answer, I wanted to quickly mention that the F() expression has indeed been released and is available for being used in queries.
This is what the Django documentation on F() says about it:
An F() object represents the value of a model field, transformed value of a model field, or annotated column. It makes it possible to refer to model field values and perform database operations using them without actually having to pull them out of the database into Python memory.
Instead, Django uses the F() object to generate an SQL expression that describes the required operation at the database level.
The reference for making queries using F() also gives useful examples.

Categories