I have two querysets -
A = Bids.objects.filter(*args,**kwargs).annotate(highest_priority=Case(*[
When(data_source=data_source, then Value(i))
for i, data_source in enumerate(data_source_order_list)
],
.order_by(
"date",
"highest_priority"
))
B= A.values("date").annotate(Min("highest_priority)).order_by("date")
First query give me all objects with selected time range with proper data sources and values. Through highest_priority i set which item should be selected. All items have additional data.
Second query gives me grouped by information about items in every date. In second query i do not have important values like price etc. So i assume i have to join these two tables and filter out where a.highest_priority = b.highest priority. Because in this case i will get queryset with objects and only one item per date.
I have tried using distinct - not working with .first()/.last(). Annotates gives me dict by grouped by, and grouping by only date cutting a lot of important data, but i have to group by only date...
Tables looks like that
A
B
How to join them? Because when i join them i could easily filter highest_prio with highest_prio and get my date with only one database shot. I want to use ORM, because i could just distinct and put it on the list and i do not want to hammer base with connecting multiple queries through date.
Look if this sugestion works :
SELECT * , (to_char(a.date, 'YYYYMMDD')::integer)*highest_priority AS prioritycalc;
FROM table A
JOIN table B ON (to_char(a.date, 'YYYYMMDD')::integer)*highest_priority = (to_char(b.date, 'YYYYMMDD')::integer)*highest_priority
ORDER BY prioritycalc DESC;
Related
I have to run a sql query that grabs the values only if two conditions are true. So for example, I need to grab all the values where asset=x and id_name=12345. There are about 10k combinations between asset and id_name that I need to be able to query for using sql. Usually I would just do the following:
select * from database where id_name IN (12345)
But how do I make this query when two conditions have to be true. id_name has to equal 12345 AND asset has to equal x.
I tried turning the list i need into tuples like this:
new_list = list(scams[['asset', 'id_name']].itertuples(index=False, name=None))
which gives me a list like this:
new_list = (12345, x), (32342, z)...etc.
Any suggestions would be great. Thanks!
Based on my understanding you need to query or fetch records based on a combination of two filters. Also you have around 10K combinations. Here is a simple SQL based solution.
Create a new column in the same table or build a temp table/view with a new column say "column_new". Populate the concatenated value of id_name and asset in the new column. You can use a concatenation function based on the database. Example in SQL server use CONCAT(column1,column2).
Now you can write your SQL as select * from database where colum_new IN ("12345x","32342z");.
Note : You can also use a "-" or "|" between column 1 and column 2 while doing a concatenation.
I want to get all the columns of a table with max(timestamp) and group by name.
What i have tried so far is:
normal_query ="Select max(timestamp) as time from table"
event_list = normal_query \
.distinct(Table.name)\
.filter_by(**filter_by_query) \
.filter(*queries) \
.group_by(*group_by_fields) \
.order_by('').all()
the query i get :
SELECT DISTINCT ON (schema.table.name) , max(timestamp)....
this query basically returns two columns with name and timestamp.
whereas, the query i want :
SELECT DISTINCT ON (schema.table.name) * from table order by ....
which returns all the columns in that table.Which is the expected behavior and i am able to get all the columns, how could i right it down in python to get to this statement?.Basically the asterisk is missing.
Can somebody help me?
What you seem to be after is the DISTINCT ON ... ORDER BY idiom in Postgresql for selecting greatest-n-per-group results (N = 1). So instead of grouping and aggregating just
event_list = Table.query.\
distinct(Table.name).\
filter_by(**filter_by_query).\
filter(*queries).\
order_by(Table.name, Table.timestamp.desc()).\
all()
This will end up selecting rows "grouped" by name, having the greatest timestamp value.
You do not want to use the asterisk most of the time, not in your application code anyway, unless you're doing manual ad-hoc queries. The asterisk is basically "all columns from the FROM table/relation", which might then break your assumptions later, if you add columns, reorder them, and such.
In case you'd like to order the resulting rows based on timestamp in the final result, you can use for example Query.from_self() to turn the query to a subquery, and order in the enclosing query:
event_list = Table.query.\
distinct(Table.name).\
filter_by(**filter_by_query).\
filter(*queries).\
order_by(Table.name, Table.timestamp.desc()).\
from_self().\
order_by(Table.timestamp.desc()).\
all()
I have been performing a query to count how many times in my sqlite3 database table (Users), within the column "country", the value "Australia" occurs.
australia = db.session.query(Users.country).filter_by(country="Australia").count()
I need to do this in a more dynamic way for any country value that may be within this column.
I have tried the following but unfortunately I only get a count of 0 for all values that are passed in the loop variable (each).
country = list(db.session.query(Users.country))
country_dict = list(set(country))
for each in country_dict:
print(db.session.query(Users.country).filter_by(country=(str(each))).count())
Any assistance would be greatly appreciated.
The issue is that country is a list of result tuples, not a list of strings. The end result is that the value of str(each) is something along the lines of ('Australia',), which should make it obvious why you are getting counts of 0 as results.
For when you want to extract a list of single column values, see here. When you want distinct results, use DISTINCT in SQL.
But you should not first query distinct countries and then fire a query to count the occurrence of each one. Instead use GROUP BY:
country_counts = db.session.query(Users.country, db.func.count()).\
group_by(Users.country).\
all()
for country, count in country_counts:
print(country, count)
The main thing to note is that SQLAlchemy does not hide the SQL when using the ORM, but works with it.
If you can use the sqlite3 module with direct SQL it is a simple query:
curs = con.execute("SELECT COUNT(*) FROM users WHERE country=?", ("Australia",))
nb = curs.fetchone()[0]
To keep it simple I have four tables(A, B, Category and Relation), Relation table stores the Intensity of A in B and Category stores the type of B.
A <--- Relation ---> B ---> Category
(So the relation between A and B is n to n, when the relation between B and Category is n to 1)
I need an ORM to group Relation records by Category and A, then calculate Sum of Intensity in each (Category, A) (seems simple till here), then I want to annotate Max of calculated Sum in each Category.
My code is something like:
A.objects.values('B_id').annotate(AcSum=Sum(Intensity)).annotate(Max(AcSum))
Which throws the error:
django.core.exceptions.FieldError: Cannot compute Max('AcSum'): 'AcSum' is an aggregate
Django-group-by package with the same error.
For further information please also see this stackoverflow question.
I am using Django 2 and PostgreSQL.
Is there a way to achieve this using ORM, if there is not, what would be the solution using raw SQL expression?
Update
After lots of struggling I found out that what I wrote was indeed an aggregation, however what I want is to find out the maximum of AcSum of each A in each category. So I suppose I have to group-by the result once more after AcSum Calculation. Based on this insight I found a stack-overflow question which asks the same concept(The question was asked 1 year, 2 months ago without any accepted answer).
Chaining another values('id') to the set does not function neither as a group_by nor as a filter for output attributes, It removes AcSum from the set. Adding AcSum to values() is also not an option due to changes in the grouped by result set.
I think what I am trying to do is re grouping the grouped by query based on the fields inside a column (i.e id).
any thoughts?
You can't do an aggregate of an aggregate Max(Sum()), it's not valid in SQL, whether you're using the ORM or not. Instead, you have to join the table to itself to find the maximum. You can do this using a subquery. The below code looks right to me, but keep in mind I don't have something to run this on, so it might not be perfect.
from django.db.models import Subquery, OuterRef
annotation = {
'AcSum': Sum('intensity')
}
# The basic query is on Relation grouped by A and Category, annotated
# with the Sum of intensity
query = Relation.objects.values('a', 'b__category').annotate(**annotation)
# The subquery is joined to the outerquery on the Category
sub_filter = Q(b__category=OuterRef('b__category'))
# The subquery is grouped by A and Category and annotated with the Sum
# of intensity, which is then ordered descending so that when a LIMIT 1
# is applied, you get the Max.
subquery = Relation.objects.filter(sub_filter).values(
'a', 'b__category').annotate(**annotation).order_by(
'-AcSum').values('AcSum')[:1]
query = query.annotate(max_intensity=Subquery(subquery))
This should generate SQL like:
SELECT a_id, category_id,
(SELECT SUM(U0.intensity) AS AcSum
FROM RELATION U0
JOIN B U1 on U0.b_id = U1.id
WHERE U1.category_id = B.category_id
GROUP BY U0.a_id, U1.category_id
ORDER BY SUM(U0.intensity) DESC
LIMIT 1
) AS max_intensity
FROM Relation
JOIN B on Relation.b_id = B.id
GROUP BY Relation.a_id, B.category_id
It may be more performant to eliminate the join in Subquery by using a backend specific feature like array_agg (Postgres) or GroupConcat (MySQL) to collect the Relation.ids that are grouped together in the outer query. But I don't know what backend you're using.
Something like this should work for you. I couldn't test it myself, so please let me know the result:
Relation.objects.annotate(
b_category=F('B__Category')
).values(
'A', 'b_category'
).annotate(
SumInensityPerCategory=Sum('Intensity')
).values(
'A', MaxIntensitySumPerCategory=Max('SumInensityPerCategory')
)
Given a sqlalchemy.orm.query.Query object, is it possible to count distinct column on it? I am asking because .count() returns dupes due to the join conditions.
For instance:
from sqlalchemy import func, distinct
channels = db.session.query(Channel).join(ChannelUsers).filter(
ChannelUsers.user_id == USER_ID,
Message.channel_id.isnot(None)
).outerjoin(Message)
# this gives us a number with duplicate channels
# and .count() does not take extra parameters to target on column
channels.count()
...
# later on I need to access all these channels via channels.all()
To get a distinct channels count, I can do this by duplicating the filter condition above again and query the distinct column. Something like this
distinct_count = db.session.query(
func.count(distinct(Channel.id))
).join(ChannelUsers).filter(
ChannelUsers.user_id == USER_ID,
Message.channel_id.isnot(None)
).outerjoin(Message)
But that's not ideal as I need to access some or all channels after getting the distinct count.
Found this looking for the answer myself. After some more research, I was able to get the expected result using a combination of load_only and distinct in order to count only distinct values of an ID field. Let's say for simplicity that Channel has a unique field named id.
distinct_count = channels.options(load_only(Channel.id)).distinct().count()