So when I run
select * from table1 t1 left outer join table2 t2 on t1.id = t2.id; in sqlite3 terminal
I get the data back as I want and would expect.
However, when I run this in SqlAlchemy
TableOneModel.query.outerjoin(TableTwoModel,TableOneModel.id == TableTwoModel.id)
I only get table1 information back. I don't even get empty columns from table2. Am I missing something silly?
You're probably using Flask-SQLAlchemy, which provides the query property as a shortcut for selecting model entities. Your query is equivalent to
db.session.query(TableOneModel).\
join(TableTwoModel,TableOneModel.id == TableTwoModel.id)
Either explicitly query for both entities:
db.session.query(TableOneModel, TableTwoModel).\
join(TableTwoModel,TableOneModel.id == TableTwoModel.id)
or add the entity to your original:
TableOneModel.query.\
join(TableTwoModel,TableOneModel.id == TableTwoModel.id).\
add_entity(TableTwoModel)
Related
I have got 2 column properties which use the same query, but just return different columns:
action_time = column_property(
select([Action.created_at]).where((Action.id == id)).order_by(desc(Action.created_at)).limit(1)
)
action_customer = column_property(
select([Action.customer_id]).where((Action.id == id)).order_by(desc(Action.created_at)).limit(1)
)
SQL query that is produced will have 2 subqueries for each of the properties. So it mean if I'd like to add a few more similar properties, SQL query will end up with N subqueries.
I am wondering whether it is possible to have one LEFT OUTER JOIN which will be used for multiple column_property (ies)?
In SQL, I can sum two counts like
SELECT (
(SELECT count(*) FROM a WHERE val=42)
+
(SELECT count(*) FROM b WHERE val=42)
)
How do I perform this query with the Django ORM?
The closest I got is
a.objects.filter(val=42).order_by().values_list('id', flat=True).union(
b.objects.filter(val=42).order_by().values_list('id', flat=True)
).count()
This works fine if the returned count is small, but seems bad if there's a lot of rows that the database must hold in memory just to count them.
Your solution can be only little simplified by values('pk') instead of values_list('id', flat=True), because this would affect only a type of rows of the output, but the source SQL of both querysets is the same:
SELECT id FROM a WHERE val=42 UNION SELECT id FROM b WHERE val=42
and the method .count() makes only a query around a subquery:
SELECT COUNT(*) FROM (... subquery ...)
It is not necessary that a database backend would hold all values in memory. It can also only count them and forget. (not checked)
Similarly if you run a simple SELECT COUNT(id) FROM a, it doesn't need to collect id.
Subqueries of the form SELECT count(*) FROM a WHERE val=42 in a bigger query are not possible because Django doesn't use lazy evaluation for aggregations and immediately evaluates them.
The evaluation can be postponed e.g. by grouping by some expression that has only one possible value, e.g. GROUP BY (i >= 0) (or by an outer reference if it would work), but the query plan can be worse.
Another problem is that a SELECT is not possible without a table. Therefore I will use an unimportant row of an unimportant table in the base of query.
Example:
qs = Unimportant.objects.filter(pk=unimportant_pk).values('id').annotate(
total_a=a.objects.filter(val=42).order_by().values('val')
.annotate(cnt=models.Count('*')).values('cnt'),
total_b=b.objects.filter(val=42).order_by().values('val')
.annotate(cnt=models.Count('*')).values('cnt')
)
It is not nice, but it could be easily parallelized
SELECT
id,
(SELECT COUNT(*) AS cnt FROM a WHERE val=42 GROUP BY val) AS total_a,
(SELECT COUNT(*) AS cnt FROM b WHERE val=42 GROUP BY val) AS total_b
FROM unimportant WHERE id = unimportant_pk
Django docs confirms that simple solution doesn't exist.
Using aggregates within a Subquery expression
...
... This is the only way to perform an aggregation within a Subquery, as using aggregate() attempts to evaluate the queryset (and if there is an OuterRef, this will not be possible to resolve).
Suppose I have a table MyTable where the primary key is ID and a composite unique key is ColA and ColB.
I want to retrieve the ID affected by an UPDATE statement like this:
UPDATE MyTable
SET ColC='Blah'
WHERE ColA='xxx' and ColB='yyy'
Is there any way to do this using sqlite3 in python3 in a single statement without doing another SELECT after a successful UPDATE? I'm aware of lastrowid attribute on a cursor, but it seems to only apply to INSERTs.
More generally, I'm curious if any SQL engine allows for such functionality.
You asked if it could be done in some other DBMS, so I found this method in MySQL:
UPDATE MyTable as m1
JOIN (SELECT #id := id AS id
FROM MyTable
WHERE ColA = 'xxx' AND ColB = 'yyy') AS m2
ON m1.id = m2.id
SET m1.ColC = 'Blah';
After this you can do SELECT #id to get the ID of the updated row.
I want to update multiple columns of one table according to other multiple columns of another table in SQLAlchemy. I'm using SQLite when testing it, so I can't use the `UPDATE table1 SET col=val WHERE table1.key == table2.key" syntax.
In other words, I'm trying to create this sort of update query:
UPDATE table1
SET
col1 = (SELECT col1 FROM table2 WHERE table2.key == table1.key),
col2 = (SELECT col2 FROM table2 WHERE table2.key == table1.key)
In SQLAlchemy:
select_query1 = select([table2.c.col1]).where(table1.c.key == table2.c.key)
select_query2 = select([table2.c.col2]).where(table1.c.key == table2.c.key)
session.execute(table.update().values(col1=select_query1, col2=select_query2))
Only I'd like to do the query only once instead of twice, unless SQLite and MySQL are smart enough not to make that query twice themselves.
I don't think you can. Thus, this is not really an answer, but it is far too long for a comment.
You can easily compose your query with 2 columns (I guess you already knew that):
select_query = select([table2.c.col1, table2.c.col2]).where(table1.c.key == table2.c.key)
and afterwards you can use the method with_only_columns(), see api:
In[52]: print(table.update().values(col1 = select_query.with_only_columns([table2.c.col1]), col2 = select_query.with_only_columns([table2.c.col2])))
UPDATE table SET a=(SELECT tweet.id
FROM tweet
WHERE tweet.id IS NOT NULL), b=(SELECT tweet.user_id
FROM tweet
WHERE tweet.id IS NOT NULL)
But as you see from the update statement, you will be effectivelly doing two selects. (Sorry I did not adapt the output completely to your example, but I'm sure you get the idea).
I'm not sure whether, as you say, MySQL will be smart enough to make it one query only. I guess so. Hope it helps anyway.
I have two tables with a common field I want to find all the the
items(user_id's) which present in first table but not in second.
Table1(user_id,...)
Table2(userid,...)
user_id in and userid in frist and second table are the same.
session.query(Table1.user_id).outerjoin(Table2).filter(Table2.user_id == None)
This is untested as I'm still new to SQLAlchemy, but I think it should push you in the right direction:
table2 = session.query(Table2.user_id).subquery()
result = session.query(Table1).filter(Table1.user_id.notin_(table2))
my guess is this type of approach would result in the following SQL:
SELECT table1.* FROM table1 WHERE table1.user_id NOT IN (SELECT table2.user_id FROM table2)