I want to update multiple columns of one table according to other multiple columns of another table in SQLAlchemy. I'm using SQLite when testing it, so I can't use the `UPDATE table1 SET col=val WHERE table1.key == table2.key" syntax.
In other words, I'm trying to create this sort of update query:
UPDATE table1
SET
col1 = (SELECT col1 FROM table2 WHERE table2.key == table1.key),
col2 = (SELECT col2 FROM table2 WHERE table2.key == table1.key)
In SQLAlchemy:
select_query1 = select([table2.c.col1]).where(table1.c.key == table2.c.key)
select_query2 = select([table2.c.col2]).where(table1.c.key == table2.c.key)
session.execute(table.update().values(col1=select_query1, col2=select_query2))
Only I'd like to do the query only once instead of twice, unless SQLite and MySQL are smart enough not to make that query twice themselves.
I don't think you can. Thus, this is not really an answer, but it is far too long for a comment.
You can easily compose your query with 2 columns (I guess you already knew that):
select_query = select([table2.c.col1, table2.c.col2]).where(table1.c.key == table2.c.key)
and afterwards you can use the method with_only_columns(), see api:
In[52]: print(table.update().values(col1 = select_query.with_only_columns([table2.c.col1]), col2 = select_query.with_only_columns([table2.c.col2])))
UPDATE table SET a=(SELECT tweet.id
FROM tweet
WHERE tweet.id IS NOT NULL), b=(SELECT tweet.user_id
FROM tweet
WHERE tweet.id IS NOT NULL)
But as you see from the update statement, you will be effectivelly doing two selects. (Sorry I did not adapt the output completely to your example, but I'm sure you get the idea).
I'm not sure whether, as you say, MySQL will be smart enough to make it one query only. I guess so. Hope it helps anyway.
Related
I want to get all the columns of a table with max(timestamp) and group by name.
What i have tried so far is:
normal_query ="Select max(timestamp) as time from table"
event_list = normal_query \
.distinct(Table.name)\
.filter_by(**filter_by_query) \
.filter(*queries) \
.group_by(*group_by_fields) \
.order_by('').all()
the query i get :
SELECT DISTINCT ON (schema.table.name) , max(timestamp)....
this query basically returns two columns with name and timestamp.
whereas, the query i want :
SELECT DISTINCT ON (schema.table.name) * from table order by ....
which returns all the columns in that table.Which is the expected behavior and i am able to get all the columns, how could i right it down in python to get to this statement?.Basically the asterisk is missing.
Can somebody help me?
What you seem to be after is the DISTINCT ON ... ORDER BY idiom in Postgresql for selecting greatest-n-per-group results (N = 1). So instead of grouping and aggregating just
event_list = Table.query.\
distinct(Table.name).\
filter_by(**filter_by_query).\
filter(*queries).\
order_by(Table.name, Table.timestamp.desc()).\
all()
This will end up selecting rows "grouped" by name, having the greatest timestamp value.
You do not want to use the asterisk most of the time, not in your application code anyway, unless you're doing manual ad-hoc queries. The asterisk is basically "all columns from the FROM table/relation", which might then break your assumptions later, if you add columns, reorder them, and such.
In case you'd like to order the resulting rows based on timestamp in the final result, you can use for example Query.from_self() to turn the query to a subquery, and order in the enclosing query:
event_list = Table.query.\
distinct(Table.name).\
filter_by(**filter_by_query).\
filter(*queries).\
order_by(Table.name, Table.timestamp.desc()).\
from_self().\
order_by(Table.timestamp.desc()).\
all()
The situation: I'm loading an entire SQL table into my program. For convenience I'm using pandas to maintain the row data. I am then creating a dataframe of rows I would like to have removed from the SQL table. Unfortunately (and I can't change this) the table does not have any primary keys other than the built-in Oracle ROWID (which isn't a real table column its a pseudocolumn), but I can make ROWID part of my dataframe if I need to.
The table has hundreds of thousands of rows, and I'll probably be deleting a few thousand records with each run of the program.
Question:
Using Cx_Oracle what is the best method of deleting multiple rows/records that don't have a primary key? I don't think creating a loop to submit thousands of delete statements is very efficient or pythonic. Although I am concerned about building a singular SQL delete statement keyed off of ROWID and that contains a clause with thousands of items:
Where ROWID IN ('eg1','eg2',........, 'eg2345')
Is this concern valid? Any Suggestions?
Using ROWID
Since you can use ROWID, that would be the ideal way to do it. And depending on the Oracle version, the query length limit may be large enough for a query with that many elements in the IN clause. The issue is the number of elements in the IN expression list - limited to 1000.
So you'll either have to break up the list of RowIDs into sets of 1000 at a time or delete just a single row at a time; with or without executemany().
>>> len(delrows) # rowids to delete
5000
>>> q = 'DELETE FROM sometable WHERE ROWID IN (' + ', '.join(f"'{row}'" for row in delrows) + ')'
>>> len(q) # length of the query
55037
>>> # let's try with just the first 1000 id's and no extra spaces
... q = 'DELETE FROM sometable WHERE ROWID IN (' + ','.join(f"'{row}'" for row in delrows[:1000]) + ')'
>>> len(q)
10038
You're probably within query-length limits, and can even save some chars with a minimal ',' item separator.
Without ROWID
Without the Primary Key or ROWID, the only way to identify each row is to specify all the columns in the WHERE clause and to do many rows at a time, they'll need to be OR'd together:
DELETE FROM sometable
WHERE ( col1 = 'val1'
AND col2 = 'val2'
AND col3 = 'val3' ) -- row 1
OR ( col1 = 'other2'
AND col2 = 'value2'
AND col3 = 'val3' ) -- row 2
OR ( ... ) -- etc
As you can see it's not the nicest query to construct but allows you to do it without ROWIDs.
And in both cases, you probably don't need to be using parameterised queries since the IN list in 1 or OR grouping in 2 is variable. (Yes, you could create it parameterised after constructing the whole extended SQL with thousands of parameters. Not sure what the limit is on that.) The executemany() way is definitely easier to write & do but for speed, the single large queries (either of the above two) will probably outperform executemany with thousands of items.
You can use cursor.executemany() to delete multiple rows at once. Something like the following should work:
dataToDelete = [['eg1'], ['eg2'], ...., ['eg2345']]
cursor.executemany("delete from sometable where rowid = :1", dataToDelete)
There is a set of tables, all of which have a column with a determined name, say 'ColA'. I'm trying to write a function which will a attach a specific where clause to a SqlAlchemy statement (Insert, Select, Update) for that column.
Desired result:
I'd like the result to work like this (where the function I need is somehow_get_table):
def attach_where(statement, val):
return statement.where(somehow_get_table(statement).c.ColA == val)
statement = select([table1.c.col1, table1.c.col2])
print statement
>>> SELECT col1, col2 FROM table1
print attach_where(statement, 1)
>>> SELECT col1, col2 FROM table1 WHERE ColA = 1
Possible solutions I've thought of:
Get the list of Column objects from the statement and select one (or all) of the tables to which the column belongs and use that for the where.
Use only the column name (as in text). I'm not sure how to do this, and it might break for some cases (like joins).
I think number 1 is better but I could be wrong, and I feel like there should be a better to do this. What is a good pattern to do this?
I have two tables with a common field I want to find all the the
items(user_id's) which present in first table but not in second.
Table1(user_id,...)
Table2(userid,...)
user_id in and userid in frist and second table are the same.
session.query(Table1.user_id).outerjoin(Table2).filter(Table2.user_id == None)
This is untested as I'm still new to SQLAlchemy, but I think it should push you in the right direction:
table2 = session.query(Table2.user_id).subquery()
result = session.query(Table1).filter(Table1.user_id.notin_(table2))
my guess is this type of approach would result in the following SQL:
SELECT table1.* FROM table1 WHERE table1.user_id NOT IN (SELECT table2.user_id FROM table2)
I have a table with 4 columns (1 PK) from which I need to select 30 rows.
Of these rows, two columns (col. A and B) must exists in another table (8 columns, 1 PK, 2 are A and B).
Second table is large, contains millions of records and it's enough for me to know if even a single row exists containing values of col. A and B of 1st table.
I am using the code below:
query = db.Session.query(db.Table_1).\
filter(
exists().where(db.Table_2.col_a == db.Table_1.col_a).\
where(db.Table_2.col_b == db.Table_2.col_b)
).limit(30).all()
This query gets me the results I desire however I'm afraid it might be a bit slow since it does not imply a limit condition to exists() function nor does it do select 1 but a select *.
exists() does not accept a .limit(1)
How can I put a limit to exists to get it not to look for whole table, hence making this query run faster?
I need n rows from Table_1, which 2 columns exist in a record in
Table_2
Thank you
You can do the "select 1" thing using a more explicit form as it mentioned here, that is,
exists([1]).where(...)
However, while I've been a longtime diehard "select 1" kind of guy, I've since learned that the usage of "1" vs. "*" for performance is now a myth (more / more).
exists() is also a wrapper around select(), so you can get a limit() by constructing the select() first:
s = select([1]).where(
table1.c.col_a == table2.c.colb
).where(
table1.c.colb == table2.c.colb
).limit(30)
s = exists(s)
query=select([db.Table_1])
query=query.where(
and_(
db.Table_2.col_a == db.Table_1.col_a,
db.Table_2.col_b == db.Table_2.col_b
)
).limit(30)
result=session.execute(query)