flask-migrate: running custom SQL in migrations? - python

I need to run some custom SQL on my table before applying a migration. In particular, I need to populate a column before declaring it non-nullable:
Clickid.query\
.filter(Clickid.first_seen_at == None)\
.update({'first_seen_at': datetime.utcnow()})
I've read about op.execute method in Alembic docs, where we feed raw SQL to the execute method:
op.execute(
account.update().\
where(account.c.name==op.inline_literal('account 1')).\
values({'name':op.inline_literal('account 2')})
)
however, with Flask-SQLALchemy my query does not return a string with SQL, it runs the SQL itself, returning number of rows modified:
print(Clickid.query
.filter(Clickid.first_seen_at == None)
.update({'first_seen_at': datetime.utcnow()})
)
>>> 150
how do I make this work properly?

Related

How does a SQLAlchemy Session manage transactions, when executing multiple raw SQL statements at once?

None of the "similar questions" really get at this specific topic, but I am trying to find out how SQLAlchemy's Session handles transactions, when:
Passing raw SQL text to the execute() method, rather than utilizing any SQLAlchemy model objects, AND
The raw SQL text contains multiple distinct commands.
For instance:
bulk_operation = """
DELETE FROM the_table WHERE id = ...;
INSERT INTO the_table (id, name) VALUES (...);
"""
sql = text(bulk_operation)
session.execute(sql.bindparams(id=foo, name=bar))
The goal here is to restore the original state, if either the DELETE or the INSERT fails for any reason.
But does Session.execute() actually guarantee this, in this context? Is it necessary to include BEGIN and COMMIT commands within the raw SQL text itself, or manage from the Python level with session.commit() or something else? Thanks in advance!

Prevent SQL Injection in BigQuery with Python for table name

I have an Airflow DAG which takes an argument from the user for table.
I then use this value in an SQL statement and execute it in BigQuery. I'm worried about exposing myself to SQL Injection.
Here is the code:
sql = f"""
CREATE OR REPLACE TABLE {PROJECT}.{dataset}.{table} PARTITION BY DATE(start_time) as (
//OTHER CODE
)
"""
client = bigquery.Client()
query_job = client.query(sql)
Both dataset and table get passed through via airflow but I'm worried someone could pass through something like: random_table; truncate other_tbl; -- as the table argument.
My fear is that the above will create a table called random_table and then truncate an existing table.
Is there a safer way to process these passed through arguments?
I've looked into parameterized queries in BigQuery but these don't work for table names.
You will have to create a table name validator. I think you can safely validate by using just backticks --> ` at the start and at the end of your table name string. It's not a 100% solution but it worked for some of my test scenarios I try. It should work like this:
# validate should look for ` at the beginning and end of your tablename
table_name = validate(f"`{project}.{dataset}.{table}`")
sql = f"""
CREATE OR REPLACE TABLE {table_name} PARTITION BY DATE(start_time) as (
//OTHER CODE
)
"""
...
Note: I suggest you to check the following post on medium site to check about bigquery sql injection.
I checked the official documentation about Running parameterized queries, and sadly it only covers the parameterization of variables not tables or other string part of your query.
As a final note, I recommend to open a feature request for BigQuery for this particular scenario.
You should probably look into sanitization/validation of user input in general. This is done before passing the input to the BQ query.
With Python, you could look for malicious strings in the user input - like truncate in your example - or use a regex to filter input that for instance contains --. Those are just some quick examples. I recommend you do more research on that topic; you will also find quite a few questions on that topic on SE.

How to execute DDL statement in a different schema?

I'd like to execute a DDL statement (for example: create table test(id int, str varchar)) in different DB schemas.
In order to execute this DDL i was going to use the following code:
from sqlalchemy import DDL, text, create_engine
engine = create_engine(...)
ddl_cmd = "create table test(id int, str varchar)"
DDL(ddl_cmd).execute(bind=engine)
How can I specify in which DB schema to execute this DDL statement, not changing the DDL command itself?
I don't understand why such a basic parameter like schema is missing in the DDL().execute() method. I guess I'm missing some important concept, but I couldn't figure it out.
UPD: I've found the "schema_translate_map" execution option, but it didn't work for me - the table will be still created in the default schema.
Here are my attempts:
conn = engine.connect().execution_options(schema_translate_map={None: "my_schema"})
then i tried different variants:
# variant 1
conn.execute(ddl_cmd)
# variant 2
conn.execution_options(schema_translate_map={None: "my_schema"}).execute()
# variant 3
DDL(ddl_cmd).compile(bind=conn).execute()
# variant 4
DDL(ddl_cmd).compile(bind=conn).execution_options(schema_translate_map={None: "my_schema"})
but every time the table will be created in the default schema. :(

Update MSSQL table through SQLAlchemy using dataframes

I'm trying to replace some old MSSQL stored procedures with python, in an attempt to take some of the heavy calculations off of the sql server. The part of the procedure I'm having issues replacing is as follows
UPDATE mytable
SET calc_value = tmp.calc_value
FROM dbo.mytable mytable INNER JOIN
#my_temp_table tmp ON mytable.a = tmp.a AND mytable.b = tmp.b AND mytable.c = tmp.c
WHERE (mytable.a = some_value)
and (mytable.x = tmp.x)
and (mytable.b = some_other_value)
Up to this point, I've made some queries with SQLAlchemy, stored those data in Dataframes, and done the requisite calculations on them. I don't know now how to put the data back into the server using SQLAlchemy, either with raw SQL or function calls. The dataframe I have on my end would essentially have to work in the place of the temporary table created in MSSQL Server, but I'm not sure how I can do that.
The difficulty is of course that I don't know of a way to join between a dataframe and a mssql table, and I'm guessing this wouldn't work so I'm looking for a workaround
As the pandas doc suggests here :
from sqlalchemy import create_engine
engine = create_engine("mssql+pyodbc://user:password#DSN", echo = False)
dataframe.to_sql('tablename', engine , if_exists = 'replace')
engine parameter for msSql is basically the connection string check it here
if_exist parameter is a but tricky since 'replace' actually drops the table first and then recreates it and then inserts all data at once.
by setting the echo attribute to True it shows all background logs and sql's.

alembic: use subquery for update statement in migration

I'm using alembic to manage my database migrations. In my current migration I need also to populate a column based on a SELECT statement (basically copying a column from a different table).
With plain SQL I can do:
UPDATE foo_table
SET bar_id=
(SELECT bar_table.id FROM bar_table
WHERE bar_table.foo_id = foo_table.id);
However can't figure out how to do that with alembic:
execute(
foo_table.update().\
values({
u'bar_id': ???
})
)
I tried to use plain SQLAlchemy expressions for the '???':
select([bar_table.columns['id']],
bar_table.columns[u'foo_id'] == foo_table.columns[u'id'])
But that only generates bad SQL and a ProgrammingError during execution:
'UPDATE foo_table SET ' {}
Actually it works exactly as I described above.
My problem was that the table definition for 'foo_table' in my alembic script did not include the 'bar_id' column so SQLALchemy did not use that to generate the SQL...

Categories