How to execute DDL statement in a different schema? - python

I'd like to execute a DDL statement (for example: create table test(id int, str varchar)) in different DB schemas.
In order to execute this DDL i was going to use the following code:
from sqlalchemy import DDL, text, create_engine
engine = create_engine(...)
ddl_cmd = "create table test(id int, str varchar)"
DDL(ddl_cmd).execute(bind=engine)
How can I specify in which DB schema to execute this DDL statement, not changing the DDL command itself?
I don't understand why such a basic parameter like schema is missing in the DDL().execute() method. I guess I'm missing some important concept, but I couldn't figure it out.
UPD: I've found the "schema_translate_map" execution option, but it didn't work for me - the table will be still created in the default schema.
Here are my attempts:
conn = engine.connect().execution_options(schema_translate_map={None: "my_schema"})
then i tried different variants:
# variant 1
conn.execute(ddl_cmd)
# variant 2
conn.execution_options(schema_translate_map={None: "my_schema"}).execute()
# variant 3
DDL(ddl_cmd).compile(bind=conn).execute()
# variant 4
DDL(ddl_cmd).compile(bind=conn).execution_options(schema_translate_map={None: "my_schema"})
but every time the table will be created in the default schema. :(

Related

Prevent SQL Injection in BigQuery with Python for table name

I have an Airflow DAG which takes an argument from the user for table.
I then use this value in an SQL statement and execute it in BigQuery. I'm worried about exposing myself to SQL Injection.
Here is the code:
sql = f"""
CREATE OR REPLACE TABLE {PROJECT}.{dataset}.{table} PARTITION BY DATE(start_time) as (
//OTHER CODE
)
"""
client = bigquery.Client()
query_job = client.query(sql)
Both dataset and table get passed through via airflow but I'm worried someone could pass through something like: random_table; truncate other_tbl; -- as the table argument.
My fear is that the above will create a table called random_table and then truncate an existing table.
Is there a safer way to process these passed through arguments?
I've looked into parameterized queries in BigQuery but these don't work for table names.
You will have to create a table name validator. I think you can safely validate by using just backticks --> ` at the start and at the end of your table name string. It's not a 100% solution but it worked for some of my test scenarios I try. It should work like this:
# validate should look for ` at the beginning and end of your tablename
table_name = validate(f"`{project}.{dataset}.{table}`")
sql = f"""
CREATE OR REPLACE TABLE {table_name} PARTITION BY DATE(start_time) as (
//OTHER CODE
)
"""
...
Note: I suggest you to check the following post on medium site to check about bigquery sql injection.
I checked the official documentation about Running parameterized queries, and sadly it only covers the parameterization of variables not tables or other string part of your query.
As a final note, I recommend to open a feature request for BigQuery for this particular scenario.
You should probably look into sanitization/validation of user input in general. This is done before passing the input to the BQ query.
With Python, you could look for malicious strings in the user input - like truncate in your example - or use a regex to filter input that for instance contains --. Those are just some quick examples. I recommend you do more research on that topic; you will also find quite a few questions on that topic on SE.

SQLAlchemy: Dynamically pass schema and table name avoiding SQL Injection

How can I execute an SQL query where the schema and table name are passed in a function? Something like below?
def get(engine, schema: str, table: str):
query = text("select * from :schema.:table")
result = engine.connect().execute(query, schema=schema, table=table)
return result
Two things going on here:
Avoiding SQL injection
Dynamically setting a schema with (presumably) PostgreSQL
The first question has a very broad scope, you might want to look at older questions about SQLAlchemy and SQL Injection like this one SQLAlchemy + SQL Injection
Your second question can be addressed in a number of ways, though I would recommend the following approach from SQLAlchemy's documentation: https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#remote-schema-table-introspection-and-postgresql-search-path
PostgreSQL supports a "search path" command which sets the schema for all operations in the transaction.
So your query code might look like:
qry_str = f"SET search_path TO {schema}";
Alternatively, if you use an SQLAlchemy declarative approach, you can use a MetaData object like in this question/answer SQLAlchemy support of Postgres Schemas
You could create a collection of existing table and schema names in your database and check inputs against those values before creating your query:
-- assumes we are connected to the correct *database*
SELECT table_schema, table_name
FROM information_schema.tables;

SQLalchemy not committing changes when setting role

I'm creating tables using a sqlalchemy engine, but even though my create statements execute without error, the tables don't show up in the database when I try to set the role beforehand.
url = 'postgresql://{}:{}#{}:{}/{}'
url = url.format(user, password, host, port, db)
engine = sqlalchemy.create_engine(url)
# works fine
engine.execute("CREATE TABLE testpublic (id int, val text); \n\nINSERT INTO testpublic VALUES (1,'foo'), (2,'bar'), (3,'baz');")
r = engine.execute("select * from testpublic")
r.fetchall() # returns expected tuples
engine.execute("DROP TABLE testpublic;")
# appears to succeed/does NOT throw any error
engine.execute("SET ROLE read_write; CREATE table testpublic (id int, val text);")
# throws error "relation testpublic does not exist"
engine.execute("select * FROM testpublic")
For context, I am on python 3.6, sqlalchemy version 1.2.17 and postgres 11.1 and the role "read_write" absolutely exists and has all necessary permissions to create a table in public (I have no problem running the exact sequence above in pgadmin).
Does anyone know why this is the case and how to fix?
The issue here how sqlalchemy decides to issue a commit after each statement.
if a text is passed to engine.execute, sqlalchemy will attempt to determine if the text is a DML or DDL using the following regex. You can find it in the sources here
AUTOCOMMIT_REGEXP = re.compile(
r"\s*(?:UPDATE|INSERT|CREATE|DELETE|DROP|ALTER)", re.I | re.UNICODE
)
This only detects the words if they're at the start of the text, ignoring any leading whitespaces. So, while your first attempt # works fine, the second example fails to recognize that a commit needs to be issued after the statement is executed because the first word is SET.
Instead, sqlalchemy issues a rollback, so it # appears to succeed/does NOT throw any error.
the simplest solution is to manually commit.
example:
engine.execute("SET ROLE read_write; CREATE table testpublic (id int, val text); COMMIT;")
or, wrap the sql in text and set autocommit=True, as shown in the documentation
stmt = text('set role read_write; create table testpublic (id int, val text);').execution_options(autocommit=True)
e.execute(stmt)

SQLAlchemy table reflection with Sybase

When I try to reflect all tables in my Sybase DB
metadata = MetaData()
metadata.reflect(bind=engine)
SQLAlchemy runs the following query:
SELECT o.name AS name
FROM sysobjects o JOIN sysusers u ON o.uid = u.uid
WHERE u.name = #schema_name
AND o.type = 'U'
I then try to print the contents of metadata.tables, and this yields nothing.
I've tried creating an individual Table object and using the autoload=True option, but this yields a TableDoesNotExist error.
accounts = Table('Accounts', metadata, autoload=True, autoload_with=engine)
I looked into this query and it seems the #schema_name is becoming my username, and none of the tables which come from "sysobjects" appear to have a "name" attribute set to my username. They are all set to "dbo", which means the Database Owner, and thus the query returns nothing, and nothing is ever reflected. Is there any way to force SQLAlchemy to use something different as schema_name?
I've found two questions regarding table reflection using the Sybase dialect. Both were asked 6 years ago and seem to indicate that table reflection with Sybase was unsupported. However, it seems that SQLAlchemy tries to run a genuine sybase reflection query as above, so I don't think this is the case now.
I've solved this by setting the schema parameter on the MetaData object. I had to set it to dbo. You can also specify this in the reflect function.

Issue with the web.py tutorial when using sqlite3

For the record, I have looked into this, but cannot seem to figure out what is wrong.
So I'm doing the tutorial on web.py, and I get to the database part (can do everything above it). I wanted to use sqlite3 for various reasons. Since I couldn't figure out where to type the
sqlite3 test.db
line, I look into the sqlite3 module, and create a database with that. The code for that is:
import sqlite3
conn = sqlite3.connect("test.db")
print("Opened database successfully");
conn.execute('''CREATE TABLE todo
(id serial primary key,
title text,
created timestamp default now(),
done boolean default 'f');''')
conn.execute("INSERT INTO todo (title) VALUES ('Learn web.py')");
but I get the error
done boolean default 'f');''')
sqlite3.OperationalError: near "(": syntax error
I've tried looking into this, but cannot figure out for the life of me what the issue is.
I haven't had luck with other databases (new to this, so not sure on the subtleties), I wasn't able to just make the sqlite database directly so it might be a python thing, but it matches the tester.py I made with the sqlite with python tutorial...
Thanks if anyone can help me!
The problem causing the error is that you can't use the MySQL now() function here. Try instead
created default current_timestamp
This works:
conn.execute('''CREATE TABLE todo
(id serial primary key,
title text,
created default current_timestamp,
done boolean default 'f');''')
You are using SQLite but are specifying data types from some other database engine. SQLite accepts only INT, TEXT, REAL, NUMERIC, and NONE. Boolean is most likely being mapped to one of the number types and therefore DEFAULT 'F' is not valid syntax (although I don't think it would be valid in any version of SQL that does support BOOLEAN as a datatype, since they general use INTEGER for the underlying storage).
Rewrite the CREATE TABLE statement with SQLite datatypes and allowable default values and your code should work fine.
More details on the (somewhat unusual) SQLite type system: http://www.sqlite.org/datatype3.html

Categories