I have the following code, using pscyopg2:
sql = 'select %s from %s where utctime > %s and utctime < %s order by utctime asc;'
data = (dataItems, voyage, dateRangeLower, dateRangeUpper)
rows = cur.mogrify(sql, data)
This outputs:
select 'waterTemp, airTemp, utctime' from 'ss2012_t02' where utctime > '2012-05-03T17:01:35+00:00'::timestamptz and utctime < '2012-05-01T17:01:35+00:00'::timestamptz order by utctime asc;
When I execute this, it falls over - this is understandable, as the quotes around the table name are illegal.
Is there a way to legally pass the table name as a parameter, or do I need to do a (explicitly warned against) string concatenation, ie:
voyage = 'ss2012_t02'
sql = 'select %s from ' + voyage + ' where utctime > %s and utctime < %s order by utctime asc;'
Cheers for any insights.
According to the official documentation:
If you need to generate dynamically an SQL query (for instance
choosing dynamically a table name) you can use the facilities
provided by the psycopg2.sql module.
The sql module is new in psycopg2 version 2.7. It has the following syntax:
from psycopg2 import sql
cur.execute(
sql.SQL("insert into {table} values (%s, %s)")
.format(table=sql.Identifier('my_table')),
[10, 20])
More on: https://www.psycopg.org/docs/sql.html#module-usage
[Update 2017-03-24: AsIs should NOT be used to represent table or fields names, the new sql module should be used instead: https://stackoverflow.com/a/42980069/5285608 ]
Also, according to psycopg2 documentation:
Warning: Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
Per this answer you can do it as so:
import psycopg2
from psycopg2.extensions import AsIs
#Create your connection and cursor...
cursor.execute("SELECT * FROM %(table)s", {"table": AsIs("my_awesome_table")})
The table name cannot be passed as a parameter, but everything else can. Thus, the table name should be hard coded in your app (Don't take inputs or use anything outside of the program as a name). The code you have should work for this.
On the slight chance that you have a legitimate reason to take an outside table name, make sure that you don't allow the user to directly input it. Perhaps an index could be passed to select a table, or the table name could be looked up in some other way. You are right to be wary of doing this, however. This works, because there are relatively few table names around. Find a way to validate the table name, and you should be fine.
It would be possible to do something like this, to see if the table name exists. This is a parameterised version. Just make sure that you do this and verify the output prior to running the SQL code. Part of the idea for this comes from this answer.
SELECT 1 FROM information_schema.tables WHERE table_schema = 'public' and table_name=%s LIMIT 1
This is a workaround I have used in the past
query = "INSERT INTO %s (col_1, col_2) VALUES (%%s, %%s)" % table_name
cur.execute(query, (col_1_var, col_2_var))
Hope it help :)
This is a small addition to #Antoine Dusséaux's answer. If you want to pass two (unquoted) parameters in a SQL query, you can do it as follows: -
query = sql.SQL("select {field} from {table} where {pkey} = %s").format(
field=sql.Identifier('my_name'),
table=sql.Identifier('some_table'),
pkey=sql.Identifier('id'))
As per the documentation,
Usually you should express the template of your query as an SQL
instance with {}-style placeholders and use format() to merge the
variable parts into them, all of which must be Composable subclasses.
You can still have %s-style placeholders in your query and pass values
to execute(): such value placeholders will be untouched by format()
Source: https://www.psycopg.org/docs/sql.html#module-usage
Also, please keep this in mind while writing queries.
I have created a little utility for preprocessing of SQL statements with variable table (...) names:
from string import letters
NAMECHARS = frozenset(set(letters).union('.'))
def replace_names(sql, **kwargs):
"""
Preprocess an SQL statement: securely replace table ... names
before handing the result over to the database adapter,
which will take care of the values.
There will be no quoting of names, because this would make them
case sensitive; instead it is ensured that no dangerous chars
are contained.
>>> replace_names('SELECT * FROM %(table)s WHERE val=%(val)s;',
... table='fozzie')
'SELECT * FROM fozzie WHERE val=%(val)s;'
"""
for v in kwargs.values():
check_name(v)
dic = SmartDict(kwargs)
return sql % dic
def check_name(tablename):
"""
Check the given name for being syntactically valid,
and usable without quoting
"""
if not isinstance(tablename, basestring):
raise TypeError('%r is not a string' % (tablename,))
invalid = set(tablename).difference(NAMECHARS)
if invalid:
raise ValueError('Invalid chars: %s' % (tuple(invalid),))
for s in tablename.split('.'):
if not s:
raise ValueError('Empty segment in %r' % tablename)
class SmartDict(dict):
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
check_name(key)
return key.join(('%(', ')s'))
The SmartDict object returns %(key)s for every unknown key, preserving them for the value handling. The function could check for the absence of any quote characters, since all quoting now should be taken care of ...
If you want to pass the table name as a parameter, you can use this wrapper:
class Literal(str):
def __conform__(self, quote):
return self
#classmethod
def mro(cls):
return (object, )
def getquoted(self):
return str(self)
Usage: cursor.execute("CREATE TABLE %s ...", (Literal(name), ))
You can just use the module format for the table name and then use the regular paramaterization for the execute:
xlist = (column, table)
sql = 'select {0} from {1} where utctime > %s and utctime < %s order by utctime asc;'.format(xlist)
Keep in mind if this is exposed to the end user, you will not be protected from SQL injection unless you write for it.
Surprised no one has mentioned doing this:
sql = 'select {} from {} where utctime > {} and utctime < {} order by utctime asc;'.format(dataItems, voyage, dateRangeLower, dateRangeUpper)
rows = cur.mogrify(sql)
format puts in the string without quotations.
Related
I want to dynamically choose what table to use in a SQL query, but I just keep getting error however I am trying to format this. Also tried %s instead of ?.
Any suggestions?
group_food = (group, food)
group_food_new = (group, food, 1)
with con:
cur = con.cursor()
tmp = cur.execute("SELECT COUNT(Name) FROM (?) WHERE Name=?", group_food)
if tmp == 0:
cur.execute("INSERT INTO ? VALUES(?, ?)", group_food_new)
else:
times_before = cur.execute("SELECT Times FROM ? WHERE Name=?", group_food)
group_food_update = (group, (times_before +1), food)
cur.execute("UPDATE ? SET Times=? WHERE Name=?", group_food_update)
You cannot use SQL parameters to be placeholders in SQL objects; one of the reasons for using a SQL parameters is to escape the value such that the database can never mistake the contents for a database object.
You'll have to interpolate the database objects separately; escape your identifiers by doubling any " double quote parameters and use
cur.execute('SELECT COUNT(Name) FROM "{}" WHERE Name=?'.format(group.replace('"', '""')), (food,))
and
cur.execute('INSERT INTO "{}" VALUES(?, ?)'.format(group.replace('"', '""')), (food, 1))
and
cur.execute('UPDATE "{}" SET Times=? WHERE Name=?'.format(group.replace('"', '""')),
(times_before + 1, food))
The ".." double quotes are there to properly demark an identifier, even if that identifier is also a valid keyword; any existing " characters in the name must be doubled; this also helps de-fuse SQL injection attempts.
However, if your object names are user-sourced, you'll have to do your own (stringent) validation on the object names to prevent SQL injection attacks here. Always validate them against existing objects in that case.
You should really consider using a project like SQLAlchemy to generate your SQL instead; it can take care of validating object names and rigorously protect you from SQL injection risks. It can load your table definitions up front so it'll know what names are legal:
from sqlalchemy import create_engine, func, select, MetaData
engine = create_engine('sqlite:////path/to/database')
meta = MetaData()
meta.reflect(bind=engine)
conn = engine.connect()
group_table = meta.tables[group] # can only find existing tables
count_statement = select([func.count(group_table.c.Name)], group_table.c.Name == food)
count, = conn.execute(count_statement).fetchone()
if count:
# etc.
What are the values of group and food? The guidelines say to make the question as such others can benefit from it, for that to be the case here we need the values of group and food.
It seems you use the Python String formatter instead of SQL parameters for table names even though https://docs.python.org/3/library/sqlite3.html#module-sqlite3 says using the String formatter in unsafe.
# Never do this -- insecure!
symbol = 'RHAT'
c.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)
# Do this instead
t = ('RHAT',)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
Using %s as a placeholder then putting % outside the string with the variables in Python2 has been Replaced in Python3 with .format() function with the variables as arguments.
I've found building the SQL query up as a text string and then passing this string into the c.execute() function works.
querySelect = "SELECT * FROM " + str(your_table_variable)
queryWhere = " WHERE " + str(variableName) + " = " str(variableValue)
query = querySelect + queryWhere
c.execute(query)
I don't know the security situation around it though (re injection) and I'm sure there are probably better ways of doing this.
I'm looking for a way to implement alternating SQL queries - i.e. a function that allows me to filter entries based on different columns. Take the following example:
el=[["a","b",1],["a","b",3]]
def save_sql(foo):
with sqlite3.connect("fn.db") as db:
cur=db.cursor()
cur.execute("CREATE TABLE IF NOT EXISTS et"
"(var1 VARCHAR, var2 VARCHAR, var3 INT)")
cur.executemany("INSERT INTO et VALUES "
"(?,?,?)", foo)
db.commit()
def load_sql(v1,v2,v3):
with sqlite3.connect("fn.db") as db:
cur=db.cursor()
cur.execute("SELECT * FROM et WHERE var1=? AND var2=? AND var3=?", (v1,v2,v3))
return cur.fetchall()
save_sql(el)
Now if I were to use load_sql("a","b",1), it would work. But assume I want to only query for the first and third column, i.e. load_sql("a",None,1) (the None is just intended as a placeholder) or only the last column load_sql(None,None,5), this wouldn't work.
This could of course be done with if statements checking which variables were supplied in the function call, but in tables with larger amounts of columns, this might get messy.
Is there a good way to do this?
What if load_sql() would accept an arbitrary number of keyword arguments, where keyword argument names would correspond to column names. Something along these lines:
def load_sql(**values):
with sqlite3.connect("fn.db") as db:
cur = db.cursor()
query = "SELECT * FROM et"
conditions = [f"{column_name} = :{column_name}" for column_name in values]
if conditions:
query = query + " WHERE " + " AND ".join(conditions)
cur.execute(query, values)
return cur.fetchall()
Note that here we trust keyword argument names to be valid and existing column names (and string-format them into the query) which may potentially be used as an SQL injection attack vector.
As a side note, I cannot stop but think that this feels like a reinventing-the-wheel step towards an actual ORM. Look into lightweight PonyORM or Peewee abstraction layers between Python and a database.
It will inevitably get messy if you want your SQL statements to remain sanitized/safe, but as long as you control your function signature it can remain reasonably safe, e.g.:
def load_sql(var1, var2, var3):
fields = dict(field for field in locals().items() if field[1] is not None)
query = "SELECT * FROM et"
if fields: # if at least one field is not None:
query += " WHERE " + " AND ".join((k + "=?" for k in fields.keys()))
with sqlite3.connect("fn.db") as db:
cur = db.cursor()
cur.execute(query, fields.values())
return cur.fetchall()
You can replace the function signature with load_sql(**kwargs) and then use kwargs.items() instead of locals.items() so that you can pass arbitrary column names, but that can be very dangerous and is certainly not recommended.
I'm trying to execute a raw sql query and safely pass an order by/asc/desc based on user input. This is the back end for a paginated datagrid. I cannot for the life of me figure out how to do this safely. Parameters get converted to strings so Oracle can't execute the query. I can't find any examples of this anywhere on the internet. What is the best way to safely accomplish this? (I am not using the ORM, must be raw sql).
My workaround is just setting ASC/DESC to a variable that I set. This works fine and is safe. However, how do I bind a column name to the ORDER BY? Is that even possible? I can just whitelist a bunch of columns and do something similar as I do with the ASC/DESC. I was just curious if there's a way to bind it. Thanks.
#default.route('/api/barcodes/<sort_by>/<sort_dir>', methods=['GET'])
#json_enc
def fetch_barcodes(sort_by, sort_dir):
#time.sleep(5)
# Can't use sort_dir as a parameter, so assign to variable to sanitize it
ord_dir = "DESC" if sort_dir.lower() == 'desc' else 'ASC'
records = []
stmt = text("SELECT bb_request_id,bb_barcode,bs_status, "
"TO_CHAR(bb_rec_cre_date, 'MM/DD/YYYY') AS bb_rec_cre_date "
"FROM bars_barcodes,bars_status "
"WHERE bs_status_id = bb_status_id "
"ORDER BY :ord_by :ord_dir ")
stmt = stmt.bindparams(ord_by=sort_by,ord_dir=ord_dir)
rs = db.session.execute(stmt)
records = [dict(zip(rs.keys(), row)) for row in rs]
DatabaseError: (cx_Oracle.DatabaseError) ORA-01036: illegal variable name/number
[SQL: "SELECT bb_request_id,bb_barcode,bs_status, TO_CHAR(bb_rec_cre_date, 'MM/DD/YYYY') AS bb_rec_cre_date FROM bars_barcodes,bars_status WHERE bs_status_id = bb_status_id ORDER BY :ord_by :ord_dir "] [parameters: {'ord_by': u'bb_rec_cre_date', 'ord_dir': 'ASC'}]
UPDATE Solution based on accepted answer:
def fetch_barcodes(sort_by, sort_dir, page, rows_per_page):
ord_dir_func = desc if sort_dir.lower() == 'desc' else asc
query_limit = int(rows_per_page)
query_offset = (int(page) - 1) * query_limit
stmt = select([column('bb_request_id'),
column('bb_barcode'),
column('bs_status'),
func.to_char(column('bb_rec_cre_date'), 'MM/DD/YYYY').label('bb_rec_cre_date')]).\
select_from(table('bars_barcode')).\
select_from(table('bars_status')).\
where(column('bs_status_id') == column('bb_status_id')).\
order_by(ord_dir_func(column(sort_by))).\
limit(query_limit).offset(query_offset)
result = db.session.execute(stmt)
records = [dict(row) for row in result]
response = json_return()
response.addRecords(records)
#response.setTotal(len(records))
response.setTotal(1001)
response.setSuccess(True)
response.addMessage("Records retrieved successfully. Limit: " + str(query_limit) + ", Offset: " + str(query_offset) + " SQL: " + str(stmt))
return response
You could use Core constructs such as table() and column() for this instead of raw SQL strings. That'd make your life easier in this regard:
from sqlalchemy import select, table, column, asc, desc
ord_dir = desc if sort_dir.lower() == 'desc' else asc
stmt = select([column('bb_request_id'),
column('bb_barcode'),
column('bs_status'),
func.to_char(column('bb_rec_cre_date'),
'MM/DD/YYYY').label('bb_rec_cre_date')]).\
select_from(table('bars_barcodes')).\
select_from(table('bars_status')).\
where(column('bs_status_id') == column('bb_status_id')).\
order_by(ord_dir(column(sort_by)))
table() and column() represent the syntactic part of a full blown Table object with Columns and can be used in this fashion for escaping purposes:
The text handled by column() is assumed to be handled like the name of a database column; if the string contains mixed case, special characters, or matches a known reserved word on the target backend, the column expression will render using the quoting behavior determined by the backend.
Still, whitelisting might not be a bad idea.
Note that you don't need to manually zip() the row proxies in order to produce dictionaries. They act as mappings as is, and if you need dict() for serialization reasons or such, just do dict(row).
I want to dynamically choose what table to use in a SQL query, but I just keep getting error however I am trying to format this. Also tried %s instead of ?.
Any suggestions?
group_food = (group, food)
group_food_new = (group, food, 1)
with con:
cur = con.cursor()
tmp = cur.execute("SELECT COUNT(Name) FROM (?) WHERE Name=?", group_food)
if tmp == 0:
cur.execute("INSERT INTO ? VALUES(?, ?)", group_food_new)
else:
times_before = cur.execute("SELECT Times FROM ? WHERE Name=?", group_food)
group_food_update = (group, (times_before +1), food)
cur.execute("UPDATE ? SET Times=? WHERE Name=?", group_food_update)
You cannot use SQL parameters to be placeholders in SQL objects; one of the reasons for using a SQL parameters is to escape the value such that the database can never mistake the contents for a database object.
You'll have to interpolate the database objects separately; escape your identifiers by doubling any " double quote parameters and use
cur.execute('SELECT COUNT(Name) FROM "{}" WHERE Name=?'.format(group.replace('"', '""')), (food,))
and
cur.execute('INSERT INTO "{}" VALUES(?, ?)'.format(group.replace('"', '""')), (food, 1))
and
cur.execute('UPDATE "{}" SET Times=? WHERE Name=?'.format(group.replace('"', '""')),
(times_before + 1, food))
The ".." double quotes are there to properly demark an identifier, even if that identifier is also a valid keyword; any existing " characters in the name must be doubled; this also helps de-fuse SQL injection attempts.
However, if your object names are user-sourced, you'll have to do your own (stringent) validation on the object names to prevent SQL injection attacks here. Always validate them against existing objects in that case.
You should really consider using a project like SQLAlchemy to generate your SQL instead; it can take care of validating object names and rigorously protect you from SQL injection risks. It can load your table definitions up front so it'll know what names are legal:
from sqlalchemy import create_engine, func, select, MetaData
engine = create_engine('sqlite:////path/to/database')
meta = MetaData()
meta.reflect(bind=engine)
conn = engine.connect()
group_table = meta.tables[group] # can only find existing tables
count_statement = select([func.count(group_table.c.Name)], group_table.c.Name == food)
count, = conn.execute(count_statement).fetchone()
if count:
# etc.
What are the values of group and food? The guidelines say to make the question as such others can benefit from it, for that to be the case here we need the values of group and food.
It seems you use the Python String formatter instead of SQL parameters for table names even though https://docs.python.org/3/library/sqlite3.html#module-sqlite3 says using the String formatter in unsafe.
# Never do this -- insecure!
symbol = 'RHAT'
c.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)
# Do this instead
t = ('RHAT',)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
Using %s as a placeholder then putting % outside the string with the variables in Python2 has been Replaced in Python3 with .format() function with the variables as arguments.
I've found building the SQL query up as a text string and then passing this string into the c.execute() function works.
querySelect = "SELECT * FROM " + str(your_table_variable)
queryWhere = " WHERE " + str(variableName) + " = " str(variableValue)
query = querySelect + queryWhere
c.execute(query)
I don't know the security situation around it though (re injection) and I'm sure there are probably better ways of doing this.
Let's say we have a SQL statement that just needs to be completed with the parameters before getting executed against the DB. For instance:
sql = '''
SELECT id, price, date_out
FROM sold_items
WHERE date_out BETWEEN ? AND ?
'''
database_cursor.execute(sql, (start_date, end_date))
How do I get the string that is parsed and executed?, something like this:
SELECT id, price, date_out
FROM sold_items
WHERE date_out BETWEEN 2010-12-05 AND 2011-12-01
In this simple case it's not very important, but I have other SQL Statements much more complicated, and for debugging purposes I would like to execute them myself in my sqlite manager and check the results.
Thanks in advance
UPDATE. I learned from this web page that since Python 3.3 you can trigger printing of executed SQL with
connection.set_trace_callback(print)
Should you want to revert to silent processing, use
connection.set_trace_callback(None)
You can use another function instead of print.
SQLite never actually substitutes parameters into the SQL query string itself; the parameters' values are read directly when it executes the command.
(Formatting those values only to parse them again into the same values would be useless overhead.)
But if you want to find out how the parameters would be written in SQL, you can use the quote function; something like this:
import re
def log_and_execute(cursor, sql, *args):
s = sql
if len(args) > 0:
# generates SELECT quote(?), quote(?), ...
cursor.execute("SELECT " + ", ".join(["quote(?)" for i in args]), args)
quoted_values = cursor.fetchone()
for quoted_value in quoted_values:
s = s.replace('?', quoted_value, 1)
#s = re.sub(r'(values \(|, | = )\?', r'\g<1>' + quoted_value, s, 1)
print "SQL command: " + s
cursor.execute(sql, args)
(This code will fail if there is a ? that is not a parameter, i.e., inside a literal string. Unless you use the re.sub version, which will only match a ? after 'values (' ', ' or ' = '. The '\g<1>' puts back the text before the ? and using '\g<>' avoids clashes with quoted_values that start with a number.)
I've written a function that just fills in the question marks with the arguments.
It's weird that everyone sends you in the direction of using positional arguments, but no one thought about the need to log or preview or check queries in their totality.
Anyways, the code below assumes
That there are no '?' tokens in the query that do not serve as postional argument tokens. I'm not sure whether that is always the case.
That the value of the argument will be wrapped in quotes. This will not be the case when you use an argument for a table name for example. This usecase is unlikely though.
def compile_query(query, *args):
# test for mismatch in number of '?' tokens and given arguments
number_of_question_marks = query.count('?')
number_of_arguments = len(args)
# When no args are given, an empty tuple is passed
if len(args) == 1 and (not args[0]):
number_of_arguments = 0
if number_of_arguments != number_of_question_marks:
return f"Incorrect number of bindings supplied. The current statement uses {number_of_question_marks}, and there are {number_of_arguments} supplied."
# compile query
for a in args:
query = query.replace('?', "'"+str(a)+"'", 1)
return query
Suggested usage
query = "INSERT INTO users (name, password) VALUES (?, ?)"
# sensitive query, we need to log this for security
query_string = compile_query(query, username, password_hash)
fancy_log_function(query_string)
# execute
cursor.execute(query, username, password_hash)
What about string formatting it?
sql = """
SELECT id, price, date_out
FROM sold_items
WHERE date_out BETWEEN {0} AND {1} """.format(start_date, end_date)
"""
database_cursor.execute(sql)