I'm currently building SQL queries depending on input from the user. An example how this is done can be seen here:
def generate_conditions(table_name,nameValues):
sql = u""
for field in nameValues:
sql += u" AND {0}.{1}='{2}'".format(table_name,field,nameValues[field])
return sql
search_query = u"SELECT * FROM Enheter e LEFT OUTER JOIN Handelser h ON e.Id == h.Enhet WHERE 1=1"
if "Enhet" in args:
search_query += generate_conditions("e",args["Enhet"])
c.execute(search_query)
Since the SQL changes every time I cannot insert the values in the execute call which means that I should escape the strings manually. However, when I search everyone points to execute...
I'm also not that satisfied with how I generate the query, so if someone has any idea for another way that would be great also!
You have two options:
Switch to using SQLAlchemy; it'll make generating dynamic SQL a lot more pythonic and ensures proper quoting.
Since you cannot use parameters for table and column names, you'll still have to use string formatting to include these in the query. Your values on the other hand, should always be using SQL parameters, if only so the database can prepare the statement.
It's not advisable to just interpolate table and column names taken straight from user input, it's far too easy to inject arbitrary SQL statements that way. Verify the table and column names against a list of such names you accept instead.
So, to build on your example, I'd go in this direction:
tables = {
'e': ('unit1', 'unit2', ...), # tablename: tuple of column names
}
def generate_conditions(table_name, nameValues):
if table_name not in tables:
raise ValueError('No such table %r' % table_name)
sql = u""
params = []
for field in nameValues:
if field not in tables[table_name]:
raise ValueError('No such column %r' % field)
sql += u" AND {0}.{1}=?".format(table_name, field)
params.append(nameValues[field])
return sql, params
search_query = u"SELECT * FROM Enheter e LEFT OUTER JOIN Handelser h ON e.Id == h.Enhet WHERE 1=1"
search_params = []
if "Enhet" in args:
sql, params = generate_conditions("e",args["Enhet"])
search_query += sql
search_params.extend(params)
c.execute(search_query, search_params)
Related
I have the following code, using pscyopg2:
sql = 'select %s from %s where utctime > %s and utctime < %s order by utctime asc;'
data = (dataItems, voyage, dateRangeLower, dateRangeUpper)
rows = cur.mogrify(sql, data)
This outputs:
select 'waterTemp, airTemp, utctime' from 'ss2012_t02' where utctime > '2012-05-03T17:01:35+00:00'::timestamptz and utctime < '2012-05-01T17:01:35+00:00'::timestamptz order by utctime asc;
When I execute this, it falls over - this is understandable, as the quotes around the table name are illegal.
Is there a way to legally pass the table name as a parameter, or do I need to do a (explicitly warned against) string concatenation, ie:
voyage = 'ss2012_t02'
sql = 'select %s from ' + voyage + ' where utctime > %s and utctime < %s order by utctime asc;'
Cheers for any insights.
According to the official documentation:
If you need to generate dynamically an SQL query (for instance
choosing dynamically a table name) you can use the facilities
provided by the psycopg2.sql module.
The sql module is new in psycopg2 version 2.7. It has the following syntax:
from psycopg2 import sql
cur.execute(
sql.SQL("insert into {table} values (%s, %s)")
.format(table=sql.Identifier('my_table')),
[10, 20])
More on: https://www.psycopg.org/docs/sql.html#module-usage
[Update 2017-03-24: AsIs should NOT be used to represent table or fields names, the new sql module should be used instead: https://stackoverflow.com/a/42980069/5285608 ]
Also, according to psycopg2 documentation:
Warning: Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
Per this answer you can do it as so:
import psycopg2
from psycopg2.extensions import AsIs
#Create your connection and cursor...
cursor.execute("SELECT * FROM %(table)s", {"table": AsIs("my_awesome_table")})
The table name cannot be passed as a parameter, but everything else can. Thus, the table name should be hard coded in your app (Don't take inputs or use anything outside of the program as a name). The code you have should work for this.
On the slight chance that you have a legitimate reason to take an outside table name, make sure that you don't allow the user to directly input it. Perhaps an index could be passed to select a table, or the table name could be looked up in some other way. You are right to be wary of doing this, however. This works, because there are relatively few table names around. Find a way to validate the table name, and you should be fine.
It would be possible to do something like this, to see if the table name exists. This is a parameterised version. Just make sure that you do this and verify the output prior to running the SQL code. Part of the idea for this comes from this answer.
SELECT 1 FROM information_schema.tables WHERE table_schema = 'public' and table_name=%s LIMIT 1
This is a workaround I have used in the past
query = "INSERT INTO %s (col_1, col_2) VALUES (%%s, %%s)" % table_name
cur.execute(query, (col_1_var, col_2_var))
Hope it help :)
This is a small addition to #Antoine Dusséaux's answer. If you want to pass two (unquoted) parameters in a SQL query, you can do it as follows: -
query = sql.SQL("select {field} from {table} where {pkey} = %s").format(
field=sql.Identifier('my_name'),
table=sql.Identifier('some_table'),
pkey=sql.Identifier('id'))
As per the documentation,
Usually you should express the template of your query as an SQL
instance with {}-style placeholders and use format() to merge the
variable parts into them, all of which must be Composable subclasses.
You can still have %s-style placeholders in your query and pass values
to execute(): such value placeholders will be untouched by format()
Source: https://www.psycopg.org/docs/sql.html#module-usage
Also, please keep this in mind while writing queries.
I have created a little utility for preprocessing of SQL statements with variable table (...) names:
from string import letters
NAMECHARS = frozenset(set(letters).union('.'))
def replace_names(sql, **kwargs):
"""
Preprocess an SQL statement: securely replace table ... names
before handing the result over to the database adapter,
which will take care of the values.
There will be no quoting of names, because this would make them
case sensitive; instead it is ensured that no dangerous chars
are contained.
>>> replace_names('SELECT * FROM %(table)s WHERE val=%(val)s;',
... table='fozzie')
'SELECT * FROM fozzie WHERE val=%(val)s;'
"""
for v in kwargs.values():
check_name(v)
dic = SmartDict(kwargs)
return sql % dic
def check_name(tablename):
"""
Check the given name for being syntactically valid,
and usable without quoting
"""
if not isinstance(tablename, basestring):
raise TypeError('%r is not a string' % (tablename,))
invalid = set(tablename).difference(NAMECHARS)
if invalid:
raise ValueError('Invalid chars: %s' % (tuple(invalid),))
for s in tablename.split('.'):
if not s:
raise ValueError('Empty segment in %r' % tablename)
class SmartDict(dict):
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
check_name(key)
return key.join(('%(', ')s'))
The SmartDict object returns %(key)s for every unknown key, preserving them for the value handling. The function could check for the absence of any quote characters, since all quoting now should be taken care of ...
If you want to pass the table name as a parameter, you can use this wrapper:
class Literal(str):
def __conform__(self, quote):
return self
#classmethod
def mro(cls):
return (object, )
def getquoted(self):
return str(self)
Usage: cursor.execute("CREATE TABLE %s ...", (Literal(name), ))
You can just use the module format for the table name and then use the regular paramaterization for the execute:
xlist = (column, table)
sql = 'select {0} from {1} where utctime > %s and utctime < %s order by utctime asc;'.format(xlist)
Keep in mind if this is exposed to the end user, you will not be protected from SQL injection unless you write for it.
Surprised no one has mentioned doing this:
sql = 'select {} from {} where utctime > {} and utctime < {} order by utctime asc;'.format(dataItems, voyage, dateRangeLower, dateRangeUpper)
rows = cur.mogrify(sql)
format puts in the string without quotations.
I'm looking for a way to implement alternating SQL queries - i.e. a function that allows me to filter entries based on different columns. Take the following example:
el=[["a","b",1],["a","b",3]]
def save_sql(foo):
with sqlite3.connect("fn.db") as db:
cur=db.cursor()
cur.execute("CREATE TABLE IF NOT EXISTS et"
"(var1 VARCHAR, var2 VARCHAR, var3 INT)")
cur.executemany("INSERT INTO et VALUES "
"(?,?,?)", foo)
db.commit()
def load_sql(v1,v2,v3):
with sqlite3.connect("fn.db") as db:
cur=db.cursor()
cur.execute("SELECT * FROM et WHERE var1=? AND var2=? AND var3=?", (v1,v2,v3))
return cur.fetchall()
save_sql(el)
Now if I were to use load_sql("a","b",1), it would work. But assume I want to only query for the first and third column, i.e. load_sql("a",None,1) (the None is just intended as a placeholder) or only the last column load_sql(None,None,5), this wouldn't work.
This could of course be done with if statements checking which variables were supplied in the function call, but in tables with larger amounts of columns, this might get messy.
Is there a good way to do this?
What if load_sql() would accept an arbitrary number of keyword arguments, where keyword argument names would correspond to column names. Something along these lines:
def load_sql(**values):
with sqlite3.connect("fn.db") as db:
cur = db.cursor()
query = "SELECT * FROM et"
conditions = [f"{column_name} = :{column_name}" for column_name in values]
if conditions:
query = query + " WHERE " + " AND ".join(conditions)
cur.execute(query, values)
return cur.fetchall()
Note that here we trust keyword argument names to be valid and existing column names (and string-format them into the query) which may potentially be used as an SQL injection attack vector.
As a side note, I cannot stop but think that this feels like a reinventing-the-wheel step towards an actual ORM. Look into lightweight PonyORM or Peewee abstraction layers between Python and a database.
It will inevitably get messy if you want your SQL statements to remain sanitized/safe, but as long as you control your function signature it can remain reasonably safe, e.g.:
def load_sql(var1, var2, var3):
fields = dict(field for field in locals().items() if field[1] is not None)
query = "SELECT * FROM et"
if fields: # if at least one field is not None:
query += " WHERE " + " AND ".join((k + "=?" for k in fields.keys()))
with sqlite3.connect("fn.db") as db:
cur = db.cursor()
cur.execute(query, fields.values())
return cur.fetchall()
You can replace the function signature with load_sql(**kwargs) and then use kwargs.items() instead of locals.items() so that you can pass arbitrary column names, but that can be very dangerous and is certainly not recommended.
I'm new to Python and SQL, but I need to delete multiple entries in a table on a remote server. I would also prefer to preserve the input structure of a function I was given because it is used in codes of other colleagues.
I came up with a solution that does the job similar to the one presented below. I deliberately avoided using any sort of executemany() methods because (if I am not mistaken,) they can be terribly slow.
import sqlalchemy as sa
import urllib
def delete_rows(tablename, colnames, data):
"""
tablename - name of db table with dbname. like RiskData..factors
colnames - column names to use as keys in deletion
data - a list of tuples, a tuple per row, number of elements in each
tuple must is the same as number of column names
"""
# Connection details
engine = sa.create_engine("mssql+pyodbc://some_server")
connection = self.engine.connect()
# Data has to be a list - throw an exception if it is not
if (not (type(data) is list)):
raise Exception('Data must be a list');
# assemble one long query statement
query = "DELETE " + tablename + " WHERE "
query_dp = "or (" + " = '{}' and ".join(colnames) + "= '{}') "
query_tail = ""
for record_entries in data:
query_tail += query_dp.format(*record_entries)
query += query_tail[3:-1]
connection.execute(query)
connection.close()
I would like to ask whether this solution is inefficient and will be slow for a large amounts of data? If so, what would a more elegant solution be?
Don't know about speed, but as far as elegance goes, don't use string formatting for passing values to SQL queries. Since you're already using SQLAlchemy, you can leverage its query building capabilities:
def delete_rows(tablename, colnames, data):
"""
tablename - name of db table with dbname. like RiskData..factors
colnames - column names to use as keys in deletion
data - a list of tuples, a tuple per row, number of elements in each
tuple must is the same as number of column names
"""
# Data has to be a list - throw an exception if it is not
if not isinstance(data, list):
raise Exception('Data must be a list');
# Connection details
engine = sa.create_engine("mssql+pyodbc://some_server")
# Create `column()` objects for producing bindparams
cols = [sa.column(name) for name in colnames]
# Create a list of predicates, to be joined with OR
preds = []
for record_entries in data:
pred = sa.and_(*[c == e for c, e in zip(cols, record_entries)])
preds.append(pred)
# assemble one long query statement
query = sa.table(tablename).delete().where(sa.or_(*preds))
with engine.begin() as connection:
connection.execute(query)
Whether or not executemany() is slow depends on the DB-API driver in use. In case of pyodbc this used to be true, but there's been work to improve it.
I'm trying to execute a raw sql query and safely pass an order by/asc/desc based on user input. This is the back end for a paginated datagrid. I cannot for the life of me figure out how to do this safely. Parameters get converted to strings so Oracle can't execute the query. I can't find any examples of this anywhere on the internet. What is the best way to safely accomplish this? (I am not using the ORM, must be raw sql).
My workaround is just setting ASC/DESC to a variable that I set. This works fine and is safe. However, how do I bind a column name to the ORDER BY? Is that even possible? I can just whitelist a bunch of columns and do something similar as I do with the ASC/DESC. I was just curious if there's a way to bind it. Thanks.
#default.route('/api/barcodes/<sort_by>/<sort_dir>', methods=['GET'])
#json_enc
def fetch_barcodes(sort_by, sort_dir):
#time.sleep(5)
# Can't use sort_dir as a parameter, so assign to variable to sanitize it
ord_dir = "DESC" if sort_dir.lower() == 'desc' else 'ASC'
records = []
stmt = text("SELECT bb_request_id,bb_barcode,bs_status, "
"TO_CHAR(bb_rec_cre_date, 'MM/DD/YYYY') AS bb_rec_cre_date "
"FROM bars_barcodes,bars_status "
"WHERE bs_status_id = bb_status_id "
"ORDER BY :ord_by :ord_dir ")
stmt = stmt.bindparams(ord_by=sort_by,ord_dir=ord_dir)
rs = db.session.execute(stmt)
records = [dict(zip(rs.keys(), row)) for row in rs]
DatabaseError: (cx_Oracle.DatabaseError) ORA-01036: illegal variable name/number
[SQL: "SELECT bb_request_id,bb_barcode,bs_status, TO_CHAR(bb_rec_cre_date, 'MM/DD/YYYY') AS bb_rec_cre_date FROM bars_barcodes,bars_status WHERE bs_status_id = bb_status_id ORDER BY :ord_by :ord_dir "] [parameters: {'ord_by': u'bb_rec_cre_date', 'ord_dir': 'ASC'}]
UPDATE Solution based on accepted answer:
def fetch_barcodes(sort_by, sort_dir, page, rows_per_page):
ord_dir_func = desc if sort_dir.lower() == 'desc' else asc
query_limit = int(rows_per_page)
query_offset = (int(page) - 1) * query_limit
stmt = select([column('bb_request_id'),
column('bb_barcode'),
column('bs_status'),
func.to_char(column('bb_rec_cre_date'), 'MM/DD/YYYY').label('bb_rec_cre_date')]).\
select_from(table('bars_barcode')).\
select_from(table('bars_status')).\
where(column('bs_status_id') == column('bb_status_id')).\
order_by(ord_dir_func(column(sort_by))).\
limit(query_limit).offset(query_offset)
result = db.session.execute(stmt)
records = [dict(row) for row in result]
response = json_return()
response.addRecords(records)
#response.setTotal(len(records))
response.setTotal(1001)
response.setSuccess(True)
response.addMessage("Records retrieved successfully. Limit: " + str(query_limit) + ", Offset: " + str(query_offset) + " SQL: " + str(stmt))
return response
You could use Core constructs such as table() and column() for this instead of raw SQL strings. That'd make your life easier in this regard:
from sqlalchemy import select, table, column, asc, desc
ord_dir = desc if sort_dir.lower() == 'desc' else asc
stmt = select([column('bb_request_id'),
column('bb_barcode'),
column('bs_status'),
func.to_char(column('bb_rec_cre_date'),
'MM/DD/YYYY').label('bb_rec_cre_date')]).\
select_from(table('bars_barcodes')).\
select_from(table('bars_status')).\
where(column('bs_status_id') == column('bb_status_id')).\
order_by(ord_dir(column(sort_by)))
table() and column() represent the syntactic part of a full blown Table object with Columns and can be used in this fashion for escaping purposes:
The text handled by column() is assumed to be handled like the name of a database column; if the string contains mixed case, special characters, or matches a known reserved word on the target backend, the column expression will render using the quoting behavior determined by the backend.
Still, whitelisting might not be a bad idea.
Note that you don't need to manually zip() the row proxies in order to produce dictionaries. They act as mappings as is, and if you need dict() for serialization reasons or such, just do dict(row).
I'm currently building SQL queries depending on input from the user. An example how this is done can be seen here:
def generate_conditions(table_name,nameValues):
sql = u""
for field in nameValues:
sql += u" AND {0}.{1}='{2}'".format(table_name,field,nameValues[field])
return sql
search_query = u"SELECT * FROM Enheter e LEFT OUTER JOIN Handelser h ON e.Id == h.Enhet WHERE 1=1"
if "Enhet" in args:
search_query += generate_conditions("e",args["Enhet"])
c.execute(search_query)
Since the SQL changes every time I cannot insert the values in the execute call which means that I should escape the strings manually. However, when I search everyone points to execute...
I'm also not that satisfied with how I generate the query, so if someone has any idea for another way that would be great also!
You have two options:
Switch to using SQLAlchemy; it'll make generating dynamic SQL a lot more pythonic and ensures proper quoting.
Since you cannot use parameters for table and column names, you'll still have to use string formatting to include these in the query. Your values on the other hand, should always be using SQL parameters, if only so the database can prepare the statement.
It's not advisable to just interpolate table and column names taken straight from user input, it's far too easy to inject arbitrary SQL statements that way. Verify the table and column names against a list of such names you accept instead.
So, to build on your example, I'd go in this direction:
tables = {
'e': ('unit1', 'unit2', ...), # tablename: tuple of column names
}
def generate_conditions(table_name, nameValues):
if table_name not in tables:
raise ValueError('No such table %r' % table_name)
sql = u""
params = []
for field in nameValues:
if field not in tables[table_name]:
raise ValueError('No such column %r' % field)
sql += u" AND {0}.{1}=?".format(table_name, field)
params.append(nameValues[field])
return sql, params
search_query = u"SELECT * FROM Enheter e LEFT OUTER JOIN Handelser h ON e.Id == h.Enhet WHERE 1=1"
search_params = []
if "Enhet" in args:
sql, params = generate_conditions("e",args["Enhet"])
search_query += sql
search_params.extend(params)
c.execute(search_query, search_params)