How to create a SQL query with optional parameters in Python Flask?
I use a HTML form to filter the data from a SQL table. The form has multiple optional fields. How can I create a SQL query based on the filled fields in the form?
For example, if the users fill in 3 fields "name", "amount" and "itemtype", the query is like:
rows = cursor.execute("""SELECT * FROM items WHERE name = ? AND amount=? AND itemtype = ? """, name, amount, itemtype).fetchall()
If they skip "amount", the query is like:
rows = cursor.execute("""SELECT * FROM items WHERE name = ? AND itemtype = ? """, name, itemtype).fetchall()
I prefer to use Python's format string function for this. It's splitting hairs, but format allows to set name in the string, so it's technically more explicit. However, I would suggest using **kwargs instead of *args, so you don't have to rely on magic.
UPADATE 2019
This was a terrible answer. You never, ever, EVER want to take user generated data and interpolate it directly into a SQL query. It is imperative that you always sanitize user input in order to protect against SQL injection. Python has defined a database API specification that any database package that is not an ORM like SQLAlchemy should implement. Long story short, you should be NEVER, EVER, EVER use str.format(), %, or "fstrings" to interpolate data into your SQL queries.
The database API specification provides a way to safely interpolate data into queries. Every python database interface should have a Cursor class that is returned from a Connection object. The Cursor class will implement a method named execute. This obviously will execute a query, but it will also have a second argument–usually called args. According to the specification:
Parameters may be provided as sequence or mapping and will be bound to variables in the operation. Variables are specified in a database-specific notation (see the module's paramstyle attribute for details).
By "sequence", it means that args can be a list or tuple, and by "mapping", it means that args can also be a dict. Depending on the package, the way to specify where your data should be interpolated may differ. There are six options for this. Which formatting the package you're using can be found in the paramstyle constant of the package. For instance, PyMySQL(and most implementations of the spec that I've used) uses format and pyformat. A simple example would be:
format
cursor.execute('SELECT * FROM t WHERE a = %s, b = %s;', (1, 'baz'))
pyformat
cursor.execute('SELECT * FROM t WHERE a = %(foo)s, b = %(bar)s;', {'foo': 1, 'bar': 'baz'})
Both of these would execute as:
SELECT * FROM t WHERE a = 1, b = 'baz';
You should make sure to explore the documentation of the database API package you're using. One extremely helpful thing I came across using psycopg2, a PostgresSQL package, was its extras module. For instance, a common problem when trying to insert data securely is encountered when inserting multiple rows at once. psycopg2 has a clever solution to this problem in its execute_values function. Using execute_values, this code:
execute_values(cursor, "INSERT INTO t (a, b) VALUES %s;", ((1, 'foo'), (2, 'baz')))
... is executed as:
"INSERT INTO t (a, b) VALUES (1, 'foo'), (2, 'baz');"
I don't think the current answer actually addresses "what if I have parameters that are optional?", so I have a similar scenario that I'm solving like this -
def some_func(a, b = None, c = None):
where_clause = f"id = {str(a)}"
if b:
where_clause += f" and lower(b) = lower('{b}')"
if c:
where_clause += f" and c = {str(c)}"
query = f"select * from table where {where_clause}"
Not the most scalable solution but it works if you only have a few optional parameters. You could refactor the clause-builder into its own function to build the string and accept parameters for any transforms that need to be applied (lower, etc).
I also assume there are some ORM's with functionality that solves this but while working on a small app this is sufficient for me.
Related
I have a function that executes many SQL queries with different dates.
What I want is to pass all dates and other query variables as function parameters and then just execute the function. I have figured out how to do this for datetime variables as below. But I also have a query that looks at specific campaign_names in a database and pulls those as strings. I want to be able to pass those strings as function parameters but I haven't figured out the correct syntax for this in the SQL query.
def Camp_eval(start_date,end_1M,camp1,camp2,camp3):
query1 = f"""SELECT CONTACT_NUMBER, OUTCOME_DATE
FROM DATABASE1
where OUTCOME_DATE >= (to_date('{start_date}', 'dd/mm/yyyy'))
and OUTCOME_DATE < (to_date('{end_1M}', 'dd/mm/yyyy'))"""
query2 = """SELECT CONTACT_NUMBER
FROM DATABASE2
WHERE (CAMP_NAME = {camp1} or
CAMP_NAME = {camp2} or
CAMP_NAME = {camp3})"""
Camp_eval('01/04/2022','01/05/2022','Camp_2022_04','Camp_2022_05','Camp_2022_06')
The parameters start_date and end_1M work fine with the {} brackets but the camp variables, which are strings don't return any results even though there are results in the database with those conditions if I were to write them directly in the query.
Any help would be appreciated!!
Please, do not use f-strings for creating SQL queries!
Most likely, any library you use for accessing a database already has a way of creating queries: SQLite docs (check code examples).
Another example: cur.execute("SELECT * FROM tasks WHERE priority = ?", (priority,)).
Not only this way is safer (fixes SQL Injection problem mentioned by #d-malan in comments), but it also eliminates the need to care about how data is represented in SQL - the library will automatically cast dates, strings, etc. in what they need to be casted into. Therefore, your problem can be fixed by using proper instruments.
Background: Not much documentation on MySQLdb Connector
Maybe I'm looking in the wrong places, but there's not much documentation about Python's MySQLdb family of connectors. Perhaps PEP249 is meant to do the job. Oracle's MySQL/Python connector seems to have much better docs, but at the moment I'm working with mysqlclient (the 3.x version of MySQLdb, which wraps around the C connector).
Named Parameters in MySQLdb: working for single values
After much searching, I stumbled upon beautiful syntax for binding named parameters, so long as they are a single value. For instance (made-up query to simplify the case):
query = """
SELECT...
WHERE
name = %(name)s AND
gender = %(gender)s
"""
parameters = {'name': name, 'gender': gender}
cursor.execute(query, parameters)
This properly escapes the parameters. Terrific.
Named Parameters in MySQLdb: how to use iterables?
Now I'd like to use a set, list or tuple to build queries with IN. Something like:
query = """
SELECT...
WHERE
gender = %(gender)s AND
name IN %(nameset)s
"""
I found a similar question here but that query doesn't use named parameters (the placeholder is named, but not the iterable).
What am I missing? Would someone know the magic syntax to make this work?
I see in the MySQLdb code that paramstyle is set to format rather than pyformat, but pyformat does work for single values.
To clarify,
I am not interested in an answer that just builds a string like ('sophie', 'jane', 'chloe') and concatenates it to the query. I need bound parameters to guarantee proper escaping.
I am also not interested in concatenating a join that uses db.escape_string(), although I may end up going that route if nothing else works.
What I'm really after is a clean idiom that binds named iterable parameters, if there is one.
Don't love answering my own question, but it's been a day...
Having looked inside the MySQLdb code, it looks like I won't get my wish. The quoting function will always add one set of quotes too many.
This is where I've ended up (the fallback option I had mentioned):
idset = ('Chloe', 'Zoe', "Noe';drip dautobus")
quoted_ids = [mdb.escape_string(identifier).decode('utf-8') for identifier in idset]
sql_idset = "('" + "', '".join(quoted_ids) + "')"
query = """
SELECT ...
FROM ...
JOIN ...
WHERE
someid = %(someid)s AND
namae IN """ + sql_idset
parameters = {'someid': someid}
cursor.execute(query, parameters)
Only one of the parameters is bound. The set in the IN clause is pre-quoted.
Not my favorite thing to have to do, but at least each value is run through the MySQLdb quoting function in order to quote any potentially harmful stuff.
The decode is there because escape_string prepares a byte string, but the query being built with the bound parameter someid is a string. Maybe there is an easier way. If you find one, let me know.
I don't know how to express it specifically in MySQLdb but it just works out of the box with the MySQL connector library (version: 1.4.4, MySQL 5.7.x) as suggested in this other answer:
cursor.execute('SELECT * FROM test WHERE id in %(l)s', {'l': (1,2,3)})
If MySQLdb messes it up then I suggest acquiring direct cursor somehow.
I'm trying to build a relatively complex query and would like to manipulate the where clause of the result directly, without cloning/subquerying the returned query. An example would look like:
session = sessionmaker(bind=engine)()
def generate_complex_query():
return select(
columns=[location.c.id.label('id')],
from_obj=location,
whereclause=location.c.id>50
).alias('a')
query = generate_complex_query()
# based on this query, I'd like to add additional where conditions, ideally like:
# `query.where(query.c.id<100)`
# but without subquerying the original query
# this is what I found so far, which is quite verbose and it doesn't solve the subquery problem
query = select(
columns=[query.c.id],
from_obj=query,
whereclause=query.c.id<100
)
# Another option I was considering was to map the query to a class:
# class Location(object):pass
# mapper(Location, query)
# session.query(Location).filter(Location.id<100)
# which looks more elegant, but also creates a subquery
result = session.execute(query)
for r in result:
print r
This is the generated query:
SELECT a.id
FROM (SELECT location.id AS id
FROM location
WHERE location.id > %(id_1)s) AS a
WHERE a.id < %(id_2)s
I would like to obtain:
SELECT location.id AS id
FROM location
WHERE id > %(id_1)s and
id < %(id_2)s
Is there any way to achieve this? The reason for this is that I think query (2) is slightly faster (not much), and the mapper example (2nd example above) which I have in place messes up the labels (id becomes anon_1_id or a.id if I name the alias).
Why don't you do it like this:
query = generate_complex_query()
query = query.where(location.c.id < 100)
Essentially you can refine any query like this. Additionally, I suggest reading the SQL Expression Language Tutorial which is pretty awesome and introduces all the techniques you need. The way you build a select is only one way. Usually, I build my queries more like this: select(column).where(expression).where(next_expression) and so on. The FROM is usually automatically inferred by SQLAlchemy from the context, i.e. you rarely need to specify it.
Since you don't have access to the internals of generate_complex_query try this:
query = query.where(query.c.id < 100)
This should work in your case I presume.
Another idea:
query = query.where(text("id < 100"))
This uses SQLAlchemy's text expression. This could work for you, however, and this is important: If you want to introduce variables, read the description of the API linked above, because just using format strings intead of bound parameters will open you up to SQL injection, something that normally is a no-brainer with SQLAlchemy but must be taken care of if working with such literal expressions.
Also note that this works because you label the column as id. If you don't do that and don't know the column name, then this won't work either.
I am trying to do a simple filter operation on a query in sqlalchemy, like this:
q = session.query(Genotypes).filter(Genotypes.rsid.in_(inall))
where
inall is a list of strings
Genotypes is mapped to a table:
class Genotypes(object):
pass
Genotypes.mapper = mapper(Genotypes, kg_table, properties={'rsid': getattr(kg_table.c, 'rs#')})
This seems pretty straightforward to me, but I get the following error when I execute the above query by doing q.first():
"sqlalchemy.exc.OperationalError: (OperationalError) too many SQL
variables u'SELECT" followed by a list of the 1M items in the inall
list. But they aren't supposed to be SQL variables, just a list whose
membership is the filtering criteria.
Am I doing the filtering incorrectly?
(the db is sqlite)
If the table where you are getting your rsids from is available in the same database I'd use a subquery to pass them into your Genotypes query rather than passing the one million entries around in your Python code.
sq = session.query(RSID_Source).subquery()
q = session.query(Genotypes).filter(Genotypes.rsid.in_(sq))
The issue is that in order to pass that list to SQLite (or any database, really), SQLAlchemy has to pass over each entry for your in clause as a variable. The SQL translates roughly to:
-- Not valid SQLite SQL
DECLARE #Param1 TEXT;
SET #Param1 = ?;
DECLARE #Param2 TEXT;
SET #Param2 = ?;
-- snip 999,998 more
SELECT field1, field2, -- etc.
FROM Genotypes G
WHERE G.rsid IN (#Param1, #Param2, /* snip */)
The below workaround worked for me:
q = session.query(Genotypes).filter(Genotypes.rsid.in_(inall))
query_as_string = str(q.statement.compile(compile_kwargs={"literal_binds": True}))
session.execute(query_as_string).first()
This basically forces the query to compile as a string before execution, which bypasses the whole variables issue. Some details on this are available in SQLAlchemy's docs here.
BTW, if you're not using SQLite you can make use of the ANY operator to pass the list object as a single parameter (see my answer to this question here).
What's the best way to make psycopg2 pass parameterized queries to PostgreSQL? I don't want to write my own escpaing mechanisms or adapters and the psycopg2 source code and examples are difficult to read in a web browser.
If I need to switch to something like PyGreSQL or another python pg adapter, that's fine with me. I just want simple parameterization.
psycopg2 follows the rules for DB-API 2.0 (set down in PEP-249). That means you can call execute method from your cursor object and use the pyformat binding style, and it will do the escaping for you. For example, the following should be safe (and work):
cursor.execute("SELECT * FROM student WHERE last_name = %(lname)s",
{"lname": "Robert'); DROP TABLE students;--"})
From the psycopg documentation
(http://initd.org/psycopg/docs/usage.html)
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
The correct way to pass variables in a SQL command is using the second argument of the execute() method:
SQL = "INSERT INTO authors (name) VALUES (%s);" # Note: no quotes
data = ("O'Reilly", )
cur.execute(SQL, data) # Note: no % operator
Here are a few examples you might find helpful
cursor.execute('SELECT * from table where id = %(some_id)d', {'some_id': 1234})
Or you can dynamically build your query based on a dict of field name, value:
query = 'INSERT INTO some_table (%s) VALUES (%s)'
cursor.execute(query, (my_dict.keys(), my_dict.values()))
Note: the fields must be defined in your code, not user input, otherwise you will be susceptible to SQL injection.
I love the official docs about this:
https://www.psycopg.org/psycopg3/docs/basic/params.html