I am trying to use pyodbc cursor execute the right way to prevent injection attacks, as suggested here:
what does ? mean in python pyodbc module
My code is as follows:
query = """\
SELECT
?,count(*)
FROM
?
WHERE
?=?
""", ('date', 'myTable', 'date', '2017-05-08')
cursor.execute(query)
And I get an error:
TypeError: The first argument to execute must be a string or unicode query.
For the right answer I'd want to:
Keep the question mark format to avoid SQL injection attacks
Keep the triple quotes format so I can write long SQL queries and not loose code readability.
Is there a way to achieve this? I know I could use """ %s """ %('table') format type but that defeats the purpose of this question.
You have 2 issues:
query is a tuple. The way to execute a parameterized query is as
follows:
query = """SELECT ?,count(*)
FROM ?
WHERE ?=? """
args = ('date', 'myTable', 'date', '2017-05-08')
cursor.execute(query, args)
You could pass query with *. This would expand query to a string and a tuple which is what execute expects:
cursor.execute(*query) # 'query' here is defined as it is in your example
But, that won't work. You can not use parameterized query to use parameters in the select and from clauses. You can also not use parameters for the column name in the where clause.
You (usually) don't have to worry about SQL injection if the value isn't inputted by the user (or if the user can't change it in anyway).
Related
I have a function that executes many SQL queries with different dates.
What I want is to pass all dates and other query variables as function parameters and then just execute the function. I have figured out how to do this for datetime variables as below. But I also have a query that looks at specific campaign_names in a database and pulls those as strings. I want to be able to pass those strings as function parameters but I haven't figured out the correct syntax for this in the SQL query.
def Camp_eval(start_date,end_1M,camp1,camp2,camp3):
query1 = f"""SELECT CONTACT_NUMBER, OUTCOME_DATE
FROM DATABASE1
where OUTCOME_DATE >= (to_date('{start_date}', 'dd/mm/yyyy'))
and OUTCOME_DATE < (to_date('{end_1M}', 'dd/mm/yyyy'))"""
query2 = """SELECT CONTACT_NUMBER
FROM DATABASE2
WHERE (CAMP_NAME = {camp1} or
CAMP_NAME = {camp2} or
CAMP_NAME = {camp3})"""
Camp_eval('01/04/2022','01/05/2022','Camp_2022_04','Camp_2022_05','Camp_2022_06')
The parameters start_date and end_1M work fine with the {} brackets but the camp variables, which are strings don't return any results even though there are results in the database with those conditions if I were to write them directly in the query.
Any help would be appreciated!!
Please, do not use f-strings for creating SQL queries!
Most likely, any library you use for accessing a database already has a way of creating queries: SQLite docs (check code examples).
Another example: cur.execute("SELECT * FROM tasks WHERE priority = ?", (priority,)).
Not only this way is safer (fixes SQL Injection problem mentioned by #d-malan in comments), but it also eliminates the need to care about how data is represented in SQL - the library will automatically cast dates, strings, etc. in what they need to be casted into. Therefore, your problem can be fixed by using proper instruments.
I am new to PyMySQL and just tried to execute a query:
c.execute('''INSERT INTO mysql_test1 (
data,
duration,
audio,
comments
) VALUES (
?,
?,
?,
?
);
''', [
comments_var,
duration_var,
audio_var,
comments_var
]
);
However, it threw the following error:
TypeError: not all arguments converted during string formatting
I noticed that something must be wrong with my variables and read up on how to properly deal with them in PyMySQL, expecting methods for parameter substitution, but to my surprise I could not find anything. Instead, every thread I found used string operations (e.g. here, here, here and here (with a comment claiming that string operations would be standard with PyMySQL).
This is interesting to me because I have previously only dealt with SQLite where the DBAPI documentation explicitly warns to use string operations with variables:
SQL operations usually need to use values from Python variables. However, beware of using Python’s string operations to assemble queries, as they are vulnerable to SQL injection attacks.
The documentation exemplifies this with the following code snippet:
Never do this -- insecure!
symbol = 'RHAT'
cur.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)
Instead, use the DB-API’s parameter substitution.
When reading the PyMySQL docs, I could not find any mention of such dangers. It only confirmed my previous findings:
If args is a list or tuple, %s can be used as a placeholder in the query. If args is a dict, %(name)s can be used as a placeholder in the query.
Why is using string operations in sqlite3 considered vulnerable to SQL injection attacks and at the same time not questioned in pymysql?
It's a pity that the designers of pymysql chose to use %s as the parameter placeholder. It confuses many developers because that's the same as the %s used in string-formatting functions. But it's not doing the same thing in pymysql.
It's not just doing a simple string substitution. Pymysql will apply escaping to the values before interpolating them into the SQL query. This prevents special characters from changing the syntax of the SQL query.
In fact, you can get into trouble with pymysql too. The following is unsafe:
cur.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)
Because it interpolates the variable symbol into the string before passing it as an argument to execute(). The only argument is then a finished SQL string with the variable(s) formatted into it.
Whereas this is safe:
cur.execute("SELECT * FROM stocks WHERE symbol = %s", (symbol,))
Because it passes the list consisting of the symbol variable as a second argument. The code in the execute() function applies escaping to each element in the list, and interpolates the resulting value into the SQL query string. Note the %s is not delimited by single-quotes. The code of execute() takes care of that.
I am trying to run a SQL script in Python where I am passing a variable in the where clause as below:
cursor.execute(f"""select * from table where type = variable_value""")
In the above query, variable_value has the value that I am trying to use in the where clause. I am however getting an error psycopg2.errors.UndefinedColumn: column "variable_value" does not exist in table
As per psycopg2 documentation the execute function takes variables as an extra parameter.
cursor.execute("""select * from table where type = %(value)s """, {"value": variable_value})
More examples in psycopg2 user manual..
Also please read carefully the section about SQL injection - the gist is, you should not quote parameters in your query, the execute function will take care of that to prevent the injection of harmful SQL.
Also to explain the error you are getting - the query you're sending is comparing two identifiers (type and variable_value). The table does not contain variable_value column, hence the error.
I believe, you intended to use string interpolation to construct the query, but you forgot the {}. It would work like this:
cursor.execute(f"""select * from table where type = '{variable_value}'""")
⚠️ but because of previously mentioned SQL injection, it is not a recommended way!.
I am aware that queries in Python can be parameterized using either ? or %s in execute query here or here
However I have some long query that would use some constant variable defined at the beginning of the query
Set #my_const = 'xyz';
select #my_const;
-- Query that use #my_const 40 times
select ... coalesce(field1, #my_const), case(.. then #my_const)...
I would like to do the least modif possible to the query from Mysql. So that instead of modifying the query to
pd.read_sql(select ... coalesce(field1, %s), case(.. then %s)... , [my_const, my_const, my_const, ..]
,I could write something along the line of the initial query. Upon trying the following, however, I am getting a TypeError: 'NoneType' object is not iterable
query_str = "Set #null_val = \'\'; "\
" select #null_val"
erpur_df = pd.read_sql(query_str, con = db)
Any idea how to use the original variable defined in Mysql query ?
The reason
query_str = "Set #null_val = \'\'; "\
" select #null_val"
erpur_df = pd.read_sql(query_str, con = db)
throws that exception is because all you are doing is setting null_value to '' and then selecting that '' - what exactly would you have expected that to give you? EDIT read_sql only seems to execute one query at a time, and as the first query returns no rows it results in that exception.
If you split them in to two calls to read_sql then it will in fact return you the value of your #null value in the second call. Due to this behaviour read_sql is clearly not a good way to do this. I strongly suggest you use one of my suggestions below.
Why are you wanting to set the variable in the SQL using '#' anyway?
You could try using the .format style of string formatting.
Like so:
query_str = "select ... coalesce(field1, {c}), case(.. then {c})...".format(c=my_const)
pd.read_sql(query_str)
Just remember that if you do it this way and your my_const is a user input then you will need to sanitize it manually to prevent SQL injection.
Another possibility is using a dict of params like so:
query_str = "select ... coalesce(field1, %(my_const)s, case(.. then %(my_const)s)..."
pd.read_sql(query_str, params={'my_const': const_value})
However this is dependent on which database driver you use.
From the pandas.read_sql docs:
Check your database driver documentation for which of the five syntax
styles, described in PEP 249’s paramstyle, is supported. Eg. for
psycopg2, uses %(name)s so use params={‘name’ : ‘value’}
What's the best way to make psycopg2 pass parameterized queries to PostgreSQL? I don't want to write my own escpaing mechanisms or adapters and the psycopg2 source code and examples are difficult to read in a web browser.
If I need to switch to something like PyGreSQL or another python pg adapter, that's fine with me. I just want simple parameterization.
psycopg2 follows the rules for DB-API 2.0 (set down in PEP-249). That means you can call execute method from your cursor object and use the pyformat binding style, and it will do the escaping for you. For example, the following should be safe (and work):
cursor.execute("SELECT * FROM student WHERE last_name = %(lname)s",
{"lname": "Robert'); DROP TABLE students;--"})
From the psycopg documentation
(http://initd.org/psycopg/docs/usage.html)
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
The correct way to pass variables in a SQL command is using the second argument of the execute() method:
SQL = "INSERT INTO authors (name) VALUES (%s);" # Note: no quotes
data = ("O'Reilly", )
cur.execute(SQL, data) # Note: no % operator
Here are a few examples you might find helpful
cursor.execute('SELECT * from table where id = %(some_id)d', {'some_id': 1234})
Or you can dynamically build your query based on a dict of field name, value:
query = 'INSERT INTO some_table (%s) VALUES (%s)'
cursor.execute(query, (my_dict.keys(), my_dict.values()))
Note: the fields must be defined in your code, not user input, otherwise you will be susceptible to SQL injection.
I love the official docs about this:
https://www.psycopg.org/psycopg3/docs/basic/params.html