Why is PyMySQL not vulnerable to SQL injection attacks? - python

I am new to PyMySQL and just tried to execute a query:
c.execute('''INSERT INTO mysql_test1 (
data,
duration,
audio,
comments
) VALUES (
?,
?,
?,
?
);
''', [
comments_var,
duration_var,
audio_var,
comments_var
]
);
However, it threw the following error:
TypeError: not all arguments converted during string formatting
I noticed that something must be wrong with my variables and read up on how to properly deal with them in PyMySQL, expecting methods for parameter substitution, but to my surprise I could not find anything. Instead, every thread I found used string operations (e.g. here, here, here and here (with a comment claiming that string operations would be standard with PyMySQL).
This is interesting to me because I have previously only dealt with SQLite where the DBAPI documentation explicitly warns to use string operations with variables:
SQL operations usually need to use values from Python variables. However, beware of using Python’s string operations to assemble queries, as they are vulnerable to SQL injection attacks.
The documentation exemplifies this with the following code snippet:
Never do this -- insecure!
symbol = 'RHAT'
cur.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)
Instead, use the DB-API’s parameter substitution.
When reading the PyMySQL docs, I could not find any mention of such dangers. It only confirmed my previous findings:
If args is a list or tuple, %s can be used as a placeholder in the query. If args is a dict, %(name)s can be used as a placeholder in the query.
Why is using string operations in sqlite3 considered vulnerable to SQL injection attacks and at the same time not questioned in pymysql?

It's a pity that the designers of pymysql chose to use %s as the parameter placeholder. It confuses many developers because that's the same as the %s used in string-formatting functions. But it's not doing the same thing in pymysql.
It's not just doing a simple string substitution. Pymysql will apply escaping to the values before interpolating them into the SQL query. This prevents special characters from changing the syntax of the SQL query.
In fact, you can get into trouble with pymysql too. The following is unsafe:
cur.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)
Because it interpolates the variable symbol into the string before passing it as an argument to execute(). The only argument is then a finished SQL string with the variable(s) formatted into it.
Whereas this is safe:
cur.execute("SELECT * FROM stocks WHERE symbol = %s", (symbol,))
Because it passes the list consisting of the symbol variable as a second argument. The code in the execute() function applies escaping to each element in the list, and interpolates the resulting value into the SQL query string. Note the %s is not delimited by single-quotes. The code of execute() takes care of that.

Related

Psycopg2 - Passing variable in the where clause

I am trying to run a SQL script in Python where I am passing a variable in the where clause as below:
cursor.execute(f"""select * from table where type = variable_value""")
In the above query, variable_value has the value that I am trying to use in the where clause. I am however getting an error psycopg2.errors.UndefinedColumn: column "variable_value" does not exist in table
As per psycopg2 documentation the execute function takes variables as an extra parameter.
cursor.execute("""select * from table where type = %(value)s """, {"value": variable_value})
More examples in psycopg2 user manual..
Also please read carefully the section about SQL injection - the gist is, you should not quote parameters in your query, the execute function will take care of that to prevent the injection of harmful SQL.
Also to explain the error you are getting - the query you're sending is comparing two identifiers (type and variable_value). The table does not contain variable_value column, hence the error.
I believe, you intended to use string interpolation to construct the query, but you forgot the {}. It would work like this:
cursor.execute(f"""select * from table where type = '{variable_value}'""")
⚠️ but because of previously mentioned SQL injection, it is not a recommended way!.

How to insert a serialized object in a postgresql DB with a pickle?

As suggested on the title I want to insert a xgboost object in my db. I'm using psycopg2 and postgresql.
I pickled the xgboost model with dumps in order to insert the serialized version.
query = "INSERT INTO reporting_ml.model (model) VALUES (%(model)s)"
cursor_dev.execute(query % {"model": pickle.dumps(model)})
That's what I get:
syntax error at or near "\"
LINE 2: ...
\x04\x00\x00\x00\x13\x00\x00\x00\x14\x00\x00\x00\'\x00\x00\x...
Some people asked to explain #Tammem Sa's solution.
By looking Psycopg's official documentation:
Warning Never, never, NEVER use Python string concatenation (+) or
string parameters interpolation (%) to pass variables to a SQL query
string. Not even at gunpoint.
We see that this is formation issue.
They specifically quote:
Because of the difference, sometime subtle, between the data types
representations, a naïve approach to query strings composition, such
as using Python strings concatenation, is a recipe for terrible
problems:
Followed by an example:
>>> SQL = "INSERT INTO authors (name) VALUES ('%s');" # NEVER DO THIS
>>> data = ("O'Reilly", )
>>> cur.execute(SQL % data) # THIS WILL FAIL MISERABLY
ProgrammingError: syntax error at or near "Reilly"
LINE 1: INSERT INTO authors (name) VALUES ('O'Reilly')
^
Per documentation, the correct (and one and only) way to pass variables in SQL command is using second argument of the execute() method:
>>> SQL = "INSERT INTO authors (name) VALUES (%s);" # Note: no quotes
>>> data = ("O'Reilly", )
>>> cur.execute(SQL, data) # Note: no % operator
My additional two cents:
For tensors or vectors, preferable data type for column is Bytea.

pyodbc execute SQL code

I am trying to use pyodbc cursor execute the right way to prevent injection attacks, as suggested here:
what does ? mean in python pyodbc module
My code is as follows:
query = """\
SELECT
?,count(*)
FROM
?
WHERE
?=?
""", ('date', 'myTable', 'date', '2017-05-08')
cursor.execute(query)
And I get an error:
TypeError: The first argument to execute must be a string or unicode query.
For the right answer I'd want to:
Keep the question mark format to avoid SQL injection attacks
Keep the triple quotes format so I can write long SQL queries and not loose code readability.
Is there a way to achieve this? I know I could use """ %s """ %('table') format type but that defeats the purpose of this question.
You have 2 issues:
query is a tuple. The way to execute a parameterized query is as
follows:
query = """SELECT ?,count(*)
FROM ?
WHERE ?=? """
args = ('date', 'myTable', 'date', '2017-05-08')
cursor.execute(query, args)
You could pass query with *. This would expand query to a string and a tuple which is what execute expects:
cursor.execute(*query) # 'query' here is defined as it is in your example
But, that won't work. You can not use parameterized query to use parameters in the select and from clauses. You can also not use parameters for the column name in the where clause.
You (usually) don't have to worry about SQL injection if the value isn't inputted by the user (or if the user can't change it in anyway).

Python + Sqlite 3. How to construct queries?

I'm trying to create a python script that constructs valid sqlite queries. I want to avoid SQL Injection, so I cannot use '%s'. I've found how to execute queries, cursor.execute('sql ?', (param)), but I want how to get the parsed sql param. It's not a problem if I have to execute the query first in order to obtain the last query executed.
If you're trying to transmit changes to the database to another computer, why do they have to be expressed as SQL strings? Why not pickle the query string and the parameters as a tuple, and have the other machine also use SQLite parameterization to query its database?
If you're not after just parameter substitution, but full construction of the SQL, you have to do that using string operations on your end. The ? replacement always just stands for a value. Internally, the SQL string is compiled to SQLite's own bytecode (you can find out what it generates with EXPLAIN thesql) and ? replacements are done by just storing the value at the correct place in the value stack; varying the query structurally would require different bytecode, so just replacing a value wouldn't be enough.
Yes, this does mean you have to be ultra-careful. If you don't want to allow updates, try opening the DB connection in read-only mode.
Use the DB-API’s parameter substitution. Put ? as a placeholder wherever you want to use a value, and then provide a tuple of values as the second argument to the cursor’s execute() method.
# Never do this -- insecure!
symbol = 'hello'
c.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)
# Do this instead
t = (symbol,)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
print c.fetchone()
More reference is in the manual.
I want how to get the parsed 'sql param'.
It's all open source so you have full access to the code doing the parsing / sanitization. Why not just reading this code and find out how it works and if there's some (possibly undocumented) implementation that you can reuse ?

Parameterized queries with psycopg2 / Python DB-API and PostgreSQL

What's the best way to make psycopg2 pass parameterized queries to PostgreSQL? I don't want to write my own escpaing mechanisms or adapters and the psycopg2 source code and examples are difficult to read in a web browser.
If I need to switch to something like PyGreSQL or another python pg adapter, that's fine with me. I just want simple parameterization.
psycopg2 follows the rules for DB-API 2.0 (set down in PEP-249). That means you can call execute method from your cursor object and use the pyformat binding style, and it will do the escaping for you. For example, the following should be safe (and work):
cursor.execute("SELECT * FROM student WHERE last_name = %(lname)s",
{"lname": "Robert'); DROP TABLE students;--"})
From the psycopg documentation
(http://initd.org/psycopg/docs/usage.html)
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
The correct way to pass variables in a SQL command is using the second argument of the execute() method:
SQL = "INSERT INTO authors (name) VALUES (%s);" # Note: no quotes
data = ("O'Reilly", )
cur.execute(SQL, data) # Note: no % operator
Here are a few examples you might find helpful
cursor.execute('SELECT * from table where id = %(some_id)d', {'some_id': 1234})
Or you can dynamically build your query based on a dict of field name, value:
query = 'INSERT INTO some_table (%s) VALUES (%s)'
cursor.execute(query, (my_dict.keys(), my_dict.values()))
Note: the fields must be defined in your code, not user input, otherwise you will be susceptible to SQL injection.
I love the official docs about this:
https://www.psycopg.org/psycopg3/docs/basic/params.html

Categories