To distinguish variable name from DB field name in Python - python

Code
import psycopg2
import psycopg2.extras
def store_values_to_pg94(file_size, connection):
cursor = connection.cursor()
cursor.execute(
INSERT INTO measurements
(file_size)
VALUES (file_size)
where the field name of the column in PostgreSQL 9.4 is file_size but the variable name is also file_size in Python.
How can you avoid this kind of conflict?

The code from your example does not compile, i.e. it is not valid python. cursor.execute() accepts a string and an optional set of arguments that are expanded into the string according to the database module's paramstyle.
In fact, when using the DBAPI, your problem does not even surface, because SQL syntax and python syntax are completely separate.
If you are using psycopg2 to access PostgreSQL, paramstyle is pyformat, hence
cursor.execute(
'INSERT INTO measurements (file_size) VALUES (%(file_size)s);',
file_size=file_size # pass variable file_size as named parameter file_size
);
As you can see, you specify the SQL query and leave placeholders for the values that you want to pass in from python code.
If paramstyle == 'pyformat', it is also possible to use positional parameters:
cursor.execute(
'INSERT INTO measurements (file_size) VALUES (%s);',
file_size # pass variable file_size as positional parameter
);
Note that different database access modules implement different paramstyle's and need different parameter placeholders (or a suitable wrapper).
All parameter substitution implementations are expected to properly escape their arguments according to the rules of the database backend, which is a nice and important feature as you don't have to manually escape your data.

Related

How to avoid SQL Injection in Python for Upsert Query to SQL Server?

I have a sql query I'm executing that I'm passing variables into. In the current context I'm passing the parameter values in as f strings, but this query is vulnerable to sql injection. I know there is a method to use a stored procedure and restrict permissions on the user executing the query. But is there a way to avoid having to go the stored procedure route and perhaps modify this function to be secure against SQL Injection?
I have the below query created to execute within a python app.
def sql_gen(tv, kv, join_kv, col_inst, val_inst, val_upd):
sqlstmt = f"""
IF NOT EXISTS (
SELECT *
FROM {tv}
WHERE {kv} = {join_kv}
)
INSERT {tv} (
{col_inst}
)
VALUES (
{val_inst}
)
ELSE
UPDATE {tv}
SET {val_upd}
WHERE {kv} = {join_kv};
"""
engine = create_engine(f"mssql+pymssql://{username}:{password}#{server}/{database}")
connection = engine.raw_connection()
cursor = connection.cursor()
cursor.execute(sqlstmt)
connection.commit()
cursor.close()
Fortunately, most database connectors have query parameters in which you pass the variable instead of giving in the string inside the query yourself for the risks you mentioned.
You can read more on this here: https://realpython.com/prevent-python-sql-injection/#understanding-python-sql-injection
Example:
# Vulnerable
cursor.execute("SELECT admin FROM users WHERE username = '" + username + '");
# Safe
cursor.execute("SELECT admin FROM users WHERE username = %s'", (username, ));
As Amanzer mentions correctly in his reply Python has mechanisms to pass parameters safely.
However, there are other elements in your query (table names and column names) that are not supported as parameters (bind variables) because JDBC does not support those.
If these are from an untrusted source (or may be in the future) you should be sure you validate these elements. This is a good coding practice to do even if you are sure.
There are some options to do this safely:
You should limit your tables and columns based on positive validation - make sure that the only values allowed are the ones that are authorized
If that's not possible (because these are user created?):
You should make sure tables or column names limit the
names to use a "safe" set of characters (alphanumeric & dashes,
underscores...)
You should enquote the table names / column names -
adding double quotes around the objects. If you do this, you need to
be careful to validate there are no quotes in the name, and error out
or escape the quotes. You also need to be aware that adding quotes
will make the name case sensitive.

Why is PyMySQL not vulnerable to SQL injection attacks?

I am new to PyMySQL and just tried to execute a query:
c.execute('''INSERT INTO mysql_test1 (
data,
duration,
audio,
comments
) VALUES (
?,
?,
?,
?
);
''', [
comments_var,
duration_var,
audio_var,
comments_var
]
);
However, it threw the following error:
TypeError: not all arguments converted during string formatting
I noticed that something must be wrong with my variables and read up on how to properly deal with them in PyMySQL, expecting methods for parameter substitution, but to my surprise I could not find anything. Instead, every thread I found used string operations (e.g. here, here, here and here (with a comment claiming that string operations would be standard with PyMySQL).
This is interesting to me because I have previously only dealt with SQLite where the DBAPI documentation explicitly warns to use string operations with variables:
SQL operations usually need to use values from Python variables. However, beware of using Python’s string operations to assemble queries, as they are vulnerable to SQL injection attacks.
The documentation exemplifies this with the following code snippet:
Never do this -- insecure!
symbol = 'RHAT'
cur.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)
Instead, use the DB-API’s parameter substitution.
When reading the PyMySQL docs, I could not find any mention of such dangers. It only confirmed my previous findings:
If args is a list or tuple, %s can be used as a placeholder in the query. If args is a dict, %(name)s can be used as a placeholder in the query.
Why is using string operations in sqlite3 considered vulnerable to SQL injection attacks and at the same time not questioned in pymysql?
It's a pity that the designers of pymysql chose to use %s as the parameter placeholder. It confuses many developers because that's the same as the %s used in string-formatting functions. But it's not doing the same thing in pymysql.
It's not just doing a simple string substitution. Pymysql will apply escaping to the values before interpolating them into the SQL query. This prevents special characters from changing the syntax of the SQL query.
In fact, you can get into trouble with pymysql too. The following is unsafe:
cur.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)
Because it interpolates the variable symbol into the string before passing it as an argument to execute(). The only argument is then a finished SQL string with the variable(s) formatted into it.
Whereas this is safe:
cur.execute("SELECT * FROM stocks WHERE symbol = %s", (symbol,))
Because it passes the list consisting of the symbol variable as a second argument. The code in the execute() function applies escaping to each element in the list, and interpolates the resulting value into the SQL query string. Note the %s is not delimited by single-quotes. The code of execute() takes care of that.

sqlite3.OperationalError: near "index": syntax error

I am trying to connect to a database with python using sqlite3 module and i gets an error - sqlite3.OperationalError: near "index": syntax error
I searched some solutions for this but i did not got the solution. I
am new to sqlite3
def insert_into_db(url, title, description, keywords):
con = sqlite3.connect('index.db')
c = con.cursor()
create = r'''CREATE TABLE IF NOT EXISTS index (id INTEGER NOT NULL AUTO_INCREMENT,url VARCHAR,description TEXT,keywords TEXT);INSERT INTO index(url, title, description, keywords)VALUES('{}','{}',{}','{}');'''.format(url, title,description, keywords)
c.execute(create)
con.commit()
con.close()
help me to get rid of this error :(
INDEX is a keyword in SQLite3. Thus, it'll be parsed as a keyword. There are several ways around this, though.
According to the documentation, you could use backticks or quote marks to specify it as a table name. For example,
CREATE TABLE IF NOT EXISTS `index` ...
or
CREATE TABLE IF NOT EXISTS "index" ...
may work.
You can pass arguments to your sql statement from the execute() command. Thus,
create = r'''CREATE TABLE ... VALUES(?,?,?,?);''' # use ? for placeholders
c.execute(create, (url, title, description, keywords)) # pass args as tuple
This is more secure compared to formatting your arguments directly with Python.
Note also that SQLite's syntax for autoinc is AUTOINCREMENT without the underscore and they require the field to also be an INTEGER PRIMARY KEY.
You can not name a table index. INDEX is a reserved keyword.
The documentation states:
The SQL standard specifies a large number of keywords which may not be used as the names of tables, indices, columns, databases, user-defined functions, collations, virtual table modules, or any other named object.

psycopg2 difference between AsIs and sql module

To choose dynamically a table name in a query I used to use AsIs() from psycopg2.extensions ( http://initd.org/psycopg/docs/extensions.html#psycopg2.extensions.AsIs ), with the following syntax:
cur.execute("SELECT * FROM %s WHERE id = %s;", (AsIs('table_name'), id))
However, the documentation now recommends to use the new psycopg2.sql module available in version 2.7 ( http://initd.org/psycopg/docs/sql.html#module-psycopg2.sql ) with the following syntax:
from psycopg2 import sql
cur.execute(
sql.SQL("SELECT * FROM {} WHERE id = %s;")
.format(sql.Identifier('table_name')), (id, )
What's the difference between those two options besides the fact that objects exposed by the sql module can be passed directly to execute()?
AsIs is... as it is. It won't perform any escape of the table name, if it contains characters need quoting. The objects in the sql module instead know what is an identifier.
More subtly, AsIs is for parameter values only: if currently works is mostly an implementation accident and in the future the behaviour may change. Query values should not be used to represent variable parts of the query, such as table or field names.

Parameterized queries with psycopg2 / Python DB-API and PostgreSQL

What's the best way to make psycopg2 pass parameterized queries to PostgreSQL? I don't want to write my own escpaing mechanisms or adapters and the psycopg2 source code and examples are difficult to read in a web browser.
If I need to switch to something like PyGreSQL or another python pg adapter, that's fine with me. I just want simple parameterization.
psycopg2 follows the rules for DB-API 2.0 (set down in PEP-249). That means you can call execute method from your cursor object and use the pyformat binding style, and it will do the escaping for you. For example, the following should be safe (and work):
cursor.execute("SELECT * FROM student WHERE last_name = %(lname)s",
{"lname": "Robert'); DROP TABLE students;--"})
From the psycopg documentation
(http://initd.org/psycopg/docs/usage.html)
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
The correct way to pass variables in a SQL command is using the second argument of the execute() method:
SQL = "INSERT INTO authors (name) VALUES (%s);" # Note: no quotes
data = ("O'Reilly", )
cur.execute(SQL, data) # Note: no % operator
Here are a few examples you might find helpful
cursor.execute('SELECT * from table where id = %(some_id)d', {'some_id': 1234})
Or you can dynamically build your query based on a dict of field name, value:
query = 'INSERT INTO some_table (%s) VALUES (%s)'
cursor.execute(query, (my_dict.keys(), my_dict.values()))
Note: the fields must be defined in your code, not user input, otherwise you will be susceptible to SQL injection.
I love the official docs about this:
https://www.psycopg.org/psycopg3/docs/basic/params.html

Categories