How do I fix this error on passing a string parameter into pd.read_sql_query? Using Python 2.7.
table = 'mytable1'
query = (
"SELECT * "
"FROM ? "
)
df = pd.read_sql_query(sql=query, con=conn, params=(table))
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT * FROM ? ': near "?": syntax error
I have tried replacing ? with % and %s but it returns the same error.
The following equality example works as expected:
query = (
"SELECT * "
"FROM mytable1 "
"WHERE name = ? "
)
df = pd.read_sql_query(sql=query, con=conn, params=('cat',))
Note that the comma in params appears to be required, otherwise this error is returned: Incorrect number of bindings supplied. The current statement uses 1, and there are 7 supplied.
I have also tried params=(table,) in my problem but no luck. I know that alternatively I can do this with "FROM '{t}' ").format(t=table), but I would like to understand how to use Pandas' built-in parameters option.
The problem is that params is intended to replace values, not SQL keywords (FROM, SELECT, etc...) or tables or columns.
You can't specify table that way, you have to use a string substitution.
query = (
f"SELECT * "
"FROM {table} "
)
However, be very, very, very careful. Doing this, rather than using params opens you up to a very big family of vulnerabilities, SQL Injections.
Don't do it if you get the table names from external sources.
(Oh, and the likely reason for this limitation is that, besides their absolute necessity for security reasons, parametrized queries allow the database engine to examine the query, look at indices, statistics and all that and draw up an "execution plan". Successive calls then can reuse that execution plan and just substitute in the new variables. That could not be done if the query was changing in terms of what tables and columns were being accessed.)
Related
I am trying to run a SQL script in Python where I am passing a variable in the where clause as below:
cursor.execute(f"""select * from table where type = variable_value""")
In the above query, variable_value has the value that I am trying to use in the where clause. I am however getting an error psycopg2.errors.UndefinedColumn: column "variable_value" does not exist in table
As per psycopg2 documentation the execute function takes variables as an extra parameter.
cursor.execute("""select * from table where type = %(value)s """, {"value": variable_value})
More examples in psycopg2 user manual..
Also please read carefully the section about SQL injection - the gist is, you should not quote parameters in your query, the execute function will take care of that to prevent the injection of harmful SQL.
Also to explain the error you are getting - the query you're sending is comparing two identifiers (type and variable_value). The table does not contain variable_value column, hence the error.
I believe, you intended to use string interpolation to construct the query, but you forgot the {}. It would work like this:
cursor.execute(f"""select * from table where type = '{variable_value}'""")
⚠️ but because of previously mentioned SQL injection, it is not a recommended way!.
I'm working on a bit of python code to run a query against a redshift (postgres) SQL database, and I'm running into an issue where I can't strip off the surrounding single quotes from a variable I'm passing to the query. I'm trying to drop a number of tables from a list. This is the basics of my code:
def func(table_list):
drop_query = 'drop table if exists %s' #loaded from file
table_name = table_list[0] #table_name = 'my_db.my_table'
con=psycopg2.connect(dbname=DB, host=HOST, port=PORT, user=USER, password=PASS)
cur=con.cursor()
cur.execute(drop_query, (table_name, )) #this line is giving me trouble
#cleanup statements for the connection
table_list = ['my_db.my_table']
when func() gets called, I am given the following error:
syntax error at or near "'my_db.my_table'"
LINE 1: drop table if exists 'my_db.my_table...
^
Is there a way I can remove the surrounding single quotes from my list item?
for the time being, I've done it (what think is) the wrong way and used string concatenation, but know this is basically begging for SQL-injection.
This is not how psycopg2 works. You are using a string operator %s to replace with a string. The reason for this is to tokenize your string safely to avoid SQL injection, psycopg2 handles the rest.
You need to modify the query before it gets to the execute statement.
drop_query = 'drop table if exists {}'.format(table_name)
I warn you however, do not allow these table names to be create by outside sources, or you risk SQL injection.
However a new version of PSYCOPG2 kind of allows something similar
http://initd.org/psycopg/docs/sql.html#module-psycopg2.sql
from psycopg2 import sql
cur.execute(
sql.SQL("insert into {} values (%s, %s)").format(sql.Identifier('my_table')),[10, 20]
)
I am aware that queries in Python can be parameterized using either ? or %s in execute query here or here
However I have some long query that would use some constant variable defined at the beginning of the query
Set #my_const = 'xyz';
select #my_const;
-- Query that use #my_const 40 times
select ... coalesce(field1, #my_const), case(.. then #my_const)...
I would like to do the least modif possible to the query from Mysql. So that instead of modifying the query to
pd.read_sql(select ... coalesce(field1, %s), case(.. then %s)... , [my_const, my_const, my_const, ..]
,I could write something along the line of the initial query. Upon trying the following, however, I am getting a TypeError: 'NoneType' object is not iterable
query_str = "Set #null_val = \'\'; "\
" select #null_val"
erpur_df = pd.read_sql(query_str, con = db)
Any idea how to use the original variable defined in Mysql query ?
The reason
query_str = "Set #null_val = \'\'; "\
" select #null_val"
erpur_df = pd.read_sql(query_str, con = db)
throws that exception is because all you are doing is setting null_value to '' and then selecting that '' - what exactly would you have expected that to give you? EDIT read_sql only seems to execute one query at a time, and as the first query returns no rows it results in that exception.
If you split them in to two calls to read_sql then it will in fact return you the value of your #null value in the second call. Due to this behaviour read_sql is clearly not a good way to do this. I strongly suggest you use one of my suggestions below.
Why are you wanting to set the variable in the SQL using '#' anyway?
You could try using the .format style of string formatting.
Like so:
query_str = "select ... coalesce(field1, {c}), case(.. then {c})...".format(c=my_const)
pd.read_sql(query_str)
Just remember that if you do it this way and your my_const is a user input then you will need to sanitize it manually to prevent SQL injection.
Another possibility is using a dict of params like so:
query_str = "select ... coalesce(field1, %(my_const)s, case(.. then %(my_const)s)..."
pd.read_sql(query_str, params={'my_const': const_value})
However this is dependent on which database driver you use.
From the pandas.read_sql docs:
Check your database driver documentation for which of the five syntax
styles, described in PEP 249’s paramstyle, is supported. Eg. for
psycopg2, uses %(name)s so use params={‘name’ : ‘value’}
I have two queries in SQL which are the following:
q1 = select date_hour from table
And, the second query is:
q2 = select date(date_hour) from table
The only difference between these queries is the string date_hour and date(date_hour). SO, I tried parameterising my query in the following manner:
q1 = select %s from table
cur.execute(q1,'date')
cur.execute(q1,'date(date_hour)')
However, this throws an error which is:
not all arguments converted during string formatting
Why am I getting this error? How can I fix it?
Change the comma in cur.execute to %
Change this:
q1 = "select %s from table"
cur.execute(q1,'date')
cur.execute(q1,'date(date_hour)')
to:
q1 = "select %s from table"
cur.execute(q1 % 'date')
cur.execute(q1 % 'date(date_hour)')
It's unclear wich sql library you're using but assuming it uses the Python DB API:
Sql parameters are typically used for values, not columns names (while this is possible using stored procedures).
It seems you're confused between string formatting in python and sql parametized queries.
While %s can be used to format a string (see formatting strings) this is not the way to set sql parameters.
See this response to use sql parameters in python.
By the way i can't see anything wrong with this simple code:
cursor=cnx.curor()
query="select date_hour from table"
cursor.execute(query)
query="select date(date_hour) from table"
cursor.execute(query)
Change your code to something like this:
q1 = "select %s from table"
cur.execute(q1,['date'])
cur.execute(q1,['date(date_hour)'])
Check this
What's the best way to make psycopg2 pass parameterized queries to PostgreSQL? I don't want to write my own escpaing mechanisms or adapters and the psycopg2 source code and examples are difficult to read in a web browser.
If I need to switch to something like PyGreSQL or another python pg adapter, that's fine with me. I just want simple parameterization.
psycopg2 follows the rules for DB-API 2.0 (set down in PEP-249). That means you can call execute method from your cursor object and use the pyformat binding style, and it will do the escaping for you. For example, the following should be safe (and work):
cursor.execute("SELECT * FROM student WHERE last_name = %(lname)s",
{"lname": "Robert'); DROP TABLE students;--"})
From the psycopg documentation
(http://initd.org/psycopg/docs/usage.html)
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
The correct way to pass variables in a SQL command is using the second argument of the execute() method:
SQL = "INSERT INTO authors (name) VALUES (%s);" # Note: no quotes
data = ("O'Reilly", )
cur.execute(SQL, data) # Note: no % operator
Here are a few examples you might find helpful
cursor.execute('SELECT * from table where id = %(some_id)d', {'some_id': 1234})
Or you can dynamically build your query based on a dict of field name, value:
query = 'INSERT INTO some_table (%s) VALUES (%s)'
cursor.execute(query, (my_dict.keys(), my_dict.values()))
Note: the fields must be defined in your code, not user input, otherwise you will be susceptible to SQL injection.
I love the official docs about this:
https://www.psycopg.org/psycopg3/docs/basic/params.html