I'm fairly new to Python but have a project I am working on so please excuse any nativity on my part.
I am writing some SQL statements in Python 2.7 (Libraries not upgraded to 3 yet) but I am getting stuck on best practice procedure for them. We are using Sybase. Initially I was using
query = "UPDATE DB..TABLE SET version = '{}' WHERE name = '{}'".format(app.version, app.name)
cursor.execute(query)
But realised this after further reading that it is open to injection. So I then looked at doing the following:
query = "UPDATE DB..TABLE SET version = '%s' WHERE name = '%s'" % (app.version, app.name)
cursor.execute(query)
But got me to thinking is this not the same thing?
The parameters are also variables set by argparse, which means I have to use the '' around %s otherwise it throws up the invalid column name error. Which is frustrating for me as I also want to be able to pass NULL (None in Python) by default if any additional flags aren't set in other queries, otherwise it obviously inserts "NULL" as string.
For this particular example the 2 variables are set from a file being read by ConfigParser but I think it's still the same for argparse variables. e.g.
[SECTION]
application=name
version=1.0
I'm not quite sure how to best tackle this issue and yes yes I know "PYTHON 3 IS BETTER UPGRADE TO IT", as I said at the start, the libraries are in the process of being ported.
If you need any additional info then please advise and I will give you the best I can.
UPDATE***
Using the following Param style string I found in some sybase docs it can work but it does not pass None for NULL and throws up errors, starting to think this is a limitation of the sybase module in python.
cursor.execute("SELECT * FROM DB..table where app_name=#app_name", {"#app_name": app_name})
or
params = {"#appname": app.name. "#appver": app.version}
sql = "INSERT INTO DB..table (app_name, app_version) VALUES (#appname, #appversion)
cursor.execute(sql, params)
There is an issue though if you have a global list of params and feed that to a query that if any are None then it gives you a lot of errors again about being None, EVEN if those specific params aren't used in the query. Think I may be stuck doing IF statements for various options here for multiple inserts to bypass this None NULL issue.
Ok I have resolved this issue now.
I had to update the sybase module in order to get it to work with None > NULL.
As posted in the updated question. the below is how I was running the queries.
cursor.execute("SELECT * FROM DB..table where app_name=#app_name", {"#app_name": app_name})
or
params = {"#appname": app.name. "#appver": app.version}
sql = "INSERT INTO DB..table (app_name, app_version) VALUES (#appname, #appversion)
cursor.execute(sql, params)
But got me to thinking is this not the same thing?
Yes, those are effectively the same. They are both wide open to an injection attack.
Instead, do it this way:
query = "UPDATE DB..TABLE SET version = %s WHERE name = %s"
cursor.execute(query, [app.version, app.name])
Related
I'm writing a program to extract a lot of data from another source and record it in a Postgres database. I need a function that takes in the destination table and a dictionary with variable fields to be added and then inserts it as appropriate. It seems like it should be simple enough, but I'm running into problems generating the insert query. The examples I've found online are either partial, outdated, or simply don't work when I modify them for my data.
Here's a simple version I've put together to work it out. I've tried a lot of variations of this, so it's probably not as clean as it should be at this point. It feels like there's something really simple that I'm just missing, but if so I'm just not seeing it.
def insert_record():
table = "test"
record = {"name": "Jack", "id": 1}
fields = record.keys()
values = ", ".join(str(n) for n in record.values())
query = sql.SQL("INSERT INTO {} ({}) VALUES ({});".format(
sql.Identifier(table),
sql.SQL(",").join(map(sql.Identifier, fields)),
sql.SQL(",").join(sql.Placeholder() * len(fields))
))
cursor = connection.cursor()
print(query.as_string(connection))
try:
cursor.execute(query, (values,))
connection.commit()
except (Exception, psycopg2.DatabaseError) as error:
print(error)
cursor.close()
This returns the error:
syntax error at or near "'test'"
LINE 1: INSERT INTO Identifier('test') (Composed([Identifier('name')...
It looks like it's not actually formatting the query for whatever reason, since the as_string function also returns the unformatted:
"INSERT INTO Identifier('test') (Composed([Identifier('name'), SQL(','), Identifier('id')])) VALUES (Composed([Placeholder(''), SQL(','), Placeholder('')]));"
Any suggestions on how to fix this, or better ways to handle dynamic queries in general?
edit: Here's my import statement
import psycopg2
from psycopg2 import extras, Error, sql
You are calling the function .format() from the object string, you must call the .format() function from sql.SQL() object class.
query = sql.SQL("INSERT INTO {} ({}) VALUES ({});").format(
sql.Identifier(table),
sql.SQL(",").join(map(sql.Identifier, fields)),
sql.SQL(",").join(sql.Placeholder() * len(fields))
)
Ref: https://www.psycopg.org/docs/sql.html?highlight=literal#module-usage
I am aware that queries in Python can be parameterized using either ? or %s in execute query here or here
However I have some long query that would use some constant variable defined at the beginning of the query
Set #my_const = 'xyz';
select #my_const;
-- Query that use #my_const 40 times
select ... coalesce(field1, #my_const), case(.. then #my_const)...
I would like to do the least modif possible to the query from Mysql. So that instead of modifying the query to
pd.read_sql(select ... coalesce(field1, %s), case(.. then %s)... , [my_const, my_const, my_const, ..]
,I could write something along the line of the initial query. Upon trying the following, however, I am getting a TypeError: 'NoneType' object is not iterable
query_str = "Set #null_val = \'\'; "\
" select #null_val"
erpur_df = pd.read_sql(query_str, con = db)
Any idea how to use the original variable defined in Mysql query ?
The reason
query_str = "Set #null_val = \'\'; "\
" select #null_val"
erpur_df = pd.read_sql(query_str, con = db)
throws that exception is because all you are doing is setting null_value to '' and then selecting that '' - what exactly would you have expected that to give you? EDIT read_sql only seems to execute one query at a time, and as the first query returns no rows it results in that exception.
If you split them in to two calls to read_sql then it will in fact return you the value of your #null value in the second call. Due to this behaviour read_sql is clearly not a good way to do this. I strongly suggest you use one of my suggestions below.
Why are you wanting to set the variable in the SQL using '#' anyway?
You could try using the .format style of string formatting.
Like so:
query_str = "select ... coalesce(field1, {c}), case(.. then {c})...".format(c=my_const)
pd.read_sql(query_str)
Just remember that if you do it this way and your my_const is a user input then you will need to sanitize it manually to prevent SQL injection.
Another possibility is using a dict of params like so:
query_str = "select ... coalesce(field1, %(my_const)s, case(.. then %(my_const)s)..."
pd.read_sql(query_str, params={'my_const': const_value})
However this is dependent on which database driver you use.
From the pandas.read_sql docs:
Check your database driver documentation for which of the five syntax
styles, described in PEP 249’s paramstyle, is supported. Eg. for
psycopg2, uses %(name)s so use params={‘name’ : ‘value’}
I'm using a Python script that I have been using many times before to load CSV data into MySQL tables.
I modified the script for a very simple insert but it fails and I can't see why.
I've gone through the MySQL documentation of the Python connector, compared my syntax and I went through all the related articles on Stackoverflow but I can't find the reason. I've also checked the quotes I'm using as that is a common error.
Perhaps someone can help:
if row[0]:
s=row[0]
d=s[s.rfind('/')+1:len(s)-4]
cursor.execute("INSERT INTO `tab` (`did`) VALUES (%s)",(d))
I've checked print(d) and d is populated correctly.
The error I'm getting is
You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use
near '%s)' at line 1
If anyone can spot the (probably very silly) error, please help. Thanks.
The problem is that in
cursor.execute("INSERT INTO `tab` (`did`) VALUES (%s)",(d))
the (d) passed as params is a string with parentheses around it, not a tuple.
Here's how a mysql-connector cursor checks its params:
if params is not None:
if isinstance(params, dict):
for key, value in self._process_params_dict(params).items():
stmt = stmt.replace(key, value)
elif isinstance(params, (list, tuple)):
psub = _ParamSubstitutor(self._process_params(params))
stmt = RE_PY_PARAM.sub(psub, stmt)
if psub.remaining != 0:
raise errors.ProgrammingError(
"Not all parameters were used in the SQL statement")
So in your case though params is not None, it is not something accepted as params either and parameter substitution does not take place.
The fix then is simply to pass a tuple to cursor.execute() (a list works too):
cursor.execute("INSERT INTO `tab` (`did`) VALUES (%s)", (d,))
I think your string formating is wrong. It should probably be:
cursor.execute("INSERT INTO `tab` (`did`) VALUES (?)",d)
But you should check in the docs for your database library. I'm pretty sure the problem is with the placeholder in the query.
I was looking at the question and decided to try using the bind variables. I use
sql = 'insert into abc2 (interfield,textfield) values (%s,%s)'
a = time.time()
for i in range(10000):
#just a wrapper around cursor.execute
db.executeUpdateCommand(sql,(i,'test'))
db.commit()
and
sql = 'insert into abc2 (intfield,textfield) values (%(x)s,%(y)s)'
for i in range(10000):
db.executeUpdateCommand(sql,{'x':i,'y':'test'})
db.commit()
Looking at the time taken for the two sets, above it seems like there isn't much time difference. In fact, the second one takes longer. Can someone correct me if I've made a mistake somewhere? using psycopg2 here.
The queries are equivalent in Postgresql.
Bind is oracle lingo. When you use it will save the query plan so the next execution will be a little faster. prepare does the same thing in Postgres.
http://www.postgresql.org/docs/current/static/sql-prepare.html
psycopg2 supports an internal 'bind', not prepare with cursor.executemany() and cursor.execute()
(But don't call it bind to pg people. Call it prepare or they may not know what you mean:)
IMPORTANT UPDATE :
I've seen into source of all python libraries to connect to PostgreSQL in FreeBSD ports and can say, that only py-postgresql does real prepared statements! But it is Python 3+ only.
also py-pg_queue is funny lib implementing official DB protocol (python 2.4+)
You've missed answer for that question about prepared statements to use as many as possible. "Binded variables" are better form of this, let's see:
sql_q = 'insert into abc (intfield, textfield) values (?, ?)' # common form
sql_b = 'insert into abc2 (intfield, textfield) values (:x , :y)' # should have driver and db support
so your test should be this:
sql = 'insert into abc2 (intfield, textfield) values (:x , :y)'
for i in range (10000):
cur.execute(sql, x=i, y='test')
or this:
def _data(n):
for i in range (n):
yield (i, 'test')
sql = 'insert into abc2 (intfield, textfield) values (? , ?)'
cur.executemany(sql, _data(10000))
and so on.
UPDATE:
I've just found interest reciple how to transparently replace SQL queries with prepared and with usage of %(name)s
As far as I know, psycopg2 has never supported server-side parameter binding ("bind variables" in Oracle parlance). Current versions of PostgreSQL do support it at the protocol level using prepared statements, but only a few connector libraries make use of it. The Postgres wiki notes this here. Here are some connectors that you might want to try: (I haven't used these myself.)
pg8000
python-pgsql
py-postgresql
As long as you're using DB-API calls, you probably ought to consider cursor.executemany() instead of repeatedly calling cursor.execute().
Also, binding parameters to their query in the server (instead of in the connector) is not always going to be faster in PostgreSQL. Note this FAQ entry.
What's the best way to make psycopg2 pass parameterized queries to PostgreSQL? I don't want to write my own escpaing mechanisms or adapters and the psycopg2 source code and examples are difficult to read in a web browser.
If I need to switch to something like PyGreSQL or another python pg adapter, that's fine with me. I just want simple parameterization.
psycopg2 follows the rules for DB-API 2.0 (set down in PEP-249). That means you can call execute method from your cursor object and use the pyformat binding style, and it will do the escaping for you. For example, the following should be safe (and work):
cursor.execute("SELECT * FROM student WHERE last_name = %(lname)s",
{"lname": "Robert'); DROP TABLE students;--"})
From the psycopg documentation
(http://initd.org/psycopg/docs/usage.html)
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
The correct way to pass variables in a SQL command is using the second argument of the execute() method:
SQL = "INSERT INTO authors (name) VALUES (%s);" # Note: no quotes
data = ("O'Reilly", )
cur.execute(SQL, data) # Note: no % operator
Here are a few examples you might find helpful
cursor.execute('SELECT * from table where id = %(some_id)d', {'some_id': 1234})
Or you can dynamically build your query based on a dict of field name, value:
query = 'INSERT INTO some_table (%s) VALUES (%s)'
cursor.execute(query, (my_dict.keys(), my_dict.values()))
Note: the fields must be defined in your code, not user input, otherwise you will be susceptible to SQL injection.
I love the official docs about this:
https://www.psycopg.org/psycopg3/docs/basic/params.html