Parameterizing 'SELECT IN (...)' queries - python

I want to use MySQLdb to create a parameterized query such as:
serials = ['0123456', '0123457']
c.execute('''select * from table where key in %s''', (serials,))
But what ends up being send to the DBMS is:
select * from table where key in ("'0123456'", "'0123457'")
Is it possible to create a parameterized query like this? Or do I have to loop myself and build up a result set?
Note: executemany(...) won't work for this - it'll only return the last result:
>>> c.executemany('''select * from table where key in (%s)''',
[ (x,) for x in serials ] )
2L
>>> c.fetchall()
((1, '0123457', 'faketestdata'),)
Final solution adapted from Gareth's clever answer:
# Assume check above for case where len(serials) == 0
query = '''select * from table where key in ({0})'''.format(
','.join(["%s"] * len(serials)))
c.execute(query, tuple(serials)) # tuple() for case where len == 1

You want something like this, I think:
query = 'select * from table where key in (%s)' % ','.join('?' * len(serials))
c.execute(query, serials)

Related

psycopg2 - insert into variable coumns using extras.batch_execution

I am inserting a pandas dataframe into postgres using psycopg2.
Below code:
...
import psycopg2.extras as extras
tuples = [tuple(x) for x in df.to_numpy()]
cols = ','.join(list(column_list))
query = "INSERT INTO %s(%s) VALUES (%%s,%%s,%%s,%%s,%%s)" % (table , cols)
extras.execute_batch(cursor, query, tuples, page_size = 100)
...
This works!
Here, I convert df into tuple, and I think %%s is taking this values at runtime when extras.execute_batch is executed.
The problem is that for this, I need to hardcode %%s, number of times the columns.
In this example its 5 columns, hence I am using %%s,%%s,%%s,%%s,%%s.
Is there a way to have it variable?
Here is what I tried:
...
tuples = [tuple(x) for x in df.to_numpy()]
cols = ','.join(list(column_list))
vals_frame = len(column_list) * """%%s,"""
vals_frame = vals_frame[:-1]
print('vals_frame: ',vals_frame)
query = query = "INSERT INTO %s(%s) VALUES("+vals_frame+")" % (table , cols)
extras.execute_batch(cursor, query, tuples, page_size = 100)
...
This prints:
vals_frame: '%%s,%%s,%%s,%%s,%%s'
which is what I want, but I get below error while creation of query:
TypeError: not all arguments converted during string formatting
How to get past this?
I have tried:
vals_frame = len(column_list) * """\%\%s,"""
vals_frame = len(column_list) * """\\%%s,"""
but this does not seem to work. Can some one help?
The problem is the location of the %. Because of operator precedence, % binds tighter than +. So:
query = "INSERT INTO %s(%s) VALUES("+vals_frame+")" % (table , cols)
The % operator here applies to the string ")". Here are some alternatives to consider:
query = "INSERT INTO %s(%s) VALUES(" % (table, cols) +vals_frame+")"
query = ("INSERT INTO %s(%s) VALUES("+vals_frame+")") % (table , cols)
query = "INSERT INTO %s(%s) VALUES(%s)" % (table, cols, vals_frame)
Alternatively, avoid the problem by using f-strings:
query = f"INSERT INTO {table}({cols}) VALUES({vals_frame});"

Correct way to "select * from tbl where field in ?" and the placeholder is a list without string interpolation

I have a query of this form using pysqlite:
query = "select * from tbl where field1 in ?"
variables = ['Aa', 'Bb']
In a query, I'd like this to work:
with conn.cursor() as db:
res = db.execute(query, (variables,)).fetchall()
eg, interpreted into SQLITE command line as:
select * from tbl where field1 in ("Aa", "Bb");
But this fails with:
pysqlite3.dbapi2.InterfaceError: Error binding parameter 0 - probably unsupported type.
I understand I can just string.join([mylist]), but this is unsafe. How can I use placeholder parameters and a list in sqlite with python?
Update
Differentiating this from similar questions on Stackoverflow, they seem to be looking to use %s string interpolation where I am looking to avoid this
Question: WHERE field IN ? and the placeholder is a list without string interpolation
Values are a list of int
values = (42, 43, 44)
Prepare your Query with the number of bindings
bindings = '?,'*len(values)
QUERY = "SELECT * FROM t1 WHERE id IN ({});".format(bindings[:-1])
print("{}".format(QUERY))
Output:
SELECT * FROM t1 WHERE id IN (?,?,?);
Execute the Query
cur.execute (QUERY, values)

Variable Substitution in SQLite using Python

I am trying to figure out how to combine two SQLite queries. Both of them work fine independently but when I put them together with AND they do not work. I think the problem is that I do not know how to pass the variables properly.
First query that works:
var1 = 10
mylist = ['A', 'B', 'C', 'AB', 'AC']
c.execute("SELECT * FROM my_table WHERE column1=(?) ORDER BY RANDOM() LIMIT 1", (mylist[2],))
This line also works:
params = [5,0,1]
query = ("SELECT * FROM my_table WHERE column2 NOT IN (%s)" % ','.join('?' * len(params)))
c.execute(query, params)
I have been trying to combine these two statements without success:
query = ("SELECT * FROM my_table WHERE column2 NOT IN (%s) AND column1=(?)" % ','.join('?' * len(params)))
c.execute(query, params, mylist[2])
In case anyone finds this helpful, my final solution looked like this:
query = ("SELECT * FROM my_table WHERE column1 = (?) AND column2 NOT IN (%s) ORDER BY RANDOM() LIMIT 1" % ','.join('?' * len(params)))
c.execute(query, [mylist[2]] + params)
The second parameter of execute() must be a sequence containing all the SQL parameters.
So you have to construct a single list with the values from both original lists:
c.execute(query, params + [list[2]])

Increase Query Speed in Sqlite

I am a very newbie in using python and sqlite. I am trying to create a script that reads a data from a table (rawdata) and then performs some calculations which is then stored in a new table. I am counting the number race that a player has won before that date at a particular track position and calculating the percentage. There are 15 track positions in total. Overall the script is very slow. Any suggestions to improve its speed. I have already used the PRAGMA parameters.
Below is the script.
for item in result:
l1 = str(item[0])
l2 = item[1]
l3 = int(item[2])
winpost = []
key = l1.split("|")
dt = l2
###Denominator--------------
cursor.execute(
"SELECT rowid FROM rawdata WHERE Track = ? AND Date< ? AND Distance = ? AND Surface =? AND OfficialFinish=1",
(key[2], dt, str(key[4]), str(key[5]),))
result_den1 = cursor.fetchall()
cursor.execute(
"SELECT rowid FROM rawdata WHERE Track = ? AND RaceSN<= ? AND Date= ? AND Distance = ? AND Surface =? AND OfficialFinish=1",
(key[2], int(key[3]), dt, str(key[4]), str(key[5]),))
result_den2 = cursor.fetchall()
totalmat = len(result_den1) + len(result_den2)
if totalmat > 0:
for i in range(1, 16):
cursor.execute(
"SELECT rowid FROM rawdata WHERE Track = ? AND Date< ? AND PolPosition = ? AND Distance = ? AND Surface =? AND OfficialFinish=1",
(key[2], dt, i, str(key[4]), str(key[5]),))
result_num1 = cursor.fetchall()
cursor.execute(
"SELECT rowid FROM rawdata WHERE Track = ? AND RaceSN<= ? AND Date= ? AND PolPosition = ? AND Distance = ? AND Surface =? AND OfficialFinish=1",
(key[2], int(key[3]), dt, i, str(key[4]), str(key[5]),))
result_num2 = cursor.fetchall()
winpost.append(len(result_num1) + len(result_num2))
winpost = [float(x) / totalmat for x in winpost]
rank = rankmin(winpost)
franks = list(rank)
franks.insert(0, int(key[3]))
franks.insert(0, dt)
franks.insert(0, l1)
table1.append(franks)
franks = []
cursor.executemany("INSERT INTO posttable VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)", table1)
Sending and retrieving an SQL query is "expensive" in terms of time. The easiest way to speed things up would be to use SQL functions to reduce the number of queries.
For example, the first two queries could be reduced to a single call using COUNT(), UNION, and Aliases.
SELECT COUNT(*)
FROM
( SELECT rowid FROM rawdata where ...
UNION
SELECT rowid FROM rawdata where ...
) totalmatch
In this case we take the two original queries (with your conditions in place of the "...") combine them with a UNION statement, give that union the alias "totalmatch", and count all the rows in it.
Same thing can be done with the second set of queries. Instead of cycling 16 times over 2 queries (resulting in 32 calls to the SQL engine) you can replace it with one query by also using GROUP BY.
SELECT PolPosition, COUNT(PolPosition)
FROM
( SELECT PolPosition FROM rawdata WHERE ...
UNION
SELECt PolPosition FROM rawdata WHERE ...
) totalmatch
GROUP BY PolPosition
In this case we take the exact same query as before and group it by PolPosition, using COUNT to display how many rows are in each group.
W3Schools is a great resource for how these functions work:
http://www.w3schools.com/sql/default.asp

Using a Python loop to create SQL databases from lists [duplicate]

if count == 1:
cursor.execute("SELECT * FROM PacketManager WHERE ? = ?", filters[0], parameters[0])
all_rows = cursor.fetchall()
elif count == 2:
cursor.execute("SELECT * FROM PacketManager WHERE ? = ? AND ? = ?", filters[0], parameters[0], filters[1], parameters[1])
all_rows = cursor.fetchall()
elif count == 3 :
cursor.execute("SELECT * FROM PacketManager WHERE ? = ? AND ? = ? AND ? = ?", filters[0], parameters[0], filters[1], parameters[1], filters[2], parameters[2])
all_rows = cursor.fetchall()
This is a code snippet in my program. What I'm planning to do is pass the column name and the parameter in the query.
The filters array contains the columnnames, the parameter array contains the parameters. The count is the number of filters set by the user. The filters and paramters array are already ready and have no problem. I just need to pass it to the query for it to execute. This give me an error of "TypeError: function takes at most 2 arguments"
You cannot use SQL parameters to interpolate column names. You'll have to use classic string formatting for those parts. That's the point of SQL parameters; they quote values so they cannot possibly be interpreted as SQL statements or object names.
The following, using string formatting for the column name works, but be 100% certain that the filters[0] value doesn't come from user input:
cursor.execute("SELECT * FROM PacketManager WHERE {} = ?".format(filters[0]), (parameters[0],))
You probably want to validate the column name against a set of permissible column names, to ensure no injection can take place.
You can only set parameters using ?, not table or column names.
You could build a dict with predefined queries.
queries = {
"foo": "SELECT * FROM PacketManager WHERE foo = ?",
"bar": "SELECT * FROM PacketManager WHERE bar = ?",
"foo_bar": "SELECT * FROM PacketManager WHERE foo = ? AND bar = ?",
}
# count == 1
cursor.execute(queries[filters[0], parameters[0])
# count == 2
cursor.execute(queries[filters[0] + "_" + queries[filters[1], parameters[0])
This approach will make you save from SQL injection in filters[0].

Categories