Adding list elements to WHERE clause - python

I want to add another condition to this WHERE clause:
stmt = 'SELECT account_id FROM asmithe.data_hash WHERE percent < {};'.format(threshold)
I have the variable juris which is a list. The value of account_id and juris are related in that when an account_id is created, it contains the substring of a juris.
I want to add to the query the condition that it needs to match anyone of the juris elements. Normally I would just add ...AND account_id LIKE '{}%'".format(juris) but this doesn't work because juris is a list.
How do I add all elements of a list to the WHERE clause?

Use Regex with operator ~:
juris = ['2','7','8','3']
'select * from tbl where id ~ \'^({})\''.format('|'.join(juris))
which leads to this query:
select * from tbl where id ~ '^(2|7|8|3)'
This brings the rows which their id start with any of 2,7,8 or 3. Here is a fiddle for it.
If you want the id start with 2783 use:
select * from tbl where id ~ '^2783'
and if id contains any of 2,7,8 or 3
select * from t where id ~ '.*(2|7|8|3).*'

Stop using string formatting with SQL. Right now. Understand?
OK now. There's a construct, ANY in SQL, that lets you take an operator and apply it to an array. psycopg2 supports passing a Python list as an SQL ARRAY[]. So in this case you can just
curs.execute('SELECT account_id FROM asmithe.data_hash WHERE percent LIKE ANY (%s)', (thelist,))
Note here that %s is the psycopg2 query-parameter placeholder. It's not actually a format specifier. The second argument is a tuple, the query parameters. The first (and only) parameter is the list.
There's also ALL, which works like ANY but is true only if all the matches are true, not just if one or more is true.

I am hoping juris is a list of strings? If so, this might help:
myquery = ("SELECT accountid FROM asmithe.data_hash "
"WHERE percent in (%s)" % ",".join(map(str,juris)))
See these links:
python list in sql query as parameter
How to select item matching Only IN List in sql server
String formatting operations

Related

Dynamic mySQL Statement

I am trying to make a dynamic mySQL update statement. My update fails if certain characters are in the string.
import mysql.connector as sql
import MySQLdb
#Values are taken from a wxGrid.
key_id = str("'") + str(self.GetCellValue(event.GetRow(),1)) + str("'") #Cell column with unique ID
target_col = str(self.GetColLabelValue(event.GetCol())) #Column being updated
key_col = str(self.GetColLabelValue(1)) #Unique ID column
nVal = str("'")+self.GetCellValue(event.GetRow(),event.GetCol()) + str("'") #Updated value
sql_update = f"""Update {tbl} set {target_col} = {nVal} where {key_col} = {key_id}"""
self.cursor.execute(sql_update)
My Key column always contains Email addresses or integers. So if key_id = test#email.com, the update is successful, but if key_id = t'est#email.com, it fails. How do I get around this?
You can fix this by using query parameters. Stop concatenating strings into your SQL query. Use placeholders and then pass the values in a separate list argument to execute().
sql_update = f"""Update {tbl} set {target_col} = %s where {key_col} = %s"""
self.cursor.execute(sql_update, (nVal, key_id,))
Query parameters only work where you would use a literal value in your query, like a quoted string literal or a numeric literal.
You can't use query parameters for identifiers like the table name or column names. But I hope your identifiers are less likely to contain ' characters!
Likewise you cannot use query parameters for expressions or SQL keywords or lists of values e.g. for an IN() predicate. One query parameter = one scalar value.
See also:
MySQL parameterized queries
https://dev.mysql.com/doc/connector-python/en/connector-python-example-cursor-transaction.html
Literally any other Python SQL tutorial.
Use execute function instead.
Not recommended solution: A workaround for single quote literal is to replace with an escape character; just before the query key_id.replace("'", "\'"). That you might have to do for each special character like %, , _, and [.

SQL Conditional Count for Each Entry Instead of All

I have the following query that is attempting to return authors and their article counts:
SELECT (
SELECT COUNT(*)
FROM aldryn_newsblog_article
WHERE
aldryn_newsblog_article.author_id IN (1,2) AND
aldryn_newsblog_article.app_config_id = 1 AND
aldryn_newsblog_article.is_published IS TRUE AND
aldryn_newsblog_article.publishing_date <= now()
) as article_count, aldryn_people_person.*
FROM aldryn_people_person
However, it is currently returning the same number for each author because it counts all articles for authors with ID's of 1 and 2.
How should the query be modified, so it returns proper article counts for each author?
On a separate note, how can one turn the (1,2) into a list that can be spliced into the query dynamically? That is, suppose I have a Python list of author IDs, for which I would like to look up article counts. How could I pass that information to the SQL?
As commented, for a subquery to work you need to correlate it to the outer query usually by a unique identifier (assumed to be author_id) which appears to also be used for a filtered condition to be run in WHERE of outer query. Also, use table aliases for clarity between subquery and outer query.
SELECT main.*
, (SELECT COUNT(*)
FROM aldryn_newsblog_article AS sub
WHERE
sub.author_id = main.author_id AND
sub.app_config_id = 1 AND
sub.is_published IS TRUE AND
sub.publishing_date <= now()
) AS article_count
FROM aldryn_people_person AS main
WHERE main.author_id IN (1, 2)
Alternatively, for a more efficient query, have main query JOIN to an aggregate subquery to calculate counts once and avoid re-running subquery for every outer query's number of rows.
SELECT main.*,
, sub.article_count
FROM aldryn_people_person AS main
INNER JOIN
(SELECT author_id
, COUNT(*) AS article_count
FROM aldryn_newsblog_article AS sub
WHERE
sub.app_config_id = 1 AND
sub.is_published IS TRUE AND
sub.publishing_date <= now()
GROUP BY author_id
) AS sub
ON sub.author_id = main.author_id
AND main.author_id IN (1, 2)
Re your separate note, there are many SO questions like this one that asks for a dynamic list in IN operator which involves creating a prepared statement with dynamic number of parameter placeholders, either ? or %s depending on Python DB-API (e.g., psycopg2, pymysql, pyodbc). Then, pass parameters in second argument of cursor.execute() clause. Do note the limit of such values for your database.
# BUILD PARAM PLACEHOLDERS
qmarks = ", ".join(['?' for _ in range(len([list_of_author_ids]))])
# INTERPOLATE WITH F-STRING (PYTHON 3.6+)
sql = f'''SELECT ...
FROM ....
INNER JOIN ....
AND main.author_id IN ({qmarks})'''
# BIND PARAMS
cursor.execute(sql, [list_of_author_ids])
The way I normally handle these sorts of aggregates is first design a query that gets a list of author names and articles, then create a column to serve as the article count. At the lowest level this looks silly, because every article is 1. Then I wrap that in a subquery and sum from it.
SELECT sub.author, articleCount = sum(sub.rowCount)
FROM (
select distinct
author = x.author_id
, article = x.articleTitle
, rowCount = 1
from aldryn_newsblog_article x
where x.author_id in (1,2) and x.is_pubished = true --whatever other conditions you need here
) sub
GROUP BY sub.author
As far as the (1,2) being replaced with something more dynamic, the way I've seen it done before is to use CHARINDEX to parse a comma separated string in the where clause so you would have something like
DECLARE #passedFilter VARCHAR(50) = ',1,2,'
SELECT * FROM aldryn_newsblog_article WHERE CHARINDEX(',' + CAST(author_id AS VARCHAR) + ',', #passedFilter, 0) > 0
What this does is takes your list of ids (note the leading and trailing commas) and lets the query do a pattern match on it off of the key value. I've read that this doesn't give the absolute best performance, but sometimes that isn't the biggest concern. We used this a lot in passing filters from a web app to SQL Server reports. Another method would be to declare a table variable / temp table, populate it somehow with the authors you want to filter for then join that subquery from the first bit of my answer to that table.

Insert NULL into a nullable row, using $$ in postgres

I was working on securing my application:
q = "insert into foo (name, descr, some_fk_int) values ($${}$$, $${}$$, $${}$$);".format(
name.replace("$$",""),
description.replace("$$",""),
str(fkToAnotherTable).replace("$$","") if fkToANotherTable is not None else 'NULL'
)
cur.execute(q)
I noticed that when doing this it will try to insert the string NULL into the DB into an integer entry. For the sake of keeping things uniform, I was pretty much taking all strings processing it this way, and all ints converting to a string and then doing it as well (just covering bases).
It was working when it was just:
.... ($$Tony$$,$$Bar$$, NULL)
but now that it is:
.....($$Tony$$, $$Bar$$, $$NULL$$)
it fails. I was thinking that there was a way to implement this in a uniform fashion. I was doing this because when doing lookups on indexes, i noticed that it understood ints as strings
select * from foo where id = $$id$$
So i was blanketing the app where user input was going to be put into databases. I could remove it from ints, but I thought this would be smart in case someone tried to pass a string into an int to see "what would happen".
I was thinking that the query would fail if the ID was an int and you did:
select * from foo where id = $$hello$$
but i was thinking of more interesting cases where someone would try sqlinjection
select * from foo where id = $$$$ or 1=1$$$$
where the injected string was: $$ or 1=1$$. I was thinking that i could resolve it by casting to a string, and then replacing key characters. BUT in the process of injecting or selects, I was noticing the NULL case, and was trying to figure out how to Bypass it?
Maybe I should customise the query to do something like:
q = "insert into foo (name, description, fk) values
( $${}$$, $${}$$, {} );".format(
name.replace("$$",""),
description.replace("$$",""),
"$${}$$".format(str(FK).replace("$$","")) if FK is not None else "NULL"
)
instead which only adds $$ if it is NOT None? Then it would work.
TLDR: Is there a property way to pass 'NULL' into an int field and it understand it as null?
Edit: It seems like people were thinking I should follow parameterized query defintions done in psycopg2 under the Cursor's execute function's second property.
I was not sure sure if it would solve SQL injection, but i was thinking of doing something as follows:
q = "select * from foo where name in ({}) and user_id = %s".format(
",".join(["%s" for d in my_list])
)
# creates: select * from foo where name in (%s, %s, %s, %s, %s) and user_id = %s;
args = [d for d in my_list] + [user_id]
# creates [1,2,3,4,5,7493]
cursor.execute(q, args)
as the website allows the second argument to be a sequence, list, or dict.

Unpack tuple within SQL statement

I am using Python and PyMySQL. I want to fetch a number of items from a MySQL database according to their ids:
items_ids = tuple([3, 2])
sql = f"SELECT * FROM items WHERE item_id IN {items_ids};"
I am using the formatted string literals (f" ", https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep498) to evaluate the tuple inside the SQL statement.
However,I want to get back the items in the order specified by the tuple so firstly the item with item_id = 3 and then the item with item_id = 2. To accomplish this I have to use the ORDER BY FIELD clause (see also here: Ordering by the order of values in a SQL IN() clause).
But if I write something like this:
items_ids = tuple([3, 2])
sql = f"SELECT * FROM items WHERE item_id IN {items_ids} ORDER BY FIELD{(item_id,) + items_ids};"
then item_id in the ORDER BY FIELD clause is considered as an undeclared python variable
and if I write something like this:
items_ids = tuple([3, 2])
sql = f"SELECT * FROM items WHERE item_id IN {items_ids} ORDER BY FIELD{('item_id',) + items_ids};"
then item_id in the ORDER BY FIELD clause is considered as a string and not as a SQL variable and in this case ORDER BY FIELD does not do anything.
How can I evaluate the tuple (item_id,) + items_ids in the SQL statement by maintaining item_id as a SQL variable in the ORDER BY FIELD clause?
Obviously I can sort the items after they have returned from the database according to items_ids and without bothering so much with MySQL but I was just wondering how to do this.
Please don't use f-strings, or any string formatting, for passing values to SQL queries. That's the road to SQL injection. Now you may be thinking: "it's a tuple of integers, what bad could happen?" First of all a single element Python tuple's string representation is not valid SQL. Secondly, someone may follow the example with user controllable data other than tuples of ints (so having these bad examples online perpetuates the habit). Also the reason why you have to resort to your "cunning" solution is using the wrong tools for the job.
The correct way to pass values to SQL queries is to use placeholders. In case of pymysql the placeholder is – a bit confusingly – %s. Don't mix it with manual %-formatting. In case of having to pass a variable amount of values to a query you do have to resort to some string building, but you build the placeholders, not the values:
item_ids = (3, 2)
item_placeholders = ', '.join(['%s'] * len(item_ids))
sql = f"""SELECT * FROM items
WHERE item_id IN ({item_placeholders})
ORDER BY FIELD(item_id, {item_placeholders})"""
# Produces:
#
# SELECT * FROM items
# WHERE item_id IN (%s, %s)
# ORDER BY FIELD(item_id, %s, %s)
with conn.cursor() as cur:
# Build the argument tuple
cur.execute(sql, (*item_ids, *item_ids))
res = cur.fetchall()
Another simpler way to resolve this single element tuple problem is by checking the length of the element by keeping it into list and keeping it as a list rather than passing it as a tuple to cursor param:
eg:
if (len(get_version_list[1])==1):
port_id=str(port_id[0])
port_id = '(' + "'" + port_id + "'" + ')'
else:
port_id=tuple(port_id)
pd.read_sql(sql=get_version_str.format(port_id,src_cd), con=conn)
By using above code simply you won't get (item_id,) this error in sql further:)
A solution with .format() is the following:
items_ids = tuple([3, 2])
items_placeholders = ', '.join(['{}'] * len(items_ids))
sql = "SELECT * FROM items WHERE item_id IN {} ORDER BY FIELD(item_id, {});".format(items_ids, items_placeholders).format(*items_ids)
# with `.format(items_ids, items_placeholders)` you get this: SELECT * FROM items WHERE item_id IN (3, 2) ORDER BY FIELD(item_id, {}, {});
# and then with `.format(*items_ids)` you get this: SELECT * FROM items WHERE item_id IN (3, 2) ORDER BY FIELD(item_id, 3, 2);
A rather tricky solution with f-strings is the following:
sql1 = f"SELECT * FROM items WHERE item_id IN {item_ids} ORDER BY FIELD(item_id, "
sql2 = f"{items_ids};"
sql = sql1 + sql2[1:]
# SELECT * FROM items WHERE item_id IN (3, 2) ORDER BY FIELD(item_id, 3, 2);
But as #IIija mentions, I may get a SQL injection with it because IN {item_ids} cannot accommodate one-element tuples as such.
Additionally, using f-strings to unpack tuples in strings is perhaps more difficult than using .format() as others have mentioned before (Formatted string literals in Python 3.6 with tuples) since you cannot use * to unpack a tuple within a f-string. However, perhaps you may come up with a solution for this (which is using a iterator?) to produce this
sql = f"SELECT * FROM items WHERE item_id IN ({t[0]}, {t[1]}) ORDER BY FIELD(item_id, {t[0]}, {t[1]});"
even though I do not have the solution for this in my mind right now. You are welcome to post a solution of this kind if you have it in your mind.

MySQLdb insert into database using lists for field names and values python

I have these two lists:
list1=['a','b','c']
list2=['1','2','3']
I am trying to insert these into a database with field names like so:
a | b | c | d | e
I am currently trying putting these lists as strings and then simply adding in the execute, e.g. cur.execute(insert,(strList1,strList2)) where strList1 and strList2 are just strings of list1 and list2 formed using:
strList1=''
for thing in list1:
strList1+=thing+','
strList1=strList1[:-1]
My current SQL statement is:
insert="""insert into tbl_name(%s) values(%s)"""
cur.execute(insert,(strList1,strList2))
I also have a follow up question: how could I ensure that say column a needed to be a primary key that on a duplicate entry it would update the other fields if they were blank?
Do not use %s in queries as this is a security risk. This is due to %s simply inserting the value into the string, meaning it can be a whole separate query all together.
Instead use "?" where you want the value to be, and add a second argument to execute in the form of a tuple like so
curs.execute("SELECT foo FROM bar WHERE foobar = ?",(some_value,))
Or in a slightly longer example
curs.execute("UPDATE foo SET bar = ? WHERE foobar = ?",(first_value,second_value))
Edit:
Hopefully i understood what you want correctly this time, sadly you cannot use "?" for tables so you are stuck with %s. I made a quick little test script.
import sqlite3
list1=['foo','bar','foobar'] #List of tables
list2=['First_value','second_value','Third_value'] #List of values
db_conn = sqlite3.connect("test.db") #I used sqlite to test it quickly
db_curs = db_conn.cursor()
for table in list1: #Create all the tables in the db
query = "CREATE TABLE IF NOT EXISTS %s(foo text, bar text,foobar text)" % table
db_curs.execute(query)
db_conn.commit()
for table in list1: #Insert all the values into all the tables
query = "INSERT INTO %s VALUES (?,?,?)" % table
db_curs.execute(query,tuple(list2))
db_conn.commit()
for table in list1: #Print all the values out to see if it worked
db_curs.execute("SELECT * FROM %s" % table)
fetchall = db_curs.fetchall()
for entry in fetchall:
print entry[0], entry[1],entry[2]
One thing you could do on those lists to make things easier...
list1=['a','b','c']
print ",".join(list1)
#a,b,c
Your insert looks good. Seems like a batch insert would be the only other option.
This is the way prepared statements work (simplified):
* The statement is send to the database with a list of parameters;
* The statement is retrieved from the statement cache or if not present, prepared and added to the statement cache;
* Parameters are applied;
* Statement is executed.
Statements have to be complete except that the parameters are replaced by %s (or ? or :parm depending on the language used). Parameters are your final numerical/string/date/etc values only. So labels or other parts can not be replaced.
In your case that means:
insert="""insert into tbl_name(%s) values(%s)"""
Should become something like:
insert="""insert into tbl_name(a,b,c) values(%s,%s,%s)"""
To use parameters, you must provide a "%s" (or %d, whatever) for each item. You can use two lists/tuples as follows:
insert="""insert into tbl_name (%s,%s,%s) values (%s, %s, %s);"""
strList1=('a','b','c')
strList2=(1,2,3)
curs.execute(insert % (strList1 + strList2))
*I'm using python3, (strList1,StrList2) doesn't work for me, but you might have slight differences.

Categories