I am using Python and PyMySQL. I want to fetch a number of items from a MySQL database according to their ids:
items_ids = tuple([3, 2])
sql = f"SELECT * FROM items WHERE item_id IN {items_ids};"
I am using the formatted string literals (f" ", https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep498) to evaluate the tuple inside the SQL statement.
However,I want to get back the items in the order specified by the tuple so firstly the item with item_id = 3 and then the item with item_id = 2. To accomplish this I have to use the ORDER BY FIELD clause (see also here: Ordering by the order of values in a SQL IN() clause).
But if I write something like this:
items_ids = tuple([3, 2])
sql = f"SELECT * FROM items WHERE item_id IN {items_ids} ORDER BY FIELD{(item_id,) + items_ids};"
then item_id in the ORDER BY FIELD clause is considered as an undeclared python variable
and if I write something like this:
items_ids = tuple([3, 2])
sql = f"SELECT * FROM items WHERE item_id IN {items_ids} ORDER BY FIELD{('item_id',) + items_ids};"
then item_id in the ORDER BY FIELD clause is considered as a string and not as a SQL variable and in this case ORDER BY FIELD does not do anything.
How can I evaluate the tuple (item_id,) + items_ids in the SQL statement by maintaining item_id as a SQL variable in the ORDER BY FIELD clause?
Obviously I can sort the items after they have returned from the database according to items_ids and without bothering so much with MySQL but I was just wondering how to do this.
Please don't use f-strings, or any string formatting, for passing values to SQL queries. That's the road to SQL injection. Now you may be thinking: "it's a tuple of integers, what bad could happen?" First of all a single element Python tuple's string representation is not valid SQL. Secondly, someone may follow the example with user controllable data other than tuples of ints (so having these bad examples online perpetuates the habit). Also the reason why you have to resort to your "cunning" solution is using the wrong tools for the job.
The correct way to pass values to SQL queries is to use placeholders. In case of pymysql the placeholder is – a bit confusingly – %s. Don't mix it with manual %-formatting. In case of having to pass a variable amount of values to a query you do have to resort to some string building, but you build the placeholders, not the values:
item_ids = (3, 2)
item_placeholders = ', '.join(['%s'] * len(item_ids))
sql = f"""SELECT * FROM items
WHERE item_id IN ({item_placeholders})
ORDER BY FIELD(item_id, {item_placeholders})"""
# Produces:
#
# SELECT * FROM items
# WHERE item_id IN (%s, %s)
# ORDER BY FIELD(item_id, %s, %s)
with conn.cursor() as cur:
# Build the argument tuple
cur.execute(sql, (*item_ids, *item_ids))
res = cur.fetchall()
Another simpler way to resolve this single element tuple problem is by checking the length of the element by keeping it into list and keeping it as a list rather than passing it as a tuple to cursor param:
eg:
if (len(get_version_list[1])==1):
port_id=str(port_id[0])
port_id = '(' + "'" + port_id + "'" + ')'
else:
port_id=tuple(port_id)
pd.read_sql(sql=get_version_str.format(port_id,src_cd), con=conn)
By using above code simply you won't get (item_id,) this error in sql further:)
A solution with .format() is the following:
items_ids = tuple([3, 2])
items_placeholders = ', '.join(['{}'] * len(items_ids))
sql = "SELECT * FROM items WHERE item_id IN {} ORDER BY FIELD(item_id, {});".format(items_ids, items_placeholders).format(*items_ids)
# with `.format(items_ids, items_placeholders)` you get this: SELECT * FROM items WHERE item_id IN (3, 2) ORDER BY FIELD(item_id, {}, {});
# and then with `.format(*items_ids)` you get this: SELECT * FROM items WHERE item_id IN (3, 2) ORDER BY FIELD(item_id, 3, 2);
A rather tricky solution with f-strings is the following:
sql1 = f"SELECT * FROM items WHERE item_id IN {item_ids} ORDER BY FIELD(item_id, "
sql2 = f"{items_ids};"
sql = sql1 + sql2[1:]
# SELECT * FROM items WHERE item_id IN (3, 2) ORDER BY FIELD(item_id, 3, 2);
But as #IIija mentions, I may get a SQL injection with it because IN {item_ids} cannot accommodate one-element tuples as such.
Additionally, using f-strings to unpack tuples in strings is perhaps more difficult than using .format() as others have mentioned before (Formatted string literals in Python 3.6 with tuples) since you cannot use * to unpack a tuple within a f-string. However, perhaps you may come up with a solution for this (which is using a iterator?) to produce this
sql = f"SELECT * FROM items WHERE item_id IN ({t[0]}, {t[1]}) ORDER BY FIELD(item_id, {t[0]}, {t[1]});"
even though I do not have the solution for this in my mind right now. You are welcome to post a solution of this kind if you have it in your mind.
Related
I'm looking for a way to implement alternating SQL queries - i.e. a function that allows me to filter entries based on different columns. Take the following example:
el=[["a","b",1],["a","b",3]]
def save_sql(foo):
with sqlite3.connect("fn.db") as db:
cur=db.cursor()
cur.execute("CREATE TABLE IF NOT EXISTS et"
"(var1 VARCHAR, var2 VARCHAR, var3 INT)")
cur.executemany("INSERT INTO et VALUES "
"(?,?,?)", foo)
db.commit()
def load_sql(v1,v2,v3):
with sqlite3.connect("fn.db") as db:
cur=db.cursor()
cur.execute("SELECT * FROM et WHERE var1=? AND var2=? AND var3=?", (v1,v2,v3))
return cur.fetchall()
save_sql(el)
Now if I were to use load_sql("a","b",1), it would work. But assume I want to only query for the first and third column, i.e. load_sql("a",None,1) (the None is just intended as a placeholder) or only the last column load_sql(None,None,5), this wouldn't work.
This could of course be done with if statements checking which variables were supplied in the function call, but in tables with larger amounts of columns, this might get messy.
Is there a good way to do this?
What if load_sql() would accept an arbitrary number of keyword arguments, where keyword argument names would correspond to column names. Something along these lines:
def load_sql(**values):
with sqlite3.connect("fn.db") as db:
cur = db.cursor()
query = "SELECT * FROM et"
conditions = [f"{column_name} = :{column_name}" for column_name in values]
if conditions:
query = query + " WHERE " + " AND ".join(conditions)
cur.execute(query, values)
return cur.fetchall()
Note that here we trust keyword argument names to be valid and existing column names (and string-format them into the query) which may potentially be used as an SQL injection attack vector.
As a side note, I cannot stop but think that this feels like a reinventing-the-wheel step towards an actual ORM. Look into lightweight PonyORM or Peewee abstraction layers between Python and a database.
It will inevitably get messy if you want your SQL statements to remain sanitized/safe, but as long as you control your function signature it can remain reasonably safe, e.g.:
def load_sql(var1, var2, var3):
fields = dict(field for field in locals().items() if field[1] is not None)
query = "SELECT * FROM et"
if fields: # if at least one field is not None:
query += " WHERE " + " AND ".join((k + "=?" for k in fields.keys()))
with sqlite3.connect("fn.db") as db:
cur = db.cursor()
cur.execute(query, fields.values())
return cur.fetchall()
You can replace the function signature with load_sql(**kwargs) and then use kwargs.items() instead of locals.items() so that you can pass arbitrary column names, but that can be very dangerous and is certainly not recommended.
The overarching question here is how to get a multirow REPLACE INTO statement that works with None in the format "REPLACE INTO ... VALUES (...), (...).
From https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-executemany.html we have this example where executemany(stmt, params) for INSERT statements ONLY forms the multiple row format:
INSERT INTO employees (first_name, hire_date) VALUES ('Jane', '2005-02-12'), ('Joe', '2006-05-23'), ('John', '2010-10-03')
But for all other statement types, it creates one query per tuple in params. For performance reasons, we want to bundle a REPLACE INTO in this multirow format.
The field list looks like this:
child_item_count, name, type_default, parent_id, version, agent_id, folder_id
and some of them are permitted to be NULL.
Originally, I tried to just build a statement string with all of the tuples comma added to the operational part of the query. Given list_of_tuples looks like [(None,'a string',8,'UUID',190L,'UUID','UUID'),...]:
insert_query = "REPLACE INTO %s ( %s ) VALUES {} " % (table, column_names)
values = ', '.join(map(str, list_of_tuples))
sql = insert_query.format(values)
db_cursor.execute(sql)
but I got:
Error writing to database: OperationalError(1054, "Unknown column 'None' in 'field list'")
I've also tried just shipping the list to execute() as in db_cursor.execute(insert_query, list_of_tuples) and that doesn't work, either. That results in "TypeError('not all arguments converted during string formatting',)"
Warning: Your code contains the possibility of SQL Injection.
The issue is pretty simple:
The map(a, b) function will run the a(el) for each element in b.
In your case, it will get every tuple on the list and convert it to a string, therefore a given tuple (None, 4, 'b') will turn into (None, 4, 'b') - and None is not a valid keyword on MySQL.
The best way to fix this is to rely on the execute command to convert the values correctly, making it sql injection free:
import itertools
columns_count = 10 # Change this value according to your columns count
column_values = '(' + ', '.join(['%s'] * columns_count) + ')'
values = ', '.join([column_values]*len(list_of_tuples))
# (...)
insert_query = insert_query.format(values)
db_cursor.execute(insert_query, list(itertools.chain.from_iterable(list_of_tuples)))
Although there is the second option (Bellow), it would make your code vulnerable to SQL Injection. So don't use it.
Simply to convert the values directly, making the necessary adjustments (In this specific scenario, it is just changing None to NULL):
values = ', '.join('('+', '.join(map(lambda e: str(e) if e else "NULL", t))+')' for t in list_of_tuples)
I want to add another condition to this WHERE clause:
stmt = 'SELECT account_id FROM asmithe.data_hash WHERE percent < {};'.format(threshold)
I have the variable juris which is a list. The value of account_id and juris are related in that when an account_id is created, it contains the substring of a juris.
I want to add to the query the condition that it needs to match anyone of the juris elements. Normally I would just add ...AND account_id LIKE '{}%'".format(juris) but this doesn't work because juris is a list.
How do I add all elements of a list to the WHERE clause?
Use Regex with operator ~:
juris = ['2','7','8','3']
'select * from tbl where id ~ \'^({})\''.format('|'.join(juris))
which leads to this query:
select * from tbl where id ~ '^(2|7|8|3)'
This brings the rows which their id start with any of 2,7,8 or 3. Here is a fiddle for it.
If you want the id start with 2783 use:
select * from tbl where id ~ '^2783'
and if id contains any of 2,7,8 or 3
select * from t where id ~ '.*(2|7|8|3).*'
Stop using string formatting with SQL. Right now. Understand?
OK now. There's a construct, ANY in SQL, that lets you take an operator and apply it to an array. psycopg2 supports passing a Python list as an SQL ARRAY[]. So in this case you can just
curs.execute('SELECT account_id FROM asmithe.data_hash WHERE percent LIKE ANY (%s)', (thelist,))
Note here that %s is the psycopg2 query-parameter placeholder. It's not actually a format specifier. The second argument is a tuple, the query parameters. The first (and only) parameter is the list.
There's also ALL, which works like ANY but is true only if all the matches are true, not just if one or more is true.
I am hoping juris is a list of strings? If so, this might help:
myquery = ("SELECT accountid FROM asmithe.data_hash "
"WHERE percent in (%s)" % ",".join(map(str,juris)))
See these links:
python list in sql query as parameter
How to select item matching Only IN List in sql server
String formatting operations
I would like to use a dictionary to insert values into a table, how would I do this?
import sqlite3
db = sqlite3.connect('local.db')
cur = db.cursor()
cur.execute('DROP TABLE IF EXISTS Media')
cur.execute('''CREATE TABLE IF NOT EXISTS Media(
id INTEGER PRIMARY KEY, title TEXT,
type TEXT, genre TEXT,
onchapter INTEGER, chapters INTEGER,
status TEXT
)''')
values = {'title':'jack', 'type':None, 'genre':'Action', 'onchapter':None,'chapters':6,'status':'Ongoing'}
#What would I Replace x with to allow a
#dictionary to connect to the values?
cur.execute('INSERT INTO Media VALUES (NULL, x)'), values)
cur.execute('SELECT * FROM Media')
meida = cur.fetchone()
print meida
If you're trying to use a dict to specify both the column names and the values, you can't do that, at least not directly.
That's really inherent in SQL. If you don't specify the list of column names, you have to specify them in CREATE TABLE order—which you can't do with a dict, because a dict has no order. If you really wanted to, of course, you could use a collections.OrderedDict, make sure it's in the right order, and then just pass values.values(). But at that point, why not just have a list (or tuple) in the first place? If you're absolutely sure you've got all the values, in the right order, and you want to refer to them by order rather than by name, what you have is a list, not a dict.
And there's no way to bind column names (or table names, etc.) in SQL, just values.
You can, of course, generate the SQL statement dynamically. For example:
columns = ', '.join(values.keys())
placeholders = ', '.join('?' * len(values))
sql = 'INSERT INTO Media ({}) VALUES ({})'.format(columns, placeholders)
values = [int(x) if isinstance(x, bool) else x for x in values.values()]
cur.execute(sql, values)
However, this is almost always a bad idea. This really isn't much better than generating and execing dynamic Python code. And you've just lost all of the benefits of using placeholders in the first place—primarily protection from SQL injection attacks, but also less important things like faster compilation, better caching, etc. within the DB engine.
It's probably better to step back and look at this problem from a higher level. For example, maybe you didn't really want a static list of properties, but rather a name-value MediaProperties table? Or, alternatively, maybe you want some kind of document-based storage (whether that's a high-powered nosql system, or just a bunch of JSON or YAML objects stored in a shelve)?
An alternative using named placeholders:
columns = ', '.join(my_dict.keys())
placeholders = ':'+', :'.join(my_dict.keys())
query = 'INSERT INTO my_table (%s) VALUES (%s)' % (columns, placeholders)
print query
cur.execute(query, my_dict)
con.commit()
There is a solution for using dictionaries. First, the SQL statement
INSERT INTO Media VALUES (NULL, 'x');
would not work, as it assumes you are referring to all columns, in the order they are defined in the CREATE TABLE statement, as abarnert stated. (See SQLite INSERT.)
Once you have fixed it by specifying the columns, you can use named placeholders to insert data. The advantage of this is that is safely escapes key-characters, so you do not have to worry. From the Python sqlite-documentation:
values = {
'title':'jack', 'type':None, 'genre':'Action',
'onchapter':None,'chapters':6,'status':'Ongoing'
}
cur.execute(
'INSERT INTO Media (id, title, type, onchapter, chapters, status)
VALUES (:id, :title, :type, :onchapter, :chapters, :status);',
values
)
You could use named parameters:
cur.execute('INSERT INTO Media VALUES (NULL, :title, :type, :genre, :onchapter, :chapters, :status)', values)
This still depends on the column order in the INSERT statement (those : are only used as keys in the values dict) but it at least gets away from having to order the values on the python side, plus you can have other things in values that are ignored here; if you're pulling what's in the dict apart to store it in multiple tables, that can be useful.
If you still want to avoid duplicating the names, you could extract them from an sqlite3.Row result object, or from cur.description, after doing a dummy query; it may be saner to keep them around in python form near wherever you do your CREATE TABLE.
Here's a more generic way with the benefit of escaping:
# One way. If keys can be corrupted don't use.
sql = 'INSERT INTO demo ({}) VALUES ({})'.format(
','.join(my_dict.keys()),
','.join(['?']*len(my_dict)))
# Another, better way. Hardcoded w/ your keys.
sql = 'INSERT INTO demo ({}) VALUES ({})'.format(
','.join(my_keys),
','.join(['?']*len(my_dict)))
cur.execute(sql, tuple(my_dict.values()))
key_lst = ('status', 'title', 'chapters', 'onchapter', 'genre', 'type')
cur.execute('INSERT INTO Media (status,title,chapters,onchapter,genre,type) VALUES ' +
'(?,?,?,?,?,?);)',tuple(values[k] for k in key_lst))
Do your escaping right.
You probably also need a commit call in there someplace.
Super late to this, but figured I would add my own answer. Not an expert, but something I found that works.
There are issues with preserving order when using a dictionary, which other users have stated, but you could do the following:
# We're going to use a list of dictionaries, since that's what I'm having to use in my problem
input_list = [{'a' : 1 , 'b' : 2 , 'c' : 3} , {'a' : 14 , 'b' : '' , 'c' : 43}]
for i in input_list:
# I recommend putting this inside a function, this way if this
# Evaluates to None at the end of the loop, you can exit without doing an insert
if i :
input_dict = i
else:
input_dict = None
continue
# I am noting here that in my case, I know all columns will exist.
# If you're not sure, you'll have to get all possible columns first.
keylist = list(input_dict.keys())
vallist = list(input_dict.values())
query = 'INSERT INTO example (' +','.join( ['[' + i + ']' for i in keylist]) + ') VALUES (' + ','.join(['?' for i in vallist]) + ')'
items_to_insert = list(tuple(x.get(i , '') for i in keylist) for x in input_list)
# Making sure to preserve insert order.
conn = sqlite3.connect(':memory:')
cur = conn.cursor()
cur.executemany(query , items_to_insert)
conn.commit()
dictionary = {'id':123, 'name': 'Abc', 'address':'xyz'}
query = "insert into table_name " + str(tuple(dictionary.keys())) + " values" + str(tuple(dictionary.values())) + ";"
cursor.execute(query)
query becomes
insert into table_name ('id', 'name', 'address') values(123, 'Abc', 'xyz');
I was having the similar problem so I created a string first and then passed that string to execute command. It does take longer time to execute but mapping was perfect for me. Just a work around:
create_string = "INSERT INTO datapath_rtg( Sr_no"
for key in record_tab:
create_string = create_string+ " ," + str(key)
create_string = create_string+ ") VALUES("+ str(Sr_no)
for key in record_tab:
create_string = create_string+ " ," + str(record_tab[key])
create_string = create_string + ")"
cursor.execute(create_string)
By doing above thing I ensured that if my dict (record_tab) doesn't contain a particular field then the script wont throw out error and proper mapping can be done which is why I used dictionary at the first place.
I was having a similar problem and ended up with something not entirely unlike the following (Note - this is the OP's code with bits changed so that it works in the way they requested)-
import sqlite3
db = sqlite3.connect('local.db')
cur = db.cursor()
cur.execute('DROP TABLE IF EXISTS Media')
cur.execute('''CREATE TABLE IF NOT EXISTS Media(
id INTEGER PRIMARY KEY, title TEXT,
type TEXT, genre TEXT,
onchapter INTEGER, chapters INTEGER,
status TEXT
)''')
values = {'title':'jack', 'type':None, 'genre':'Action', 'onchapter':None,'chapters':6,'status':'Ongoing'}
#What would I Replace x with to allow a
#dictionary to connect to the values?
#cur.execute('INSERT INTO Media VALUES (NULL, x)'), values)
# Added code.
cur.execute('SELECT * FROM Media')
colnames = cur.description
list = [row[0] for row in cur.description]
new_list = [values[i] for i in list if i in values.keys()]
sql = "INSERT INTO Media VALUES ( NULL, "
qmarks = ', '.join('?' * len(values))
sql += qmarks + ")"
cur.execute(sql, new_list)
#db.commit() #<-Might be important.
cur.execute('SELECT * FROM Media')
media = cur.fetchone()
print (media)
I have a large SQLite database with a mix of text and lots of other columns var1 ... var 50. Most of these are numeric, though some are text based.
I am trying to extract data from the database, process it in python and write it back - I need to do this for all rows in the db.
So far, the below sort of works:
# get row using select and process
fields = (','.join(keys)) # "var1, var2, var3 ... var50"
results = ','.join([results[key] for key in keys]) # "value for var1, ... value for var50"
cur.execute('INSERT OR REPLACE INTO results (id, %s) VALUES (%s, %s);' %(fields, id, results))
This however, nulls the columns that I don't explicitly add back. I can fix this by re-writing the code, but this feels quite messy, as I would have to surround with quotes using string concatenation and rewrite data that was there to begin with (i.e. the columns I didn't change).
Apparently the way to run updates on rows is something like this:
update table set var1 = 4, var2 = 5, var3="some text" where id = 666;
Presumably the way for me would be to run map , and add the = signs somehow (not sure how), but how would I quote all of the results appropriately (Since I would have to quote the text fields, and they might contain quotes within them too .. )?
I'm a bit confused. Any pointers would be very helpful.
Thanks!
As others have stressed, use parametrized arguments. Here is an example of how you might construct the SQL statement when it has a variable number of keys:
sql=('UPDATE results SET '
+ ', '.join(key+' = ?' for key in keys)
+ 'WHERE id = ?')
args = [results[key] for key in keys] + [id]
cur.execute(sql,args)
Use parameter substitution. It's more robust (and safer I think) than string formatting.
So if you did something like
query = 'UPDATE TABLE SET ' + ', '.join(str(f) + '=?,' for f in fields) + ';'
Or alternatively
query = 'UPDATE TABLE SET %s;' % (', '.join(str(f) + '=?,' for f in fields))
Or using new style formatting:
query = 'UPDATE TABLE SET {0};'.format(', '.join(str(f) + '=?,' for f in fields))
So the complete program would look something like this:
vals = {'var1': 'foo', 'var2': 3, 'var24':999}
fields = vals.keys()
results = vals.values()
query = 'UPDATE TABLE SET {0};'.format(', '.join(str(f) + '=?,' for f in fields))
conn.execute(query, results)
And that should work - and I presume do what you want it to.
You don't have to care about things like quotations etc, and in fact you shouldn't. If you do it like this, it's not only more convenient but also takes care of security issues known as sql injections:
sql = "update table set var1=%s, var2=%s, var3=%s where id=666"
cursor.execute(sql, (4, 5, "some text"))
the key point here ist that the sql and the values in the second statement aren't separated by a "%", but by a "," - this is not a string manipulation, but instead you pass two arguments to the execute function, the actual sql and the values. Each %s is replaced by a value from the value tuple. the database driver then knows how to take care of the individual types of the values.
the insert statement can be rewritten the same way, although I'm not sure and currently can't test whether you can also replace field names that way (the first %s in your insert-sql statement)
so to come back to your overall problem, you can loop over your values and dynamically add ", var%d=%%s" % i for your i-th variable while adding the actual value to a list at the same time