How to dynamically check postgresql constraints in psycopg2?

How to dynamically check postgresql constraints in psycopg2? - python

I'm generating random data to fill a database (without knowing how is the database before runtime). I can fill it if it has no constraints, but when it has i can't differenciate between values passing the check and values that don't.
Let's see an example. Table definition:
CREATE TABLE test (
id INT,
age INT CONSTRAINT adult CHECK (age > 18),
PRIMARY KEY (id)
);
The data of that table that i have during runtime is:
Table and columns names
Columns types
Column UNIQUE, and NOT NULL
Column constraint definition as a string
Foreign keys
I can get more data from postgresql internal tables preferably from the information squema
I want to check the constraint before making an insert with that data. It's valid for me to do so using the database to check it, or to check it in code.
Here is a short snippet, try to detect when the check is False before the execution of the insert query:
# Data you have access to:
t_name = 'test'
t_col_names = ['id', 'age']
col_constraints = {
'id': '',
'age': 'age > 18'}
# you can access more data,
# but you have to query the database to do so
id_value = 1
#I want to check values HERE
age_value = 17
#I want to check values HERE
values = (id_value, age_value)
#I could want to check HERE
query = "INSERT INTO test (id, age) VALUES (%s, %s);"
db_cursor.execute(query, values)
db_cursor.close()
Because of how data is generated in my application, managing the error thrown is not an option if it's done while/after executing the insert query, it would increment the cost of generating random data dramatically.
EDIT to explain why try: is not an option:
If I wait for the exception, the problematic element that provoke a thrown error would already be in multiple queries.
Let's see in the previous example how this could happen. I generate a random data pool to pick from and generate tuples of insert values:
age_pool = (7, 19, 23, 48)
id_pool = (0,2,3,...,99) #It's not that random for better understanding
Now if I generate 100 insert queries and supposing 25% of them has a 7 in them (an age < 18). From a single value i have 25 invalid queries that will try to execute in the database (a costly operation by the way) to fail hopelessly. After that i would have to generate more random data in this case 25 more insert queries that could have the same problem if i generate a 8 for example.
On the other hand if i check just after generating the element, i check if it's a valid value and for one single element i have multiple valid combinations of values.

You could use eval():
def constraint_check(constraints, keys, values):
vals = dict(zip(keys, values))
for k, v in constraints.items():
if v and not eval(v.replace(k, str(vals[k]))):
return False
return True
t_name = 'test'
t_col_names = ['id', 'age']
col_constraints = {
'id': '',
'age': 'age > 18'}
id_value = 1
age_value = 17
values = (id_value, age_value)
if constraint_check(col_constraints, ('id', 'age'), values):
query = "INSERT INTO test (id, age) VALUES (%s, %s);"
db_cursor.execute(query, values)
However, this will work well only for very simple constraints. A Postgres check expression may include constructs specific for Postgres and not known in Python. For example, the app fails with this obviously valid constraint:
create table test(
id int primary key,
age int check(age between 18 and 60));
I do not think you can implement the complete Postgres expression parser in Python in an easy way and whether this would be profitable to achieve the intended effect.

It's not clear why a try...except clause is not desired. You test for the precise exception and keep going.
How about:
problem_inserts = []
try:
db_cursor.execute(query, values)
db_cursor.close()
except <your exception here>:
problem_inserts.append(query)
In this snippet, you keep a list of all queries that didn't go through properly. I don't know what else you can do. I don't think you want to change the data to make it fit into the table.

Related

How to use variable column name in filter in Django ORM?

I have two tables BloodBank(id, name, phone, address) and BloodStock(id, a_pos, b_pos, a_neg, b_neg, bloodbank_id). I want to fetch all the columns from two tables where the variable column name (say bloodgroup) which have values like a_pos or a_neg... like that and their value should be greater than 0. How can I write ORM for the same?
SQL query is written like this to get the required results.
sql="select * from public.bloodbank_bloodbank as bb, public.bloodbank_bloodstock as bs where bs."+blood+">0 and bb.id=bs.bloodbank_id order by bs."+blood+" desc;"
cursor = connection.cursor()
cursor.execute(sql)
bloodbanks = cursor.fetchall()

You could be more specific in your questions, but I believe you have a variable called blood which contains the string name of the column and that the columns a_pos, b_pos, etc. are numeric.
You can use a dictionary to create keyword arguments from strings:
filter_dict = {bloodstock__blood + '__gt': 0}
bloodbanks = Bloodbank.objects.filter(**filter_dict)
This will get you Bloodbank objects that have a related bloodstock with a greater than zero value in the bloodgroup represented by the blood variable.
Note that the way I have written this, you don't get the bloodstock columns selected, and you may get duplicate bloodbanks. If you want to get eliminate duplicate bloodbanks you can add .distinct() to your query. The bloodstocks are available for each bloodbank instance using .bloodstock_set.all().
The ORM will generate SQL using a join. Alternatively, you can do an EXISTS in the where clause and no join.
from django.db.models import Exists, OuterRef
filter_dict = {blood + '__gt': 0}
exists = Exists(Bloodstock.objects.filter(
bloodbank_id=OuterRef('id'),
**filter_dict
)
bloodbanks = Bloodbank.objects.filter(exists)
There will be no need for a .distinct() in this case.

sqlite3.OperationalError: no such column - but I'm not asking for a column?

So, I'm trying to use sqlite3 and there seems to be a problem when I run a SELECT query, I'm not too familiar with it so I was wondering where the problem is:
def show_items():
var = cursor.execute("SELECT Cost FROM Items WHERE ID = A01")
for row in cursor.fetchall():
print(row)
When I run this (hopefully asking for a cost value where the ID = A01), I get the error:
sqlite3.OperationalError: no such column: A01
Though I wasn't asking for it to look in column A01, I was asking for it to look in column 'Cost'?

If you're looking for a string value in a column, you have to wrap it in ', otherwise it will be interpreted as a column name:
var = cursor.execute("SELECT Cost FROM Items WHERE ID = 'A01'")
Update 2021-02-10:
Since this Q&A gets so much attention, I think it's worth editing to let you readers know about prepared statements.
Not only will they help you avoid SQL injections, they might in some cases even speed up your queries and you will no longer have to worry about those single quotes around stings, as the DB library will take care of it for you.
Let's assume we have the query above, and our value A01 is stored in a variable value.
You could write:
var = cursor.execute("SELECT Cost FROM Items WHERE ID = '{}'".format( value ))
And as a prepares statement it will look like this:
var = cursor.execute("SELECT Cost FROM Items WHERE ID = ?", (value,))
Notice that the cursor.execute() method accepts a second parameter, that must be a sequence (could be a tuple or a list). Since we have only a single value, you might miss the , in (value,) that will effectively turn the single value into a tuple.
If you want to use a list instead of a tuple the query would look like this:
var = cursor.execute("SELECT Cost FROM Items WHERE ID = ?", [value])
When working with multiple values, just make sure the numer of ? and the number of values in your sequence match up:
cursor.execute("SELECT * FROM students WHERE ID=? AND name=? AND age=?", (123, "Steve", 17))
You could also use named-style parameters, where instead of a tuple or list, you use a dictionary as parameter:
d = { "name": "Steve", "age": 17, "id": 123 }
cursor.execute("SELECT * FROM students WHERE ID = :id AND name = :name AND age = :age", d)

if you want to delete the data in sqllite
dara_list = ['1', 'apple']
cnt.execute("DELETE FROM to_do_data WHERE task='%s'"% str(data_list[1]))

Write dictionnary with tuple containing parameters as unique value for features into postgresql table

In Python 2.7, let a dictionary with features' IDs as keys.
There are thousands of features.
Each feature has a single value, but this value is a tuple containing 6 parameters for the features (for example; size, color, etc.)
On the other hand I have a postgreSQL table in a database where these features parameters must be saved.
The features' IDs are already set in the table (as well as other informations about these features).
The IDs are unique (they are random (thus not serial) but unique numbers).
There is 6 empty columns with names: "param1", "param2", "param3", ..., "param6".
I already have a tuple containing these names:
columns = ("param1", "param2", "param3", ..., "param6")
The code I have doesn't work for saving these parameters in their respective columns for each feature:
# "view" is the dictionary with features's ID as keys()
# and their 6 params stored in values().
values = [view[i] for i in view.keys()]
columns = ("param1","param2","param3","param4","param5","param6")
conn = psycopg2.connect("dbname=mydb user=username password=password")
curs = conn.cursor()
curs.execute("DROP TABLE IF EXISTS mytable;")
curs.execute("CREATE TABLE IF NOT EXISTS mytable (LIKE originaltable including defaults including constraints including indexes);")
curs.execute("INSERT INTO mytable SELECT * from originaltable;")
insertstatmnt = 'INSERT INTO mytable (%s) values %s'
alterstatement = ('ALTER TABLE mytable '+
'ADD COLUMN param1 text,'+
'ADD COLUMN param2 text,'+
'ADD COLUMN param3 real,'+
'ADD COLUMN param4 text,'+
'ADD COLUMN param5 text,'+
'ADD COLUMN param6 text;'
)
curs.execute(alterstatement) # It's working up to this point.
curs.execute(insertstatmnt, (psycopg2.extensions.AsIs(','.join(columns)), tuple(values))) # The problem seems to be here.
conn.commit() # Making change to DB !
curs.close()
conn.close()
Here's the error I have:
curs.execute(insert_statement, (psycopg2.extensions.AsIs(','.join(columns)), tuple(values)))
ProgrammingError: INSERT has more expressions than target columns
I must miss something.
How to do that properly?

When using '%s' to get the statement as what I think you want, you just need to change a couple things.
Ignoring c.execute(), this statement is by no means wrong, but it does not return what you are looking for. Using my own version, this is what I got with that statement. I also ignored psycopg2.extensions.AsIs() because, it is just a Adapter conform to the ISQLQuote protocol useful for objects whose string representation is already valid as SQL representation.
>>> values = [ i for i in range(0,5)] #being I dont know the keys, I just made up values.
>>> insertstatmnt, (','.join(columns), tuple(vlaues))
>>> ('INSERT INTO mytable (%s) values %s', ('param1,param2,param3,param4,param5,param6', (0, 1, 2, 3, 4)))
As you can see, what you entered returns a tuple with the values.
>>> insertstatmnt % (','.join(columns), tuple(values))
>>> 'INSERT INTO mytable (param1,param2,param3,param4,param5,param6) values (0, 1, 2, 3, 4)'
Where as, this returns a string that is more likely to be read by the SQL. The values obviously do not match the specified ones. I believe the problem you have lies within creating your string.
Reference for pycopg2: http://initd.org/psycopg/docs/extensions.html

As I took the syntax of the psycopg2 command from this thread:
Insert Python Dictionary using Psycopg2
and as my values dictionary doesn't exactly follow the same structure as the mentioned example (I also have 1 key as ID, like in this example, but mine has only 1 corresponding value, as a tuple containing my 6-parameters, thus "nested 1 lever deeper" instead of directly 6 values corresponding to the keys) I need to loop through all features to execute one SQL statement per feature:
[curs.execute(insertstatmnt, (psycopg2.extensions.AsIs(', '.join(columns)), i)) for i in tuple(values)].
This, is working.

Pandas to_sql fails on duplicate primary key

I'd like to append to an existing table, using pandas df.to_sql() function.
I set if_exists='append', but my table has primary keys.
I'd like to do the equivalent of insert ignore when trying to append to the existing table, so I would avoid a duplicate entry error.
Is this possible with pandas, or do I need to write an explicit query?

There is unfortunately no option to specify "INSERT IGNORE". This is how I got around that limitation to insert rows into that database that were not duplicates (dataframe name is df)
for i in range(len(df)):
try:
df.iloc[i:i+1].to_sql(name="Table_Name",if_exists='append',con = Engine)
except IntegrityError:
pass #or any other action

You can do this with the method parameter of to_sql:
from sqlalchemy.dialects.mysql import insert
def insert_on_duplicate(table, conn, keys, data_iter):
insert_stmt = insert(table.table).values(list(data_iter))
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(insert_stmt.inserted)
conn.execute(on_duplicate_key_stmt)
df.to_sql('trades', dbConnection, if_exists='append', chunksize=4096, method=insert_on_duplicate)
for older versions of sqlalchemy, you need to pass a dict to on_duplicate_key_update. i.e., on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(dict(insert_stmt.inserted))

please note that the "if_exists='append'" related to the existing of the table and what to do in case the table not exists.
The if_exists don't related to the content of the table.
see the doc here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html
if_exists : {‘fail’, ‘replace’, ‘append’}, default ‘fail’
fail: If table exists, do nothing.
replace: If table exists, drop it, recreate it, and insert data.
append: If table exists, insert data. Create if does not exist.

Pandas has no option for it currently, but here is the Github issue. If you need this feature too, just upvote for it.

The for loop method above slow things down significantly. There's a method parameter you can pass to panda.to_sql to help achieve customization for your sql query
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql
The below code should work for postgres and do nothing if there's a conflict with primary key "unique_code". Change your insert dialects for your db.
def insert_do_nothing_on_conflicts(sqltable, conn, keys, data_iter):
"""
Execute SQL statement inserting data
Parameters
----------
sqltable : pandas.io.sql.SQLTable
conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
keys : list of str
Column names
data_iter : Iterable that iterates the values to be inserted
"""
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy import table, column
columns=[]
for c in keys:
columns.append(column(c))
if sqltable.schema:
table_name = '{}.{}'.format(sqltable.schema, sqltable.name)
else:
table_name = sqltable.name
mytable = table(table_name, *columns)
insert_stmt = insert(mytable).values(list(data_iter))
do_nothing_stmt = insert_stmt.on_conflict_do_nothing(index_elements=['unique_code'])
conn.execute(do_nothing_stmt)
pd.to_sql('mytable', con=sql_engine, method=insert_do_nothing_on_conflicts)

Pandas doesn't support editing the actual SQL syntax of the .to_sql method, so you might be out of luck. There's some experimental programmatic workarounds (say, read the Dataframe to a SQLAlchemy object with CALCHIPAN and use SQLAlchemy for the transaction), but you may be better served by writing your DataFrame to a CSV and loading it with an explicit MySQL function.
CALCHIPAN repo: https://bitbucket.org/zzzeek/calchipan/

I had trouble where I was still getting the IntegrityError
...strange but I just took the above and worked it backwards:
for i, row in df.iterrows():
sql = "SELECT * FROM `Table_Name` WHERE `key` = '{}'".format(row.Key)
found = pd.read_sql(sql, con=Engine)
if len(found) == 0:
df.iloc[i:i+1].to_sql(name="Table_Name",if_exists='append',con = Engine)

In my case, I was trying to insert new data in an empty table, but some of the rows are duplicated, almost the same issue here, I "may" think about fetching existing data and merge with the new data I got and continue in process, but this is not optimal, and may work only for small data, not a huge tables.
As pandas do not provide any kind of handling for this situation right now, I was looking for a suitable workaround for this, so I made my own, not sure if that will work or not for you, but I decided to control my data first instead of luck of waiting if that worked or not, so what I did is removing duplicates before I call .to_sql so if any error happens, I know more about my data and make sure I know what is going on:
import pandas as pd
def write_to_table(table_name, data):
df = pd.DataFrame(data)
# Sort by price, so we remove the duplicates after keeping the lowest only
data.sort(key=lambda row: row['price'])
df.drop_duplicates(subset=['id_key'], keep='first', inplace=True)
#
df.to_sql(table_name, engine, index=False, if_exists='append', schema='public')
So in my case, I wanted to keep the lowest price of rows (btw I was passing an array of dict for data), and for that, I did sorting first, not necessary but this is an example of what I mean with control the data that I want to keep.
I hope this will help someone who got almost the same as my situation.

When you use SQL Server you'll get a SQL error when you enter a duplicate value into a table that has a primary key constraint. You can fix it by altering your table:
CREATE TABLE [dbo].[DeleteMe](
[id] [uniqueidentifier] NOT NULL,
[Value] [varchar](max) NULL,
CONSTRAINT [PK_DeleteMe]
PRIMARY KEY ([id] ASC)
WITH (IGNORE_DUP_KEY = ON)); <-- add
Taken from https://dba.stackexchange.com/a/111771.
Now your df.to_sql() should work again.

The solutions by Jayen and Huy Tran helped me a lot, but they didn't work straight out of the box. The problem I faced with Jayen code is that it requires that the DataFrame columns be exactly as those of the database. This was not true in my case as there were some DataFrame columns that I won't write to the database.
I modified the solution so that it considers the column names.
from sqlalchemy.dialects.mysql import insert
import itertools
def insertWithConflicts(sqltable, conn, keys, data_iter):
"""
Execute SQL statement inserting data, whilst taking care of conflicts
Used to handle duplicate key errors during database population
This is my modification of the code snippet
from https://stackoverflow.com/questions/30337394/pandas-to-sql-fails-on-duplicate-primary-key
The help page from https://docs.sqlalchemy.org/en/14/core/dml.html#sqlalchemy.sql.expression.Insert.values
proved useful.
Parameters
----------
sqltable : pandas.io.sql.SQLTable
conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
keys : list of str
Column names
data_iter : Iterable that iterates the values to be inserted. It is a zip object.
The length of it is equal to the chunck size passed in df_to_sql()
"""
vals = [dict(zip(z[0],z[1])) for z in zip(itertools.cycle([keys]),data_iter)]
insertStmt = insert(sqltable.table).values(vals)
doNothingStmt = insertStmt.on_duplicate_key_update(dict(insertStmt.inserted))
conn.execute(doNothingStmt)

I faced the same issue and I adopted the solution provided by #Huy Tran for a while, until my tables started to have schemas.
I had to improve his answer a bit and this is the final result:
def do_nothing_on_conflicts(sql_table, conn, keys, data_iter):
"""
Execute SQL statement inserting data
Parameters
----------
sql_table : pandas.io.sql.SQLTable
conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
keys : list of str
Column names
data_iter : Iterable that iterates the values to be inserted
"""
columns = []
for c in keys:
columns.append(column(c))
if sql_table.schema:
my_table = table(sql_table.name, *columns, schema=sql_table.schema)
# table_name = '{}.{}'.format(sql_table.schema, sql_table.name)
else:
my_table = table(sql_table.name, *columns)
# table_name = sql_table.name
# my_table = table(table_name, *columns)
insert_stmt = insert(my_table).values(list(data_iter))
do_nothing_stmt = insert_stmt.on_conflict_do_nothing()
conn.execute(do_nothing_stmt)
How to use it:
history.to_sql('history', schema=schema, con=engine, method=do_nothing_on_conflicts)

The idea is the same as #Nfern's but uses recursive function to divide the df into half in each iteration to skip the row/rows causing the integrity violation.
def insert(df):
try:
# inserting into backup table
df.to_sql("table",con=engine, if_exists='append',index=False,schema='schema')
except:
rows = df.shape[0]
if rows>1:
df1 = df.iloc[:int(rows/2),:]
df2 = df.iloc[int(rows/2):,:]
insert(df1)
insert(df2)
else:
print(f"{df} not inserted. Integrity violation, duplicate primary key/s")

SQL multiple inserts with Python

UPDATE
After passing execute() a list of rows as per Nathan's suggestion, below, the code executes further but still gets stuck on the execute function. The error message reads:
query = query % db.literal(args)
TypeError: not all arguments converted during string formatting
So it still isn't working. Does anybody know why there is a type error now?
END UPDATE
I have a large mailing list in .xls format. I am using python with xlrd to retrieve the name and email from the xls file into two lists. Now I want to put each name and email into a mysql database. I'm using MySQLdb for this part. Obviously I don't want to do an insert statement for every list item.
Here's what I have so far.
from xlrd import open_workbook, cellname
import MySQLdb
dbname = 'h4h'
host = 'localhost'
pwd = 'P#ssw0rd'
user = 'root'
book = open_workbook('h4hlist.xls')
sheet = book.sheet_by_index(0)
mailing_list = {}
name_list = []
email_list = []
for row in range(sheet.nrows):
"""name is in the 0th col. email is the 4th col."""
name = sheet.cell(row, 0).value
email = sheet.cell(row, 4).value
if name and email:
mailing_list[name] = email
for n, e in sorted(mailing_list.iteritems()):
name_list.append(n)
email_list.append(e)
db = MySQLdb.connect(host=host, user=user, db=dbname, passwd=pwd)
cursor = db.cursor()
cursor.execute("""INSERT INTO mailing_list (name,email) VALUES (%s,%s)""",
(name_list, email_list))
The problem when the cursor executes. This is the error: _mysql_exceptions.OperationalError: (1241, 'Operand should contain 1 column(s)') I tried putting my query into a var initially, but then it just barfed up a message about passing a tuple to execute().
What am I doing wrong? Is this even possible?
The list is huge and I definitely can't afford to put the insert into a loop. I looked at using LOAD DATA INFILE, but I really don't understand how to format the file or the query and my eyes bleed when I have to read MySQL docs. I know I could probably use some online xls to mysql converter, but this is a learning exercise for me as well. Is there a better way?

You need to give executemany() a list of rows. You don't need break the name and email out into separate lists, just create one list with both of the values in it.
rows = []
for row in range(sheet.nrows):
"""name is in the 0th col. email is the 4th col."""
name = sheet.cell(row, 0).value
email = sheet.cell(row, 4).value
rows.append((name, email))
db = MySQLdb.connect(host=host, user=user, db=dbname, passwd=pwd)
cursor = db.cursor()
cursor.executemany("""INSERT INTO mailing_list (name,email) VALUES (%s,%s)""", rows)
Update: as #JonClements mentions, it should be executemany() not execute().

To fix TypeError: not all arguments converted during string formatting - you need to use the cursor.executemany(...) method, as this accepts an iterable of tuples (more than one row), while cursor.execute(...) expects the parameter to be a single row value.
After the command is executed, you need to ensure that the transaction is committed to make the changes active in the database by using db.commit().

If you are interested in high-performance of the code, this answer may be better.
Compare to excutemany method, the below execute will much faster:
INSERT INTO mailing_list (name,email) VALUES ('Jim','jim#yahoo.com'),('Lucy','Lucy#gmail.com')
You can easily modify the answer from #Nathan Villaescusa and get the new code.
cursor.execute("""INSERT INTO mailing_list (name,email) VALUES (%s)""".format(",".join(str(i) for i in rows))
here is my own test result:
excutemany:10000 runs takes 220 seconds
execute:10000 runs takes 12 seconds.
The speed difference will be about 15 times.

Taking up the idea of #PengjuZhao, it should work to simply add one single placeholder for all values to be passed. The difference to #PengjuZhao's answer is that the values are passed as a second parameter to the execute() function, which should be injection attack safe because this is only evalutated during runtime (in contrast to ".format()").
cursor.execute("""INSERT INTO mailing_list (name,email) VALUES (%s)""", ",".join(str(i) for i in rows))
Only if this does not work properly, try the approach below.
####
#PengjuZhao's answer shows that executemany() has either a strong Python overhead or it uses multiple execute() statements where this is not needed, elsewise executemany() would not be so much slower than a single execute() statement.
Here is a function that puts NathanVillaescusa's and #PengjuZhao's answers in a single execute() approach.
The solution builds a dynamic number of placeholders to be added to the sql statement. It is a manually built execute() statement with multiple placeholders of "%s", which likely outperforms the executemany() statement.
For example, at 2 columns, inserting 100 rows:
execute(): 200 times "%s" (= dependent from the number of the rows)
executemany(): just 2 times "%s" (= independent from the number of the rows).
There is a chance that this solution has the high speed of #PengjuZhao's answer without risking injection attacks.
Prepare parameters of the function:
You will store your values in 1-dimensional numpy arrays arr_name and arr_email which are then converted in a list of concatenated values, row by row. Alternatively, you use the approach of #NathanVillaescusa.
from itertools import chain
listAllValues = list(chain([
arr_name.reshape(-1,1), arr_email.reshape(-1,1)
]))
column_names = 'name, email'
table_name = 'mailing_list'
Get sql query with placeholders:
The numRows = int((len(listAllValues))/numColumns) simply avoids passing the number of rows. If you insert 6 values in listAllValues at 2 columns this would make 6/2 = 3 rows then, obviously.
def getSqlInsertMultipleRowsInSqlTable(table_name, column_names, listAllValues):
numColumns = len(column_names.split(","))
numRows = int((len(listAllValues))/numColumns)
placeholdersPerRow = "("+', '.join(['%s'] * numColumns)+")"
placeholders = ', '.join([placeholdersPerRow] * numRows)
sqlInsertMultipleRowsInSqlTable = "insert into `{table}` ({columns}) values {values};".format(table=table_name, columns=column_names, values=placeholders)
return sqlInsertMultipleRowsInSqlTable
strSqlQuery = getSqlInsertMultipleRowsInSqlTable(table_name, column_names, listAllValues)
Execute strSqlQuery
Final step:
db = MySQLdb.connect(host=host, user=user, db=dbname, passwd=pwd)
cursor = db.cursor()
cursor.execute(strSqlQuery, listAllValues)
This solution is hopefully without the risk of injection attacks as in #PengjuZhao's answer since it fills the sql statement only with placeholders instead of values. The values are only passed separately in listAllValues at this point here, where strSqlQuery has only placeholders instead of values:
cursor.execute(strSqlQuery, listAllValues)
The execute() statement gets the sql statement with placeholders %s and the list of values in two separate parameters, as it is done in #NathanVillaescusa's answer. I am still not sure whether this avoids injection attacks. It is my understanding that injection attacks can only occur if the values are put directly in the sql statement, please comment if I am wrong.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to dynamically check postgresql constraints in psycopg2? - python

Related

How to use variable column name in filter in Django ORM?

sqlite3.OperationalError: no such column - but I'm not asking for a column?

Write dictionnary with tuple containing parameters as unique value for features into postgresql table

Pandas to_sql fails on duplicate primary key

SQL multiple inserts with Python

Categories

Resources