psycopg2 formatting for bulk insertion

psycopg2 formatting for bulk insertion - python

I'm inserting rows this way, data being a dictionary of several fieldname: fieldvalue items:
def add_row(self, data, table): #data is a dictionary
columns = data.keys()
values = []
for column in columns:
if isinstance(data[column], list): #checking for json values
values.append(Json(data[column]))
elif isinstance(data[column], dict):
values.append(Json(data[column]))
else:
values.append(data[column])
insert_statement = 'insert into %s ' % table + '(%s) values %s'
self.cur.execute(insert_statement, (AsIs(','.join(columns)), tuple(values)))
self.conn.commit()
print "added %s" % table
But now I'd like to insert rows in bulk to improve performance and reduce I/O usage. The problem is that I couldn't find the right way to do it. The following function throws (data being a list of the items described above):
psycopg2.ProgrammingError: syntax error at or near "["
LINE 1: ...,category_id,initial_quantity,base_price) VALUES ([u'Entrega...
def add_row_bulk(self, data, table): #data is a dictionary
columns = data[0].keys()
value_rows = []
for e in data:
columns = e.keys()
values = []
for column in columns:
if isinstance(e[column], list): #checking for json values
values.append(Json(e[column]))
elif isinstance(e[column], dict):
values.append(Json(e[column]))
else:
values.append(e[column])
value_rows.append(AsIs(values))
cols = (AsIs(','.join(columns)))
query = self.cur.mogrify("INSERT INTO item (%s) VALUES %s", (cols, tuple(value_rows)))
self.cur.execute(query)
self.conn.commit()
print "added %s" % table

You have a couple problems with your SQL generating code.
First off, AsIs(values) will not mogrify into a value row, like you seem to be hoping. Testing it, it seems to be equivalent to AsIs(str(values)). That's the output you're seeing in your thrown error.
What worked in your working example was using mogrify on separate tuples of values. Add tuple(values) to value_rows, not AsIs(values).
Secondly, to specify the values for inserting a number of rows in one insert statement, you need SQL syntax similar to the following:
... VALUES (1, 'x'), (2, 'y'), (3, 'z')
Note that the list of value lists doesn't have ( ) around it. There's nothing (that I'm aware of) that's magically going to mogrify into a list like that. Certainly a tuple won't.
So you need to do something like:
self.cur.mogrify('INSERT INTO item (%s) VALUES %s,%s,%s,%s',
(cols, value_row1, value_row2, value_row3, value_row4))
which means you need to do a little more work to generate the two arguments to mogrify, because the number of rows isn't known in advance. To generate the first argument, you can do something like:
'INSERT INTO item (%s) VALUES ' + ','.join(['%s'] * len(value_rows))
And the second argument needs to be a sequence with the first value cols, and the rest the contents of value_rows. One way to get that:
[cols] + value_rows

Related

Write dictionnary with tuple containing parameters as unique value for features into postgresql table

In Python 2.7, let a dictionary with features' IDs as keys.
There are thousands of features.
Each feature has a single value, but this value is a tuple containing 6 parameters for the features (for example; size, color, etc.)
On the other hand I have a postgreSQL table in a database where these features parameters must be saved.
The features' IDs are already set in the table (as well as other informations about these features).
The IDs are unique (they are random (thus not serial) but unique numbers).
There is 6 empty columns with names: "param1", "param2", "param3", ..., "param6".
I already have a tuple containing these names:
columns = ("param1", "param2", "param3", ..., "param6")
The code I have doesn't work for saving these parameters in their respective columns for each feature:
# "view" is the dictionary with features's ID as keys()
# and their 6 params stored in values().
values = [view[i] for i in view.keys()]
columns = ("param1","param2","param3","param4","param5","param6")
conn = psycopg2.connect("dbname=mydb user=username password=password")
curs = conn.cursor()
curs.execute("DROP TABLE IF EXISTS mytable;")
curs.execute("CREATE TABLE IF NOT EXISTS mytable (LIKE originaltable including defaults including constraints including indexes);")
curs.execute("INSERT INTO mytable SELECT * from originaltable;")
insertstatmnt = 'INSERT INTO mytable (%s) values %s'
alterstatement = ('ALTER TABLE mytable '+
'ADD COLUMN param1 text,'+
'ADD COLUMN param2 text,'+
'ADD COLUMN param3 real,'+
'ADD COLUMN param4 text,'+
'ADD COLUMN param5 text,'+
'ADD COLUMN param6 text;'
)
curs.execute(alterstatement) # It's working up to this point.
curs.execute(insertstatmnt, (psycopg2.extensions.AsIs(','.join(columns)), tuple(values))) # The problem seems to be here.
conn.commit() # Making change to DB !
curs.close()
conn.close()
Here's the error I have:
curs.execute(insert_statement, (psycopg2.extensions.AsIs(','.join(columns)), tuple(values)))
ProgrammingError: INSERT has more expressions than target columns
I must miss something.
How to do that properly?

When using '%s' to get the statement as what I think you want, you just need to change a couple things.
Ignoring c.execute(), this statement is by no means wrong, but it does not return what you are looking for. Using my own version, this is what I got with that statement. I also ignored psycopg2.extensions.AsIs() because, it is just a Adapter conform to the ISQLQuote protocol useful for objects whose string representation is already valid as SQL representation.
>>> values = [ i for i in range(0,5)] #being I dont know the keys, I just made up values.
>>> insertstatmnt, (','.join(columns), tuple(vlaues))
>>> ('INSERT INTO mytable (%s) values %s', ('param1,param2,param3,param4,param5,param6', (0, 1, 2, 3, 4)))
As you can see, what you entered returns a tuple with the values.
>>> insertstatmnt % (','.join(columns), tuple(values))
>>> 'INSERT INTO mytable (param1,param2,param3,param4,param5,param6) values (0, 1, 2, 3, 4)'
Where as, this returns a string that is more likely to be read by the SQL. The values obviously do not match the specified ones. I believe the problem you have lies within creating your string.
Reference for pycopg2: http://initd.org/psycopg/docs/extensions.html

As I took the syntax of the psycopg2 command from this thread:
Insert Python Dictionary using Psycopg2
and as my values dictionary doesn't exactly follow the same structure as the mentioned example (I also have 1 key as ID, like in this example, but mine has only 1 corresponding value, as a tuple containing my 6-parameters, thus "nested 1 lever deeper" instead of directly 6 values corresponding to the keys) I need to loop through all features to execute one SQL statement per feature:
[curs.execute(insertstatmnt, (psycopg2.extensions.AsIs(', '.join(columns)), i)) for i in tuple(values)].
This, is working.

MYSQL: how to insert statement without specifying col names or question marks?

I have a list of tuples of which i'm inserting into a Table.
Each tuple has 50 values. How do i insert without having to specify the column names and how many ? there is?
col1 is an auto increment column so my insert stmt starts in col2 and ends in col51.
current code:
l = [(1,2,3,.....),(2,4,6,.....),(4,6,7,.....)...]
for tup in l:
cur.execute(
"""insert into TABLENAME(col2,col3,col4.........col50,col51)) VALUES(?,?,?,.............)
""")
want:
insert into TABLENAME(col*) VALUES(*)

MySQL's syntax for INSERT is documented here: http://dev.mysql.com/doc/refman/5.7/en/insert.html
There is no wildcard syntax like you show. The closest thing is to omit the column names:
INSERT INTO MyTable VALUES (...);
But I don't recommend doing that. It works only if you are certain you're going to specify a value for every column in the table (even the auto-increment column), and your values are guaranteed to be in the same order as the columns of the table.
You should learn to use code to build the SQL query based on arrays of values in your application. Here's a Python example the way I do it. Suppose you have a dict of column: value pairs called data_values.
placeholders = ['%s'] * len(data_values)
sql_template = """
INSERT INTO MyTable ({columns}) VALUES ({placeholders})
"""
sql = sql_template.format(
columns=','.join(keys(data_values)),
placeholders=','.join(placeholders)
)
cur = db.cursor()
cur.execute(sql, data_values)

example code to put before your code:
cols = "("
for x in xrange(2, 52):
cols = cols + "col" + str(x) + ","
test = test[:-1]+")"
Inside your loop
for tup in l:
cur.execute(
"""insert into TABLENAME " + cols " VALUES {0}".format(tup)
""")
This is off the top of my head with no error checking

How do insert a time window of data into table that partially overlaps with existing rows with unique constraint

How do insert a time window of data into table that partially overlaps with existing rows with unique constraint.
Here's a code snippet:
# Create the insert strings
column_str = """data_vendor_id, symbol_id, price_date, created_date,
last_updated_date, open_price, high_price, low_price,
close_price, volume, adj_close_price"""
insert_str = ("%s, " * 11)[:-2]
final_str = "INSERT INTO daily_price (%s) VALUES (%s)" % \
(column_str, insert_str)
When I call this now I get the IntegrityError which makes sense. Ideally it will let the fresh rows insert and fail gracefully on the redundant rows. My try/except block unfortunately doesn't permit legit rows and makes the entire query fail:
for i, t in enumerate(tickers):
print(
"Adding data for %s: %s out of %s" %
(t[1], i+1, lentickers)
)
yf_data = price_retrieval.get_daily_historic_data_yahoo(t[1], start_date.timetuple())
try:
price_retrieval.insert_daily_data_into_db('1', t[0], yf_data)
except IntegrityError:
continue
Is there a python or mysql solution to making this insertion more fault tolerant?

You're looking for either INSERT IGNORE or REPLACE, depending on how you want the duplicate data handled.
INSERT IGNORE
If you want to keep the old data and discard the new duplicate data, you want to use INSERT IGNORE. This will turn the unique key violations into warnings, and all the non-violating rows will be processed as normal. Without the IGNORE keyword, any unique violation will abort the entire INSERT batch.
final_str = "INSERT IGNORE INTO daily_price (%s) VALUES (%s)" % \
(column_str, insert_str)
Insert documentation
REPLACE
If you want to overwrite the old data with the new duplicate data, you want to use the REPLACE statement instead of INSERT. REPLACE is a MySQL specific extension to the SQL standard. It will insert non-existing rows, and if it encounters a duplicate row, it will first delete the old row and then insert the new row.
final_str = "REPLACE INTO daily_price (%s) VALUES (%s)" % \
(column_str, insert_str)
Replace documentation

My understanding it that price_retrieval.insert_daily_data_into_db is inserting many rows, but is interrupting on the first failed row? You would need to wrap the individual record insertion with a try rather than wrapping the group of them.

python sqlite adding rows to a sql data base where the row is given as a dictionary

I'm new to using sqlite and I'm trying to write a class in Python which will handle all logging for a program that I'm writing.
The class receives a dictionary whose keys are the names of the columns of the database and values are the entries for that row. Now the dictionary may not have entries for all columns so I want to just log those entries that exist and set some default value if no entry can be found for these columns. The function I have at the moment in my class is something like this:
def AddRow(self, Row, Header):
keys = ''
values = ''
for key in Header:
if keys and values:
keys += ','
values += ','
try:
keys += string(key)
values += string(Row[key])
except:
keys += string(key)
values += '0.0'
print keys
print values
self.c.execute("INSERT INTO {tn} ({k}) VALUES ({v})".format(tn=self.Table_Name, k=keys, v=values))
Firstly it doesn't compile with the error "sqlite3.OperationalError: near ")": syntax error"
secondly the way Im approaching this seems very clumsy is there a better/easier way then to populate a string first before executing?

Do not generate the SQL from values. Use SQL parameters instead:
def AddRow(self, Row, Header):
columns = ', '.join(Header)
params = ', '.join([':{}'.format(k) for k in Header])
sql = "INSERT INTO {} ({}) VALUES ({})".format(self.Table_Name, columns, params)
Row = dict(dict.fromkeys(Header, 0.0), **Row)
self.c.execute(sql, Row)
The sqlit3 adapter supports two styles of SQL parameters; positional (?) and named (:name); using named parameters lets you use a dictionary as the parameter source.
The above method generates the column names and named parameters for each of the columns, then makes sure that there are values for all columns by generating a new dictionary with all keys set to 0.0, then overriding any keys with those found in Row.
By using named parameters you get several benefits:
Protection against SQL injection attacks
Automatic quoting appropriate to the value
Re-use of already parsed SQL queries
A quick demo session to illustrate what the above produces:
>>> Row = {'foo': 42, 'bar': 81}
>>> Header = ['foo', 'bar', 'baz']
>>> columns = ', '.join(Header)
>>> params = ', '.join([':{}'.format(k) for k in Header])
>>> "INSERT INTO {} ({}) VALUES ({})".format('demo_tablename', columns, params)
'INSERT INTO demo_tablename (foo, bar, baz) VALUES (:foo, :bar, :baz)'
>>> dict(dict.fromkeys(Header, 0.0), **Row)
{'bar': 81, 'foo': 42, 'baz': 0.0}

psycopg2: insert multiple rows with one query

I need to insert multiple rows with one query (number of rows is not constant), so I need to execute query like this one:
INSERT INTO t (a, b) VALUES (1, 2), (3, 4), (5, 6);
The only way I know is
args = [(1,2), (3,4), (5,6)]
args_str = ','.join(cursor.mogrify("%s", (x, )) for x in args)
cursor.execute("INSERT INTO t (a, b) VALUES "+args_str)
but I want some simpler way.

I built a program that inserts multiple lines to a server that was located in another city.
I found out that using this method was about 10 times faster than executemany. In my case tup is a tuple containing about 2000 rows. It took about 10 seconds when using this method:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
and 2 minutes when using this method:
cur.executemany("INSERT INTO table VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s)", tup)

New execute_values method in Psycopg 2.7:
data = [(1,'x'), (2,'y')]
insert_query = 'insert into t (a, b) values %s'
psycopg2.extras.execute_values (
cursor, insert_query, data, template=None, page_size=100
)
The pythonic way of doing it in Psycopg 2.6:
data = [(1,'x'), (2,'y')]
records_list_template = ','.join(['%s'] * len(data))
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
cursor.execute(insert_query, data)
Explanation: If the data to be inserted is given as a list of tuples like in
data = [(1,'x'), (2,'y')]
then it is already in the exact required format as
the values syntax of the insert clause expects a list of records as in
insert into t (a, b) values (1, 'x'),(2, 'y')
Psycopg adapts a Python tuple to a Postgresql record.
The only necessary work is to provide a records list template to be filled by psycopg
# We use the data list to be sure of the template length
records_list_template = ','.join(['%s'] * len(data))
and place it in the insert query
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
Printing the insert_query outputs
insert into t (a, b) values %s,%s
Now to the usual Psycopg arguments substitution
cursor.execute(insert_query, data)
Or just testing what will be sent to the server
print (cursor.mogrify(insert_query, data).decode('utf8'))
Output:
insert into t (a, b) values (1, 'x'),(2, 'y')

Update with psycopg2 2.7:
The classic executemany() is about 60 times slower than #ant32 's implementation (called "folded") as explained in this thread: https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.com
This implementation was added to psycopg2 in version 2.7 and is called execute_values():
from psycopg2.extras import execute_values
execute_values(cur,
"INSERT INTO test (id, v1, v2) VALUES %s",
[(1, 2, 3), (4, 5, 6), (7, 8, 9)])
Previous Answer:
To insert multiple rows, using the multirow VALUES syntax with execute() is about 10x faster than using psycopg2 executemany(). Indeed, executemany() just runs many individual INSERT statements.
#ant32 's code works perfectly in Python 2. But in Python 3, cursor.mogrify() returns bytes, cursor.execute() takes either bytes or strings, and ','.join() expects str instance.
So in Python 3 you may need to modify #ant32 's code, by adding .decode('utf-8'):
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x).decode('utf-8') for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
Or by using bytes (with b'' or b"") only:
args_bytes = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_bytes)

cursor.copy_from is the fastest solution I've found for bulk inserts by far. Here's a gist I made containing a class named IteratorFile which allows an iterator yielding strings to be read like a file. We can convert each input record to a string using a generator expression. So the solution would be
args = [(1,2), (3,4), (5,6)]
f = IteratorFile(("{}\t{}".format(x[0], x[1]) for x in args))
cursor.copy_from(f, 'table_name', columns=('a', 'b'))
For this trivial size of args it won't make much of a speed difference, but I see big speedups when dealing with thousands+ of rows. It will also be more memory efficient than building a giant query string. An iterator would only ever hold one input record in memory at a time, where at some point you'll run out of memory in your Python process or in Postgres by building the query string.

A snippet from Psycopg2's tutorial page at Postgresql.org (see bottom):
A last item I would like to show you is how to insert multiple rows using a dictionary. If you had the following:
namedict = ({"first_name":"Joshua", "last_name":"Drake"},
{"first_name":"Steven", "last_name":"Foo"},
{"first_name":"David", "last_name":"Bar"})
You could easily insert all three rows within the dictionary by using:
cur = conn.cursor()
cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)
It doesn't save much code, but it definitively looks better.

All of these techniques are called 'Extended Inserts" in Postgres terminology, and as of the 24th of November 2016, it's still a ton faster than psychopg2's executemany() and all the other methods listed in this thread (which i tried before coming to this answer).
Here's some code which doesnt use cur.mogrify and is nice and simply to get your head around:
valueSQL = [ '%s', '%s', '%s', ... ] # as many as you have columns.
sqlrows = []
rowsPerInsert = 3 # more means faster, but with diminishing returns..
for row in getSomeData:
# row == [1, 'a', 'yolo', ... ]
sqlrows += row
if ( len(sqlrows)/len(valueSQL) ) % rowsPerInsert == 0:
# sqlrows == [ 1, 'a', 'yolo', 2, 'b', 'swag', 3, 'c', 'selfie' ]
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*rowsPerInsert)
cur.execute(insertSQL, sqlrows)
con.commit()
sqlrows = []
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*len(sqlrows))
cur.execute(insertSQL, sqlrows)
con.commit()
But it should be noted that if you can use copy_from(), you should use copy_from ;)

Security vulnerabilities
As of 2022-11-16, the answers by #Clodoaldo Neto (for Psycopg 2.6), #Joseph Sheedy, #J.J, #Bart Jonk, #kevo Njoki, #TKoutny and #Nihal Sharma contain SQL injection vulnerabilities and should not be used.
The fastest proposal so far (copy_from) should not be used either because it is difficult to escape the data correctly. This is easily apparent when trying to insert characters like ', ", \n, \, \t or \n.
The author of psycopg2 also recommends against copy_from:
copy_from() and copy_to() are really just ancient and incomplete methods
The fastest method
The fastest method is cursor.copy_expert, which can insert data straight from CSV files.
with open("mydata.csv") as f:
cursor.copy_expert("COPY mytable (my_id, a, b) FROM STDIN WITH csv", f)
copy_expert is also the fastest method when generating the CSV file on-the-fly. For reference, see the following CSVFile class, which takes care to limit memory usage.
import io, csv
class CSVFile(io.TextIOBase):
# Create a CSV file from rows. Can only be read once.
def __init__(self, rows, size=8192):
self.row_iter = iter(rows)
self.buf = io.StringIO()
self.available = 0
self.size = size
def read(self, n):
# Buffer new CSV rows until enough data is available
buf = self.buf
writer = csv.writer(buf)
while self.available < n:
try:
row_length = writer.writerow(next(self.row_iter))
self.available += row_length
self.size = max(self.size, row_length)
except StopIteration:
break
# Read requested amount of data from buffer
write_pos = buf.tell()
read_pos = write_pos - self.available
buf.seek(read_pos)
data = buf.read(n)
self.available -= len(data)
# Shrink buffer if it grew very large
if read_pos > 2 * self.size:
remaining = buf.read()
buf.seek(0)
buf.write(remaining)
buf.truncate()
else:
buf.seek(write_pos)
return data
This class can then be used like:
rows = [(1, "a", "b"), (2, "c", "d")]
cursor.copy_expert("COPY mytable (my_id, a, b) FROM STDIN WITH csv", CSVFile(rows))
If all your data fits into memory, you can also generate the entire CSV data directly without the CSVFile class, but if you do not know how much data you are going to insert in the future, you probably should not do that.
f = io.StringIO()
writer = csv.writer(f)
for row in rows:
writer.writerow(row)
f.seek(0)
cursor.copy_expert("COPY mytable (my_id, a, b) FROM STDIN WITH csv", f)
Benchmark results
914 milliseconds - many calls to cursor.execute
846 milliseconds - cursor.executemany
362 milliseconds - psycopg2.extras.execute_batch
346 milliseconds - execute_batch with page_size=1000
265 milliseconds - execute_batch with prepared statement
161 milliseconds - psycopg2.extras.execute_values
127 milliseconds - cursor.execute with string-concatenated values
39 milliseconds - copy_expert generating the entire CSV file at once
32 milliseconds - copy_expert with CSVFile

I've been using ant32's answer above for several years. However I've found that is thorws an error in python 3 because mogrify returns a byte string.
Converting explicitly to bytse strings is a simple solution for making code python 3 compatible.
args_str = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_str)

executemany accept array of tuples
https://www.postgresqltutorial.com/postgresql-python/insert/
""" array of tuples """
vendor_list = [(value1,)]
""" insert multiple vendors into the vendors table """
sql = "INSERT INTO vendors(vendor_name) VALUES(%s)"
conn = None
try:
# read database configuration
params = config()
# connect to the PostgreSQL database
conn = psycopg2.connect(**params)
# create a new cursor
cur = conn.cursor()
# execute the INSERT statement
cur.executemany(sql,vendor_list)
# commit the changes to the database
conn.commit()
# close communication with the database
cur.close()
except (Exception, psycopg2.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()

The cursor.copyfrom solution as provided by #jopseph.sheedy (https://stackoverflow.com/users/958118/joseph-sheedy) above (https://stackoverflow.com/a/30721460/11100064) is indeed lightning fast.
However, the example he gives are not generically usable for a record with any number of fields and it took me while to figure out how to use it correctly.
The IteratorFile needs to be instantiated with tab-separated fields like this (r is a list of dicts where each dict is a record):
f = IteratorFile("{0}\t{1}\t{2}\t{3}\t{4}".format(r["id"],
r["type"],
r["item"],
r["month"],
r["revenue"]) for r in records)
To generalise for an arbitrary number of fields we will first create a line string with the correct amount of tabs and field placeholders : "{}\t{}\t{}....\t{}" and then use .format() to fill in the field values : *list(r.values())) for r in records:
line = "\t".join(["{}"] * len(records[0]))
f = IteratorFile(line.format(*list(r.values())) for r in records)
complete function in gist here.

execute_batch has been added to psycopg2 since this question was posted.
It is faster than execute_values.

Another nice and efficient approach - is to pass rows for insertion as 1 argument,
which is array of json objects.
E.g. you passing argument:
[ {id: 18, score: 1}, { id: 19, score: 5} ]
It is array, which may contain any amount of objects inside.
Then your SQL looks like:
INSERT INTO links (parent_id, child_id, score)
SELECT 123, (r->>'id')::int, (r->>'score')::int
FROM unnest($1::json[]) as r
Notice: Your postgress must be new enough, to support json

If you're using SQLAlchemy, you don't need to mess with hand-crafting the string because SQLAlchemy supports generating a multi-row VALUES clause for a single INSERT statement:
rows = []
for i, name in enumerate(rawdata):
row = {
'id': i,
'name': name,
'valid': True,
}
rows.append(row)
if len(rows) > 0: # INSERT fails if no rows
insert_query = SQLAlchemyModelName.__table__.insert().values(rows)
session.execute(insert_query)

From #ant32
def myInsertManyTuples(connection, table, tuple_of_tuples):
cursor = connection.cursor()
try:
insert_len = len(tuple_of_tuples[0])
insert_template = "("
for i in range(insert_len):
insert_template += "%s,"
insert_template = insert_template[:-1] + ")"
args_str = ",".join(
cursor.mogrify(insert_template, x).decode("utf-8")
for x in tuple_of_tuples
)
cursor.execute("INSERT INTO " + table + " VALUES " + args_str)
connection.commit()
except psycopg2.Error as e:
print(f"psycopg2.Error in myInsertMany = {e}")
connection.rollback()

If you want to insert multiple rows within one insert statemens (assuming you are not using ORM) the easiest way so far for me would be to use list of dictionaries. Here is an example:
t = [{'id':1, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 6},
{'id':2, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 7},
{'id':3, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 8}]
conn.execute("insert into campaign_dates
(id, start_date, end_date, campaignid)
values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);",
t)
As you can see only one query will be executed:
INFO sqlalchemy.engine.base.Engine insert into campaign_dates (id, start_date, end_date, campaignid) values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);
INFO sqlalchemy.engine.base.Engine [{'campaignid': 6, 'id': 1, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 7, 'id': 2, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 8, 'id': 3, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}]
INFO sqlalchemy.engine.base.Engine COMMIT

psycopg2 2.9.3
data = "(1, 2), (3, 4), (5, 6)"
query = "INSERT INTO t (a, b) VALUES {0}".format(data)
cursor.execute(query)
or
data = [(1, 2), (3, 4), (5, 6)]
data = ",".join(map(str, data))
query = "INSERT INTO t (a, b) VALUES {0}".format(data)
cursor.execute(query)

The Solution am using can insert like 8000 records in 1 millisecond
curtime = datetime.datetime.now()
postData = dict()
postData["title"] = "This is Title Text"
postData["body"] = "This a Body Text it Can be Long Text"
postData['created_at'] = curtime.isoformat()
postData['updated_at'] = curtime.isoformat()
data = []
for x in range(8000):
data.append(((postData)))
vals = []
for d in postData:
vals.append(tuple(d.values())) #Here we extract the Values from the Dict
flds = ",".join(map(str, postData[0]))
tableFlds = ",".join(map(str, vals))
sqlStr = f"INSERT INTO posts ({flds}) VALUES {tableFlds}"
db.execute(sqlStr)
connection.commit()
rowsAffected = db.rowcount
print(f'{rowsAffected} Rows Affected')

Finally in SQLalchemy1.2 version, this new implementation is added to use psycopg2.extras.execute_batch() instead of executemany when you initialize your engine with use_batch_mode=True like:
engine = create_engine(
"postgresql+psycopg2://scott:tiger#host/dbname",
use_batch_mode=True)
http://docs.sqlalchemy.org/en/latest/changelog/migration_12.html#change-4109
Then someone would have to use SQLalchmey won't bother to try different combinations of sqla and psycopg2 and direct SQL together..

Using aiopg - The snippet below works perfectly fine
# items = [10, 11, 12, 13]
# group = 1
tup = [(gid, pid) for pid in items]
args_str = ",".join([str(s) for s in tup])
# insert into group values (1, 10), (1, 11), (1, 12), (1, 13)
yield from cur.execute("INSERT INTO group VALUES " + args_str)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

psycopg2 formatting for bulk insertion - python

Related

Write dictionnary with tuple containing parameters as unique value for features into postgresql table

MYSQL: how to insert statement without specifying col names or question marks?

How do insert a time window of data into table that partially overlaps with existing rows with unique constraint

python sqlite adding rows to a sql data base where the row is given as a dictionary

psycopg2: insert multiple rows with one query

Categories

Resources