psycopg2 - Inserting multiple rows that have multiple columns faster - python
I'm trying to insert multiple rows into my database, and currently I do not know a way to insert them all at the same time or any other method which will help save time (sequentially it takes about ~30s for around 300 rows).
My 'rows' are are tuples in a list of tuples (converted into tuple of tuples), e.g. [(col0, col1, col2), (col0, col1, col2), (.., .., ..), ..]
def commit(self, tuple):
cursor = self.conn.cursor()
for tup in tuple:
try:
sql = """insert into "SSENSE_Output" ("productID", "brand", "categoryID", "productName", "price", "sizeInfo", "SKU", "URL", "dateInserted", "dateUpdated")
values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"""
cursor.execute(sql, tup)
self.conn.commit()
except psycopg2.IntegrityError:
self.conn.rollback()
sql = 'insert into "SSENSE_Output" ' \
'("productID", "brand", "categoryID", "productName", "price", "sizeInfo", "SKU", "URL", "dateInserted", "dateUpdated")' \
'values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s) on conflict ("productID") do update set "dateUpdated" = EXCLUDED."dateUpdated"'
cursor.execute(sql, tup)
self.conn.commit()
except Exception as e:
print(e)
I have also tried commiting after the for loop is done, but still results in the same amount of time. Are there any ways to make this insert significantly faster?
In postgres you can use a format like:
INSERT INTO films (code, title, did, date_prod, kind) VALUES
('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');
Due to your record base exception handling you can better first resolve the duplicates before generating this query as the whole query might fail when an integrity error occurs.
Building one large INSERT statement instead of many of them will considerably improve the execution time, you should take a look here. It is for mysql, but I think a similar approach apply for postgreSQL
Related
Python - Multiline SQL with List - Not all parameters were used in the SQL statement
I'm usually a C# Dev, in an attempt to get some python experience. So at this current time I got a List of values I want to import into MYSQL, as there will be quite a few lines I want go with a single big import than thousands of small ones. Looking at the docs on pynative I can see the example being provided as: mySql_insert_query = """INSERT INTO Laptop (Id, Name, Price, Purchase_date) VALUES (%s, %s, %s, %s) """ records_to_insert = [(4, 'HP Pavilion Power', 1999, '2019-01-11'), (5, 'MSI WS75 9TL-496', 5799, '2019-02-27'), (6, 'Microsoft Surface', 2330, '2019-07-23')] cursor = connection.cursor() cursor.executemany(mySql_insert_query, records_to_insert) connection.commit() As my records to insert will be data pulled from an API, I had the "briliant" idea of building the list the most basic and probably the worst method possible. By creating the list with a pre-formatted string. val = [] sql = """INSERT INTO equipment (ItemStaticID, ItemID, ItemType, Element, Tier, Level, Hp, Atk, Hit, Def, Spd) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)""" for x in range(len(test)): print(test[x].id, test[x].itemId, test[x].itemSubType, test[x].elementalType, test[x].grade, test[x].level, test[x].hP, test[x].aTK, test[x].hIT, test[x].dEF, test[x].sPD) value = "({},'{}','{}','{}','{}','{}','{}','{}','{}','{}','{}')".format(test[x].id, test[x].itemId, test[x].itemSubType, test[x].elementalType, test[x].grade, test[x].level, test[x].hP, test[x].aTK, test[x].hIT, test[x].dEF, test[x].sPD) print(value) val.append(value) insertcur = mydb.cursor() insertcur.executemany(sql, val) When running this code I get a "Not all parameters were used in the SQL statement", I verified that the SQL Query was accurate and the fields listed matched, 11 column names and I got 11 params. So I'm a bit lost why it believes that not all params are being used in the SQL statement. The below is the content of my list "val" ["(10340000,'d3252f40-5cbb-4a7c-a140-dcc2badac580','BELT','NORMAL','4','1','0','0','555','0','1683')", "(10340000,'e8e16a7b-4271-4cd4-8700-4dfb17a9525f','BELT','NORMAL','4','2','0','0','534','0','1848')", "(10341000,'66e70f12-f13b-4572-a222-cd61740d78b8','BELT','FIRE','4','0','0','0','503','0','1921')", "(10130000,'7cc2027b-110e-4ba0-9d95-ce563eca4be9','WEAPON','NORMAL','3','0','0','99','0','0','0')", "(10130000,'88440779-0794-431e-a726-3cb63af1dae1','WEAPON','NORMAL','3','0','0','99','0','0','0')", "(10130000,'f513afc3-9035-40b4-a0bb-6d4a74b88c7d','WEAPON','NORMAL','3','0','0','100','0','0','88')", "(10130000,'e22a771e-3fd7-4ce4-93c4-254913e89551','WEAPON','NORMAL','3','0','0','100','0','0','93')", "(10230000,'bf97a302-41e1-4b21-b7ff-3d7786f96ec3','ARMOR','NORMAL','3','0','1707','0','0','33','0')", "(10233000,'69f4811f-f45b-4227-99eb-2f45f0705c2a','ARMOR','LAND','3','0','11254','0','0','0','0')", "(10431000,'0813b427-de85-4fd6-82a1-fe8b8708ad45','NECKLACE','FIRE','3','0','0','0','677','30','0')", "(10531000,'48fc6562-44b5-49ba-8ad2-837d0e981a70','RING','FIRE','3','0','0','65','0','90','0')", "(10120000,'e5ebcdb9-ae59-4fc2-adb8-50e3b673c26d','WEAPON','NORMAL','2','0','0','39','0','0','0')", "(10120000,'22194497-91d1-405d-8a46-9c25b36c7ffd','WEAPON','NORMAL','2','1','0','42','0','0','37')", "(10124000,'487e9efc-a5f6-4944-924a-38ff8bcfb014','WEAPON','WIND','2','0','0','421','0','0','0')", "(10124000,'d015eced-58ec-48c4-bdbd-4f790c9ccbe4','WEAPON','WIND','2','0','0','453','0','0','572')", "(10124000,'1c624b33-889d-42f1-ae6b-2ddda234c481','WEAPON','WIND','2','1','0','462','0','0','0')", "(10220000,'225f94e7-2342-4a50-a1b7-07ec789457f1','ARMOR','NORMAL','2','1','737','0','0','13','0')", "(10220000,'64bd3ce5-572a-4b83-b83c-5f8f7f728f9e','ARMOR','NORMAL','2','1','727','0','0','13','0')", "(10224000,'206d6a67-3d1e-4b22-ad94-1ee19bdb4e00','ARMOR','WIND','2','3','9597','0','0','67','0')", "(10224000,'0ddc84d4-b65d-4b6d-b332-b3de72defce2','ARMOR','WIND','2','3','8972','0','0','88','0')", "(10321000,'3d0f39cb-0684-42a4-a9b0-ec2466679d18','BELT','FIRE','2','0','0','0','0','0','176')", "(10423000,'4ebb7192-15d4-4e9d-999c-be4a2e7ac85a','NECKLACE','LAND','2','3','0','0','1833','63','0')", "(10523000,'d1260941-ba0f-4a2a-b452-b3c3a6d6ec81','RING','LAND','2','0','0','0','384','172','0')", "(10523000,'fdef3e38-88ba-4970-923e-1d46ea35d1c5','RING','LAND','2','3','0','0','295','282','0')", "(10523000,'e3657fe9-481e-45ea-9086-065a79e3bc06','RING','LAND','2','0','0','0','376','173','0')", "(10523000,'68ab4ca0-683f-43a8-a2c6-358788e4e5e8','RING','LAND','2','2','0','0','408','195','0')", "(10523000,'be3921cc-fc3a-48bb-8542-c385d83d99cb','RING','LAND','2','0','0','0','0','172','0')", "(10523000,'422bb5ab-bef6-42a8-8562-a572d07c7970','RING','LAND','2','0','0','0','0','175','0')", "(10523000,'708df542-4622-4849-899c-846938f98ef2','RING','LAND','2','0','0','0','392','173','0')", "(10111000,'54896e7e-9799-4c81-a704-124522ab9b91','WEAPON','FIRE','1','3','0','23','0','0','0')", "(10210000,'09f995cd-6a8c-47ae-8c46-18f799322c9a','ARMOR','NORMAL','1','2','264','0','0','5','0')", "(10212000,'4effa256-6fe1-4051-866d-0f77d4294004','ARMOR','WATER','1','0','374','0','0','0','0')", "(10312000,'326908bb-ba10-4be9-abf2-4499d413bbb0','BELT','WATER','1','2','0','0','23','0','110')", "(10412000,'390c39f3-756b-488a-ae3f-eb03021453af','NECKLACE','WATER','1','3','0','0','156','8','0')", "(10512000,'7f8f0faf-da5c-435c-a27b-14ccbf94dd08','RING','WATER','1','3','159','0','0','20','0')", "(10514000,'08049c6d-5923-49d6-ac5f-fb569fc69ed3','RING','WIND','1','2','0','0','0','224','232')", "(10514000,'8dce0420-4d09-4ab6-8d73-86343155f8b9','RING','WIND','1','3','0','0','0','230','433')"]
How do I handle a KeyError exception in python without exiting the dictionary?
Basically I have some JSON data that I want to put in a MySQL db and to do this I'm trying to get the contents of a dictionary in a cursor.execute method. My code is as follows: for p in d['aircraft']: with connection.cursor() as cursor: print(p['hex']) sql = "INSERT INTO `aircraft` (`hex`, `squawk`, `flight`, `lat`, `lon`, `nucp`, `seen_pos`, " \ "`altitude`, `vert_rate`, `track`, `speed`, `messages`, `seen`, `rssi`) " \ "VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s )" cursor.execute(sql, (p['hex'], p['squawk'], p['flight'], p['lat'], p['lon'], p['nucp'], p['seen_pos'], p['altitude'], p['vert_rate'], p['track'], p['speed'], p['messages'], p['seen'], p['rssi'])) print('entered') connection.commit() The issue is that any value in the dictionary can be null at any time and I need to find out how to handle this. I've tried to put the code in a try catch block and 'pass' whenever a KeyError exception is raised but this means a record is completely skipped when it has a null value. I've also tried to write a load of if blocks to append a string with the value of the dictionary key but this was pretty useless. I need to find a way to put a dictionary in my db even if it contains null values.
You can use the dict.get() method, or construct a defaultdict that returns None for missing keys: import collections keys = ['hex', 'squawk', 'flight', 'lat', 'lon', 'nucp', 'seen_pos', 'altitude', 'vert_rate', 'track', 'speed', 'messages', 'seen', 'rssi'] for p in d['aircraft']: with connection.cursor() as cursor: sql = "INSERT INTO `aircraft` (`hex`, `squawk`, `flight`, `lat`, `lon`, `nucp`, `seen_pos`, " \ "`altitude`, `vert_rate`, `track`, `speed`, `messages`, `seen`, `rssi`) " \ "VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s )" # Could also use a defaultdict cursor.execute(sql, tuple(p.get(key) for key in keys)) print('entered') connection.commit() For more examples using dict.get(), see this answer: https://stackoverflow.com/a/11041421/1718575 This assumes that SQL will do the right thing when None is provided. If you want to use a string 'NULL', you can supply that as the second argument to dict.get().
IndexError: tuple index out of range connecting Python to PostgreSQL
I know this question has been asked a number of times, but i am stuck here unable to proceed further. I am executing a for loop in python to load data to fact table. I am executing the below code for index, row in df.iterrows(): # get songid and artistid from song and artist tables cur.execute(song_select, (row.song, row.artist, row.length)) results = cur.fetchone() if results: song_id, artist_id = results else: song_id, artist_id = None, None # insert songplay record songplay_data = (pd.to_datetime(row.ts, unit='ms'),row.userId,row.level,song_id,artist_id,row.sessionId,row.location,row.userAgent) cur.execute(songplay_table_insert, songplay_data) conn.commit() and getting the error <ipython-input-22-b8b0e27022de> in <module>() 13 14 songplay_data = (pd.to_datetime(row.ts, unit='ms'),row.userId,row.level,song_id,artist_id,row.sessionId,row.location,row.userAgent) 15 cur.execute(songplay_table_insert, songplay_data) 16 conn.commit() IndexError: tuple index out of range My table i am trying to insert is songplay_table_insert = ("""INSERT INTO songplays (songplay_id, start_time, user_id, level, song_id, artist_id, session_id, location, user_agent ) VALUES(%s, %s, %s, %s, %s, %s, %s, %s, %s) I am really stuck, any help appreciated.
You have one too many %s markers. VALUES(%s, %s, %s, %s, %s, %s, %s, %s, %s) has 9 markers, while songplay_data = (pd.to_datetime(row.ts, unit='ms'),row.userId,row.level,song_id,artist_id,row.sessionId,row.location,row.userAgent) has 8 elements. When it tries to evaluate the last marker, it looks for the 9th element, i.e. songplay_data[8], and that raises the error. You will also need to remove songplay_id from the SQL to make the INSERT statement valid. The database should be generating the primary key for you if you don't have a value to provide, if not we should take a look at your table definition.
Inserting mysql data from one table to another with python
I'm trying to insert data that's already in one mysql table into another, using python. The column names are the same in each table, and objkey is the distinguishing piece of data I have for the item that I'd like to use to tell mysql which columns to look at. import MySQLdb db = MySQLdb.connect(host='', user='', passwd='', db='') cursor = db.cursor sql = "INSERT INTO newtable (%s, %s, %s, %s) SELECT %s, %s, %s, %s FROM oldtable WHERE %s;" % ((name, desig, data, num), name, desig, data, num, obj = repr(objkey)) cursor.execute(sql) db.commit() db.close() It says I have a syntax error, but I'm not sure where since I'm pretty sure there should be parentheses around the field names the first time but not the second one. Anyone know what I'm doing wrong?
I'm not exactly sure what you are trying to do with the obj = repr(objkey) line, but python is thinking you are defining variables with this line, not setting sql syntax (if that is indeed your desire here). sql = "INSERT INTO newtable (%s, %s, %s, %s) SELECT %s, %s, %s, %s FROM oldtable WHERE %s;" % ((name, desig, data, num), name, desig, data, num, obj = repr(objkey)) should probably be changed to something like: sql = "INSERT INTO newtable (%s, %s, %s, %s) SELECT %s, %s, %s, %s FROM oldtable WHERE obj=%;" % ((name, desig, data, num), name, desig, data, num, repr(objkey)) But even then, you would need objkey defined somewhere as a python variable. This answer may be way off, but you need to defined what you are expecting to achieve with obj = repr(objkey), in order to get more accurate answers.
Get last pk without commit?
Is there a way to get the last PK inserted without doing a COMMIT? Here is what I'm currently doing: self.cursor.execute('INSERT IGNORE INTO main_catalog VALUES (%s, %s, %s, %s, %s)', (item[0], item[1], item[2], False, 'GOOGLE')) self.conn.commit() self.cursor2.execute('SELECT LAST_INSERT_ID()') last_pk = cursor.fetchone()[0] How would I do that without the self.conn.commit() ?
You should be able to use cursor.lastrowid, even within a transaction (i.e. without having called or before calling conn.commit. See http://dev.mysql.com/doc/connector-python/en/connector-python-example-cursor-transaction.html