Mysqldb and Python KeyError Handling - python

I am attempting to add multiple values to MySQL table, here's the code:
Try:
cursor.execute("INSERT INTO companies_and_charges_tmp (etags, company_id, created, delivered, satisfied, status, description, persons_entitled) VALUES ('%s, %s, %s, %s, %s, %s, %s, %s')" % (item['etag'], ch_no, item['created_on'], item['delivered_on'], item['satisfied_on'], item['status'], item['particulars'][0]['description'], item['persons_entitled'][0]['name']))
Except KeyError:
pass
The problem is that this code is in the loop and at times one of the values that are beiing inserted will be missing, which will result in Key Error cancelling the entire insertion.
How do I get past the KeyError, so when the KeyError relating to one of the items that are being inserted occurs, others are still added to the table and the one that is missing is simply left as NULL?

You can use the dict.get() method which would return None if a key would not be found in a dictionary. MySQL driver would then convert None to NULL during the query parameterization step:
# handling description and name separately
try:
description = item['particulars'][0]['description']
except KeyError:
description = None
# TODO: violates DRY - extract into a reusable method?
try:
name = item['persons_entitled'][0]['name']
except KeyError:
name = None
cursor.execute("""
INSERT INTO
companies_and_charges_tmp
(etags, company_id, created, delivered, satisfied, status, description, persons_entitled)
VALUES
(%s, %s, %s, %s, %s, %s, %s, %s)""",
(item.get('etag'), ch_no, item.get('created_on'), item.get('delivered_on'), item.get('satisfied_on'), item.get('status'), description, name))

Related

How do I handle a KeyError exception in python without exiting the dictionary?

Basically I have some JSON data that I want to put in a MySQL db and to do this I'm trying to get the contents of a dictionary in a cursor.execute method. My code is as follows:
for p in d['aircraft']:
with connection.cursor() as cursor:
print(p['hex'])
sql = "INSERT INTO `aircraft` (`hex`, `squawk`, `flight`, `lat`, `lon`, `nucp`, `seen_pos`, " \
"`altitude`, `vert_rate`, `track`, `speed`, `messages`, `seen`, `rssi`) " \
"VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s )"
cursor.execute(sql, (p['hex'], p['squawk'], p['flight'], p['lat'], p['lon'], p['nucp'], p['seen_pos'], p['altitude'], p['vert_rate'], p['track'], p['speed'], p['messages'], p['seen'], p['rssi']))
print('entered')
connection.commit()
The issue is that any value in the dictionary can be null at any time and I need to find out how to handle this. I've tried to put the code in a try catch block and 'pass' whenever a KeyError exception is raised but this means a record is completely skipped when it has a null value. I've also tried to write a load of if blocks to append a string with the value of the dictionary key but this was pretty useless.
I need to find a way to put a dictionary in my db even if it contains null values.
You can use the dict.get() method, or construct a defaultdict that returns None for missing keys:
import collections
keys = ['hex', 'squawk', 'flight', 'lat', 'lon', 'nucp', 'seen_pos',
'altitude', 'vert_rate', 'track', 'speed', 'messages', 'seen',
'rssi']
for p in d['aircraft']:
with connection.cursor() as cursor:
sql = "INSERT INTO `aircraft` (`hex`, `squawk`, `flight`, `lat`, `lon`, `nucp`, `seen_pos`, " \
"`altitude`, `vert_rate`, `track`, `speed`, `messages`, `seen`, `rssi`) " \
"VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s )"
# Could also use a defaultdict
cursor.execute(sql, tuple(p.get(key) for key in keys))
print('entered')
connection.commit()
For more examples using dict.get(), see this answer: https://stackoverflow.com/a/11041421/1718575
This assumes that SQL will do the right thing when None is provided. If you want to use a string 'NULL', you can supply that as the second argument to dict.get().

IndexError: tuple index out of range connecting Python to PostgreSQL

I know this question has been asked a number of times, but i am stuck here unable to proceed further. I am executing a for loop in python to load data to fact table.
I am executing the below code
for index, row in df.iterrows():
# get songid and artistid from song and artist tables
cur.execute(song_select, (row.song, row.artist, row.length))
results = cur.fetchone()
if results:
song_id, artist_id = results
else:
song_id, artist_id = None, None
# insert songplay record
songplay_data = (pd.to_datetime(row.ts, unit='ms'),row.userId,row.level,song_id,artist_id,row.sessionId,row.location,row.userAgent)
cur.execute(songplay_table_insert, songplay_data)
conn.commit()
and getting the error
<ipython-input-22-b8b0e27022de> in <module>()
13
14 songplay_data = (pd.to_datetime(row.ts, unit='ms'),row.userId,row.level,song_id,artist_id,row.sessionId,row.location,row.userAgent)
15 cur.execute(songplay_table_insert, songplay_data)
16 conn.commit()
IndexError: tuple index out of range
My table i am trying to insert is
songplay_table_insert = ("""INSERT INTO songplays (songplay_id, start_time,
user_id, level, song_id, artist_id, session_id, location, user_agent )
VALUES(%s, %s, %s, %s, %s, %s, %s, %s, %s)
I am really stuck, any help appreciated.
You have one too many %s markers.
VALUES(%s, %s, %s, %s, %s, %s, %s, %s, %s)
has 9 markers, while
songplay_data = (pd.to_datetime(row.ts, unit='ms'),row.userId,row.level,song_id,artist_id,row.sessionId,row.location,row.userAgent)
has 8 elements. When it tries to evaluate the last marker, it looks for the 9th element, i.e. songplay_data[8], and that raises the error.
You will also need to remove songplay_id from the SQL to make the INSERT statement valid. The database should be generating the primary key for you if you don't have a value to provide, if not we should take a look at your table definition.

psycopg2 - Inserting multiple rows that have multiple columns faster

I'm trying to insert multiple rows into my database, and currently I do not know a way to insert them all at the same time or any other method which will help save time (sequentially it takes about ~30s for around 300 rows).
My 'rows' are are tuples in a list of tuples (converted into tuple of tuples), e.g. [(col0, col1, col2), (col0, col1, col2), (.., .., ..), ..]
def commit(self, tuple):
cursor = self.conn.cursor()
for tup in tuple:
try:
sql = """insert into "SSENSE_Output" ("productID", "brand", "categoryID", "productName", "price", "sizeInfo", "SKU", "URL", "dateInserted", "dateUpdated")
values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"""
cursor.execute(sql, tup)
self.conn.commit()
except psycopg2.IntegrityError:
self.conn.rollback()
sql = 'insert into "SSENSE_Output" ' \
'("productID", "brand", "categoryID", "productName", "price", "sizeInfo", "SKU", "URL", "dateInserted", "dateUpdated")' \
'values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s) on conflict ("productID") do update set "dateUpdated" = EXCLUDED."dateUpdated"'
cursor.execute(sql, tup)
self.conn.commit()
except Exception as e:
print(e)
I have also tried commiting after the for loop is done, but still results in the same amount of time. Are there any ways to make this insert significantly faster?
In postgres you can use a format like:
INSERT INTO films (code, title, did, date_prod, kind) VALUES
('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');
Due to your record base exception handling you can better first resolve the duplicates before generating this query as the whole query might fail when an integrity error occurs.
Building one large INSERT statement instead of many of them will considerably improve the execution time, you should take a look here. It is for mysql, but I think a similar approach apply for postgreSQL

Get last pk without commit?

Is there a way to get the last PK inserted without doing a COMMIT? Here is what I'm currently doing:
self.cursor.execute('INSERT IGNORE INTO main_catalog VALUES (%s, %s, %s, %s, %s)', (item[0], item[1], item[2], False, 'GOOGLE'))
self.conn.commit()
self.cursor2.execute('SELECT LAST_INSERT_ID()')
last_pk = cursor.fetchone()[0]
How would I do that without the self.conn.commit() ?
You should be able to use cursor.lastrowid, even within a transaction (i.e. without having called or before calling conn.commit. See http://dev.mysql.com/doc/connector-python/en/connector-python-example-cursor-transaction.html

Match mysql data type from python for load data

I have a python script to load data to a mysql table problem that I am running into is following:
Warning: Incorrect integer value: 'user_id' for column 'user_id' at row 1
+'VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)',row)
Warning: Invalid TIMESTAMP value in column 'edit_time' at row 1
+'VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)',row)
and this is just a one line of many warning lines, I understand the error and based on my understanding I guess the row is still in the string format or the conversion didn't happen properly. I was thinking to cast each element in the row after I read them from tsv file.
example: int(row[0])
but I am not sure how to cast correctly to match MySQL timestamp type for the relevant timestamp element.
#load training data
def readtsv(file_name, con):
cur = con.cursor()
with open('file.tsv', 'rb') as f:
for row in csv.reader(f, delimiter='\t'):
cur.execute('INSERT INTO mytable(user_id, article_id, revision_id, namespace, edit_time, md5, reverted, reverted_user_id, reverted_revision_id, delta, cur_size)'
+'VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)',row)
try:
con.commit()
except Exception, e:
print 'unable to commit'
print e
Have you checked if your timestamp value is the correct one?
http://dev.mysql.com/doc/refman/5.1/en/datetime.html
if it is so, MySQL will not see the error, thus no warning will exist!
As an hint simply print the insert statement string and try to launch it via "PhpMyAdmin" or other database interface tool.
The TIMESTAMP value, for MySQL is in the format: 'YYYYMMDDHHMMSS', thus today 04 Dec 2012 21:06:00 will be written in this way: '20121204210600' with quotes!

Categories