More elegant way to deal with multiple KeyError Exceptions - python

I have the following function, which reads a dict and affects some values to local variables, which are then returned as a tuple.
The problem is that some of the desired keys may not exist in the dictionary.
So far I have this code, it does what I want but I wonder if there is a more elegant way to do it.
def getNetwork(self, search):
data = self.get('ip',search)
handle = data['handle']
name = data['name']
try:
country = data['country']
except KeyError:
country = ''
try:
type = data['type']
except KeyError:
type = ''
try:
start_addr = data['startAddress']
except KeyError:
start_addr = ''
try:
end_addr = data['endAddress']
except KeyError:
end_addr = ''
try:
parent_handle = data['parentHandle']
except KeyError:
parent_handle = ''
return (handle, name, country, type, start_addr, end_addr, parent_handle)
I'm kind of afraid by the numerous try: except: but if I put all the affectations inside a single try: except: it would stop to affect values once the first missing dict key raises an error.

Just use dict.get. Each use of:
try:
country = data['country']
except KeyError:
country = ''
can be equivalently replaced with:
country = data.get('country', '')

You could instead iterate through the keys and try for each key, on success append it to a list and on failure append a " ":
ret = []
for key in {'country', 'type', 'startAddress', 'endAddress', 'parentHandle'}:
try:
ret.append(data[key])
except KeyError:
ret.append([" "])
Then at the end of the function return a tuple:
return tuple(ret)
if that is necessary.

Thx ShadowRanger, with you answer I went to the following code, which is indeed more confortable to read :
def getNetwork(self, search):
data = self.get('ip',search)
handle = data.get('handle', '')
name = data.get('name', '')
country = data.get('country','')
type = data.get('type','')
start_addr = data.get('start_addr','')
end_addr = data.get('end_addr','')
parent_handle = data.get('parent_handle','')
return (handle, name, country, type, start_addr, end_addr, parent_handle)

Related

List index out of range error for loop on Json

I'm trying to request a json object and run through the object with a for loop and take out the data I need and save it to a model in django.
I only want the first two attributes of runner_1_name and runner_2_name but in my json object the amount or runners varies inside each list. I keep getting list index out of range error. I have tried to use try and accept but when I try save to the model it's showing my save variables is referenced before assignment What's the best way of ignoring list index out or range error or fixing the list so the indexes are correct? I also want the code to run really fast as I will using this function as a background task to poll every two seconds.
#shared_task()
def mb_get_events():
mb = APIClient('username' , 'pass')
tennis_events = mb.market_data.get_events()
for data in tennis_events:
id = data['id']
event_name = data['name']
sport_id = data['sport-id']
start_time = data['start']
is_ip = data['in-running-flag']
par = data['event-participants']
event_id = par[0]['event-id']
cat_id = data['meta-tags'][0]['id']
cat_name = data['meta-tags'][0]['name']
cat_type = data['meta-tags'][0]['type']
url_name = data['meta-tags'][0]['type']
try:
runner_1_name = data['markets'][0]['runners'][0]['name']
except IndexError:
pass
try:
runner_2_name = data['markets'][0]['runners'][1]['name']
except IndexError:
pass
run1_par_id = data['markets'][0]['runners'][0]['id']
run2_par_id = data['markets'][0]['runners'][1]['id']
run1_back_odds = data['markets'][0]['runners'][0]['prices'][0]['odds']
run2_back_odds = data['markets'][0]['runners'][1]['prices'][0]['odds']
run1_lay_odds = data['markets'][0]['runners'][0]['prices'][3]['odds']
run2_lay_odds = data['markets'][0]['runners'][1]['prices'][3]['odds']
te, created = MBEvent.objects.update_or_create(id=id)
te.id = id
te.event_name = event_name
te.sport_id = sport_id
te.start_time = start_time
te.is_ip = is_ip
te.event_id = event_id
te.runner_1_name = runner_1_name
te.runner_2_name = runner_2_name
te.run1_back_odds = run1_back_odds
te.run2_back_odds = run2_back_odds
te.run1_lay_odds = run1_lay_odds
te.run2_lay_odds = run2_lay_odds
te.run1_par_id = run1_par_id
te.run2_par_id = run2_par_id
te.cat_id = cat_id
te.cat_name = cat_name
te.cat_type = cat_type
te.url_name = url_name
te.save()
Quick Fix:
try:
runner_1_name = data['markets'][0]['runners'][0]['name']
except IndexError:
runner_1_name = '' # don't just pass here
try:
runner_2_name = data['markets'][0]['runners'][1]['name']
except IndexError:
runner_2_name = ''
It giving you variables is referenced before assignment because in expect block you are just passing, so if try fails runner_1_name or runner_2_name is never defined. You when you try to use those variables you get an error because they were never defined. So in except block either set the value to a blank string or some other string like 'Runner Does not Exists'.
Now if you want to totally avoid try/except and IndexError you can use if statements to check the length of markets and runners. Something like this:
runner_1_name = ''
runner_2_name = ''
# Make sure markets exists in data and its length is greater than 0 and runners exists in first market
if 'markets' in data and len(data['markets']) > 0 and 'runners' in data['market'][0]:
runners = data['markets'][0]['runners']
# get runner 1
if len(runners) > 0 and `name` in runners[0]:
runner_1_name = runners[0]['name']
else:
runner_1_name = 'Runner 1 does not exists'
# get runner 2
if len(runners) > 1 and `name` in runners[1]:
runner_2_name = runners[1]['name']
else:
runner_2_name = 'Runner 2 does not exists'
As you can see this gets too long and its not the recommended way to do things.
You should just assume data is alright and try to get the names and use try/except to catch any errors as suggested above in my first code snippet.
I had the issue with a list of comments that can be empty or filled by an unknown number of comments
My solution is to initialize a counting variable at 0 and have a while loop on a boolean
In the loop I try to get comment[count] if it fails on except IndexError I set the boolean to False to stop the infinite loop
count = 0
condition_continue = True
while condition_continue :
try:
detailsCommentDict = comments[count]
....
except IndexError:
# no comment at all or no more comment
condition_continue = False

Check that a key from json output exists

I keep getting the following error when trying to parse some json:
Traceback (most recent call last):
File "/Users/batch/projects/kl-api/api/helpers.py", line 37, in collect_youtube_data
keywords = channel_info_response_data['items'][0]['brandingSettings']['channel']['keywords']
KeyError: 'brandingSettings'
How do I make sure that I check my JSON output for a key before assigning it to a variable? If a key isn’t found, then I just want to assign a default value. Code below:
try:
channel_id = channel_id_response_data['items'][0]['id']
channel_info_url = YOUTUBE_URL + '/channels/?key=' + YOUTUBE_API_KEY + '&id=' + channel_id + '&part=snippet,contentDetails,statistics,brandingSettings'
print('Querying:', channel_info_url)
channel_info_response = requests.get(channel_info_url)
channel_info_response_data = json.loads(channel_info_response.content)
no_of_videos = int(channel_info_response_data['items'][0]['statistics']['videoCount'])
no_of_subscribers = int(channel_info_response_data['items'][0]['statistics']['subscriberCount'])
no_of_views = int(channel_info_response_data['items'][0]['statistics']['viewCount'])
avg_views = round(no_of_views / no_of_videos, 0)
photo = channel_info_response_data['items'][0]['snippet']['thumbnails']['high']['url']
description = channel_info_response_data['items'][0]['snippet']['description']
start_date = channel_info_response_data['items'][0]['snippet']['publishedAt']
title = channel_info_response_data['items'][0]['snippet']['title']
keywords = channel_info_response_data['items'][0]['brandingSettings']['channel']['keywords']
except Exception as e:
raise Exception(e)
You can either wrap all your assignment in something like
try:
keywords = channel_info_response_data['items'][0]['brandingSettings']['channel']['keywords']
except KeyError as ignore:
keywords = "default value"
or, let say, use .has_key(...). IMHO In your case first solution is preferable
suppose you have a dict, you have two options to handle the key-not-exist situation:
1) get the key with default value, like
d = {}
val = d.get('k', 10)
val will be 10 since there is not a key named k
2) try-except
d = {}
try:
val = d['k']
except KeyError:
val = 10
This way is far more flexible since you can do anything in the except block, even ignore the error with a pass statement if you really don't care about it.
How do I make sure that I check my JSON output
At this point your "JSON output" is just a plain native Python dict
for a key before assigning it to a variable? If a key isn’t found, then I just want to assign a default value
Now you know you have a dict, browsing the official documention for dict methods should answer the question:
https://docs.python.org/3/library/stdtypes.html#dict.get
get(key[, default])
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
so the general case is:
var = data.get(key, default)
Now if you have deeply nested dicts/lists where any key or index could be missing, catching KeyErrors and IndexErrors can be simpler:
try:
var = data[key1][index1][key2][index2][keyN]
except (KeyError, IndexError):
var = default
As a side note: your code snippet is filled with repeated channel_info_response_data['items'][0]['statistics'] and channel_info_response_data['items'][0]['snippet'] expressions. Using intermediate variables will make your code more readable, easier to maintain, AND a bit faster too:
# always set a timeout if you don't want the program to hang forever
channel_info_response = requests.get(channel_info_url, timeout=30)
# always check the response status - having a response doesn't
# mean you got what you expected. Here we use the `raise_for_status()`
# shortcut which will raise an exception if we have anything else than
# a 200 OK.
channel_info_response.raise_for_status()
# requests knows how to deal with json:
channel_info_response_data = channel_info_response.json()
# we assume that the response MUST have `['items'][0]`,
# and that this item MUST have "statistics" and "snippets"
item = channel_info_response_data['items'][0]
stats = item["statistics"]
snippet = item["snippet"]
no_of_videos = int(stats.get('videoCount', 0))
no_of_subscribers = int(stats.get('subscriberCount', 0))
no_of_views = int(stats.get('viewCount', 0))
avg_views = round(no_of_views / no_of_videos, 0)
try:
photo = snippet['thumbnails']['high']['url']
except KeyError:
photo = None
description = snippet.get('description', "")
start_date = snippet.get('publishedAt', None)
title = snippet.get('title', "")
try:
keywords = item['brandingSettings']['channel']['keywords']
except KeyError
keywords = ""
You may also want to learn about string formatting (contatenating strings is quite error prone and barely readable), and how to pass arguments to requests.get()

Field is present in output, but getting KeyError

Below is my output. As you can see, children is clearly a (dictionary) field in my response.
This code works perfectly, but it keeps any nested fields (lists or dictionaries) as is:
user = ""
password = getattr(config, 'password')
url = ''
req = requests.post(url = url, auth=(user, password))
print('Authentication succesful!/n')
ans = req.json()
#Transform resultList into Pandas DF
solr_df = pd.DataFrame.from_dict(json_normalize(ans['resultList']), orient='columns')
I instead would like to normalize the "children" field, so I did the following instead of the last row above:
solr_df = pd.DataFrame()
for record in ans['resultList']:
df = pd.DataFrame(record['children'])
df['contactId'] = record['contactId']
solr_df = solr_df.append(df)
However, I am getting a KeyError: 'children'.
Can anyone suggest what I am doing wrong?
One of your records is probably missing the 'children' key so catch that exception and continue processing the rest of the output.
solr_df = pd.DataFrame()
for record in ans['resultList']:
try:
df = pd.DataFrame(record['children'])
df['contactId'] = record['contactId']
solr_df = solr_df.append(df)
except KeyError as e:
print("Record {} triggered {}".format(record, e))
Since the message is KeyError: 'children', the only plausible reason for the error is that the children key is missing in one of the dicts. You can avoid the exception by using a try/except block, or can pass in a default value for the key, like:
solr_df = pd.DataFrame()
for record in ans['resultList']:
df = pd.DataFrame(record.get('children', {})
df['contactId'] = record.get('contactId')
solr_df = solr_df.append(df)

Proper way of handling JSON Parsing TypeError when element does not exist

The code get's me what I want in the end. (which is to create a list of dictionary of the fields I want from a very large json dataset, so that I can create a dataframe for additional data processing)
However I have to construct a very large try/expect block to get this done. I am was wondering if there is a clearer/clever way of doing this.
The problem I'm having is that the details['element'] sometimes don't exist or have a value, which throws a NoneType exception if it does not exist on the child element['Value'] cannot be grabbed because it does not exist.
So I have a very large try/except block to set the variable to '' if that happens.
I tried to send the details['element'] to a function that would output a return value to the variable...but it looks like I can't do that, because Python checks if the element is a NoneType before passing it through the function, and this happens before sending it to the function.
Any thoughts?
rawJson = json.loads(data.decode('utf-8'))
issues = rawJson['issues']
print('Parsing data...')
for ticket in issues:
details = ticket['fields']
try:
key = ticket['key']
except TypeError:
key = ''
try:
issueType = details['issuetype']['name']
except TypeError:
issueType = ''
try:
description = details['description']
except TypeError:
description = ''
try:
status = details['status']['name']
except TypeError:
status = ''
try:
creator = details['creator']['displayName']
except TypeError:
creator =''
try:
assignee = details['assignee']['displayName']
except TypeError:
assignee =''
try:
lob = details['customfield_10060']['value']
except TypeError:
lob =''
.... There is a long list of this
You can use get method which allows to provide a default value to simplify this code:
d = {'a': 1, 'c': 2}
value = d.get('a', 0) // value = 1 here because d['a'] exists
value = d.get('b', 0) // value = 0 here because d['b'] does not exist
So you can write:
for ticket in issues:
details = ticket['fields']
key = ticket.get('key', '')
description = details.get('description', '')
issueType = details['issuetype'].get('name') if 'issuetype' in details else ''
...

Shorter try except block for many vars

I am parsing some content out of different urls. Not all the urls have the same structure and thus the code fails for some urls, so the code I came up with is this (simplified version):
meta_dict = {}
try:
meta_dict['date_published'] = html.find('date'}).text
except:
meta_dict['date_published'] = ''
try:
meta_dict['headline'] = html.find('headline').text
except:
meta_dict['headline']
try:
meta_dict['description'] = html.find('description').text
except:
meta_dict['description']
return meta_dict
This being a simplified block, but the idea is to try and get more than 50 variables and doing a try except block for every one of them just feels too repetitive and ugly in the code too.
I know I could make a function for it and return '' if it fails, but I want to know if there is another way to handle this case.
l = [('date_published', 'date'), ('headline', 'headline'), ('description', 'description')]
for dict_val, html_val in l:
try:
meta_dict[dict_val] = html.find(html_val).text
except:
meta_dict[dict_val] = ''
If the list of these variables you are checking is constant, you can put them into a list and then just iterate over that list.
vars = [date_publis, hedheadline, description, . . . ]
for var in vars:
try:
meta_dict[var] = html.find(var}).text
except:
meta_dict[var] = ''

Categories