Python: Handle Missing Object keys in mapping and continue instructions

Python: Handle Missing Object keys in mapping and continue instructions - python

I'm fairly new to Python so bear with me please.
I have a function that takes two parameters, an api response and an output object, i need to assign some values from the api response to the output object:
def map_data(output, response):
try:
output['car']['name'] = response['name']
output['car']['color'] = response['color']
output['car']['date'] = response['date']
#other mapping
.
.
.
.
#other mapping
except KeyError as e:
logging.error("Key Missing in api Response: %s", str(e))
pass
return output
Now sometimes, the api response is missing some keys i'm using to generate my output object, so i used the KeyError exception to handle this case.
Now my question is, in a case where the 'color' key is missing from the api response, how can i catch the exception and continue to the line after it output['car']['date'] = response['date'] and the rest of the instructions.
i tried the pass instruction but it didn't have any affect.
Ps: i know i can check the existence of the key using:
if response.get('color') is not None:
output['car']['color'] = response['color']
and then assign the values but seeing that i have about 30 values i need to map, is there any other way i can implement ? Thank you

A few immediate ideas
(FYI - I'm not going to explain everything in detail - you can check out the python docs for more info, examples etc - that will help you learn more, rather than trying to explain everything here)
Google 'python handling dict missing keys' for a million methods/ideas/approaches - it's a common use case!
Convert your response dict to a defaultdict. In that case you can have a default value returned (eg None, '', 'N/A' ...whatever you like) if there is no actual value returned.
In this case you could do away with the try and every line would be executed.
from collections import defaultdict
resp=defaultdict(lambda: 'NA', response)
output['car']['date'] = response['date'] # will have value 'NA' if 'date' isnt in response
Use the in syntax, perhaps in combination with a ternary else
output['car']['color'] = response['color'] if 'color' in response
output['car']['date'] = response['date'] if 'date' in response else 'NA'
Again you can do away with the try block and every line will execute.
Use the dictionary get function, which allows you to specify a default if there is no value for that key:
output['car']['color'] = response.get('car', 'no car specified')

You can create a utility function that gets the value from the response and if the value is not found, it returns an empty string. See example below:
def get_value_from_response_or_null(response, key):
try:
value = response[key]
return value
except KeyError as e:
logging.error("Key Missing in api Response: %s", str(e))
return ""
def map_data(output, response):
output['car']['name'] = get_value_from_response_or_null(response, 'name')
output['car']['color'] = get_value_from_response_or_null(response, 'color')
output['car']['date'] = get_value_from_response_or_null(response, 'date')
# other mapping
# other mapping
return output

Related

How to pass value in URL in django-restframework

I have been following below tutorial and it's just showing value in POSTMAN
https://dzone.com/articles/create-a-simple-api-using-django-rest-framework-in
Code:
#api_view(["POST"])
def IdealWeight(heightdata):
try:
height=json.loads(heightdata.body)
weight=str(height*10)
return JsonResponse("Ideal weight should be:"+weight+" kg",safe=False)
except ValueError as e:
return Response(e.args[0],status.HTTP_400_BAD_REQUEST)
I want to use GET method instead of POST and pass value of height in url,
example: http://127.0.0.1:8000/IdealWeight/height=20
and result should be visible in the form of JSON

You can get value from url using query_params
Try this:
#api_view(["GET"])
def IdealWeight(request):
try:
height= int(request.query_params.get('height'))
weight=(height)*10
return JsonResponse({"Weight should be": "{}".format(weight)}, status=status. HTTP_200_OK)
except ValueError as e:
return Response(e.args[0],status.HTTP_400_BAD_REQUEST)
Type below URL to get result:
http://127.0.0.1:8000/IdealWeight/?height=20
Make sure in pass / in your urls.py, since you're using GET method:
path('IdealWeight/',views.IdealWeight,name='IdealWeight'),

You can use query params as a medium to pass height in the URL.
http://127.0.0.1:8000/IdealWeight/?height=20
Query parameters also called Query string are optional key-value pairs that appear to the right of the ? in a URL.
key1=value1&key2=value2&key3=value3...
Within each pair, the key and value are separated by an equals sign =.
The series of pairs are separated by the ampersand, &
#api_view(["GET"])
def IdealWeight(request):
try:
height=int(request.query_params.get('height', 0))
weight=str(height*10)
return JsonResponse("Ideal weight should be:"+weight+" kg",safe=False)
except ValueError as e:
return Response(e.args[0],status.HTTP_400_BAD_REQUEST)

Allowing empty dates with Marshmallow

I try to get data from a webpage. This page contains several release information, but allow values not to be set. I.e. the date for testing from/to might be an empty string.
Now I try to deserialize all my data sucked from the page to insert it to a database and face problems handling empty dates.
from marshmallow import fields, Schema, ValidationError
class TestSchema(Schema):
training_necessary = fields.Function(deserialize=lambda x: True if x == 'Yes' else False)
test_from = fields.Date()
test_to = fields.Date()
data = dict(training_necessary='Yes', test_from='', test_to='')
try:
validated = TestSchema().load(data)
except ValidationError as err:
print(f"{err}")
Result:
{'test_to': ['Not a valid date.'], 'test_from': ['Not a valid date.']}
I already tried several combinations of allow_none=True or default='' but none of them helped my to get through. So, how to manage to allow empty dates? Setting a default to somewhat like 1970-01-01 won't help in that case.
Any hints?
Regards, Thomas
+++ EDIT: SOLUTION +++
Here's the working code I ended up after Jérômes helpful tipp:
from marshmallow import fields, Schema, ValidationError, pre_load
class TestSchema(Schema):
training_necessary = fields.Function(deserialize=lambda x: True if x == 'Yes' else False)
test_from = fields.Date(allow_none=True)
test_to = fields.Date(allow_none=True)
#pre_load(pass_many=False)
def string_to_none(self, data, many, **kwargs):
turn_to_none = lambda x: None if x == '' else x
for k, v in data.items():
data[k] = turn_to_none(v)
return data
data = dict(training_necessary='Yes', test_from='', test_to='')
try:
validated = TestSchema().load(data)
except ValidationError as err:
print(f"{err}")

I would pass no value at all.
data = dict(training_necessary='Yes')
Or I'd make the date fields allow_none and I'd pass None, not an empty string.
data = dict(training_necessary='Yes', test_from=None, test_to=None)
If the issue is that your input contains empty strings, I'd say this is a client issue, but you can add a pre_load method to delete empty strings from the input before deserializing. This is more or less equivalent to modifying the values you scrape from the page before feeding them to marshmallow.

Django - checking if instance exists results in internal server error 500

I am trying to check if I have an entry in my database using this code:
def device_update(request):
json_data = json.loads(request.body)
email = json_data['email']
imei = json_data['imei']
sdk_version = json_data['sdk_version']
date = json_data['updateDate']
rule = json_data['ruleName']
group_name = json_data['group']
if Group.objects.filter(group=group_name).exists():
print("group does exists")
else:
print("group doesn't exists")
return HttpResponse("Successful")
However, when the code reaches the if statement to check if the group exists, it returns error 500.
I tried to check with two groups one that exists and another one that doesn't, in both cases I got error 500.
How can I fix this and why is this happening?

The logic for checking if a Group exists, i.e. the line:
if Group.objects.filter(group=group_name).exists()
is not throwing the error here. It is likely that json_data is missing one of the keys you expect it to have, for example, 'group'.
I'd recommend using the get method that dictionaries have. This provides default values when the specified key is not present in the dictionary. You should also have error handling for when the request body is not in valid JSON format.
Here's an example:
def device_update(request):
try:
json_data = json.loads(request.body)
except json.JSONDecodeError:
return HttpResponse('Request body must be in valid JSON format')
email = json_data.get('email', '')
imei = json_data.get('imei', '')
sdk_version = json_data.get('sdk_version', '')
date = json_data.get('updateDate', '')
rule = json_data.get('ruleName', '')
group_name = json_data.get('group', '')
if Group.objects.filter(group=group_name).exists():
print("group does exists")
else:
print("group doesn't exists")
return HttpResponse("Successful")
I set the defaults to the empty string '', but you may want to change that.

Your view doesn't have any error handling. Looking at it quickly, at least two things could go wrong. The request body might not be valid json, and if it is valid json, it might not contain the required keys.
def device_update(request):
try:
json_data = json.loads(request.body)
except ValueError:
return HttpResponse("Invalid json")
try:
email = json_data['email']
imei = json_data['imei']
sdk_version = json_data['sdk_version']
date = json_data['updateDate']
rule = json_data['ruleName']
group_name = json_data['group']
except KeyError as e:
return HttpResponse("Missing Key %s" % e[0])
...
Writing your own validation for a single view like this is ok. As it gets more complicated, you might want to look at django rest framework. It has serializers which will help you manage validation.

Alasdair/Keselme, looks that your view is correct.
Try to put the ipdb into your code in order to debug your code, and than you can print the request.data and see what is comming in the request.

Check that a key from json output exists

I keep getting the following error when trying to parse some json:
Traceback (most recent call last):
File "/Users/batch/projects/kl-api/api/helpers.py", line 37, in collect_youtube_data
keywords = channel_info_response_data['items'][0]['brandingSettings']['channel']['keywords']
KeyError: 'brandingSettings'
How do I make sure that I check my JSON output for a key before assigning it to a variable? If a key isn’t found, then I just want to assign a default value. Code below:
try:
channel_id = channel_id_response_data['items'][0]['id']
channel_info_url = YOUTUBE_URL + '/channels/?key=' + YOUTUBE_API_KEY + '&id=' + channel_id + '&part=snippet,contentDetails,statistics,brandingSettings'
print('Querying:', channel_info_url)
channel_info_response = requests.get(channel_info_url)
channel_info_response_data = json.loads(channel_info_response.content)
no_of_videos = int(channel_info_response_data['items'][0]['statistics']['videoCount'])
no_of_subscribers = int(channel_info_response_data['items'][0]['statistics']['subscriberCount'])
no_of_views = int(channel_info_response_data['items'][0]['statistics']['viewCount'])
avg_views = round(no_of_views / no_of_videos, 0)
photo = channel_info_response_data['items'][0]['snippet']['thumbnails']['high']['url']
description = channel_info_response_data['items'][0]['snippet']['description']
start_date = channel_info_response_data['items'][0]['snippet']['publishedAt']
title = channel_info_response_data['items'][0]['snippet']['title']
keywords = channel_info_response_data['items'][0]['brandingSettings']['channel']['keywords']
except Exception as e:
raise Exception(e)

You can either wrap all your assignment in something like
try:
keywords = channel_info_response_data['items'][0]['brandingSettings']['channel']['keywords']
except KeyError as ignore:
keywords = "default value"
or, let say, use .has_key(...). IMHO In your case first solution is preferable

suppose you have a dict, you have two options to handle the key-not-exist situation:
1) get the key with default value, like
d = {}
val = d.get('k', 10)
val will be 10 since there is not a key named k
2) try-except
d = {}
try:
val = d['k']
except KeyError:
val = 10
This way is far more flexible since you can do anything in the except block, even ignore the error with a pass statement if you really don't care about it.

How do I make sure that I check my JSON output
At this point your "JSON output" is just a plain native Python dict
for a key before assigning it to a variable? If a key isn’t found, then I just want to assign a default value
Now you know you have a dict, browsing the official documention for dict methods should answer the question:
https://docs.python.org/3/library/stdtypes.html#dict.get
get(key[, default])
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
so the general case is:
var = data.get(key, default)
Now if you have deeply nested dicts/lists where any key or index could be missing, catching KeyErrors and IndexErrors can be simpler:
try:
var = data[key1][index1][key2][index2][keyN]
except (KeyError, IndexError):
var = default
As a side note: your code snippet is filled with repeated channel_info_response_data['items'][0]['statistics'] and channel_info_response_data['items'][0]['snippet'] expressions. Using intermediate variables will make your code more readable, easier to maintain, AND a bit faster too:
# always set a timeout if you don't want the program to hang forever
channel_info_response = requests.get(channel_info_url, timeout=30)
# always check the response status - having a response doesn't
# mean you got what you expected. Here we use the `raise_for_status()`
# shortcut which will raise an exception if we have anything else than
# a 200 OK.
channel_info_response.raise_for_status()
# requests knows how to deal with json:
channel_info_response_data = channel_info_response.json()
# we assume that the response MUST have `['items'][0]`,
# and that this item MUST have "statistics" and "snippets"
item = channel_info_response_data['items'][0]
stats = item["statistics"]
snippet = item["snippet"]
no_of_videos = int(stats.get('videoCount', 0))
no_of_subscribers = int(stats.get('subscriberCount', 0))
no_of_views = int(stats.get('viewCount', 0))
avg_views = round(no_of_views / no_of_videos, 0)
try:
photo = snippet['thumbnails']['high']['url']
except KeyError:
photo = None
description = snippet.get('description', "")
start_date = snippet.get('publishedAt', None)
title = snippet.get('title', "")
try:
keywords = item['brandingSettings']['channel']['keywords']
except KeyError
keywords = ""
You may also want to learn about string formatting (contatenating strings is quite error prone and barely readable), and how to pass arguments to requests.get()

Getting wrong result from JSON - Python 3

Im working on a small project of retrieving information about books from the Google Books API using Python 3. For this i make a call to the API, read out the variables and store those in a list. For a search like "linkedin" this works perfectly. However when i enter "Google", it reads the second title from the JSON input. How can this happen?
Please find my code below (Google_Results is the class I use to initialize the variables):
import requests
def Book_Search(search_term):
parms = {"q": search_term, "maxResults": 3}
r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
print(r.url)
results = r.json()
i = 0
for result in results["items"]:
try:
isbn13 = str(result["volumeInfo"]["industryIdentifiers"][0]["identifier"])
isbn10 = str(result["volumeInfo"]["industryIdentifiers"][1]["identifier"])
title = str(result["volumeInfo"]["title"])
author = str(result["volumeInfo"]["authors"])[2:-2]
publisher = str(result["volumeInfo"]["publisher"])
published_date = str(result["volumeInfo"]["publishedDate"])
description = str(result["volumeInfo"]["description"])
pages = str(result["volumeInfo"]["pageCount"])
genre = str(result["volumeInfo"]["categories"])[2:-2]
language = str(result["volumeInfo"]["language"])
image_link = str(result["volumeInfo"]["imageLinks"]["thumbnail"])
dict = Google_Results(isbn13, isbn10, title, author, publisher, published_date, description, pages, genre,
language, image_link)
gr.append(dict)
print(gr[i].title)
i += 1
except:
pass
return
gr = []
Book_Search("Linkedin")
I am a beginner to Python, so any help would be appreciated!

It does so because there is no publisher entry in volumeInfo of the first entry, thus it raises a KeyError and your except captures it. If you're going to work with fuzzy data you have to account for the fact that it will not always have the expected structure. For simple cases you can rely on dict.get() and its default argument to return a 'valid' default entry if an entry is missing.
Also, there are a few conceptual problems with your function - it relies on a global gr which is bad design, it shadows the built-in dict type and it captures all exceptions guaranteeing that you cannot exit your code even with a SIGINT... I'd suggest you to convert it to something a bit more sane:
def book_search(search_term, max_results=3):
results = [] # a list to store the results
parms = {"q": search_term, "maxResults": max_results}
r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
try: # just in case the server doesn't return valid JSON
for result in r.json().get("items", []):
if "volumeInfo" not in result: # invalid entry - missing volumeInfo
continue
result_dict = {} # a dictionary to store our discovered fields
result = result["volumeInfo"] # all the data we're interested is in volumeInfo
isbns = result.get("industryIdentifiers", None) # capture ISBNs
if isinstance(isbns, list) and isbns:
for i, t in enumerate(("isbn10", "isbn13")):
if len(isbns) > i and isinstance(isbns[i], dict):
result_dict[t] = isbns[i].get("identifier", None)
result_dict["title"] = result.get("title", None)
authors = result.get("authors", None) # capture authors
if isinstance(authors, list) and len(authors) > 2: # you're slicing from 2
result_dict["author"] = str(authors[2:-2])
result_dict["publisher"] = result.get("publisher", None)
result_dict["published_date"] = result.get("publishedDate", None)
result_dict["description"] = result.get("description", None)
result_dict["pages"] = result.get("pageCount", None)
genres = result.get("authors", None) # capture genres
if isinstance(genres, list) and len(genres) > 2: # since you're slicing from 2
result_dict["genre"] = str(genres[2:-2])
result_dict["language"] = result.get("language", None)
result_dict["image_link"] = result.get("imageLinks", {}).get("thumbnail", None)
# make sure Google_Results accepts keyword arguments like title, author...
# and make them optional as they might not be in the returned result
gr = Google_Results(**result_dict)
results.append(gr) # add it to the results list
except ValueError:
return None # invalid response returned, you may raise an error instead
return results # return the results
Then you can easily retrieve as much info as possible for a term:
gr = book_search("Google")
And it will be far more tolerant of data omissions, provided that your Google_Results type makes most of the entries optional.

Following #Coldspeed's recommendation it became clear that missing information in the JSON file caused the exception to run. Since I only had a "pass" statement there it skipped the entire result. Therefore I will have to adapt the "Try and Except" statements so errors do get handled properly.
Thanks for the help guys!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Handle Missing Object keys in mapping and continue instructions - python

Related

How to pass value in URL in django-restframework

Allowing empty dates with Marshmallow

Django - checking if instance exists results in internal server error 500

Check that a key from json output exists

Getting wrong result from JSON - Python 3

Categories

Resources