I try to get data from a webpage. This page contains several release information, but allow values not to be set. I.e. the date for testing from/to might be an empty string.
Now I try to deserialize all my data sucked from the page to insert it to a database and face problems handling empty dates.
from marshmallow import fields, Schema, ValidationError
class TestSchema(Schema):
training_necessary = fields.Function(deserialize=lambda x: True if x == 'Yes' else False)
test_from = fields.Date()
test_to = fields.Date()
data = dict(training_necessary='Yes', test_from='', test_to='')
try:
validated = TestSchema().load(data)
except ValidationError as err:
print(f"{err}")
Result:
{'test_to': ['Not a valid date.'], 'test_from': ['Not a valid date.']}
I already tried several combinations of allow_none=True or default='' but none of them helped my to get through. So, how to manage to allow empty dates? Setting a default to somewhat like 1970-01-01 won't help in that case.
Any hints?
Regards, Thomas
+++ EDIT: SOLUTION +++
Here's the working code I ended up after Jérômes helpful tipp:
from marshmallow import fields, Schema, ValidationError, pre_load
class TestSchema(Schema):
training_necessary = fields.Function(deserialize=lambda x: True if x == 'Yes' else False)
test_from = fields.Date(allow_none=True)
test_to = fields.Date(allow_none=True)
#pre_load(pass_many=False)
def string_to_none(self, data, many, **kwargs):
turn_to_none = lambda x: None if x == '' else x
for k, v in data.items():
data[k] = turn_to_none(v)
return data
data = dict(training_necessary='Yes', test_from='', test_to='')
try:
validated = TestSchema().load(data)
except ValidationError as err:
print(f"{err}")
I would pass no value at all.
data = dict(training_necessary='Yes')
Or I'd make the date fields allow_none and I'd pass None, not an empty string.
data = dict(training_necessary='Yes', test_from=None, test_to=None)
If the issue is that your input contains empty strings, I'd say this is a client issue, but you can add a pre_load method to delete empty strings from the input before deserializing. This is more or less equivalent to modifying the values you scrape from the page before feeding them to marshmallow.
Related
I'm fairly new to Python so bear with me please.
I have a function that takes two parameters, an api response and an output object, i need to assign some values from the api response to the output object:
def map_data(output, response):
try:
output['car']['name'] = response['name']
output['car']['color'] = response['color']
output['car']['date'] = response['date']
#other mapping
.
.
.
.
#other mapping
except KeyError as e:
logging.error("Key Missing in api Response: %s", str(e))
pass
return output
Now sometimes, the api response is missing some keys i'm using to generate my output object, so i used the KeyError exception to handle this case.
Now my question is, in a case where the 'color' key is missing from the api response, how can i catch the exception and continue to the line after it output['car']['date'] = response['date'] and the rest of the instructions.
i tried the pass instruction but it didn't have any affect.
Ps: i know i can check the existence of the key using:
if response.get('color') is not None:
output['car']['color'] = response['color']
and then assign the values but seeing that i have about 30 values i need to map, is there any other way i can implement ? Thank you
A few immediate ideas
(FYI - I'm not going to explain everything in detail - you can check out the python docs for more info, examples etc - that will help you learn more, rather than trying to explain everything here)
Google 'python handling dict missing keys' for a million methods/ideas/approaches - it's a common use case!
Convert your response dict to a defaultdict. In that case you can have a default value returned (eg None, '', 'N/A' ...whatever you like) if there is no actual value returned.
In this case you could do away with the try and every line would be executed.
from collections import defaultdict
resp=defaultdict(lambda: 'NA', response)
output['car']['date'] = response['date'] # will have value 'NA' if 'date' isnt in response
Use the in syntax, perhaps in combination with a ternary else
output['car']['color'] = response['color'] if 'color' in response
output['car']['date'] = response['date'] if 'date' in response else 'NA'
Again you can do away with the try block and every line will execute.
Use the dictionary get function, which allows you to specify a default if there is no value for that key:
output['car']['color'] = response.get('car', 'no car specified')
You can create a utility function that gets the value from the response and if the value is not found, it returns an empty string. See example below:
def get_value_from_response_or_null(response, key):
try:
value = response[key]
return value
except KeyError as e:
logging.error("Key Missing in api Response: %s", str(e))
return ""
def map_data(output, response):
output['car']['name'] = get_value_from_response_or_null(response, 'name')
output['car']['color'] = get_value_from_response_or_null(response, 'color')
output['car']['date'] = get_value_from_response_or_null(response, 'date')
# other mapping
# other mapping
return output
I would like to change the way the relationships are displayed in the Flask-Admin Index view of a Model. I do have two models connected through a many-to-many relationship which get displayed in the admin index view as well. Unfortunately, the relationships are just separated using a comma, which thus the user might lose the overview quickly. Ideally, I would like to convert the relationships entries into a simple list (e.g. like with li in HTML).
Is there an easy way of achieving this?
Thanks a lot!
Instead of the comma, you can use a <br> tag as the separator.
see column_type_formatters.
Define default formatter and update a list type.
def lineby_list_formatter(view, values):
html = u'<br/> '.join(str(v) for v in values)
return Markup(html)
MY_DEFAULT_FORMATTERS = dict(typefmt.BASE_FORMATTERS)
MY_DEFAULT_FORMATTERS.update({
list: lineby_list_formatter
})
class EventView(ModelView):
...
column_type_formatters = MY_DEFAULT_FORMATTERS
Ok... I figured it out myself: You can manipulate the way the data is rendered with overwriting the function _get_list_value(). see code below
def _get_list_value(self, context, model, name, column_formatters,
column_type_formatters):
"""
Returns the value to be displayed.
:param context:
:py:class:`jinja2.runtime.Context` if available
:param model:
Model instance
:param name:
Field name
:param column_formatters:
column_formatters to be used.
:param column_type_formatters:
column_type_formatters to be used.
"""
column_fmt = column_formatters.get(name)
if column_fmt is not None:
value = column_fmt(self, context, model, name)
else:
value = self._get_field_value(model, name)
choices_map = self._column_choices_map.get(name, {})
if choices_map:
return choices_map.get(value) or value
type_fmt = None
for typeobj, formatter in column_type_formatters.items():
if isinstance(value, typeobj):
type_fmt = formatter
break
if type_fmt is not None:
value = type_fmt(self, value)
### overwritten here
if name == 'items':
html_string = '<ul>'
for item in value.split(','):
html_string += '<li> {} </li>'.format(item)
html_string += '</ul>'
value = Markup(html_string)
return value
I am trying to check if I have an entry in my database using this code:
def device_update(request):
json_data = json.loads(request.body)
email = json_data['email']
imei = json_data['imei']
sdk_version = json_data['sdk_version']
date = json_data['updateDate']
rule = json_data['ruleName']
group_name = json_data['group']
if Group.objects.filter(group=group_name).exists():
print("group does exists")
else:
print("group doesn't exists")
return HttpResponse("Successful")
However, when the code reaches the if statement to check if the group exists, it returns error 500.
I tried to check with two groups one that exists and another one that doesn't, in both cases I got error 500.
How can I fix this and why is this happening?
The logic for checking if a Group exists, i.e. the line:
if Group.objects.filter(group=group_name).exists()
is not throwing the error here. It is likely that json_data is missing one of the keys you expect it to have, for example, 'group'.
I'd recommend using the get method that dictionaries have. This provides default values when the specified key is not present in the dictionary. You should also have error handling for when the request body is not in valid JSON format.
Here's an example:
def device_update(request):
try:
json_data = json.loads(request.body)
except json.JSONDecodeError:
return HttpResponse('Request body must be in valid JSON format')
email = json_data.get('email', '')
imei = json_data.get('imei', '')
sdk_version = json_data.get('sdk_version', '')
date = json_data.get('updateDate', '')
rule = json_data.get('ruleName', '')
group_name = json_data.get('group', '')
if Group.objects.filter(group=group_name).exists():
print("group does exists")
else:
print("group doesn't exists")
return HttpResponse("Successful")
I set the defaults to the empty string '', but you may want to change that.
Your view doesn't have any error handling. Looking at it quickly, at least two things could go wrong. The request body might not be valid json, and if it is valid json, it might not contain the required keys.
def device_update(request):
try:
json_data = json.loads(request.body)
except ValueError:
return HttpResponse("Invalid json")
try:
email = json_data['email']
imei = json_data['imei']
sdk_version = json_data['sdk_version']
date = json_data['updateDate']
rule = json_data['ruleName']
group_name = json_data['group']
except KeyError as e:
return HttpResponse("Missing Key %s" % e[0])
...
Writing your own validation for a single view like this is ok. As it gets more complicated, you might want to look at django rest framework. It has serializers which will help you manage validation.
Alasdair/Keselme, looks that your view is correct.
Try to put the ipdb into your code in order to debug your code, and than you can print the request.data and see what is comming in the request.
Im working on a small project of retrieving information about books from the Google Books API using Python 3. For this i make a call to the API, read out the variables and store those in a list. For a search like "linkedin" this works perfectly. However when i enter "Google", it reads the second title from the JSON input. How can this happen?
Please find my code below (Google_Results is the class I use to initialize the variables):
import requests
def Book_Search(search_term):
parms = {"q": search_term, "maxResults": 3}
r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
print(r.url)
results = r.json()
i = 0
for result in results["items"]:
try:
isbn13 = str(result["volumeInfo"]["industryIdentifiers"][0]["identifier"])
isbn10 = str(result["volumeInfo"]["industryIdentifiers"][1]["identifier"])
title = str(result["volumeInfo"]["title"])
author = str(result["volumeInfo"]["authors"])[2:-2]
publisher = str(result["volumeInfo"]["publisher"])
published_date = str(result["volumeInfo"]["publishedDate"])
description = str(result["volumeInfo"]["description"])
pages = str(result["volumeInfo"]["pageCount"])
genre = str(result["volumeInfo"]["categories"])[2:-2]
language = str(result["volumeInfo"]["language"])
image_link = str(result["volumeInfo"]["imageLinks"]["thumbnail"])
dict = Google_Results(isbn13, isbn10, title, author, publisher, published_date, description, pages, genre,
language, image_link)
gr.append(dict)
print(gr[i].title)
i += 1
except:
pass
return
gr = []
Book_Search("Linkedin")
I am a beginner to Python, so any help would be appreciated!
It does so because there is no publisher entry in volumeInfo of the first entry, thus it raises a KeyError and your except captures it. If you're going to work with fuzzy data you have to account for the fact that it will not always have the expected structure. For simple cases you can rely on dict.get() and its default argument to return a 'valid' default entry if an entry is missing.
Also, there are a few conceptual problems with your function - it relies on a global gr which is bad design, it shadows the built-in dict type and it captures all exceptions guaranteeing that you cannot exit your code even with a SIGINT... I'd suggest you to convert it to something a bit more sane:
def book_search(search_term, max_results=3):
results = [] # a list to store the results
parms = {"q": search_term, "maxResults": max_results}
r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
try: # just in case the server doesn't return valid JSON
for result in r.json().get("items", []):
if "volumeInfo" not in result: # invalid entry - missing volumeInfo
continue
result_dict = {} # a dictionary to store our discovered fields
result = result["volumeInfo"] # all the data we're interested is in volumeInfo
isbns = result.get("industryIdentifiers", None) # capture ISBNs
if isinstance(isbns, list) and isbns:
for i, t in enumerate(("isbn10", "isbn13")):
if len(isbns) > i and isinstance(isbns[i], dict):
result_dict[t] = isbns[i].get("identifier", None)
result_dict["title"] = result.get("title", None)
authors = result.get("authors", None) # capture authors
if isinstance(authors, list) and len(authors) > 2: # you're slicing from 2
result_dict["author"] = str(authors[2:-2])
result_dict["publisher"] = result.get("publisher", None)
result_dict["published_date"] = result.get("publishedDate", None)
result_dict["description"] = result.get("description", None)
result_dict["pages"] = result.get("pageCount", None)
genres = result.get("authors", None) # capture genres
if isinstance(genres, list) and len(genres) > 2: # since you're slicing from 2
result_dict["genre"] = str(genres[2:-2])
result_dict["language"] = result.get("language", None)
result_dict["image_link"] = result.get("imageLinks", {}).get("thumbnail", None)
# make sure Google_Results accepts keyword arguments like title, author...
# and make them optional as they might not be in the returned result
gr = Google_Results(**result_dict)
results.append(gr) # add it to the results list
except ValueError:
return None # invalid response returned, you may raise an error instead
return results # return the results
Then you can easily retrieve as much info as possible for a term:
gr = book_search("Google")
And it will be far more tolerant of data omissions, provided that your Google_Results type makes most of the entries optional.
Following #Coldspeed's recommendation it became clear that missing information in the JSON file caused the exception to run. Since I only had a "pass" statement there it skipped the entire result. Therefore I will have to adapt the "Try and Except" statements so errors do get handled properly.
Thanks for the help guys!
I'm writing a python scraper code for OpenData and I have one question about : how to check if all values aren't filled in site and if it is null change value to null.
My scraper is here.
Currently I'm working on it to optimalize.
My variables now look like:
evcisloval = soup.find_all('td')[3].text.strip()
prinalezival = soup.find_all('td')[5].text.strip()
popisfaplnenia = soup.find_all('td')[7].text.replace('\"', '')
hodnotafaplnenia = soup.find_all('td')[9].text[:-1].replace(",", ".").replace(" ", "")
datumdfa = soup.find_all('td')[11].text
datumzfa = soup.find_all('td')[13].text
formazaplatenia = soup.find_all('td')[15].text
obchmenonazov = soup.find_all('td')[17].text
sidlofirmy = soup.find_all('td')[19].text
pravnaforma = soup.find_all('td')[21].text
sudregistracie = soup.find_all('td')[23].text
ico = soup.find_all('td')[25].text
dic = soup.find_all('td')[27].text
cislouctu = soup.find_all('td')[29].text
And Output :
scraperwiki.sqlite.save(unique_keys=["invoice_id"],
data={ "invoice_id":number,
"invoice_price":hodnotafaplnenia,
"evidence_no":evcisloval,
"paired_with":prinalezival,
"invoice_desc":popisfaplnenia,
"date_received":datumdfa,
"date_payment":datumzfa,
"pay_form":formazaplatenia,
"trade_name":obchmenonazov,
"trade_form":pravnaforma,
"company_location":sidlofirmy,
"court":sudregistracie,
"ico":ico,
"dic":dic,
"accout_no":cislouctu,
"invoice_attachment":urlfa,
"invoice_url":url})
I googled it but without success.
First, write a configuration dict of your variables in the form:
conf = {'evidence_no': (3, str.strip),
'trade_form': (21, None),
...}
i.e. key is the output key, value is a tuple of id from soup.find_all('td') and of an optional function that has to be applied to the result, None otherwise. You don't need those Slavic variable names that may confuse other SO members.
Then iterate over conf and fill the data dict.
Also, run soup.find_all('td') before the loop.
tds = soup.find_all('td')
data = {}
for name, (num, func) in conf.iteritems():
text = tds[num].text
# replace text with None or "NULL" or whatever if needed
...
if func is None:
data[name] = text
else:
data[name] = func(text)
This will remove a lot of duplicated code. Easier to maintain.
Also, I am not sure the strings "NULL" are the best way to write missing data. Doesn't sqlite support Python's real None objects?
Just read your attached link, and it seems what you want is
evcisloval = soup.find_all('td')[3].text.strip() or "NULL"
But be careful. You should only do this with strings. If the part before or is either empty or False or None, or 0, they will all be replaced with "NULL"