I have a dictionary of values. This is for a company name.
It has 3 keys:
{'html_attributions': [],
'result' : {'Address': '123 Street', 'website' :'123street.com'
'status': 'Ok' }
I have a dataframe of many dictionaries. I want to loop through each row's dictionary and get the necessary information I want.
Currently I am writing for loops to retrieve these information. Is there a more efficient way to retrieve these information?
addresses = []
for i in range(len(testing)):
try:
addresses.append(testing['Results_dict'][i]['result']['Address'])
except:
addresses.append('No info')
What I have works perfectly fine. However I would like something that would be more efficient. Perhaps using the get() method? but I don't know how I can call to get the inside of 'result'.
Try this:
def get_address(r):
try:
return r['result']['Address']
except Exception:
return 'No info'
addresses = df['Results_dict'].map(get_address)
This guards against cases where Result_dict is None, not a dict, or any key along the path way does not exist.
This is a way faster solution if the data is big:
addresses = list(map(lambda x: x.get('result').get('Address', 'No info'), testing['Results_dict']))
Here is how I deal with nested dict keys:
Example:
def keys_exists(element, *keys):
if not isinstance(element, dict):
raise AttributeError('keys_exists() expects dict as first argument.')
if len(keys) == 0:
raise AttributeError('keys_exists() expects at least two arguments,
one given.')
_element = element
for key in keys:
try:
_element = _element[key]
except KeyError:
return False
return True
For data :
{'html_attributions': [],
'result' : {'Address': '123 Street', 'website' :'123street.com'
'status': 'Ok' }
if you want to check result exists or not use above function like this
`print 'result (exists/Not): {}'.format(keys_exists(data,"result"))`
To check address exist inside result Try this
`print 'result > Address (exists/not): {}'.format(keys_exists(data, "result", "Address"))`
It will return output in True/False
Related
I'm fairly new to Python so bear with me please.
I have a function that takes two parameters, an api response and an output object, i need to assign some values from the api response to the output object:
def map_data(output, response):
try:
output['car']['name'] = response['name']
output['car']['color'] = response['color']
output['car']['date'] = response['date']
#other mapping
.
.
.
.
#other mapping
except KeyError as e:
logging.error("Key Missing in api Response: %s", str(e))
pass
return output
Now sometimes, the api response is missing some keys i'm using to generate my output object, so i used the KeyError exception to handle this case.
Now my question is, in a case where the 'color' key is missing from the api response, how can i catch the exception and continue to the line after it output['car']['date'] = response['date'] and the rest of the instructions.
i tried the pass instruction but it didn't have any affect.
Ps: i know i can check the existence of the key using:
if response.get('color') is not None:
output['car']['color'] = response['color']
and then assign the values but seeing that i have about 30 values i need to map, is there any other way i can implement ? Thank you
A few immediate ideas
(FYI - I'm not going to explain everything in detail - you can check out the python docs for more info, examples etc - that will help you learn more, rather than trying to explain everything here)
Google 'python handling dict missing keys' for a million methods/ideas/approaches - it's a common use case!
Convert your response dict to a defaultdict. In that case you can have a default value returned (eg None, '', 'N/A' ...whatever you like) if there is no actual value returned.
In this case you could do away with the try and every line would be executed.
from collections import defaultdict
resp=defaultdict(lambda: 'NA', response)
output['car']['date'] = response['date'] # will have value 'NA' if 'date' isnt in response
Use the in syntax, perhaps in combination with a ternary else
output['car']['color'] = response['color'] if 'color' in response
output['car']['date'] = response['date'] if 'date' in response else 'NA'
Again you can do away with the try block and every line will execute.
Use the dictionary get function, which allows you to specify a default if there is no value for that key:
output['car']['color'] = response.get('car', 'no car specified')
You can create a utility function that gets the value from the response and if the value is not found, it returns an empty string. See example below:
def get_value_from_response_or_null(response, key):
try:
value = response[key]
return value
except KeyError as e:
logging.error("Key Missing in api Response: %s", str(e))
return ""
def map_data(output, response):
output['car']['name'] = get_value_from_response_or_null(response, 'name')
output['car']['color'] = get_value_from_response_or_null(response, 'color')
output['car']['date'] = get_value_from_response_or_null(response, 'date')
# other mapping
# other mapping
return output
I have a JSON file where each object looks like the following example:
[
{
"timestamp": 1569177699,
"attachments": [
],
"data": [
{
"post": "\u00f0\u009f\u0096\u00a4\u00f0\u009f\u0092\u0099"
},
{
"update_timestamp": 1569177699
}
],
"title": "firstName LastName"
}
]
I want to check if, there is the key post, nested within the key data. I wrote this, but it doesn't work:
posts = json.loads(open(file).read())
for post in posts:
if 'data' in post:
if 'post' in post['data']
print post['data']['post']
Here is my solution. I see from your sample data that post["data"] is a list, so the program should iterate over it:
posts = json.loads(open(file).read())
for post in posts:
if 'data' in post:
#THIS IS THE NEW LINE to iterate list
for d in post["data"]:
if 'post' in d:
print d['post']
Try:
posts = json.loads(open(file).read())
for data in posts:
for key, value in data.items():
if key == 'data':
for item in value:
if 'post' in item:
print(key, item['post'])
Try this answer this works!
Elegant way to check if a nested key exists in a python dict
def keys_exists(element, *keys):
'''
Check if *keys (nested) exists in `element` (dict).
'''
if not isinstance(element, dict):
raise AttributeError('keys_exists() expects dict as first argument.')
if len(keys) == 0:
raise AttributeError('keys_exists() expects at least two arguments, one given.')
_element = element
for key in keys:
try:
_element = _element[key]
except KeyError:
return False
return True
You could do it generically by adapting my answer to the question How to find a particular json value by key?.
It's generic in the sense that it doesn't care much about the details of how the JSON data is structured, it just checks every dictionary it finds inside it.
import json
def find_values(id, json_file):
results = []
def _decode_dict(a_dict):
try:
results.append(a_dict[id])
except KeyError:
pass
return a_dict
json.load(json_file, object_hook=_decode_dict) # Return value ignored.
return len(results) > 0 # If there are any results, id was found.
with open('find_key_test.json', 'r') as json_file:
print(find_values('post', json_file)) # -> True
please try the following:
posts = json.loads(open(file).read())
for post in posts:
if 'data' in post:
for data in post['data']:
if 'post' in data:
print(data['post'])
I'm looping through a list of web pages with Scrapy. Some of the pages that I scrape are in error. i want to keep track of the various error types so I have set up my function to first check if a series of error conditions ( which I have placed in a dictionary are true and if none are proceed with normal page scraping:
def parse_detail_page(self, response):
error_value = False
output = ""
error_cases = {
"' pageis not found' in response.body" : 'invalid',
"'has been transferred' in response.body" : 'transferred',
}
for key, value in error_cases.iteritems():
if bool(key):
error_value = True
output = value
if error_value:
for field in J1_Item.fields:
if field == 'case':
item[field] = id
else:
item[field] = output
else:
item['case'] = id
........................
However I see that despite even in cases with none of the error cases being valid, the 'invalid' option is being selected. What am I doing wrong?
Your conditions (something in response.body) are not evaluated. Instead, you evaluate the truth value of a nonempty string, which is True.
This might work:
def parse_detail_page(self, response):
error_value = False
output = ""
error_cases = {
"pageis not found" : 'invalid',
"has been transferred" : 'transferred',
}
for key, value in error_cases.iteritems():
if key in response.body:
error_value = True
output = value
break
.................
(Must it be "pageis not found" or "page is not found"?)
bool(key) will convert key from a string to a bool.
What it won't do is actually evaluate the condition. You could use eval() for that, but I'd recommend instead storing a list of functions (each returning an object or throwing an exception) rather than your current dict-with-string-keys-that-are-actually-Python-code.
I'm not sure why you are evaluating bool(key) like you are. Let's look at your error_cases. You have two keys, and two values. "' pageis not found' in response.body" will be your key the first time, and "'has been transferred' in response.body" will be the key in the second round in your for loop. Neither of those will be false when you check bool(key), because key has a value other than False or 0.
>>> a = "' pageis not found' in response.body"
>>> bool(a)
True
You need to have a different evaluator other than bool(key) there or you will always have an error.
Your conditions are strings, so they are not be evaluated.
You could evaluate your strings using eval(key) function, that is quite unsafe.
With the help of the operator module, there is no need to evaluate unsafe strings (as long as your conditions stay quite simple).
error['operator'] holds reference to the 'contains' function, which can be used as a replacement for 'in'.
from operator import contains
class ...:
def parse_detail_page(self, response):
error_value = False
output = ""
error_cases = [
{'search': ' pageis not found', 'operator': contains, 'output': 'invalid' },
{'search': 'has been transferred', 'operator': contains, 'output': 'invalid' },
]
for error in error_cases:
if error['operator'](error['search'], response.body):
error_value = True
output = error['output']
print output
if error_value:
for field in J1_Item.fields:
if field == 'case':
item[field] = id
else:
item[field] = output
else:
item['case'] = id
...
I want to get some data from XML and return it in the same variable, so i do like this before return:
for element in response:
document = parseString(element)
try:
element = {
'scopes': document.getElementsByTagNameNS('*', 'Scopes')[0].firstChild.nodeValue,
'address': document.getElementsByTagNameNS('*', 'XAddrs')[0].firstChild.nodeValue
}
except:
element = False
return response
But the response still contains raw xml data instead of parsed results... Basically I want that element = ... value returned to response value.
Edit: Improving the answer according delnan and jonrsharpe.
The problem with your code is that, when you loop
for element in response
in each iteration element will be pointing to a new created object couse the lines:
element = {
'scopes': document.getElementsByTagNameNS('*', 'Scopes')[0].firstChild.nodeValue,
'address': document.getElementsByTagNameNS('*', 'XAddrs')[0].firstChild.nodeValue
}
and the object it was pointing before still the same. You can try this.
for index,element in enumerate(response):
document = parseString(element)
try:
response[index]= {
'scopes': document.getElementsByTagNameNS('*', 'Scopes')[0].firstChild.nodeValue,
'address': document.getElementsByTagNameNS('*', 'XAddrs')[0].firstChild.nodeValue
}
except:
response[index]= False
return response
Assuming that response is a list of strings and you want to replace those strings with the parsed elements dict..., that's easy. Since lists are mutable containers, you can replace the elements as you go. No need to return the list... the one you pass in is changed.
def convert_response_to_elements(response):
for index, element_str in enumerate(response):
document = parseString(element_str)
try:
element = {
'scopes': document.getElementsByTagNameNS('*', 'Scopes')[0].firstChild.nodeValue,
'address': document.getElementsByTagNameNS('*', 'XAddrs')[0].firstChild.nodeValue
}
except:
element = False
response[index] = element
Since response is a list, I would use pythons list comprehension. This will create new list without modifying old one.
new_response = [modify_element(element) for element in response]
Later if you want to remove elements that equal to False you can use filter function:
without_false = filter(lambda element: bool(element), new_response)
I'm looking for ways to write functions like get_profile(js) but without all the ugly try/excepts.
Each assignment is in a try/except because occasionally the json field doesn't exist. I'd be happy with an elegant solution which defaulted everything to None even though I'm setting some defaults to [] and such, if doing so would make the overall code much nicer.
def get_profile(js):
""" given a json object, return a dict of a subset of the data.
what are some cleaner/terser ways to implement this?
There will be many other get_foo(js), get_bar(js) functions which
need to do the same general type of thing.
"""
d = {}
try:
d['links'] = js['entry']['gd$feedLink']
except:
d['links'] = []
try:
d['statisitcs'] = js['entry']['yt$statistics']
except:
d['statistics'] = {}
try:
d['published'] = js['entry']['published']['$t']
except:
d['published'] = ''
try:
d['updated'] = js['entry']['updated']['$t']
except:
d['updated'] = ''
try:
d['age'] = js['entry']['yt$age']['$t']
except:
d['age'] = 0
try:
d['name'] = js['entry']['author'][0]['name']['$t']
except:
d['name'] = ''
return d
Replace each of your try catch blocks with chained calls to the dictionary get(key [,default]) method. All calls to get before the last call in the chain should have a default value of {} (empty dictionary) so that the later calls can be called on a valid object, Only the last call in the chain should have the default value for the key that you are trying to look up.
See the python documentation for dictionairies http://docs.python.org/library/stdtypes.html#mapping-types-dict
For example:
d['links'] = js.get('entry', {}).get('gd$feedLink', [])
d['published'] = js.get('entry', {}).get('published',{}).get('$t', '')
Use get(key[, default]) method of dictionaries
Code generate this boilerplate code and save yourself even more trouble.
Try something like...
import time
def get_profile(js):
def cas(prev, el):
if hasattr(prev, "get") and prev:
return prev.get(el, prev)
return prev
def getget(default, *elements):
return reduce(cas, elements[1:], js.get(elements[0], default))
d = {}
d['links'] = getget([], 'entry', 'gd$feedLink')
d['statistics'] = getget({}, 'entry', 'yt$statistics')
d['published'] = getget('', 'entry', 'published', '$t')
d['updated'] = getget('', 'entry', 'updated', '$t')
d['age'] = getget(0, 'entry', 'yt$age', '$t')
d['name'] = getget('', 'entry', 'author', 0, 'name' '$t')
return d
print get_profile({
'entry':{
'gd$feedLink':range(4),
'yt$statistics':{'foo':1, 'bar':2},
'published':{
"$t":time.strftime("%x %X"),
},
'updated':{
"$t":time.strftime("%x %X"),
},
'yt$age':{
"$t":"infinity years",
},
'author':{0:{'name':{'$t':"I am a cow"}}},
}
})
It's kind of a leap of faith for me to assume that you've got a dictionary with a key of 0 instead of a list but... You get the idea.
You need to familiarise yourself with dictionary methods Check here for how to handle what you're asking.
Two possible solutions come to mind, without knowing more about how your data is structured:
if k in js['entry']:
something = js['entry'][k]
(though this solution wouldn't really get rid of your redundancy problem, it is more concise than a ton of try/excepts)
or
js['entry'].get(k, []) # or (k, None) depending on what you want to do
A much shorter version is just something like...
for k,v in js['entry']:
d[k] = v
But again, more would have to be said about your data.