Parse JSON data with varying parent keys using python - python

Here is my working example
jsonData = {
"3": {
"map_id": "1",
"marker_id": "3",
"title": "Your title here",
"address": "456 Example Ave",
"desc": "Description",
"pic": "",
"icon": "",
"linkd": "",
"lat": "3.14",
"lng": "-22.98",
"anim": "0",
"retina": "0",
"category": "1",
"infoopen": "0",
"other_data": ["0"]
},
"4": {
"map_id": "1",
"marker_id": "4",
"title": "Title of Place",
"address": "123 Main St, City, State",
"desc": "insert description",
"pic": "",
"icon": "",
"linkd": "",
"lat": "1.23",
"lng": "-4.56",
"anim": "0",
"retina": "0",
"category": "0",
"infoopen": "0",
"other_data": ["0"]
}
I am having such a hard time getting the title and address keys. Here is what I have tried:
for each in testJson:
print(each["title"])
and I get the following error TypeError: string indices must be integers. I don't understand why this isn't working.
I have tried so many variations to get the key data, but I just can't get it to work. I can't really change the raw JSON either because my real JSON data is a huge file. I have looked on stackoverflow for a similarly formatted JSON example (e.g., here) but have come up short. I assume there is something wrong with the way my JSON is formatted, because I have parsed JSON before with the above code without any problems.

Your getting that error because your looping over the keys, which don't have title and address properties. Those properties exist in the inner dictionaries, which are the values of the dictionary.
Here is how you can iterate over the dict.values() instead:
for value in jsonData.values():
print(value["title"], value["address"])
Which will give you the title and address:
Your title here 456 Example Ave
Title of Place 123 Main St, City, State
If you want to find out which key your iterating over, you can loop over the tuple (key, value) pair from dict.items():
for key, value in jsonData.items():
print(f"key = {key}, title = {value['title']}, address = {value['address']}")
Which will show the key with the address and title:
key = 3, title = Your title here, address = 456 Example Ave
key = 4, title = Title of Place, address = 123 Main St, City, State

for… in loops iterate over the keys of dictionaries, not their values. If you want to iterate over their values, you can either use .values():
for value in someDict.values():
…or you can iterate over items(), which will give you the key and the value as a tuple:
for key, value in someDict.items():
The reason why you are getting the error message you are is because when you try to get title out of each, each is actually the key, .i.e. "3" or "4". Python will let you get individual characters out of a string with things like someString[0] to get the first character, but it doesn't make sense to access things like someString['title'].

Related

Export Json to CSV with missing key in a dict via get()

I am very new to Python, and need to using Python to get the below work done.
I am using Python to get value out for a key pairs (which is the JSON response I got from API call), however, some of them has the value while some of them might not have the value, example of JSON response as below:
"attributes": [
{
"key": "TK_GENIE_ACTUAL_TOTAL_HOURS_EXCLUDE_CORRECTIONS",
"alias": "Annual Leave"
},
{
"key": "TK_GENIE_ACTUAL_TOTAL_HOURS_EXCLUDE_CORRECTIONS",
"alias": "Other Non-Prod Hours"
},
{
"key": "TK_GENIE_ACTUAL_TOTAL_HOURS_EXCLUDE_CORRECTIONS",
"alias": "Non-Prod Hours"
},
{
"key": "EMP_COMMON_PRIMARY_JOB",
"alias": "Primary Job",
"rawValue": "RN",
"value": "RN"
},
{
"key": "TIMECARD_TRANS_APPLY_DATE",
"alias": "Apply Date",
"rawValue": "2022-05-19",
"value": "19/05/2022"
},
The above is one of the children under a nested others, as you can see, for the above one, there is no value for "Annual Leave", however, other children might has a valid value for "Annual Leave"
I am exporting those infor into CSV, with "alias" is the column name, and "value" is the row value
like below csv sample:
enter image description here
So, I using the below python code to extract the value for each key and put them into csv as per column specified.
AL=item['attributes'][0]['value']
Date=item['attributes'][4]['value']
spamwriter.writerow([AL,'2','3',date,'5'])
However, it raised an error code
File "jsoncsv.py", line 47, in <module>
al=item['attributes'][0]['value']
KeyError: 'value'
I think I understand the error, where there is no value for this particular "Annual Leave" key.
But how do I say,like, if there is no value for this key, then value = 0, and put 0 in the CSV under "Annual Leave" column, then, move to next (which is "Other Non-Prod Hours", which also has no value in this case, but might have value for some other children)?
I found get(), but not sure how should I code it, I was trying below code:
value=it.get('value')
if len(value)>0:
AL=item['attributes'][0]value
Date=item['attributes'][4]value
spamwriter.writerow([AL,'2','3',date,'5'])
But result is syntax error.
Could please any Python expert provide help.
Much Appreciated.
WB
As user #richarddodson pointed out, you can do this:
al = item['attributes'][0].get('value', 0)
Instead of 0, you may want to consider using None, which is a clear indication that there is no value, avoiding confusion with the case where value actually is 0, i.e.:
al = item['attributes'][0].get('value', None)
Which is the same as:
al = item['attributes'][0].get('value')
As .get() returns None if there is no value to get.
A more explicit way to do the same would be:
al = None if 'value' in item['attributes'][0] else item['attributes'][0]['value']
But the solution using .get() is simpler and faster and thus probably preferable.

Python: Iterate JSON and remove items with specific criteria

I am trying to filter out data from API JSON response with Python and I get weird results. I would be glad if somebody can guide me how to deal with the situation.
The main idea is to remove irrelevant data in the JSON and keep only the data that is associated with particular people which I hold in a list.
Here is a snip of the JSON file:
{
"result": [
{
"number": "Number1",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "John Doe",
"link": "https://some_link.com"
}
},
{
"number": "Number2",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-10 11:07:13",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Tyrell Greenley",
"link": "https://some_link.com"
}
},
{
"number": "Number3",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-20 10:23:35",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Delmar Vachon",
"link": "https://some_link.com"
}
},
{
"number": "Number4",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Samual Isham",
"link": "https://some_link.com"
}
}
]
}
Here is the Python code:
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
# Load JSON file
with open('extract.json', 'r') as input_file:
input_data = json.load(input_file)
# Create a function to clear the data
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in data:
print(elem['assigned_to']['display_value'] not in users)
if elem['assigned_to']['display_value'] not in users:
print('Removing {} from JSON as not present in list of names.'.format(elem['assigned_to']['display_value']))
data.remove(elem)
else:
print('Keeping the record for {} in JSON.'.format(elem['assigned_to']['display_value']))
return data
cd = clear_data(input_data['result'], users_test)
And here is the output, which seems to iterate through only 2 of the items in the file:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
Process finished with exit code 0
It seems that the problem is more or less related to the .remove() method however I don't find any other suitable solution to delete these particular items that I do not need.
Here is the output of the iteration without applying the remove() method:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Tyrell Greenley from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
False
Keeping the record for Samual Isham in JSON.
Process finished with exit code 0
Note: I have left the check for the name visible on purpose.
I would appreciate any ideas to sort out the situation.
If you don't need to log info about people you are removing you could simply try
filtered = [i for i in data['result'] if i['assigned_to']['display_value'] in users_test]
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
solution = []
for user in users_test:
print(user)
for value in data['result']:
if user == value['assigned_to']['display_value']:
solution.append(value)
print(solution)
for more efficient code, as asked by #NomadMonad
solution = list(filter(lambda x: x['assigned_to']['display_value'] in users_test, data['result']))
You are modifying a dictionary while at the same time iterating through it. Check out this blog post which describes this behavior.
A safer way to do this is to make a copy of your dictionary to iterate over, and to delete from your original dictionary:
import copy
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in copy.deepcopy(data): # deepcopy handles nested dicts
# Still call data.remove() in here

How to search for values in Redis with redis for python

I'm currently trying to search for entries inside my Redis that match a specific value through an HTTP API the same way you'd do with a regular DB (eg: http://localhost:8000/api?name=John&age=20) with the redis python library (https://pypi.org/project/redis/).
The code I have thus far returns the whole hash, converts each entry into a JSON and adds it to a list
import json
import redis
import os
r = redis.StrictRedis(host=os.environ['redis_url'], port=os.environ['redis_port'], password=os.environ['redis_pass'], ssl=True)
result = r.hgetall('Directory')
dic_list = []
for key in result.keys():
dic_list.append(json.loads(result[key].decode('utf-8')))
return dic_list
I know that I can get the value of a specific key with
r.hget('Directory', 'key_I_want')
However inside each key there is a whole JSON full with information, so for example this would be a key, value example inside of the Directory hash
"1": {
"name": "James",
"age": "22",
"favorite_color":"Green"
},
"2":{
"name":"John",
"age": "20",
"favorite_color": "red"
},
"3":{
"name":"Jim",
"age": "30",
"favorite_color": "yellow"
}
So I know
r.hget('Directory', '1')
would return
{
"name": "James",
"age": "22",
"favorite_color":"Green"
}
But what I really want is to look for every JSON that has specific values, not just to get the values of each key inside the hash, is there any way to actually do that?
Based on your question, you are, maybe, looking for a value within results[key]. Assumin that value is equal to val, try:
for key in result.keys():
if val in result[key].values():
dic_list.append(json.loads(result[key].decode('utf-8')))
for example, if
val = "James"
Yoy will get all results with James in the value.
You can mix it up a little, change it the way you want it.

Counting Items in Python from a JSON file

I'm trying to search a data file, for example Yelp.json. It has businesses in it in LA, Boston, DC.
I wrote this:
# Python 2
# read json
with open('updated_data.json') as facts_data:
data = json.load(facts_data)
# return every unique locality along with how often it occurs
locality = []
unique_locality = []
# Load items into lists
for item in data:
locality.append(data["payload"]["locality"])
if data["payload"]["locality"] not in unique_locality:
print unique_locality.append(data["payload"]["locality"])
# Loops over unique_locality and count from locality
print "Unique Locality Count:", unique_locality, locality.count(data["payload"]["locality"])
But I get an answer of "Portsmouth 1", which means it is not providing all the cities and might not even be provided all the counts. My goal for this section is to search that JSON file and have it say "DC: 10 businesses, LA: 20 businesses, Boston: 2 businesses." Each payload is a grouping of info about a single business and "locality" is just the city. So I want it to find how many unique cities there are and then how many businesses in each city. So one payload could be Starbucks in la, another payload could be Starbucks in dc, another could be Chipotle in la.
Example of JSON file (JSONlite.com says its valid):
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.9459720|-2.1971226|20|within_50m|4\"]",
"latitude": "56.945972",
"locality": "Stonehaven",
"_records_touched": "{\"crawl\":8,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "The Lodge, Dunottar",
"email": "dunnottarcastle#btconnect.com",
"existence_ml": 0.5694238217658721,
"domain_aggregate": "",
"name": "Dunnottar Castle",
"search_tags": ["Dunnottar Castle Aberdeenshire", "Dunotter Castle"],
"admin_region": "Scotland",
"existence": 1,
"category_labels": [
["Landmarks", "Buildings and Structures"]
],
"post_town": "Stonehaven",
"region": "Kincardineshire",
"review_count": "719",
"geocode_level": "within_50m",
"tel": "01569 762173",
"placerank": 65,
"longitude": "-2.197123",
"placerank_ml": 37.27916073464469,
"fax": "01330 860325",
"category_ids_text_search": "",
"website": "http://www.dunnottarcastle.co.uk",
"status": "1",
"geocode_confidence": "20",
"postcode": "AB39 2TL",
"category_ids": [108],
"country": "gb",
"_geocode_quality": "4",
"uuid": "3867aaf3-12ab-434f-b12b-5d627b3359c3"
},
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.237480|-5.073578|20|within_50m|4\"]",
"latitude": "56.237480",
"locality": "Inveraray",
"_records_touched": "{\"crawl\":11,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "Cherry Park",
"email": "enquiries#inveraray-castle.com",
"longitude": "-5.073578",
"domain_aggregate": "",
"name": "Inveraray Castle",
"admin_region": "Scotland",
"search_tags": ["Inveraray Castle Tea Room", "Inverary Castle"],
"existence": 1,
"category_labels": [
["Social", "Food and Dining", "Restaurants"]
],
"region": "Argyll",
"review_count": "532",
"geocode_level": "within_50m",
"tel": "01499 302203",
"placerank": 67,
"post_town": "Inveraray",
"placerank_ml": 41.19978087352266,
"fax": "01499 302421",
"category_ids_text_search": "",
"website": "http://www.inveraray-castle.com",
"status": "1",
"geocode_confidence": "20",
"postcode": "PA32 8XE",
"category_ids": [347],
"country": "gb",
"_geocode_quality": "4",
"existence_ml": 0.7914881102847783,
"uuid": "8278ab80-2cd1-4dbd-9685-0d0036b681eb"
},
If your "json" semantics is something like
{"payload":{ CONTENT_A }, "payload":{ CONTENT_B }, ..., "payload":{ CONTENT_LAST }}
it is a valid json string, but after you json.loads the string, it will be evaluated as
{"payload":{ CONTENT_LAST }}
And that is why you end up with one city and one business count.
You can verify this behaviour on this online json parser http://json.parser.online.fr/ by checking JS eval field.
In this case, one way to preprocess your json string is to get rid of the dummy "payload" key and wrap the content dictionary directly in a list. You will have a json string in the following format.
{[{CONTENT_A}, {CONTENT_B} ..., {CONTENT_LAST} ]}
Assume your json string is now a list of payload dictionary, and you have json.loads(json_str) to data.
As you iterate through json payload, build a lookup table along the way.
This will handle duplicated city for you automatically since business in the same city will be hashed to the same list.
city_business_map = {}
for payload in data:
city = payload['locality']
business = payload['name']
if city not in city_business_map:
city_business_map[city] = []
city_business_map[city].append(business)
Then later on, you can easily present the solution by
for city, business_list in city_business_map.items():
print city, len(business_list)
If you want to count the unique business in each city, initialize the value to set instead of list.
If this is an overkill, instead of initialize to list or set, just associate a counter with each key.

Why am I unable to print JSON attributes in Python? What am I doing wrong?

I have the following code:
jobs = {"24": {"wage": "empty", "phone": "empty", "title": "sapfh", "description": "sod", "time": "twelve"}, "20": {"wage": "987g", "phone": "iudg", "time": "twelve", "description": "fdgsdfg", "title": "sfgji"}, "21": {"wage": "987g", "phone": "iudg", "title": "sfgji", "description": "fdgsdfg", "time": "twelve"}, "22": {"wage": "987g", "phone": "iudg", "time": "twelve", "description": "fdgsdfg", "title": "sfgji"}, "23": {"wage": "987g", "phone": "iudg", "title": "sfgji", "description": "fdgsdfg", "time": "twelve"}, "24": {"wage": "empty", "phone": "empty", "time": "twelve", "description": "sod", "title": "sapfh"}}
for job in jobs:
print job["title"]
But it won't print out the title each time. I just get TypeError: string indices must be integers, not str but if I put 0 instead of "title" it just outputs the first character of the number (so all 2s).
When you iterate over a dictionary, you iterate over its keys. This means that your current code is iterating through the keys of jobs (which are strings).
You should use dict.values instead:
for val in jobs.values():
print val["title"]
Now, the code is iterating through the values of jobs, which are the dictionaries.
If you want to have the keys and the values, you can use dict.items:
for key,val in jobs.items():
print val["title"]
When you use for/in to iterate over a dictionary, it iterates over the dictionary's keys. As such, the iteration variable (job in this case) will contain each of the dictionary's keys in turn: in this case, it'll contain "24", "20", "21", etc.
You want to iterate over the dictionary values (each of the job dictionaries). You can then retrieve the title property of each. To do so, loop like this instead:
for job in jobs.values():
print job["title"]
If you want both the keys and values, you can use iteritems as follows:
for job_key, job in jobs.iteritems():
print "Job key: ", job_key
print "Job title: ", job["title"] // or jobs[job_key]["title"]
Note also that jobs is a Python dictionary literal, not JSON (there's actually no JSON involved at all). It's also an illegally formed dictionary literal, since it contains two "24" keys (keys must not be duplicated).

Categories