Json organization - python

I use JSON for one of my project. For example, I have the JSON structure.
{
"address":{
"streetAddress": {
"aptnumber" : "21",
"building_number" : "2nd",
"street" : "Wall Street",
},
"city":"New York"
},
"phoneNumber":
[
{
"type":"home",
"number":"212 555-1234"
}
]
}
Now I have a bunch of modules using this structure, and it expects to see certain fields in the received json. For the example above, I have two files: address_manager and phone_number_manager. Each will be passed the relevant information. So address_manager will expect a dict that has keys 'streetAddress' and 'city'.
My question is: Is it possible to set up a constant structure so that every time I change the name of a field in my JSON structure (e.g. I want to change 'streetAddress' to 'address'), I don't have to make change in several places?
My naive approach is to have a bunch of constants (e.g.
ADDRESS = "address"
ADDRESS_STREET_ADDRESS = "streetAddress"
..etc..
) and so if I want to change the name of one of my fields in JSON structure, I just have to make change in one place. However, this seems to be very inefficient because my constant naming would be terribly long once I reach the third or fourth layer of the JSON structure (e.g. ADDRESS_STREETADDRESS_APTNUMBER, ADDRESS_STREETADDRESS_BUILDINGNUMBER)
I am doing this in python, but any generic answer would be OK.
Thanks.

Like Cameron Sparr suggested in a comment, don't have your constant names include all levels of your JSON structure. If you have the same data in multiple places, it will actually be better if you reuse the same constant. For example, suppose your JSON has a phone number included in the address:
{
"address": {
"streetAddress": {
"aptnumber" : "21",
"building_number" : "2nd",
"street" : "Wall Street"
},
"city":"New York",
"phoneNumber":
[
{
"type":"home",
"number":"212 555-1234"
}
]
},
"phoneNumber":
[
{
"type":"home",
"number":"212 555-1234"
}
]
}
Why not have a single constant PHONES = 'phoneNumber' that you use in both places? Your constants will have shorter names, and it is more logically coherent. You would end up using it like this (assuming JSON is stored in person):
person[ADDRESS][PHONES][x] # Phone numbers associated with that address
person[PHONES][x] # Phone numbers associated with the person
Instead of
person[ADDRESS][ADDRESS_PHONES][x]
person[PHONE_NUMBERS][x]

You can write a script than when you change the constant, change the structure in all json files.
Example:
import json
CHANGE = ('steet', 'streetAddress')
json_data = None
with open('file.json') as jfile:
json_data = jfile.load(jfile)
json_data[CHANGE[1]], json_data[CHANGE[0]] = json_data[CHANGE[0]], None

Related

Export Json to CSV with missing key in a dict via get()

I am very new to Python, and need to using Python to get the below work done.
I am using Python to get value out for a key pairs (which is the JSON response I got from API call), however, some of them has the value while some of them might not have the value, example of JSON response as below:
"attributes": [
{
"key": "TK_GENIE_ACTUAL_TOTAL_HOURS_EXCLUDE_CORRECTIONS",
"alias": "Annual Leave"
},
{
"key": "TK_GENIE_ACTUAL_TOTAL_HOURS_EXCLUDE_CORRECTIONS",
"alias": "Other Non-Prod Hours"
},
{
"key": "TK_GENIE_ACTUAL_TOTAL_HOURS_EXCLUDE_CORRECTIONS",
"alias": "Non-Prod Hours"
},
{
"key": "EMP_COMMON_PRIMARY_JOB",
"alias": "Primary Job",
"rawValue": "RN",
"value": "RN"
},
{
"key": "TIMECARD_TRANS_APPLY_DATE",
"alias": "Apply Date",
"rawValue": "2022-05-19",
"value": "19/05/2022"
},
The above is one of the children under a nested others, as you can see, for the above one, there is no value for "Annual Leave", however, other children might has a valid value for "Annual Leave"
I am exporting those infor into CSV, with "alias" is the column name, and "value" is the row value
like below csv sample:
enter image description here
So, I using the below python code to extract the value for each key and put them into csv as per column specified.
AL=item['attributes'][0]['value']
Date=item['attributes'][4]['value']
spamwriter.writerow([AL,'2','3',date,'5'])
However, it raised an error code
File "jsoncsv.py", line 47, in <module>
al=item['attributes'][0]['value']
KeyError: 'value'
I think I understand the error, where there is no value for this particular "Annual Leave" key.
But how do I say,like, if there is no value for this key, then value = 0, and put 0 in the CSV under "Annual Leave" column, then, move to next (which is "Other Non-Prod Hours", which also has no value in this case, but might have value for some other children)?
I found get(), but not sure how should I code it, I was trying below code:
value=it.get('value')
if len(value)>0:
AL=item['attributes'][0]value
Date=item['attributes'][4]value
spamwriter.writerow([AL,'2','3',date,'5'])
But result is syntax error.
Could please any Python expert provide help.
Much Appreciated.
WB
As user #richarddodson pointed out, you can do this:
al = item['attributes'][0].get('value', 0)
Instead of 0, you may want to consider using None, which is a clear indication that there is no value, avoiding confusion with the case where value actually is 0, i.e.:
al = item['attributes'][0].get('value', None)
Which is the same as:
al = item['attributes'][0].get('value')
As .get() returns None if there is no value to get.
A more explicit way to do the same would be:
al = None if 'value' in item['attributes'][0] else item['attributes'][0]['value']
But the solution using .get() is simpler and faster and thus probably preferable.

Navigate dict based on its structure

I have a python code that interacts with multiple APIs. All of the APIs return some json but each has different structure. Let's say I'm looking for people's names in all these jsons:
json_a = {
"people": [
{"name": "John"},
{"name": "Peter"}
]
}
json_b = {
"humans": {
"names": ["Adam", "Martin"]
}
}
As you can see above the dictionaries from jsons have arbitrary structures. I'd like to define something that will serve as a "blueprint" for navigating each json, something like this:
all_jsons = {
"json_a": {
"url": "http://endpoint",
"json_structure": "people -> list -> name"
},
"json_b": {
"url": "http://someotherendpoint",
"json_structure": "humans -> names -> list"
}
}
So that if I'm working with json_a I'll just look into all_jsons["json_a"]["json_structure"] and I have an information on how to navigate this exact json. What would be the best way to achieve this?
Why not define concrete retrieval functions for each api:
def retrieve_a(data):
return [d["name"] for d in data["people"]]
def retrieve_b(data):
return data["humans"]["names"]
and store them for each endpoint:
all_jsons = {
"json_a": {
"url": "http://endpoint",
"retrieve": retrieve_a
},
"json_b": {
"url": "http://someotherendpoint",
"retrieve": retrieve_b
}
}
I have found this approach more workable than trying to express code-logic by configuration. Then you can easily collect names:
for dct in all_jsons.values():
data = ... # requests.get(dct["url"]).json() # or similar
names = dct["retrieve"](data)
To get a value from a dictionary with a key if it may not exist, dict.get(key) is used for. To distinguish which type list or dict or else, type(val) is useful for. Combination of them should achieve your problem.

Python: String replacement with JSON dictionary

I need to create a script in Python, for replacement of strings in a json file, based on a json dictionary. The file has information about patents and it looks like this:
{
"US-8163793-B2": {
"publication_date": "20120424",
"priority_date": "20090420",
"family_id": "42261969",
"country_code": "US",
"ipc_code": "C07D417/14",
"cpc_code": "C07D471/04",
"assignee_name": "Hoffman-La Roche Inc.",
"title": "Proline derivatives",
"abstract": "The invention relates to a compound of formula (I) wherein A, R 1 -R 6 are as defined in the description and in the claims. The compound of formula (I) can be used as a medicament."
}
}
Initially I used a software that identifies, based on entities (ex. COMPANY), all the words that are written differently, but are the same. For example, the company "BMW" can be called "BMW Ag" as well as "BMW Group". And this dictionary has a structure like this (is only partially represented, otherwise it would be very long):
{
"RESP_META" : {
,"RESP_WARNINGS" : null
,"RESP_PAYLOAD":
{
"BIOCHEM": [
{
"hitID": "D011392",
"name": "L-Proline",
"frag_vector_array": [
"16#{!Proline!} derivatives"
],
...,
"sectionMeta": {
"8": "$.US-8163793-B2.title|"
}
},
{
(next hit...)
},
...
]
}
Taking into consideration that the "sectionMeta" key gives me the patent ID and, for ex., abstract, title or assignee_name, I would like to use this information to find out in which patent will the replacement take place, and then based on the "frag_vector_array" key, find the word to be replaced, which always is between {!!}, for example {! Proline!}, and that word should be replaced by "name", for ex. L-Proline.
I've tried something to replace the companies name, but I think I'm going the wrong way. Here is the code I started:
import json
patents = json.load(open("testset_patents.json"))
companies = json.load(open("termite_output.json"))
print(companies)
companies = companies['RESP_PAYLOAD']
# loop through companies data
for company in companies.values():
company_list = company["COMPANY"]
for comp in company_list:
comp_name = comp["name"]
# update patents "name" in "assignee_name"
for patent in patents.values():
patent['assignee_name'] = comp_name
print(patents)
# save output in new file
with open('company_replacement.json', 'w') as fp:
json.dump(patents, fp)
Well any and all help is welcome.

Accessing nested objects with python

I have a response that I receive from foursquare in the form of json. I have tried to access the certain parts of the object but have had no success. How would I access say the address of the object? Here is my code that I have tried.
url = 'https://api.foursquare.com/v2/venues/explore'
params = dict(client_id=foursquare_client_id,
client_secret=foursquare_client_secret,
v='20170801', ll=''+lat+','+long+'',
query=mealType, limit=100)
resp = requests.get(url=url, params=params)
data = json.loads(resp.text)
msg = '{} {}'.format("Restaurant Address: ",
data['response']['groups'][0]['items'][0]['venue']['location']['address'])
print(msg)
Here is an example of json response:
"items": [
{
"reasons": {
"count": 0,
"items": [
{
"summary": "This spot is popular",
"type": "general",
"reasonName": "globalInteractionReason"
}
]
},
"venue": {
"id": "412d2800f964a520df0c1fe3",
"name": "Central Park",
"contact": {
"phone": "2123106600",
"formattedPhone": "(212) 310-6600",
"twitter": "centralparknyc",
"instagram": "centralparknyc",
"facebook": "37965424481",
"facebookUsername": "centralparknyc",
"facebookName": "Central Park"
},
"location": {
"address": "59th St to 110th St",
"crossStreet": "5th Ave to Central Park West",
"lat": 40.78408342593807,
"lng": -73.96485328674316,
"labeledLatLngs": [
{
"label": "display",
"lat": 40.78408342593807,
"lng": -73.96485328674316
}
],
the full response can be found here
Like so
addrs=data['items'][2]['location']['address']
Your code (at least as far as loading and accessing the object) looks correct to me. I loaded the json from a file (since I don't have your foursquare id) and it worked fine. You are correctly using object/dictionary keys and array positions to navigate to what you want. However, you mispelled "address" in the line where you drill down to the data. Adding the missing 'a' made it work. I'm also correcting the typo in the URL you posted.
I answered this assuming that the example JSON you linked to is what is stored in data. If that isn't the case, a relatively easy way to see exact what python has stored in data is to import pprint and use it like so: pprint.pprint(data).
You could also start an interactive python shell by running the program with the -i switch and examine the variable yourself.
data["items"][2]["location"]["address"]
This will access the address for you.
You can go to any level of nesting by using integer index in case of an array and string index in case of a dict.
Like in your case items is an array
#items[int index]
items[0]
Now items[0] is a dictionary so we access by string indexes
item[0]['location']
Now again its an object s we use string index
item[0]['location']['address]

Error while parsing json from IBM watson using python

I am trying to parse out a JSON download using python and here is the download that I have:
{
"document_tone":{
"tone_categories":[
{
"tones":[
{
"score":0.044115,
"tone_id":"anger",
"tone_name":"Anger"
},
{
"score":0.005631,
"tone_id":"disgust",
"tone_name":"Disgust"
},
{
"score":0.013157,
"tone_id":"fear",
"tone_name":"Fear"
},
{
"score":1.0,
"tone_id":"joy",
"tone_name":"Joy"
},
{
"score":0.058781,
"tone_id":"sadness",
"tone_name":"Sadness"
}
],
"category_id":"emotion_tone",
"category_name":"Emotion Tone"
},
{
"tones":[
{
"score":0.0,
"tone_id":"analytical",
"tone_name":"Analytical"
},
{
"score":0.0,
"tone_id":"confident",
"tone_name":"Confident"
},
{
"score":0.0,
"tone_id":"tentative",
"tone_name":"Tentative"
}
],
"category_id":"language_tone",
"category_name":"Language Tone"
},
{
"tones":[
{
"score":0.0,
"tone_id":"openness_big5",
"tone_name":"Openness"
},
{
"score":0.571,
"tone_id":"conscientiousness_big5",
"tone_name":"Conscientiousness"
},
{
"score":0.936,
"tone_id":"extraversion_big5",
"tone_name":"Extraversion"
},
{
"score":0.978,
"tone_id":"agreeableness_big5",
"tone_name":"Agreeableness"
},
{
"score":0.975,
"tone_id":"emotional_range_big5",
"tone_name":"Emotional Range"
}
],
"category_id":"social_tone",
"category_name":"Social Tone"
}
]
}
}
I am trying to parse out 'tone_name' and 'score' from the above file and I am using following code:
import urllib
import json
url = urllib.urlopen('https://watson-api-explorer.mybluemix.net/tone-analyzer/api/v3/tone?version=2016-05-19&text=I%20am%20happy')
data = json.load(url)
for item in data['document_tone']:
print item["tone_name"]
I keep running into error that tone_name not defined.
As jonrsharpe said in a comment:
data['document_tone'] is a dictionary, but 'tone_name' is a key in dictionaries much further down the structure.
You need to access the dictionary that tone_name is in. If I am understanding the JSON correctly, tone_name is a key within tones, within tone_categories, within document_tone. You would then want to change your code to go to that level, like so:
for item in data['document_tone']['tone_categories']:
# item is an anonymous dictionary
for thing in item[tones]:
print(thing['tone_name'])
The reason more than one for is needed is because of the mix of lists and dictionaries in the file. 'tone_categories is a list of dictionaries, so it accesses each one of those. Then, it iterates through the list tones, which is in each one and full of more dictionaries. Those dictionaries are the ones that contain 'tone_name', so it prints the value of 'tone_name'.
If this does not work, let me know. I was unable to test it since I could not get the rest of the code to work on my computer.
You are incorrectly walking the structure. The root node has a single document_tone key, the value of which only has the tone_categories key. Each of the categories has a list of tones and it's name. Here is how you would print it out (adjust as needed):
for cat in data['document_tone']['tone_categories']:
print('Category:', cat['category_name'])
for tone in cat['tones']:
print('-', tone['tone_name'])
The result of this is:
Category: Emotion Tone
- Anger
- Disgust
- Fear
- Joy
- Sadness
Category: Language Tone
- Analytical
- Confident
- Tentative
Category: Social Tone
- Openness
- Conscientiousness
- Extraversion
- Agreeableness
- Emotional Range

Categories