JSON file modification with Python - python

I'm trying to write a Python script to read data from a JSON file, do some calculations with it and then write output to a new JSON file. But I can't seem to automate the JSON reading process. I get this error. Could you please help me with this issue?
Thank you very much
print([a[0]][b[1]][c[1]])
TypeError: list indices must be integers or slices, not str
test.json
{
"male": {
"jack": {
"id": "001",
"telephone": "+31 2225 345",
"address": "10 Street, Aukland",
"balance": "1500"
},
"john": {
"id": "002",
"telephone": "+31 6542 365",
"address": "Main street, Hanota",
"balance": "2500"
}
},
"female": {
"kay": {
"id": "00",
"telephone": "+31 6542 365",
"address": "Main street, Kiro",
"balance": "500"
}
}
}
test.py
with open("q.json") as datafile:
data = json.load(datafile)
a = ['male', 'female']
b = ['jack', 'john', 'kay']
c = ['id', 'telephone', 'address', 'balance']
print([a[1]][b[1]][c[1]])

If I understand you correctly, you really want to print data from the JSON, not your intermediary arrays.
So:
print(data['Male']) # will print the entire Male subsection
print(data['Male']['Jack']) # will print the entire Jack record
print(data['Male']['Jack']['telephone']) # will print Jack's telephone
But to relate that with your intermediary arrays too:
print(data[a[0]]) # will print the entire Male subsection
print(data[a[0]][b[0]]) # will print the entire Jack record
print(data[a[0]][b[0]][c[0]]) # will print Jack's telephone
assuming that you declare a correctly:
a = ['Male', 'Female'] # Notice the capitals

I dont know, how you access data in your code, because you directly write hard coded values into a, b and c. In addition, you could print out your test via: print(a[1], b[1], c[1]).

Related

Split JSON python string to pass to function

How could I try and split my JSON with all the movies and send them separately to the finalScore function and then append to my list at the end?
Sorry the explain and code was big, I was really unsure of describing and showing my problem without showing what I have done it.
This is my current code:
import datetime, json
def jsonData(data):
return json.loads(data)
def findContentInformation(content):
contentData = convert_content_data(content)
for info in contentData:
contentBaseScore = info['popularityScore']
contentBrand = info['brand']
contentType = info['contentType']
contentName = info['title']
contentInformation = [contentBaseScore, contentBrand, contentType, contentName]
return contentInformation
#Calculator the overall rating for the film
def getRating(content_data, userBrandRate, userTypeRate):
contentScore = {}
#RATING
rating = 0
# Collecting information from the content to be tested
contentInfo = findContentInformation(content_data) # The content being tested
popularityScore += contentInfo[0] #Find base score and add this to popScore
#getBrandRating = str((userBrandPreference[contentInfo[1]])) # Get brand preference
# Check if brand preference rating is a valid rating type
if brandRating in Ratings:
popularityScore += Ratings[brandRating] # Get the ratings score & update popScore
else:
print("Unrecognized rating value found in this search")
user_content_type_preference = convert_type_preferences(content_type_preferences)
typeRating = getTypeRating(user_content_type_preference, contentInfo) # Get the type rating
# Check if type rating is a valid rating
if typeRating in Ratings:
popularityScore += Ratings[typeRating] # Update the popScore based on the rating score
else:
print("Unrecognized rating value found in this search")
contentScore[contentInfo[3]] = popularityScore
popularityScore = 0
return contentScore
result = getRating(content_data)
My output with only one movie (not sure how to use all the movies in the JSON)
JSON string:
content_data = """[{ "title": "Spider-Man", "brand": "Marvel",
"Rating": 98, "contentIs": "movie" }]"""
Output:
[{'Spider-Man': 128}]
To me it feels like you're making things unnecessarily complex. For example, you have three functions (convert_content_data, convert_preferences and convert_type_preferences) that all do the same thing - they all take one JSON encoded string and parse it. Turning those three functions into one would still be one too many functions, because I don't think json.loads, as a single side-effect, is a good candidate for an entirely separate function in the first place.
You do quite a bit of conversion also - from a JSON encoded string to a dictionary. You do that multiple times. Why not convert all your JSON once at the start of the program? Doing so will let you work with dictionaries for the rest of your tasks. Once you have a list of dictionaries, you can think of each dictionary as being one "movie-object", since that is what each dictionary represents. The brand- and content type JSON strings can also be converted once at the start of the program (instead of multiple times throughout the entire program).
EDIT - I've updated my example code.
First, I think you should put your movie data in a separate JSON file, so that you're not poluting your source code with a huge string literal. Let's name it movies.json:
[
{
"title": "Spider-Man",
"brand": "Marvel",
"availability": ["CA","FR","US"],
"availableDate": "2019-11-12T05:00:00.000Z",
"isKidsContent": true,
"popularityScore": 98,
"contentType": "movie"
},
{
"title": "Float",
"brand": "Pixar",
"availability": ["US"],
"availableDate": "2019-11-12T05:00:00.000Z",
"isKidsContent": true,
"popularityScore": 87,
"contentType": "short"
},
{
"title": "Avatar",
"brand": "21st Century Fox",
"availability": ["US","CA","FR","ES","DE"],
"availableDate": "2019-11-12T05:00:00.000Z",
"isKidsContent": false,
"popularityScore": 99,
"contentType": "movie"
},
{
"title": "Chapter 1: The Mandalorian",
"brand": "Star Wars", "availability": ["US","CA"],
"availableDate": "2019-11-02T23:00:00.000Z",
"isKidsContent": false,
"popularityScore": 92,
"contentType": "series"
},
{
"title": "Marvel Studios Avengers: Endgame",
"brand": "Marvel",
"availability": ["CA","FR","ES","DE","US"],
"availableDate": "2019-11-11T23:00:00.000Z",
"isKidsContent": false,
"popularityScore": 87,
"contentType": "movie"
},
{
"title": "Disney Mickey Mouse Clubhouse: Mickey Goes Fishing",
"brand": "Disney",
"availability": ["US"],
"availableDate": "2019-09-11T22:00:00.000Z",
"isKidsContent": true,
"popularityScore": 75,
"contentType": "series"
},
{
"title": "Disney High School Musical: The Musical: The Series: Act Two",
"brand": "Disney",
"availability": ["US","FR","ES"],
"availableDate": "2020-01-10T08:00:00.000Z",
"isKidsContent": false,
"popularityScore": 97,
"contentType": "series"
}
]
Then, I would also create a JSON file for all of your users. This is where you would store the user preferences. Let's name it users.json:
[
{
"name": "Bob",
"preferences": {
"brand": {
"Star Wars": "love",
"Disney": "like",
"Marvel": "dislike",
"Pixar": "dislike"
},
"contentType": {
"movie": "like",
"series": "like",
"short": "dislike"
}
}
},
{
"name": "Joe",
"preferences": {
"brand": {
"Star Wars": "dislike",
"Disney": "dislike",
"Marvel": "dislike",
"Pixar": "dislike"
},
"contentType": {
"movie": "like",
"series": "like",
"short": "dislike"
}
}
}
]
This users.json file has two users named Bob and Joe, with different preferences.
Then, the code:
def evaluate_score(user, movie):
"""
Evaluates and returns the score a user would assign to
a given movie based on the user's brand- and content-type preferences.
"""
ratings = {
"dislike": -20,
"indifferent": 0,
"like": 10,
"adore": 30,
"love": 50
}
brand_score = ratings.get(user["preferences"]["brand"].get(movie["brand"])) or 0
content_type_score = ratings.get(user["preferences"]["contentType"].get(movie["contentType"])) or 0
return movie["popularityScore"] + brand_score + content_type_score
def get_all_scores(user, movies):
for movie in movies:
yield {
"title": movie["title"],
"score": evaluate_score(user, movie)
}
def main():
import json
from operator import itemgetter
with open("movies.json", "r") as file:
movies = json.load(file)
with open("users.json", "r") as file:
users = json.load(file)
for user in users:
print(user["name"].center(16, "-"))
for movie in sorted(get_all_scores(user, movies), key=itemgetter("score"), reverse=True):
print("{}: {}".format(movie["title"], movie["score"]))
print()
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output:
------Bob-------
Chapter 1: The Mandalorian: 152
Disney High School Musical: The Musical: The Series: Act Two: 117
Avatar: 109
Disney Mickey Mouse Clubhouse: Mickey Goes Fishing: 95
Spider-Man: 88
Marvel Studios Avengers: Endgame: 77
Float: 47
------Joe-------
Avatar: 109
Spider-Man: 88
Disney High School Musical: The Musical: The Series: Act Two: 87
Chapter 1: The Mandalorian: 82
Marvel Studios Avengers: Endgame: 77
Disney Mickey Mouse Clubhouse: Mickey Goes Fishing: 65
Float: 47
>>>
We've got two functions and one generator:
evaluate_score (which I called get_movie_score before) takes a user dictionary and a movie dictionary, and returns the score (an integer) which this user would assign to the given movie, based on that user's preferences.
get_all_scores is a generator that takes a user and a list of movie dictionaries. It gets the scores for all movies, according to that user, and yields dictionaries - where each dictionary contains two key-value pairs: The movie title, and the final score assigned by that user. This generator will be userful later in the main function, when we want to print the final result in descending order.
main is the main entry-point of the entire script. It first opens and parses our two JSON files, and then, for every user, print a sorted summary (in descending order based on score) of that user's scores for all movies.

Convert multiple string stored in a variable into a single list in python

I hope everyone is doing well.
I need a little help where I need to get all the strings from a variable and need to store into a single list in python.
For example -
I have json file from where I am getting ids and all the ids are getting stored into a variable called id as below when I run print(id)
17298626-991c-e490-bae6-47079c6e2202
17298496-19bd-2f89-7b5f-881921abc632
17298698-3e17-7a9b-b337-aacfd9483b1b
172986ac-d91d-c4ea-2e50-d53700480dd0
172986d0-18aa-6f51-9c62-6cb087ad31e5
172986f4-80f0-5c21-3aee-12f22a5f4322
17298712-a4ac-7b36-08e9-8512fa8322dd
17298747-8cc6-d9d0-8d05-50adf228c029
1729875c-050f-9a99-4850-bb0e6ad35fb0
1729875f-0d50-dc94-5515-b4891c40d81c
17298761-c26b-3ce5-e77e-db412c38a5b4
172987c8-2b5d-0d94-c365-e8407b0a8860
1729881a-e583-2b54-3a52-d092020d9c1d
1729881c-64a2-67cf-d561-6e5e38ed14cb
172987ec-7a20-7eb6-3ebe-a9fb621bb566
17298813-7ac4-258b-d6f9-aaf43f9147b1
17298813-f1ef-d28a-0817-5f3b86c3cf23
17298828-b62b-9ee6-248b-521b0663226e
17298825-7449-2fcb-378e-13671cb4688a
I want these all values to be stored into a single list.
Can some please help me out with this.
Below is the code I am using:
import json
with open('requests.json') as f:
data = json.load(f)
print(type(data))
for i in data:
if 'traceId' in i:
id = i['traceId']
newid = id.split()
#print(type(newid))
print(newid)
And below is my json file looks like:
[
{
"id": "376287298-hjd8-jfjb-khkf-6479280283e9",
"submittedTime": 1591692502558,
"traceId": "17298626-991c-e490-bae6-47079c6e2202",
"userName": "ABC",
"onlyChanged": true,
"description": "Not Required",
"startTime": 1591694487929,
"result": "NONE",
"state": "EXECUTING",
"paused": false,
"application": {
"id": "16b22a09-a840-f4d9-f42a-64fd73fece57",
"name": "XYZ"
},
"applicationProcess": {
"id": "dihihdosfj9279278yrie8ue",
"name": "Deploy",
"version": 12
},
"environment": {
"id": "fkjdshkjdshglkjdshgldshldsh03r937837",
"name": "DEV"
},
"snapshot": {
"id": "djnglkfdglki98478yhgjh48yr844h",
"name": "DEV_snapshot"
},
},
{
"id": "17298495-f060-3e9d-7097-1f86d5160789",
"submittedTime": 1591692844597,
"traceId": "17298496-19bd-2f89-7b5f-881921abc632",
"userName": "UYT,
"onlyChanged": true,
"startTime": 1591692845543,
"result": "NONE",
"state": "EXECUTING",
"paused": false,
"application": {
"id": "osfodsho883793hgjbv98r3098w",
"name": "QA"
},
"applicationProcess": {
"id": "owjfoew028r2uoieroiehojehfoef",
"name": "EDC",
"version": 5
},
"environment": {
"id": "16cf69c5-4194-e557-707d-0663afdbceba",
"name": "DTESTU"
},
}
]
From where I am trying to get the traceId.
you could use simple split method like the follwing:
ids = '''17298626-991c-e490-bae6-47079c6e2202 17298496-19bd-2f89-7b5f-881921abc632 17298698-3e17-7a9b-b337-aacfd9483b1b 172986ac-d91d-c4ea-2e50-d53700480dd0 172986d0-18aa-6f51-9c62-6cb087ad31e5 172986f4-80f0-5c21-3aee-12f22a5f4322 17298712-a4ac-7b36-08e9-8512fa8322dd 17298747-8cc6-d9d0-8d05-50adf228c029 1729875c-050f-9a99-4850-bb0e6ad35fb0 1729875f-0d50-dc94-5515-b4891c40d81c 17298761-c26b-3ce5-e77e-db412c38a5b4 172987c8-2b5d-0d94-c365-e8407b0a8860 1729881a-e583-2b54-3a52-d092020d9c1d 1729881c-64a2-67cf-d561-6e5e38ed14cb 172987ec-7a20-7eb6-3ebe-a9fb621bb566 17298813-7ac4-258b-d6f9-aaf43f9147b1 17298813-f1ef-d28a-0817-5f3b86c3cf23 17298828-b62b-9ee6-248b-521b0663226e 17298825-7449-2fcb-378e-13671cb4688a'''
l = ids.split(" ")
print(l)
This will give the following result, I assumed that the separator needed is simple space you can adjust properly:
['17298626-991c-e490-bae6-47079c6e2202', '17298496-19bd-2f89-7b5f-881921abc632', '17298698-3e17-7a9b-b337-aacfd9483b1b', '172986ac-d91d-c4ea-2e50-d53700480dd0', '172986d0-18aa-6f51-9c62-6cb087ad31e5', '172986f4-80f0-5c21-3aee-12f22a5f4322', '17298712-a4ac-7b36-08e9-8512fa8322dd', '17298747-8cc6-d9d0-8d05-50adf228c029', '1729875c-050f-9a99-4850-bb0e6ad35fb0', '1729875f-0d50-dc94-5515-b4891c40d81c', '17298761-c26b-3ce5-e77e-db412c38a5b4', '172987c8-2b5d-0d94-c365-e8407b0a8860', '1729881a-e583-2b54-3a52-d092020d9c1d', '1729881c-64a2-67cf-d561-6e5e38ed14cb', '172987ec-7a20-7eb6-3ebe-a9fb621bb566', '17298813-7ac4-258b-d6f9-aaf43f9147b1', '17298813-f1ef-d28a-0817-5f3b86c3cf23', '17298828-b62b-9ee6-248b-521b0663226e', '17298825-7449-2fcb-378e-13671cb4688a']
Edit
You get list of lists because each iteration you read only 1 id, so what you need to do is to initiate an empty list and append each id to it in the following way:
l = []
for i in data
if 'traceId' in i:
id = i['traceId']
l.append(id)
you can append the ids variable to the list such as,
#list declaration
l1=[]
#this must be in your loop
l1.append(ids)
I'm assuming you get the id as a str type value. Using id.split() will return a list of all ids in one single Python list, as each id is separated by space here in your example.
id = """17298626-991c-e490-bae6-47079c6e2202 17298496-19bd-2f89-7b5f-881921abc632
17298698-3e17-7a9b-b337-aacfd9483b1b 172986ac-d91d-c4ea-2e50-d53700480dd0
172986d0-18aa-6f51-9c62-6cb087ad31e5 172986f4-80f0-5c21-3aee-12f22a5f4322
17298712-a4ac-7b36-08e9-8512fa8322dd 17298747-8cc6-d9d0-8d05-50adf228c029
1729875c-050f-9a99-4850-bb0e6ad35fb0 1729875f-0d50-dc94-5515-b4891c40d81c
17298761-c26b-3ce5-e77e-db412c38a5b4 172987c8-2b5d-0d94-c365-e8407b0a8860
1729881a-e583-2b54-3a52-d092020d9c1d 1729881c-64a2-67cf-d561-6e5e38ed14cb
172987ec-7a20-7eb6-3ebe-a9fb621bb566 17298813-7ac4-258b-d6f9-aaf43f9147b1
17298813-f1ef-d28a-0817-5f3b86c3cf23 17298828-b62b-9ee6-248b-521b0663226e
17298825-7449-2fcb-378e-13671cb4688a"""
id_list = id.split()
print(id_list)
Output:
['17298626-991c-e490-bae6-47079c6e2202', '17298496-19bd-2f89-7b5f-881921abc632',
'17298698-3e17-7a9b-b337-aacfd9483b1b', '172986ac-d91d-c4ea-2e50-d53700480dd0',
'172986d0-18aa-6f51-9c62-6cb087ad31e5', '172986f4-80f0-5c21-3aee-12f22a5f4322',
'17298712-a4ac-7b36-08e9-8512fa8322dd', '17298747-8cc6-d9d0-8d05-50adf228c029',
'1729875c-050f-9a99-4850-bb0e6ad35fb0', '1729875f-0d50-dc94-5515-b4891c40d81c',
'17298761-c26b-3ce5-e77e-db412c38a5b4', '172987c8-2b5d-0d94-c365-e8407b0a8860',
'1729881a-e583-2b54-3a52-d092020d9c1d', '1729881c-64a2-67cf-d561-6e5e38ed14cb',
'172987ec-7a20-7eb6-3ebe-a9fb621bb566', '17298813-7ac4-258b-d6f9-aaf43f9147b1',
'17298813-f1ef-d28a-0817-5f3b86c3cf23', '17298828-b62b-9ee6-248b-521b0663226e',
'17298825-7449-2fcb-378e-13671cb4688a']
split() splits by default with space as a separator. You can use the sep argument to use any other separator if needed.

Accessing nested objects with python

I have a response that I receive from foursquare in the form of json. I have tried to access the certain parts of the object but have had no success. How would I access say the address of the object? Here is my code that I have tried.
url = 'https://api.foursquare.com/v2/venues/explore'
params = dict(client_id=foursquare_client_id,
client_secret=foursquare_client_secret,
v='20170801', ll=''+lat+','+long+'',
query=mealType, limit=100)
resp = requests.get(url=url, params=params)
data = json.loads(resp.text)
msg = '{} {}'.format("Restaurant Address: ",
data['response']['groups'][0]['items'][0]['venue']['location']['address'])
print(msg)
Here is an example of json response:
"items": [
{
"reasons": {
"count": 0,
"items": [
{
"summary": "This spot is popular",
"type": "general",
"reasonName": "globalInteractionReason"
}
]
},
"venue": {
"id": "412d2800f964a520df0c1fe3",
"name": "Central Park",
"contact": {
"phone": "2123106600",
"formattedPhone": "(212) 310-6600",
"twitter": "centralparknyc",
"instagram": "centralparknyc",
"facebook": "37965424481",
"facebookUsername": "centralparknyc",
"facebookName": "Central Park"
},
"location": {
"address": "59th St to 110th St",
"crossStreet": "5th Ave to Central Park West",
"lat": 40.78408342593807,
"lng": -73.96485328674316,
"labeledLatLngs": [
{
"label": "display",
"lat": 40.78408342593807,
"lng": -73.96485328674316
}
],
the full response can be found here
Like so
addrs=data['items'][2]['location']['address']
Your code (at least as far as loading and accessing the object) looks correct to me. I loaded the json from a file (since I don't have your foursquare id) and it worked fine. You are correctly using object/dictionary keys and array positions to navigate to what you want. However, you mispelled "address" in the line where you drill down to the data. Adding the missing 'a' made it work. I'm also correcting the typo in the URL you posted.
I answered this assuming that the example JSON you linked to is what is stored in data. If that isn't the case, a relatively easy way to see exact what python has stored in data is to import pprint and use it like so: pprint.pprint(data).
You could also start an interactive python shell by running the program with the -i switch and examine the variable yourself.
data["items"][2]["location"]["address"]
This will access the address for you.
You can go to any level of nesting by using integer index in case of an array and string index in case of a dict.
Like in your case items is an array
#items[int index]
items[0]
Now items[0] is a dictionary so we access by string indexes
item[0]['location']
Now again its an object s we use string index
item[0]['location']['address]

Counting Items in Python from a JSON file

I'm trying to search a data file, for example Yelp.json. It has businesses in it in LA, Boston, DC.
I wrote this:
# Python 2
# read json
with open('updated_data.json') as facts_data:
data = json.load(facts_data)
# return every unique locality along with how often it occurs
locality = []
unique_locality = []
# Load items into lists
for item in data:
locality.append(data["payload"]["locality"])
if data["payload"]["locality"] not in unique_locality:
print unique_locality.append(data["payload"]["locality"])
# Loops over unique_locality and count from locality
print "Unique Locality Count:", unique_locality, locality.count(data["payload"]["locality"])
But I get an answer of "Portsmouth 1", which means it is not providing all the cities and might not even be provided all the counts. My goal for this section is to search that JSON file and have it say "DC: 10 businesses, LA: 20 businesses, Boston: 2 businesses." Each payload is a grouping of info about a single business and "locality" is just the city. So I want it to find how many unique cities there are and then how many businesses in each city. So one payload could be Starbucks in la, another payload could be Starbucks in dc, another could be Chipotle in la.
Example of JSON file (JSONlite.com says its valid):
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.9459720|-2.1971226|20|within_50m|4\"]",
"latitude": "56.945972",
"locality": "Stonehaven",
"_records_touched": "{\"crawl\":8,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "The Lodge, Dunottar",
"email": "dunnottarcastle#btconnect.com",
"existence_ml": 0.5694238217658721,
"domain_aggregate": "",
"name": "Dunnottar Castle",
"search_tags": ["Dunnottar Castle Aberdeenshire", "Dunotter Castle"],
"admin_region": "Scotland",
"existence": 1,
"category_labels": [
["Landmarks", "Buildings and Structures"]
],
"post_town": "Stonehaven",
"region": "Kincardineshire",
"review_count": "719",
"geocode_level": "within_50m",
"tel": "01569 762173",
"placerank": 65,
"longitude": "-2.197123",
"placerank_ml": 37.27916073464469,
"fax": "01330 860325",
"category_ids_text_search": "",
"website": "http://www.dunnottarcastle.co.uk",
"status": "1",
"geocode_confidence": "20",
"postcode": "AB39 2TL",
"category_ids": [108],
"country": "gb",
"_geocode_quality": "4",
"uuid": "3867aaf3-12ab-434f-b12b-5d627b3359c3"
},
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.237480|-5.073578|20|within_50m|4\"]",
"latitude": "56.237480",
"locality": "Inveraray",
"_records_touched": "{\"crawl\":11,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "Cherry Park",
"email": "enquiries#inveraray-castle.com",
"longitude": "-5.073578",
"domain_aggregate": "",
"name": "Inveraray Castle",
"admin_region": "Scotland",
"search_tags": ["Inveraray Castle Tea Room", "Inverary Castle"],
"existence": 1,
"category_labels": [
["Social", "Food and Dining", "Restaurants"]
],
"region": "Argyll",
"review_count": "532",
"geocode_level": "within_50m",
"tel": "01499 302203",
"placerank": 67,
"post_town": "Inveraray",
"placerank_ml": 41.19978087352266,
"fax": "01499 302421",
"category_ids_text_search": "",
"website": "http://www.inveraray-castle.com",
"status": "1",
"geocode_confidence": "20",
"postcode": "PA32 8XE",
"category_ids": [347],
"country": "gb",
"_geocode_quality": "4",
"existence_ml": 0.7914881102847783,
"uuid": "8278ab80-2cd1-4dbd-9685-0d0036b681eb"
},
If your "json" semantics is something like
{"payload":{ CONTENT_A }, "payload":{ CONTENT_B }, ..., "payload":{ CONTENT_LAST }}
it is a valid json string, but after you json.loads the string, it will be evaluated as
{"payload":{ CONTENT_LAST }}
And that is why you end up with one city and one business count.
You can verify this behaviour on this online json parser http://json.parser.online.fr/ by checking JS eval field.
In this case, one way to preprocess your json string is to get rid of the dummy "payload" key and wrap the content dictionary directly in a list. You will have a json string in the following format.
{[{CONTENT_A}, {CONTENT_B} ..., {CONTENT_LAST} ]}
Assume your json string is now a list of payload dictionary, and you have json.loads(json_str) to data.
As you iterate through json payload, build a lookup table along the way.
This will handle duplicated city for you automatically since business in the same city will be hashed to the same list.
city_business_map = {}
for payload in data:
city = payload['locality']
business = payload['name']
if city not in city_business_map:
city_business_map[city] = []
city_business_map[city].append(business)
Then later on, you can easily present the solution by
for city, business_list in city_business_map.items():
print city, len(business_list)
If you want to count the unique business in each city, initialize the value to set instead of list.
If this is an overkill, instead of initialize to list or set, just associate a counter with each key.

Json organization

I use JSON for one of my project. For example, I have the JSON structure.
{
"address":{
"streetAddress": {
"aptnumber" : "21",
"building_number" : "2nd",
"street" : "Wall Street",
},
"city":"New York"
},
"phoneNumber":
[
{
"type":"home",
"number":"212 555-1234"
}
]
}
Now I have a bunch of modules using this structure, and it expects to see certain fields in the received json. For the example above, I have two files: address_manager and phone_number_manager. Each will be passed the relevant information. So address_manager will expect a dict that has keys 'streetAddress' and 'city'.
My question is: Is it possible to set up a constant structure so that every time I change the name of a field in my JSON structure (e.g. I want to change 'streetAddress' to 'address'), I don't have to make change in several places?
My naive approach is to have a bunch of constants (e.g.
ADDRESS = "address"
ADDRESS_STREET_ADDRESS = "streetAddress"
..etc..
) and so if I want to change the name of one of my fields in JSON structure, I just have to make change in one place. However, this seems to be very inefficient because my constant naming would be terribly long once I reach the third or fourth layer of the JSON structure (e.g. ADDRESS_STREETADDRESS_APTNUMBER, ADDRESS_STREETADDRESS_BUILDINGNUMBER)
I am doing this in python, but any generic answer would be OK.
Thanks.
Like Cameron Sparr suggested in a comment, don't have your constant names include all levels of your JSON structure. If you have the same data in multiple places, it will actually be better if you reuse the same constant. For example, suppose your JSON has a phone number included in the address:
{
"address": {
"streetAddress": {
"aptnumber" : "21",
"building_number" : "2nd",
"street" : "Wall Street"
},
"city":"New York",
"phoneNumber":
[
{
"type":"home",
"number":"212 555-1234"
}
]
},
"phoneNumber":
[
{
"type":"home",
"number":"212 555-1234"
}
]
}
Why not have a single constant PHONES = 'phoneNumber' that you use in both places? Your constants will have shorter names, and it is more logically coherent. You would end up using it like this (assuming JSON is stored in person):
person[ADDRESS][PHONES][x] # Phone numbers associated with that address
person[PHONES][x] # Phone numbers associated with the person
Instead of
person[ADDRESS][ADDRESS_PHONES][x]
person[PHONE_NUMBERS][x]
You can write a script than when you change the constant, change the structure in all json files.
Example:
import json
CHANGE = ('steet', 'streetAddress')
json_data = None
with open('file.json') as jfile:
json_data = jfile.load(jfile)
json_data[CHANGE[1]], json_data[CHANGE[0]] = json_data[CHANGE[0]], None

Categories