Split JSON python string to pass to function - python

How could I try and split my JSON with all the movies and send them separately to the finalScore function and then append to my list at the end?
Sorry the explain and code was big, I was really unsure of describing and showing my problem without showing what I have done it.
This is my current code:
import datetime, json
def jsonData(data):
return json.loads(data)
def findContentInformation(content):
contentData = convert_content_data(content)
for info in contentData:
contentBaseScore = info['popularityScore']
contentBrand = info['brand']
contentType = info['contentType']
contentName = info['title']
contentInformation = [contentBaseScore, contentBrand, contentType, contentName]
return contentInformation
#Calculator the overall rating for the film
def getRating(content_data, userBrandRate, userTypeRate):
contentScore = {}
#RATING
rating = 0
# Collecting information from the content to be tested
contentInfo = findContentInformation(content_data) # The content being tested
popularityScore += contentInfo[0] #Find base score and add this to popScore
#getBrandRating = str((userBrandPreference[contentInfo[1]])) # Get brand preference
# Check if brand preference rating is a valid rating type
if brandRating in Ratings:
popularityScore += Ratings[brandRating] # Get the ratings score & update popScore
else:
print("Unrecognized rating value found in this search")
user_content_type_preference = convert_type_preferences(content_type_preferences)
typeRating = getTypeRating(user_content_type_preference, contentInfo) # Get the type rating
# Check if type rating is a valid rating
if typeRating in Ratings:
popularityScore += Ratings[typeRating] # Update the popScore based on the rating score
else:
print("Unrecognized rating value found in this search")
contentScore[contentInfo[3]] = popularityScore
popularityScore = 0
return contentScore
result = getRating(content_data)
My output with only one movie (not sure how to use all the movies in the JSON)
JSON string:
content_data = """[{ "title": "Spider-Man", "brand": "Marvel",
"Rating": 98, "contentIs": "movie" }]"""
Output:
[{'Spider-Man': 128}]

To me it feels like you're making things unnecessarily complex. For example, you have three functions (convert_content_data, convert_preferences and convert_type_preferences) that all do the same thing - they all take one JSON encoded string and parse it. Turning those three functions into one would still be one too many functions, because I don't think json.loads, as a single side-effect, is a good candidate for an entirely separate function in the first place.
You do quite a bit of conversion also - from a JSON encoded string to a dictionary. You do that multiple times. Why not convert all your JSON once at the start of the program? Doing so will let you work with dictionaries for the rest of your tasks. Once you have a list of dictionaries, you can think of each dictionary as being one "movie-object", since that is what each dictionary represents. The brand- and content type JSON strings can also be converted once at the start of the program (instead of multiple times throughout the entire program).
EDIT - I've updated my example code.
First, I think you should put your movie data in a separate JSON file, so that you're not poluting your source code with a huge string literal. Let's name it movies.json:
[
{
"title": "Spider-Man",
"brand": "Marvel",
"availability": ["CA","FR","US"],
"availableDate": "2019-11-12T05:00:00.000Z",
"isKidsContent": true,
"popularityScore": 98,
"contentType": "movie"
},
{
"title": "Float",
"brand": "Pixar",
"availability": ["US"],
"availableDate": "2019-11-12T05:00:00.000Z",
"isKidsContent": true,
"popularityScore": 87,
"contentType": "short"
},
{
"title": "Avatar",
"brand": "21st Century Fox",
"availability": ["US","CA","FR","ES","DE"],
"availableDate": "2019-11-12T05:00:00.000Z",
"isKidsContent": false,
"popularityScore": 99,
"contentType": "movie"
},
{
"title": "Chapter 1: The Mandalorian",
"brand": "Star Wars", "availability": ["US","CA"],
"availableDate": "2019-11-02T23:00:00.000Z",
"isKidsContent": false,
"popularityScore": 92,
"contentType": "series"
},
{
"title": "Marvel Studios Avengers: Endgame",
"brand": "Marvel",
"availability": ["CA","FR","ES","DE","US"],
"availableDate": "2019-11-11T23:00:00.000Z",
"isKidsContent": false,
"popularityScore": 87,
"contentType": "movie"
},
{
"title": "Disney Mickey Mouse Clubhouse: Mickey Goes Fishing",
"brand": "Disney",
"availability": ["US"],
"availableDate": "2019-09-11T22:00:00.000Z",
"isKidsContent": true,
"popularityScore": 75,
"contentType": "series"
},
{
"title": "Disney High School Musical: The Musical: The Series: Act Two",
"brand": "Disney",
"availability": ["US","FR","ES"],
"availableDate": "2020-01-10T08:00:00.000Z",
"isKidsContent": false,
"popularityScore": 97,
"contentType": "series"
}
]
Then, I would also create a JSON file for all of your users. This is where you would store the user preferences. Let's name it users.json:
[
{
"name": "Bob",
"preferences": {
"brand": {
"Star Wars": "love",
"Disney": "like",
"Marvel": "dislike",
"Pixar": "dislike"
},
"contentType": {
"movie": "like",
"series": "like",
"short": "dislike"
}
}
},
{
"name": "Joe",
"preferences": {
"brand": {
"Star Wars": "dislike",
"Disney": "dislike",
"Marvel": "dislike",
"Pixar": "dislike"
},
"contentType": {
"movie": "like",
"series": "like",
"short": "dislike"
}
}
}
]
This users.json file has two users named Bob and Joe, with different preferences.
Then, the code:
def evaluate_score(user, movie):
"""
Evaluates and returns the score a user would assign to
a given movie based on the user's brand- and content-type preferences.
"""
ratings = {
"dislike": -20,
"indifferent": 0,
"like": 10,
"adore": 30,
"love": 50
}
brand_score = ratings.get(user["preferences"]["brand"].get(movie["brand"])) or 0
content_type_score = ratings.get(user["preferences"]["contentType"].get(movie["contentType"])) or 0
return movie["popularityScore"] + brand_score + content_type_score
def get_all_scores(user, movies):
for movie in movies:
yield {
"title": movie["title"],
"score": evaluate_score(user, movie)
}
def main():
import json
from operator import itemgetter
with open("movies.json", "r") as file:
movies = json.load(file)
with open("users.json", "r") as file:
users = json.load(file)
for user in users:
print(user["name"].center(16, "-"))
for movie in sorted(get_all_scores(user, movies), key=itemgetter("score"), reverse=True):
print("{}: {}".format(movie["title"], movie["score"]))
print()
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output:
------Bob-------
Chapter 1: The Mandalorian: 152
Disney High School Musical: The Musical: The Series: Act Two: 117
Avatar: 109
Disney Mickey Mouse Clubhouse: Mickey Goes Fishing: 95
Spider-Man: 88
Marvel Studios Avengers: Endgame: 77
Float: 47
------Joe-------
Avatar: 109
Spider-Man: 88
Disney High School Musical: The Musical: The Series: Act Two: 87
Chapter 1: The Mandalorian: 82
Marvel Studios Avengers: Endgame: 77
Disney Mickey Mouse Clubhouse: Mickey Goes Fishing: 65
Float: 47
>>>
We've got two functions and one generator:
evaluate_score (which I called get_movie_score before) takes a user dictionary and a movie dictionary, and returns the score (an integer) which this user would assign to the given movie, based on that user's preferences.
get_all_scores is a generator that takes a user and a list of movie dictionaries. It gets the scores for all movies, according to that user, and yields dictionaries - where each dictionary contains two key-value pairs: The movie title, and the final score assigned by that user. This generator will be userful later in the main function, when we want to print the final result in descending order.
main is the main entry-point of the entire script. It first opens and parses our two JSON files, and then, for every user, print a sorted summary (in descending order based on score) of that user's scores for all movies.

Related

Python: Iterate JSON and remove items with specific criteria

I am trying to filter out data from API JSON response with Python and I get weird results. I would be glad if somebody can guide me how to deal with the situation.
The main idea is to remove irrelevant data in the JSON and keep only the data that is associated with particular people which I hold in a list.
Here is a snip of the JSON file:
{
"result": [
{
"number": "Number1",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "John Doe",
"link": "https://some_link.com"
}
},
{
"number": "Number2",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-10 11:07:13",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Tyrell Greenley",
"link": "https://some_link.com"
}
},
{
"number": "Number3",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-20 10:23:35",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Delmar Vachon",
"link": "https://some_link.com"
}
},
{
"number": "Number4",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Samual Isham",
"link": "https://some_link.com"
}
}
]
}
Here is the Python code:
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
# Load JSON file
with open('extract.json', 'r') as input_file:
input_data = json.load(input_file)
# Create a function to clear the data
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in data:
print(elem['assigned_to']['display_value'] not in users)
if elem['assigned_to']['display_value'] not in users:
print('Removing {} from JSON as not present in list of names.'.format(elem['assigned_to']['display_value']))
data.remove(elem)
else:
print('Keeping the record for {} in JSON.'.format(elem['assigned_to']['display_value']))
return data
cd = clear_data(input_data['result'], users_test)
And here is the output, which seems to iterate through only 2 of the items in the file:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
Process finished with exit code 0
It seems that the problem is more or less related to the .remove() method however I don't find any other suitable solution to delete these particular items that I do not need.
Here is the output of the iteration without applying the remove() method:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Tyrell Greenley from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
False
Keeping the record for Samual Isham in JSON.
Process finished with exit code 0
Note: I have left the check for the name visible on purpose.
I would appreciate any ideas to sort out the situation.
If you don't need to log info about people you are removing you could simply try
filtered = [i for i in data['result'] if i['assigned_to']['display_value'] in users_test]
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
solution = []
for user in users_test:
print(user)
for value in data['result']:
if user == value['assigned_to']['display_value']:
solution.append(value)
print(solution)
for more efficient code, as asked by #NomadMonad
solution = list(filter(lambda x: x['assigned_to']['display_value'] in users_test, data['result']))
You are modifying a dictionary while at the same time iterating through it. Check out this blog post which describes this behavior.
A safer way to do this is to make a copy of your dictionary to iterate over, and to delete from your original dictionary:
import copy
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in copy.deepcopy(data): # deepcopy handles nested dicts
# Still call data.remove() in here

How to extract objects from nested lists from a Json file with Python?

I have a response that I receive from Lobbyview in the form of json. I tried to put it in data frame to access only some variables, but with no success. How can I access only some variables such as the id and the committees in a format exportable to .dta ? Here is the code I have tried.
import requests, json
query = {"naics": "424430"}
results = requests.post('https://www.lobbyview.org/public/api/reports',
data = json.dumps(query))
print(results.json())
import pandas as pd
b = pd.DataFrame(results.json())
_id = data["_id"]
committee = data["_source"]["specific_issues"][0]["bills_by_algo"][0]["committees"]
An observation of the json looks like this:
"_score": 4.421936,
"_type": "object",
"_id": "5EZUMbQp3hGKH8Uq2Vxuke",
"_source":
{
"issue_codes": ["CPT"],
"received": 1214320148,
"client_name": "INTELLECTUAL PROPERTY OWNERS ASSOCIATION",
"amount": 240000,
"client":
{
"legal_name": "INTELLECTUAL PROPERTY OWNERS ASSOCIATION",
"name": "INTELLECTUAL PROPERTY OWNERS ASSOCIATION",
"naics": null,
"gvkey": null,
"ticker": "Unlisted",
"id": null,
"bvdid": "US131283992L"},
"specific_issues": [
{
"text": "H.R. 34, H.R. 1908, H.R. 2336, H.R. 3093 S. 522, S. 681, S. 1145, S. 1745",
"bills_by_algo": [
{
"titles": ["To amend title 35, United States Code, to provide for patent reform.", "Patent Reform Act of 2007", "Patent Reform Act of 2007", "Patent Reform Act of 2007"],
"top_terms": ["Commerce", "Administrative fees"],
"sponsor":
{
"firstname": "Howard",
"district": 28,
"title": "rep",
"id": 400025
},
"committees": ["House Judiciary"],
"introduced": 1176868800,
"type": "HR", "id": "110_HR1908"},
{
"titles": ["To amend title 35, United States Code, relating to the funding of the United States Patent and Trademark Office."],
"top_terms": ["Commerce", "Administrative fees"],
"sponsor":
{
"firstname": "Howard",
"district": 28,
"title": "rep",
"id": 400025
},
"committees": ["House Judiciary"],
"introduced": 1179288000,
"type": "HR",
"id": "110_HR2336"
}],
"gov_entities": ["U.S. House of Representatives", "Patent and Trademark Office (USPTO)", "U.S. Senate", "UNDETERMINED", "U.S. Trade Representative (USTR)"],
"lobbyists": ["Valente, Thomas Silvio", "Wamsley, Herbert C"],
"year": 2007,
"issue": "CPT",
"id": "S4nijtRn9Q5NACAmbqFjvZ"}],
"year": 2007,
"is_latest_amendment": true,
"type": "MID-YEAR AMENDMENT",
"id": "1466CDCD-BA3D-41CE-B7A1-F9566573611A",
"alternate_name": "INTELLECTUAL PROPERTY OWNERS ASSOCIATION"
},
"_index": "collapsed"}```
Since the data that you specified is nested pretty deeply in the JSON-response, you have to loop through it and save it to a list temporarily. To understand the response data better, I would advice you to use some tool to look into the JSON structure, like this online JSON-Viewer. Not every entry in the JSON contains the necessary data, therefore I try to catch the error through a try and except. To make sure that the id and committees are matched correctly, I chose to add them as small dicts to the list. This list can then be read into Pandas with ease. Saving to .dta requires you to convert the lists inside the committees column to strings, instead you might also want to save as .csv for a more generally usable format.
import requests, json
import pandas as pd
query = {"naics": "424430"}
results = requests.post(
"https://www.lobbyview.org/public/api/reports", data=json.dumps(query)
)
json_response = results.json()["result"]
# to save the JSON response
# with open("data.json", "w") as outfile:
# json.dump(results.json()["result"], outfile)
resulting_data = []
# loop through the response
for data in json_response:
# try to find entries with specific issues, bills_by_algo and committees
try:
# loop through the special issues
for special_issue in data["specific_issues"]:
_id = special_issue["id"]
# loop through the bills_by_algo's
for x in special_issue["bills_by_algo"]:
# append the id and committees in a dict
resulting_data.append(({"id": _id, "committees": x["committees"]}))
except KeyError as e:
print(e, "not found in entry.")
continue
# create a DataFrame
df = pd.DataFrame(resulting_data)
# export of list objects in the column is not supported by .dta, therefore we convert
# to strings with ";" as delimiter
df["committees"] = ["; ".join(map(str, l)) for l in df["committees"]]
print(df)
df.to_stata("result.dta")
Results in
id committees
0 D8BxG5664FFb8AVc6KTphJ House Judiciary
1 D8BxG5664FFb8AVc6KTphJ Senate Judiciary
2 8XQE5wu3mU7qvVPDpUWaGP House Agriculture
3 8XQE5wu3mU7qvVPDpUWaGP Senate Agriculture, Nutrition, and Forestry
4 kzZRLAHdMK4YCUQtQAdCPY House Agriculture
.. ... ...
406 ZxXooeLGVAKec9W2i32hL5 House Agriculture
407 ZxXooeLGVAKec9W2i32hL5 Senate Agriculture, Nutrition, and Forestry; H...
408 ZxXooeLGVAKec9W2i32hL5 House Appropriations; Senate Appropriations
409 ahmmafKLfRP8wZay9o8GRf House Agriculture
410 ahmmafKLfRP8wZay9o8GRf Senate Agriculture, Nutrition, and Forestry
[411 rows x 2 columns]

How to remove empty { } from JSON file using python

I have done my research but I couldn't find any answers that worked.
I have the following JSON file:
{
"Cars": [{
"Manufacturer": "Audi",
"model": "R8",
"price": 50000,
"a": {
"n": "1",
"street": "ABC Street",
"city": "London",
"postcode": "TW1 1AA"
}
},
{
"Manufacturer": "Ford",
"model": "Fiesta",
"price": 10000,
"a": {
"n": 2,
"street": "DEF street",
"town": "London",
"PostCode": "TW2 2AB"
}
},
{
"Manufacturer": "VW",
"model": "Polo",
"price": 5000,
"a": {
"n": "3",
"Street": "GHI Street",
"town": "London",
"postcode": "TW3 3CD"
}
}
]
}
In my python file, to remove the JSON elements, I am using the following:
deletecar = int(input("Enter price of car to delete: "))
for item in data["Cars"]:
if deletecar == item["price"]:
item.pop("Manufacturer")
item.pop("model")
item.pop("price")
item.pop("a")
with open("testjson.json", 'w') as f:
json.dump(data, f)
When I run this, if I delete the first car in the JSON file, I find this:
{"Cars": [{}, {"Manufacturer": "Ford", ...
If I now run my program again, but I try to search for cars, the program won't work due to these empty braces.
So how can I remove them using Python?
Thanks in advance.
You need to remove the item itself, which means you need two steps:
find the index at which the item you want to remove is
remove the item from the list (with del)
And you don't need to "empty" the dict as that's not what you're looking for.
Alternatively, you could create a brand new list without the offending item using a list comprehension or a filter call e.g.
deletecar = int(input("Enter price of car to delete: "))
data['Cars'] = [
item for item in data['Cars']
if item['price'] != deletecar
]
with open("testjson.json", 'w') as f:
json.dump(data, f)
(note: this "removes" all items which match, rather than just the first as your code does).
Also you probably want to save after you're done processing, not during processing.
Since it's a list, you can find the index values in your list that match your price input. Then remove those elements from values in the 'Cars' list
deletecar = int(input("Enter price of car to delete: "))
# Get the index values of where the item is located
index_to_delete = []
for item in data["Cars"]:
if deletecar == item["price"]:
index_to_delete.append(data["Cars"].index(item))
# Since the index values will change as you delete them,
# you will have to remove them in reverse order (in case there's more than 1
# item being removed
for i in reversed(index_to_delete):
del data["Cars"][i]
# write to file
with open("testjson.json", 'w') as f:
json.dump(data, f)

Counting Items in Python from a JSON file

I'm trying to search a data file, for example Yelp.json. It has businesses in it in LA, Boston, DC.
I wrote this:
# Python 2
# read json
with open('updated_data.json') as facts_data:
data = json.load(facts_data)
# return every unique locality along with how often it occurs
locality = []
unique_locality = []
# Load items into lists
for item in data:
locality.append(data["payload"]["locality"])
if data["payload"]["locality"] not in unique_locality:
print unique_locality.append(data["payload"]["locality"])
# Loops over unique_locality and count from locality
print "Unique Locality Count:", unique_locality, locality.count(data["payload"]["locality"])
But I get an answer of "Portsmouth 1", which means it is not providing all the cities and might not even be provided all the counts. My goal for this section is to search that JSON file and have it say "DC: 10 businesses, LA: 20 businesses, Boston: 2 businesses." Each payload is a grouping of info about a single business and "locality" is just the city. So I want it to find how many unique cities there are and then how many businesses in each city. So one payload could be Starbucks in la, another payload could be Starbucks in dc, another could be Chipotle in la.
Example of JSON file (JSONlite.com says its valid):
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.9459720|-2.1971226|20|within_50m|4\"]",
"latitude": "56.945972",
"locality": "Stonehaven",
"_records_touched": "{\"crawl\":8,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "The Lodge, Dunottar",
"email": "dunnottarcastle#btconnect.com",
"existence_ml": 0.5694238217658721,
"domain_aggregate": "",
"name": "Dunnottar Castle",
"search_tags": ["Dunnottar Castle Aberdeenshire", "Dunotter Castle"],
"admin_region": "Scotland",
"existence": 1,
"category_labels": [
["Landmarks", "Buildings and Structures"]
],
"post_town": "Stonehaven",
"region": "Kincardineshire",
"review_count": "719",
"geocode_level": "within_50m",
"tel": "01569 762173",
"placerank": 65,
"longitude": "-2.197123",
"placerank_ml": 37.27916073464469,
"fax": "01330 860325",
"category_ids_text_search": "",
"website": "http://www.dunnottarcastle.co.uk",
"status": "1",
"geocode_confidence": "20",
"postcode": "AB39 2TL",
"category_ids": [108],
"country": "gb",
"_geocode_quality": "4",
"uuid": "3867aaf3-12ab-434f-b12b-5d627b3359c3"
},
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.237480|-5.073578|20|within_50m|4\"]",
"latitude": "56.237480",
"locality": "Inveraray",
"_records_touched": "{\"crawl\":11,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "Cherry Park",
"email": "enquiries#inveraray-castle.com",
"longitude": "-5.073578",
"domain_aggregate": "",
"name": "Inveraray Castle",
"admin_region": "Scotland",
"search_tags": ["Inveraray Castle Tea Room", "Inverary Castle"],
"existence": 1,
"category_labels": [
["Social", "Food and Dining", "Restaurants"]
],
"region": "Argyll",
"review_count": "532",
"geocode_level": "within_50m",
"tel": "01499 302203",
"placerank": 67,
"post_town": "Inveraray",
"placerank_ml": 41.19978087352266,
"fax": "01499 302421",
"category_ids_text_search": "",
"website": "http://www.inveraray-castle.com",
"status": "1",
"geocode_confidence": "20",
"postcode": "PA32 8XE",
"category_ids": [347],
"country": "gb",
"_geocode_quality": "4",
"existence_ml": 0.7914881102847783,
"uuid": "8278ab80-2cd1-4dbd-9685-0d0036b681eb"
},
If your "json" semantics is something like
{"payload":{ CONTENT_A }, "payload":{ CONTENT_B }, ..., "payload":{ CONTENT_LAST }}
it is a valid json string, but after you json.loads the string, it will be evaluated as
{"payload":{ CONTENT_LAST }}
And that is why you end up with one city and one business count.
You can verify this behaviour on this online json parser http://json.parser.online.fr/ by checking JS eval field.
In this case, one way to preprocess your json string is to get rid of the dummy "payload" key and wrap the content dictionary directly in a list. You will have a json string in the following format.
{[{CONTENT_A}, {CONTENT_B} ..., {CONTENT_LAST} ]}
Assume your json string is now a list of payload dictionary, and you have json.loads(json_str) to data.
As you iterate through json payload, build a lookup table along the way.
This will handle duplicated city for you automatically since business in the same city will be hashed to the same list.
city_business_map = {}
for payload in data:
city = payload['locality']
business = payload['name']
if city not in city_business_map:
city_business_map[city] = []
city_business_map[city].append(business)
Then later on, you can easily present the solution by
for city, business_list in city_business_map.items():
print city, len(business_list)
If you want to count the unique business in each city, initialize the value to set instead of list.
If this is an overkill, instead of initialize to list or set, just associate a counter with each key.

Parsing muilti dimensional Json array to Python

I'm in over my head, trying to parse JSON for my first time and dealing with a multi dimensional array.
{
"secret": "[Hidden]",
"minutes": 20,
"link": "http:\/\/www.1.com",
"bookmark_collection": {
"free_link": {
"name": "#free_link#",
"bookmarks": [
{
"name": "1",
"link": "http:\/\/www.1.com"
},
{
"name": "2",
"link": "http:\/\/2.dk"
},
{
"name": "3",
"link": "http:\/\/www.3.in"
}
]
},
"boarding_pass": {
"name": "Boarding Pass",
"bookmarks": [
{
"name": "1",
"link": "http:\/\/www.1.com\/"
},
{
"name": "2",
"link": "http:\/\/www.2.com\/"
},
{
"name": "3",
"link": "http:\/\/www.3.hk"
}
]
},
"sublinks": {
"name": "sublinks",
"link": [
"http:\/\/www.1.com",
"http:\/\/www.2.com",
"http:\/\/www.3.com"
]
}
}
}
This is divided into 3 parts, the static data on my first dimension (secret, minutes, link) Which i need to get as seperate strings.
Then I need a dictionary per "bookmark collection" which does not have fixed names, so I need the name of them and the links/names of each bookmark.
Then there is the seperate sublinks which is always the same, where I need all the links in a seperate dictionary.
I'm reading about parsing JSON but most of the stuff I find is a simple array put into 1 dictionary.
Does anyone have any good techniques to do this ?
After you parse the JSON, you will end up with a Python dict. So, suppose the above JSON is in a string named input_data:
import json
# This converts from JSON to a python dict
parsed_input = json.loads(input_data)
# Now, all of your static variables are referenceable as keys:
secret = parsed_input['secret']
minutes = parsed_input['minutes']
link = parsed_input['link']
# Plus, you can get your bookmark collection as:
bookmark_collection = parsed_input['bookmark_collection']
# Print a list of names of the bookmark collections...
print bookmark_collection.keys() # Note this contains sublinks, so remove it if needed
# Get the name of the Boarding Pass bookmark:
print bookmark_collection['boarding_pass']['name']
# Print out a list of all bookmark links as:
# Boarding Pass
# * 1: http://www.1.com/
# * 2: http://www.2.com/
# ...
for bookmark_definition in bookmark_collection.values():
# Skip sublinks...
if bookmark_definition['name'] == 'sublinks':
continue
print bookmark_definition['name']
for bookmark in bookmark_definition['bookmarks']:
print " * %(name)s: %(link)s" % bookmark
# Get the sublink definition:
sublinks = parsed_input['bookmark_collection']['sublinks']
# .. and print them
print sublinks['name']
for link in sublinks['link']:
print ' *', link
Hmm, doesn't json.loads do the trick?
For example, if your data is in a file,
import json
text = open('/tmp/mydata.json').read()
d = json.loads(text)
# first level fields
print d['minutes'] # or 'secret' or 'link'
# the names of each of bookmark_collections's items
print d['bookmark_collection'].keys()
# the sublinks section, as a dict
print d['bookmark_collection']['sublinks']
The output of this code (given your sample input above) is:
20
[u'sublinks', u'free_link', u'boarding_pass']
{u'link': [u'http://www.1.com', u'http://www.2.com', u'http://www.3.com'], u'name': u'sublinks'}
Which, I think, gets you what you need?

Categories