Python: Iterate JSON and remove items with specific criteria - python

I am trying to filter out data from API JSON response with Python and I get weird results. I would be glad if somebody can guide me how to deal with the situation.
The main idea is to remove irrelevant data in the JSON and keep only the data that is associated with particular people which I hold in a list.
Here is a snip of the JSON file:
{
"result": [
{
"number": "Number1",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "John Doe",
"link": "https://some_link.com"
}
},
{
"number": "Number2",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-10 11:07:13",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Tyrell Greenley",
"link": "https://some_link.com"
}
},
{
"number": "Number3",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-20 10:23:35",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Delmar Vachon",
"link": "https://some_link.com"
}
},
{
"number": "Number4",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Samual Isham",
"link": "https://some_link.com"
}
}
]
}
Here is the Python code:
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
# Load JSON file
with open('extract.json', 'r') as input_file:
input_data = json.load(input_file)
# Create a function to clear the data
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in data:
print(elem['assigned_to']['display_value'] not in users)
if elem['assigned_to']['display_value'] not in users:
print('Removing {} from JSON as not present in list of names.'.format(elem['assigned_to']['display_value']))
data.remove(elem)
else:
print('Keeping the record for {} in JSON.'.format(elem['assigned_to']['display_value']))
return data
cd = clear_data(input_data['result'], users_test)
And here is the output, which seems to iterate through only 2 of the items in the file:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
Process finished with exit code 0
It seems that the problem is more or less related to the .remove() method however I don't find any other suitable solution to delete these particular items that I do not need.
Here is the output of the iteration without applying the remove() method:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Tyrell Greenley from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
False
Keeping the record for Samual Isham in JSON.
Process finished with exit code 0
Note: I have left the check for the name visible on purpose.
I would appreciate any ideas to sort out the situation.

If you don't need to log info about people you are removing you could simply try
filtered = [i for i in data['result'] if i['assigned_to']['display_value'] in users_test]

users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
solution = []
for user in users_test:
print(user)
for value in data['result']:
if user == value['assigned_to']['display_value']:
solution.append(value)
print(solution)
for more efficient code, as asked by #NomadMonad
solution = list(filter(lambda x: x['assigned_to']['display_value'] in users_test, data['result']))

You are modifying a dictionary while at the same time iterating through it. Check out this blog post which describes this behavior.
A safer way to do this is to make a copy of your dictionary to iterate over, and to delete from your original dictionary:
import copy
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in copy.deepcopy(data): # deepcopy handles nested dicts
# Still call data.remove() in here

Related

Improperly formatted json? [duplicate]

This question already has answers here:
Python list of dictionaries search
(24 answers)
Closed last month.
First, I am new to Python and working with JSON.
I am trying to extract just one value from an API request response, and I am having a difficult time parsing out the data I need.
I have done a lot of searching on how to do this, but most all the examples use a string or file that is formatted is much more basic than what I am getting.
I understand the key - value pair concept but I am unsure how to reference the key-value I want. I think it has something to do with the response having multiple objects having the same kay names. Or maybe the first line "Bookmark" is making things goofy.
The value I want is for the model name in the response example below.
That's all I need from this. Any help would be greatly appreciated.
{
"Bookmark": "<B><P><p>SerNum</p><p>Item</p></P><D><f>false</f><f>false</f></D><F><v>1101666</v><v>ADDMASTER IJ7102-23E</v></F><L><v>123456</v><v>Model Name</v></L></B>",
"Items": [
[
{
"Name": "SerNum",
"Value": "123456"
},
{
"Name": "Item",
"Value": "Model Name"
},
{
"Name": "_ItemId",
"Value": "PBT=[unit] unt.DT=[2021-07-28 08:20:33.513] unt.ID=[eae2621d-3e9f-4515-9763-55e67f65fae6]"
}
]
],
"Message": "Success",
"MessageCode": 0
}
If you want to find value of dictionary with key 'Name' and value 'Item' you can do:
import json
with open('your_data.json', 'r') as f_in:
data = json.load(f_in)
model_name = next((i['Value'] for lst in data['Items'] for i in lst if i['Name'] == 'Item'), 'Model name not found.')
print(model_name)
Prints:
Model Name
Note: if the dictionary is not found string 'Model name not found.' is returned
First, load the JSON into a python dict:
import json
x = '''{
"Bookmark": "<B><P><p>SerNum</p><p>Item</p></P><D><f>false</f><f>false</f></D><F><v>1101666</v><v>ADDMASTER IJ7102-23E</v></F><L><v>123456</v><v>Model Name</v></L></B>",
"Items": [
[
{
"Name": "SerNum",
"Value": "123456"
},
{
"Name": "Item",
"Value": "Model Name"
},
{
"Name": "_ItemId",
"Value": "PBT=[unit] unt.DT=[2021-07-28 08:20:33.513] unt.ID=[eae2621d-3e9f-4515-9763-55e67f65fae6]"
}
]
],
"Message": "Success",
"MessageCode": 0
}'''
# parse x:
y = json.loads(x)
# The result is a Python dictionary.
Now if you want the value 'Model Name', you would do:
print(y['Items'][0][1]['Value'])

How to parse nested JSON object?

I am working on a new project in HubSpot that returns nested JSON like the sample below. I am trying to access the associated contacts id, but am struggling to reference it correctly (the id I am looking for is the value '201' in the example below). I've put together this script, but this script only returns the entire associations portion of the JSON and I only want the id. How do I reference the id correctly?
Here is the output from the script:
{'contacts': {'paging': None, 'results': [{'id': '201', 'type': 'ticket_to_contact'}]}}
And here is the script I put together:
import hubspot
from pprint import pprint
client = hubspot.Client.create(api_key="API_KEY")
try:
api_response = client.crm.tickets.basic_api.get_page(limit=2, associations=["contacts"], archived=False)
for x in range(2):
pprint(api_response.results[x].associations)
except ApiException as e:
print("Exception when calling basic_api->get_page: %s\n" % e)
Here is what the full JSON looks like ('contacts' property shortened for readability):
{
"results": [
{
"id": "34018123",
"properties": {
"content": "Hi xxxxx,\r\n\r\nCan you clarify on how the blocking of script happens? Is it because of any CSP (or) the script will decide run time for every URL’s getting triggered from browser?\r\n\r\nRegards,\r\nLogan",
"createdate": "2019-07-03T04:20:12.366Z",
"hs_lastmodifieddate": "2020-12-09T01:16:12.974Z",
"hs_object_id": "34018123",
"hs_pipeline": "0",
"hs_pipeline_stage": "4",
"hs_ticket_category": null,
"hs_ticket_priority": null,
"subject": "RE: call followup"
},
"createdAt": "2019-07-03T04:20:12.366Z",
"updatedAt": "2020-12-09T01:16:12.974Z",
"archived": false
},
{
"id": "34018892",
"properties": {
"content": "Hi Guys,\r\n\r\nI see that we were placed back on the staging and then removed again.",
"createdate": "2019-07-03T07:59:10.606Z",
"hs_lastmodifieddate": "2021-12-17T09:04:46.316Z",
"hs_object_id": "34018892",
"hs_pipeline": "0",
"hs_pipeline_stage": "3",
"hs_ticket_category": null,
"hs_ticket_priority": null,
"subject": "Re: Issue due to server"
},
"createdAt": "2019-07-03T07:59:10.606Z",
"updatedAt": "2021-12-17T09:04:46.316Z",
"archived": false,
"associations": {
"contacts": {
"results": [
{
"id": "201",
"type": "ticket_to_contact"
}
]
}
}
}
],
"paging": {
"next": {
"after": "35406270",
"link": "https://api.hubapi.com/crm/v3/objects/tickets?associations=contacts&archived=false&hs_static_app=developer-docs-ui&limit=2&after=35406270&hs_static_app_version=1.3488"
}
}
}
You can do api_response.results[x].associations["contacts"]["results"][0]["id"].
Sorted this out, posting in case anyone else is struggling with the response from the HubSpot v3 Api. The response schema for this call is:
Response schema type: Object
String results[].id
Object results[].properties
String results[].createdAt
String results[].updatedAt
Boolean results[].archived
String results[].archivedAt
Object results[].associations
Object paging
Object paging.next
String paging.next.after
String paging.next.linkResponse schema type: Object
String results[].id
Object results[].properties
String results[].createdAt
String results[].updatedAt
Boolean results[].archived
String results[].archivedAt
Object results[].associations
Object paging
Object paging.next
String paging.next.after
String paging.next.link
So to access the id of the contact associated with the ticket, you need to reference it using this notation:
api_response.results[1].associations["contacts"].results[0].id
notes:
results[x] - reference the result in the index
associations["contacts"] -
associations is a dictionary object, you can access the contacts item
by it's name
associations["contacts"].results is a list - reference
by the index []
id - is a string
In my case type was ModelProperty or CollectionResponseProperty couldn't reach dict anyhow.
For the record this got me to go through the results.
for result in list(api_response.results):
ID = result.id

Convert multiple string stored in a variable into a single list in python

I hope everyone is doing well.
I need a little help where I need to get all the strings from a variable and need to store into a single list in python.
For example -
I have json file from where I am getting ids and all the ids are getting stored into a variable called id as below when I run print(id)
17298626-991c-e490-bae6-47079c6e2202
17298496-19bd-2f89-7b5f-881921abc632
17298698-3e17-7a9b-b337-aacfd9483b1b
172986ac-d91d-c4ea-2e50-d53700480dd0
172986d0-18aa-6f51-9c62-6cb087ad31e5
172986f4-80f0-5c21-3aee-12f22a5f4322
17298712-a4ac-7b36-08e9-8512fa8322dd
17298747-8cc6-d9d0-8d05-50adf228c029
1729875c-050f-9a99-4850-bb0e6ad35fb0
1729875f-0d50-dc94-5515-b4891c40d81c
17298761-c26b-3ce5-e77e-db412c38a5b4
172987c8-2b5d-0d94-c365-e8407b0a8860
1729881a-e583-2b54-3a52-d092020d9c1d
1729881c-64a2-67cf-d561-6e5e38ed14cb
172987ec-7a20-7eb6-3ebe-a9fb621bb566
17298813-7ac4-258b-d6f9-aaf43f9147b1
17298813-f1ef-d28a-0817-5f3b86c3cf23
17298828-b62b-9ee6-248b-521b0663226e
17298825-7449-2fcb-378e-13671cb4688a
I want these all values to be stored into a single list.
Can some please help me out with this.
Below is the code I am using:
import json
with open('requests.json') as f:
data = json.load(f)
print(type(data))
for i in data:
if 'traceId' in i:
id = i['traceId']
newid = id.split()
#print(type(newid))
print(newid)
And below is my json file looks like:
[
{
"id": "376287298-hjd8-jfjb-khkf-6479280283e9",
"submittedTime": 1591692502558,
"traceId": "17298626-991c-e490-bae6-47079c6e2202",
"userName": "ABC",
"onlyChanged": true,
"description": "Not Required",
"startTime": 1591694487929,
"result": "NONE",
"state": "EXECUTING",
"paused": false,
"application": {
"id": "16b22a09-a840-f4d9-f42a-64fd73fece57",
"name": "XYZ"
},
"applicationProcess": {
"id": "dihihdosfj9279278yrie8ue",
"name": "Deploy",
"version": 12
},
"environment": {
"id": "fkjdshkjdshglkjdshgldshldsh03r937837",
"name": "DEV"
},
"snapshot": {
"id": "djnglkfdglki98478yhgjh48yr844h",
"name": "DEV_snapshot"
},
},
{
"id": "17298495-f060-3e9d-7097-1f86d5160789",
"submittedTime": 1591692844597,
"traceId": "17298496-19bd-2f89-7b5f-881921abc632",
"userName": "UYT,
"onlyChanged": true,
"startTime": 1591692845543,
"result": "NONE",
"state": "EXECUTING",
"paused": false,
"application": {
"id": "osfodsho883793hgjbv98r3098w",
"name": "QA"
},
"applicationProcess": {
"id": "owjfoew028r2uoieroiehojehfoef",
"name": "EDC",
"version": 5
},
"environment": {
"id": "16cf69c5-4194-e557-707d-0663afdbceba",
"name": "DTESTU"
},
}
]
From where I am trying to get the traceId.
you could use simple split method like the follwing:
ids = '''17298626-991c-e490-bae6-47079c6e2202 17298496-19bd-2f89-7b5f-881921abc632 17298698-3e17-7a9b-b337-aacfd9483b1b 172986ac-d91d-c4ea-2e50-d53700480dd0 172986d0-18aa-6f51-9c62-6cb087ad31e5 172986f4-80f0-5c21-3aee-12f22a5f4322 17298712-a4ac-7b36-08e9-8512fa8322dd 17298747-8cc6-d9d0-8d05-50adf228c029 1729875c-050f-9a99-4850-bb0e6ad35fb0 1729875f-0d50-dc94-5515-b4891c40d81c 17298761-c26b-3ce5-e77e-db412c38a5b4 172987c8-2b5d-0d94-c365-e8407b0a8860 1729881a-e583-2b54-3a52-d092020d9c1d 1729881c-64a2-67cf-d561-6e5e38ed14cb 172987ec-7a20-7eb6-3ebe-a9fb621bb566 17298813-7ac4-258b-d6f9-aaf43f9147b1 17298813-f1ef-d28a-0817-5f3b86c3cf23 17298828-b62b-9ee6-248b-521b0663226e 17298825-7449-2fcb-378e-13671cb4688a'''
l = ids.split(" ")
print(l)
This will give the following result, I assumed that the separator needed is simple space you can adjust properly:
['17298626-991c-e490-bae6-47079c6e2202', '17298496-19bd-2f89-7b5f-881921abc632', '17298698-3e17-7a9b-b337-aacfd9483b1b', '172986ac-d91d-c4ea-2e50-d53700480dd0', '172986d0-18aa-6f51-9c62-6cb087ad31e5', '172986f4-80f0-5c21-3aee-12f22a5f4322', '17298712-a4ac-7b36-08e9-8512fa8322dd', '17298747-8cc6-d9d0-8d05-50adf228c029', '1729875c-050f-9a99-4850-bb0e6ad35fb0', '1729875f-0d50-dc94-5515-b4891c40d81c', '17298761-c26b-3ce5-e77e-db412c38a5b4', '172987c8-2b5d-0d94-c365-e8407b0a8860', '1729881a-e583-2b54-3a52-d092020d9c1d', '1729881c-64a2-67cf-d561-6e5e38ed14cb', '172987ec-7a20-7eb6-3ebe-a9fb621bb566', '17298813-7ac4-258b-d6f9-aaf43f9147b1', '17298813-f1ef-d28a-0817-5f3b86c3cf23', '17298828-b62b-9ee6-248b-521b0663226e', '17298825-7449-2fcb-378e-13671cb4688a']
Edit
You get list of lists because each iteration you read only 1 id, so what you need to do is to initiate an empty list and append each id to it in the following way:
l = []
for i in data
if 'traceId' in i:
id = i['traceId']
l.append(id)
you can append the ids variable to the list such as,
#list declaration
l1=[]
#this must be in your loop
l1.append(ids)
I'm assuming you get the id as a str type value. Using id.split() will return a list of all ids in one single Python list, as each id is separated by space here in your example.
id = """17298626-991c-e490-bae6-47079c6e2202 17298496-19bd-2f89-7b5f-881921abc632
17298698-3e17-7a9b-b337-aacfd9483b1b 172986ac-d91d-c4ea-2e50-d53700480dd0
172986d0-18aa-6f51-9c62-6cb087ad31e5 172986f4-80f0-5c21-3aee-12f22a5f4322
17298712-a4ac-7b36-08e9-8512fa8322dd 17298747-8cc6-d9d0-8d05-50adf228c029
1729875c-050f-9a99-4850-bb0e6ad35fb0 1729875f-0d50-dc94-5515-b4891c40d81c
17298761-c26b-3ce5-e77e-db412c38a5b4 172987c8-2b5d-0d94-c365-e8407b0a8860
1729881a-e583-2b54-3a52-d092020d9c1d 1729881c-64a2-67cf-d561-6e5e38ed14cb
172987ec-7a20-7eb6-3ebe-a9fb621bb566 17298813-7ac4-258b-d6f9-aaf43f9147b1
17298813-f1ef-d28a-0817-5f3b86c3cf23 17298828-b62b-9ee6-248b-521b0663226e
17298825-7449-2fcb-378e-13671cb4688a"""
id_list = id.split()
print(id_list)
Output:
['17298626-991c-e490-bae6-47079c6e2202', '17298496-19bd-2f89-7b5f-881921abc632',
'17298698-3e17-7a9b-b337-aacfd9483b1b', '172986ac-d91d-c4ea-2e50-d53700480dd0',
'172986d0-18aa-6f51-9c62-6cb087ad31e5', '172986f4-80f0-5c21-3aee-12f22a5f4322',
'17298712-a4ac-7b36-08e9-8512fa8322dd', '17298747-8cc6-d9d0-8d05-50adf228c029',
'1729875c-050f-9a99-4850-bb0e6ad35fb0', '1729875f-0d50-dc94-5515-b4891c40d81c',
'17298761-c26b-3ce5-e77e-db412c38a5b4', '172987c8-2b5d-0d94-c365-e8407b0a8860',
'1729881a-e583-2b54-3a52-d092020d9c1d', '1729881c-64a2-67cf-d561-6e5e38ed14cb',
'172987ec-7a20-7eb6-3ebe-a9fb621bb566', '17298813-7ac4-258b-d6f9-aaf43f9147b1',
'17298813-f1ef-d28a-0817-5f3b86c3cf23', '17298828-b62b-9ee6-248b-521b0663226e',
'17298825-7449-2fcb-378e-13671cb4688a']
split() splits by default with space as a separator. You can use the sep argument to use any other separator if needed.

Formatting JSON output

I have a JSON file with key value pair data. My JSON file looks like this.
{
"professors": [
{
"first_name": "Richard",
"last_name": "Saykally",
"helpfullness": "3.3",
"url": "http://www.ratemyprofessors.com/ShowRatings.jsp?tid=111119",
"reviews": [
{
"attendance": "N/A",
"class": "CHEM 1A",
"textbook_use": "It's a must have",
"review_text": "Tests were incredibly difficult (averages in the 40s) and lectures were essentially useless. I attended both lectures every day and still was unable to grasp most concepts on the midterms. Scope out a good GSI to get help and ride the curve."
},
{
"attendance": "N/A",
"class": "CHEMISTRY1A",
"textbook_use": "Essential to passing",
"review_text": "Saykally really isn't as bad as everyone made him out to be. If you go to his lectures he spends about half the time blowing things up, but if you actually read the texts before his lectures and pay attention to what he's writing/saying, you'd do okay. He posts practice tests that were representative of actual tests and curves the class nicely!"
}]
{
{
"first_name": "Laura",
"last_name": "Stoker",
"helpfullness": "4.1",
"url": "http://www.ratemyprofessors.com/ShowRatings.jsp?tid=536606",
"reviews": [
{
"attendance": "N/A",
"class": "PS3",
"textbook_use": "You need it sometimes",
"review_text": "Stoker is by far the best professor. If you put in the effort, take good notes, and ask questions, you will be fine in the class. As far as her lecture, she does go a bit fast, but her lecture is in the form of an outline. As long as you take good notes, you will have everything you need for exams. She is funny and super nice if you speak with her"
},
{
"attendance": "Mandatory",
"class": "164A",
"textbook_use": "Barely cracked it open",
"review_text": "AMAZING professor. She has a good way of keeping lectures interesting. Yes, she can be a little everywhere and really quick with her lecture, but the GSI's are useful to make sure you understand the material. Oh, and did I mention she's hilarious!"
}]
}]
So I'm trying to do multiple things.
I'm trying to get the most mentioned ['class'] key under reviews. Then get the class name and the times it was mentioned.
Then I'd like to output my format in this manner. Also under professor array. It's just the info of professors for instance for CHEM 1A, CHEMISTRY1A - It's Richard Saykally.
{
courses:[
{
"course_name" : # class name
"course_mentioned_times" : # The amount of times the class was mentioned
professors:[ #The professor array should have professor that teaches this class which is in my shown json file
{
'first_name' : 'professor name'
'last_name' : 'professor last name'
}
}
So I'd like to sort my json file key-value where I have max to minimum. So far all I've been able to figure out isd
if __name__ == "__main__":
open_json = open('result.json')
load_as_json = json.load(open_json)['professors']
outer_arr = []
outer_dict = {}
for items in load_as_json:
output_dictionary = {}
all_classes = items['reviews']
for classes in all_classes:
arr_info = []
output_dictionary['class'] = classes['class']
output_dictionary['first_name'] = items['first_name']
output_dictionary['last_name'] = items['last_name']
#output_dictionary['department'] = items['department']
output_dictionary['reviews'] = classes['review_text']
with open('output_info.json','wb') as outfile:
json.dump(output_dictionary,outfile,indent=4)
I think this program does what you want:
import json
with open('result.json') as open_json:
load_as_json = json.load(open_json)
courses = {}
for professor in load_as_json['professors']:
for review in professor['reviews']:
course = courses.setdefault(review['class'], {})
course.setdefault('course_name', review['class'])
course.setdefault('course_mentioned_times', 0)
course['course_mentioned_times'] += 1
course.setdefault('professors', [])
prof_name = {
'first_name': professor['first_name'],
'last_name': professor['last_name'],
}
if prof_name not in course['professors']:
course['professors'].append(prof_name)
courses = {
'courses': sorted(courses.values(),
key=lambda x: x['course_mentioned_times'],
reverse=True)
}
with open('output_info.json', 'w') as outfile:
json.dump(courses, outfile, indent=4)
Result, using the example input in the question:
{
"courses": [
{
"professors": [
{
"first_name": "Laura",
"last_name": "Stoker"
}
],
"course_name": "PS3",
"course_mentioned_times": 1
},
{
"professors": [
{
"first_name": "Laura",
"last_name": "Stoker"
}
],
"course_name": "164A",
"course_mentioned_times": 1
},
{
"professors": [
{
"first_name": "Richard",
"last_name": "Saykally"
}
],
"course_name": "CHEM 1A",
"course_mentioned_times": 1
},
{
"professors": [
{
"first_name": "Richard",
"last_name": "Saykally"
}
],
"course_name": "CHEMISTRY1A",
"course_mentioned_times": 1
}
]
}

Parsing muilti dimensional Json array to Python

I'm in over my head, trying to parse JSON for my first time and dealing with a multi dimensional array.
{
"secret": "[Hidden]",
"minutes": 20,
"link": "http:\/\/www.1.com",
"bookmark_collection": {
"free_link": {
"name": "#free_link#",
"bookmarks": [
{
"name": "1",
"link": "http:\/\/www.1.com"
},
{
"name": "2",
"link": "http:\/\/2.dk"
},
{
"name": "3",
"link": "http:\/\/www.3.in"
}
]
},
"boarding_pass": {
"name": "Boarding Pass",
"bookmarks": [
{
"name": "1",
"link": "http:\/\/www.1.com\/"
},
{
"name": "2",
"link": "http:\/\/www.2.com\/"
},
{
"name": "3",
"link": "http:\/\/www.3.hk"
}
]
},
"sublinks": {
"name": "sublinks",
"link": [
"http:\/\/www.1.com",
"http:\/\/www.2.com",
"http:\/\/www.3.com"
]
}
}
}
This is divided into 3 parts, the static data on my first dimension (secret, minutes, link) Which i need to get as seperate strings.
Then I need a dictionary per "bookmark collection" which does not have fixed names, so I need the name of them and the links/names of each bookmark.
Then there is the seperate sublinks which is always the same, where I need all the links in a seperate dictionary.
I'm reading about parsing JSON but most of the stuff I find is a simple array put into 1 dictionary.
Does anyone have any good techniques to do this ?
After you parse the JSON, you will end up with a Python dict. So, suppose the above JSON is in a string named input_data:
import json
# This converts from JSON to a python dict
parsed_input = json.loads(input_data)
# Now, all of your static variables are referenceable as keys:
secret = parsed_input['secret']
minutes = parsed_input['minutes']
link = parsed_input['link']
# Plus, you can get your bookmark collection as:
bookmark_collection = parsed_input['bookmark_collection']
# Print a list of names of the bookmark collections...
print bookmark_collection.keys() # Note this contains sublinks, so remove it if needed
# Get the name of the Boarding Pass bookmark:
print bookmark_collection['boarding_pass']['name']
# Print out a list of all bookmark links as:
# Boarding Pass
# * 1: http://www.1.com/
# * 2: http://www.2.com/
# ...
for bookmark_definition in bookmark_collection.values():
# Skip sublinks...
if bookmark_definition['name'] == 'sublinks':
continue
print bookmark_definition['name']
for bookmark in bookmark_definition['bookmarks']:
print " * %(name)s: %(link)s" % bookmark
# Get the sublink definition:
sublinks = parsed_input['bookmark_collection']['sublinks']
# .. and print them
print sublinks['name']
for link in sublinks['link']:
print ' *', link
Hmm, doesn't json.loads do the trick?
For example, if your data is in a file,
import json
text = open('/tmp/mydata.json').read()
d = json.loads(text)
# first level fields
print d['minutes'] # or 'secret' or 'link'
# the names of each of bookmark_collections's items
print d['bookmark_collection'].keys()
# the sublinks section, as a dict
print d['bookmark_collection']['sublinks']
The output of this code (given your sample input above) is:
20
[u'sublinks', u'free_link', u'boarding_pass']
{u'link': [u'http://www.1.com', u'http://www.2.com', u'http://www.3.com'], u'name': u'sublinks'}
Which, I think, gets you what you need?

Categories