Formatting JSON output - python

I have a JSON file with key value pair data. My JSON file looks like this.
{
"professors": [
{
"first_name": "Richard",
"last_name": "Saykally",
"helpfullness": "3.3",
"url": "http://www.ratemyprofessors.com/ShowRatings.jsp?tid=111119",
"reviews": [
{
"attendance": "N/A",
"class": "CHEM 1A",
"textbook_use": "It's a must have",
"review_text": "Tests were incredibly difficult (averages in the 40s) and lectures were essentially useless. I attended both lectures every day and still was unable to grasp most concepts on the midterms. Scope out a good GSI to get help and ride the curve."
},
{
"attendance": "N/A",
"class": "CHEMISTRY1A",
"textbook_use": "Essential to passing",
"review_text": "Saykally really isn't as bad as everyone made him out to be. If you go to his lectures he spends about half the time blowing things up, but if you actually read the texts before his lectures and pay attention to what he's writing/saying, you'd do okay. He posts practice tests that were representative of actual tests and curves the class nicely!"
}]
{
{
"first_name": "Laura",
"last_name": "Stoker",
"helpfullness": "4.1",
"url": "http://www.ratemyprofessors.com/ShowRatings.jsp?tid=536606",
"reviews": [
{
"attendance": "N/A",
"class": "PS3",
"textbook_use": "You need it sometimes",
"review_text": "Stoker is by far the best professor. If you put in the effort, take good notes, and ask questions, you will be fine in the class. As far as her lecture, she does go a bit fast, but her lecture is in the form of an outline. As long as you take good notes, you will have everything you need for exams. She is funny and super nice if you speak with her"
},
{
"attendance": "Mandatory",
"class": "164A",
"textbook_use": "Barely cracked it open",
"review_text": "AMAZING professor. She has a good way of keeping lectures interesting. Yes, she can be a little everywhere and really quick with her lecture, but the GSI's are useful to make sure you understand the material. Oh, and did I mention she's hilarious!"
}]
}]
So I'm trying to do multiple things.
I'm trying to get the most mentioned ['class'] key under reviews. Then get the class name and the times it was mentioned.
Then I'd like to output my format in this manner. Also under professor array. It's just the info of professors for instance for CHEM 1A, CHEMISTRY1A - It's Richard Saykally.
{
courses:[
{
"course_name" : # class name
"course_mentioned_times" : # The amount of times the class was mentioned
professors:[ #The professor array should have professor that teaches this class which is in my shown json file
{
'first_name' : 'professor name'
'last_name' : 'professor last name'
}
}
So I'd like to sort my json file key-value where I have max to minimum. So far all I've been able to figure out isd
if __name__ == "__main__":
open_json = open('result.json')
load_as_json = json.load(open_json)['professors']
outer_arr = []
outer_dict = {}
for items in load_as_json:
output_dictionary = {}
all_classes = items['reviews']
for classes in all_classes:
arr_info = []
output_dictionary['class'] = classes['class']
output_dictionary['first_name'] = items['first_name']
output_dictionary['last_name'] = items['last_name']
#output_dictionary['department'] = items['department']
output_dictionary['reviews'] = classes['review_text']
with open('output_info.json','wb') as outfile:
json.dump(output_dictionary,outfile,indent=4)

I think this program does what you want:
import json
with open('result.json') as open_json:
load_as_json = json.load(open_json)
courses = {}
for professor in load_as_json['professors']:
for review in professor['reviews']:
course = courses.setdefault(review['class'], {})
course.setdefault('course_name', review['class'])
course.setdefault('course_mentioned_times', 0)
course['course_mentioned_times'] += 1
course.setdefault('professors', [])
prof_name = {
'first_name': professor['first_name'],
'last_name': professor['last_name'],
}
if prof_name not in course['professors']:
course['professors'].append(prof_name)
courses = {
'courses': sorted(courses.values(),
key=lambda x: x['course_mentioned_times'],
reverse=True)
}
with open('output_info.json', 'w') as outfile:
json.dump(courses, outfile, indent=4)
Result, using the example input in the question:
{
"courses": [
{
"professors": [
{
"first_name": "Laura",
"last_name": "Stoker"
}
],
"course_name": "PS3",
"course_mentioned_times": 1
},
{
"professors": [
{
"first_name": "Laura",
"last_name": "Stoker"
}
],
"course_name": "164A",
"course_mentioned_times": 1
},
{
"professors": [
{
"first_name": "Richard",
"last_name": "Saykally"
}
],
"course_name": "CHEM 1A",
"course_mentioned_times": 1
},
{
"professors": [
{
"first_name": "Richard",
"last_name": "Saykally"
}
],
"course_name": "CHEMISTRY1A",
"course_mentioned_times": 1
}
]
}

Related

Python - How to retrieve element from json

Aloha,
My python routine will retrieve json from site, then check the file and download another json given the first answer and eventually download a zip.
The first json file gives information about doc.
Here's an example :
[
{
"id": "d9789918772f935b2d686f523d066a7b",
"originalName": "130010259_AC2_R44_20200101",
"type": "SUP",
"status": "document.deleted",
"legalStatus": "APPROVED",
"name": "130010259_SUP_R44_AC2",
"grid": {
"name": "R44",
"title": "GRAND EST"
},
"bbox": [
3.4212881,
47.6171589,
8.1598899,
50.1338684
],
"documentSource": "UPLOAD",
"uploadDate": "2020-06-25T14:56:27+02:00",
"updateDate": "2021-01-19T14:33:35+01:00",
"fileIdentifier": "SUP-AC2-R44-130010259-20200101",
"legalControlStatus": 101
},
{
"id": "6a9013bdde6acfa632861aeb1a02942b",
"originalName": "130010259_AC2_R44_20210101",
"type": "SUP",
"status": "document.production",
"legalStatus": "APPROVED",
"name": "130010259_SUP_R44_AC2",
"grid": {
"name": "R44",
"title": "GRAND EST"
},
"bbox": [
3.4212881,
47.6171589,
8.1598899,
50.1338684
],
"documentSource": "UPLOAD",
"uploadDate": "2021-01-18T16:37:01+01:00",
"updateDate": "2021-01-19T14:33:29+01:00",
"fileIdentifier": "SUP-AC2-R44-130010259-20210101",
"legalControlStatus": 101
},
{
"id": "efd51feaf35b12248966cb82f603e403",
"originalName": "130010259_PM2_R44_20210101",
"type": "SUP",
"status": "document.production",
"legalStatus": "APPROVED",
"name": "130010259_SUP_R44_PM2",
"grid": {
"name": "R44",
"title": "GRAND EST"
},
"bbox": [
3.6535762,
47.665021,
7.9509455,
49.907347
],
"documentSource": "UPLOAD",
"uploadDate": "2021-01-28T09:52:31+01:00",
"updateDate": "2021-01-28T18:53:34+01:00",
"fileIdentifier": "SUP-PM2-R44-130010259-20210101",
"legalControlStatus": 101
},
{
"id": "2e1b6104fdc09c84077d54fd9e74a7a7",
"originalName": "444619258_I4_R44_20210211",
"type": "SUP",
"status": "document.pre_production",
"legalStatus": "APPROVED",
"name": "444619258_SUP_R44_I4",
"grid": {
"name": "R44",
"title": "GRAND EST"
},
"bbox": [
2.8698336,
47.3373246,
8.0881368,
50.3796449
],
"documentSource": "UPLOAD",
"uploadDate": "2021-04-19T10:20:20+02:00",
"updateDate": "2021-04-19T14:46:21+02:00",
"fileIdentifier": "SUP-I4-R44-444619258-20210211",
"legalControlStatus": 100
}
]
What I try to do is to retrieve "id" from this json file. (ex. "id": "2e1b6104fdc09c84077d54fd9e74a7a7",)
I've tried
import json
from jsonpath_rw import jsonpath, parse
import jsonpath_rw_ext as jp
with open('C:/temp/gpu/SUP/20210419/SUPGE.json') as f:
d = json.load(f)
data = json.dumps(d)
print("oriName: {}".format( jp.match1("$.id[*]",data) ) )
It doesn't work In fact, I'm not sure how jsonpath-rw is intended to work. Thankfully there was this blogpost But I'm still stuck.
Does anyone have a clue ?
With the id, I'll be able to download another json and in this json there'll be an archiveUrl to get the zipfile.
Thanks in advance.
import json
file = open('SUPGE.json')
with file as f:
d = json.load(f)
for i in d:
print(i.get('id'))
this will give you id only.
d9789918772f935b2d686f523d066a7b
6a9013bdde6acfa632861aeb1a02942b
efd51feaf35b12248966cb82f603e403
2e1b6104fdc09c84077d54fd9e74a7a7
Ok.
Here's what I've done.
import json
import urllib
# not sure it's the best way to load json from url, but it works fine
# and I could test most of code if needed.
def getResponse(url):
operUrl = urllib.request.urlopen(url)
if(operUrl.getcode()==200):
data = operUrl.read()
jsonData = json.loads(data)
else:
print("Erreur reçue", operUrl.getcode())
return jsonData
# Here I get the json from the url. *
# That part will be in the final script a parameter,
# because I got lot of territory to control
d = getResponse('https://www.geoportail-urbanisme.gouv.fr/api/document?documentFamily=SUP&grid=R44&legalStatus=APPROVED')
for i in d:
if i['status'] == 'document.production' :
print('id du doc en production :',i.get('id'))
# here we parse the id to fetch the whole document.
# Same server, same API but different url
_URL = 'https://www.geoportail-urbanisme.gouv.fr/api/document/' + i.get('id')+'/details'
d2 = getResponse(_URL)
print('archive',d2['archiveUrl'])
urllib.request.urlretrieve(d2['archiveUrl'], 'c:/temp/gpu/SUP/'+d2['metadata']+'.zip' )
# I used wget in the past and loved the progression bar.
# Maybe I'd switch to wget because of it.
# Works fine.
Thanks for your answer. I'm delighted to see that even with only the json library you could do amazing things. Just normal stuff. But amazing.
Feel free to comment if you think I've missed smthg.

Why does updating a dictionary remove the rest of the dictionaries from my nested array?

I have a json file with players structured as so
[
{
"Player_Name": "Rory McIlroy",
"Tournament": [
{
"Name": "Arnold Palmer Invitational presented by Mastercard",
"Points": "68.10",
"Salary": "12200.00"
},
{
"Name": "World Golf Championships-Mexico Championship",
"Points": "103.30",
"Salary": "12200.00"
},
{
"Name": "The Genesis Invitational",
"Points": "88.60",
"Salary": "12200.00"
},
{
"Name": "Farmers Insurance Open",
"Points": "107.30",
"Salary": "12200.00"
},
{
"Name": "World Golf Championships-HSBC Champions",
"Points": "138.70",
"Salary": "12400.00"
},
{
"Name": "The ZOZO Championship",
"Points": "103.40",
"Salary": "12300.00"
}
]
}]
When I run this code
import json
import numpy as np
import pandas as pd
from itertools import groupby
# using json open the player objects file and set it equal to data
with open('Active_PGA_Player_Objects.json') as json_file:
data = json.load(json_file)
with open('Players_DK.json') as json_file:
Players_DK = json.load(json_file)
results = []
for k,g in groupby(sorted(data, key=lambda x:x['Player_Name']), lambda x:x['Player_Name']):
results.append({'Player_Name':k, 'Tournament':[i['Tournament'][0] for i in g]})
for obj in results:
for x in Players_DK:
if obj['Player_Name'] == x['Name']:
obj['Average'] = x['AvgPointsPerGame']
i = 0
points_results = []
while i < len(results):
j = 0
while j < len(results[i]['Tournament']):
difference = (int(float(results[i]['Tournament'][j]['Points'])) - (results[i]['Average']))
points_results.append(round(difference,2))
j += 1
i += 1
with open('PGA_Player_Objects_w_Average.json', 'w') as my_file:
json.dump(results, my_file)
my list comes back like this
[{
"Player_Name": "Rory McIlroy",
"Tournament": [
{
"Name": "Arnold Palmer Invitational presented by Mastercard",
"Points": "68.10",
"Salary": "12200.00"
}
],
"Average": 96.19
}]
Can someone explain to me why when I update the specific dictionary it deletes all but the first value from the nested Tournament list? My goal here is to add each players average to their corresponding dictionary so that I can take each average and subtract it from each score. When I try to do this though I'm only able to perform it on the one value left in the list.
Just for what it's worth, I'd go back and really think about what each line is really doing. You're also making things harder on yourself by calling variables obj or x. Calculating the average can be done like:
for player in data: # data is poorly named, try players or players_data
player['Average'] = sum(float(tourny['Points']) for tourny in player['Tournament']) / len(player['Tournament'])
for tourny in player['Tournament']:
tourny['Difference'] = float(tourny['Points']) - float(player['Average'])
leaving you with:
{'Player_Name': 'Rory McIlroy',
'Tournament': [{
'Name': 'Arnold Palmer Invitational presented by Mastercard',
'Points': '68.10',
'Salary': '12200.00',
'Difference': -33.46666666666667},
{
'Name': 'World Golf Championships-Mexico Championship',
'Points': '103.30',
'Salary': '12200.00',
'Difference': 1.7333333333333343}, # .....etc
'Average': 101.566666666666666
}
When you use names in your code that describe what they're representing, a huge number of optimizations become immediately obvious. Give it a go!

Python: Iterate JSON and remove items with specific criteria

I am trying to filter out data from API JSON response with Python and I get weird results. I would be glad if somebody can guide me how to deal with the situation.
The main idea is to remove irrelevant data in the JSON and keep only the data that is associated with particular people which I hold in a list.
Here is a snip of the JSON file:
{
"result": [
{
"number": "Number1",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "John Doe",
"link": "https://some_link.com"
}
},
{
"number": "Number2",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-10 11:07:13",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Tyrell Greenley",
"link": "https://some_link.com"
}
},
{
"number": "Number3",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-20 10:23:35",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Delmar Vachon",
"link": "https://some_link.com"
}
},
{
"number": "Number4",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Samual Isham",
"link": "https://some_link.com"
}
}
]
}
Here is the Python code:
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
# Load JSON file
with open('extract.json', 'r') as input_file:
input_data = json.load(input_file)
# Create a function to clear the data
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in data:
print(elem['assigned_to']['display_value'] not in users)
if elem['assigned_to']['display_value'] not in users:
print('Removing {} from JSON as not present in list of names.'.format(elem['assigned_to']['display_value']))
data.remove(elem)
else:
print('Keeping the record for {} in JSON.'.format(elem['assigned_to']['display_value']))
return data
cd = clear_data(input_data['result'], users_test)
And here is the output, which seems to iterate through only 2 of the items in the file:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
Process finished with exit code 0
It seems that the problem is more or less related to the .remove() method however I don't find any other suitable solution to delete these particular items that I do not need.
Here is the output of the iteration without applying the remove() method:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Tyrell Greenley from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
False
Keeping the record for Samual Isham in JSON.
Process finished with exit code 0
Note: I have left the check for the name visible on purpose.
I would appreciate any ideas to sort out the situation.
If you don't need to log info about people you are removing you could simply try
filtered = [i for i in data['result'] if i['assigned_to']['display_value'] in users_test]
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
solution = []
for user in users_test:
print(user)
for value in data['result']:
if user == value['assigned_to']['display_value']:
solution.append(value)
print(solution)
for more efficient code, as asked by #NomadMonad
solution = list(filter(lambda x: x['assigned_to']['display_value'] in users_test, data['result']))
You are modifying a dictionary while at the same time iterating through it. Check out this blog post which describes this behavior.
A safer way to do this is to make a copy of your dictionary to iterate over, and to delete from your original dictionary:
import copy
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in copy.deepcopy(data): # deepcopy handles nested dicts
# Still call data.remove() in here

flask-ask slot is always being mapped to None

My slot for a custom intent is always being recognised as None.
I have an intents schema which looks like this:
{
"interactionModel": {
"languageModel": {
"invocationName": "name_of_app",
"intents": [
{
"name": "AMAZON.CancelIntent",
"samples": []
},
{
"name": "AMAZON.HelpIntent",
"samples": []
},
{
"name": "AMAZON.StopIntent",
"samples": []
},
{
"name": "EventsIntent",
"slots": [
{
"name": "eventCity",
"type": "AMAZON.GB_CITY"
}
],
"samples": [
"whats on in {eventCity}",
"whats going on in {eventCity} ",
"tell me what events are in {eventCity}"
]
}
],
"types": []
}
}
}
My code is in python, using the flask-ask framework. My main entrypoint looks something like this:
#ask.launch
def start_skill():
welcome_message = 'Welcome to name_of_app, what do you want?'
return question(welcome_message)
#ask.intent('EventsIntent', mapping={'city': 'eventCity'})
def weather(city):
return statement('you have selected {}'.format(city))
Even this simple example however, does not work when the input is:
"whats on in London?"
I have tried with lowercase and with/without punctuation in the testing panel on the amazon developer console, however the return is always:
"you have selected None"
indicating that None is passed as 'eventCity'. Am I passing this slot incorrectly in either the intent schema or in the code?
Thanks
hey buddy following is the solution of your problem:
#ask.launch
def start_skill():
welcome_message = 'Welcome to name_of_app, what do you want?'
return question(welcome_message)
#ask.intent('EventsIntent', convert={'eventCity': str})
def weather(eventCity):
return statement('you have selected {}'.format(eventCity))
I was also facing similar issue this thing resolved my issue.

Grab element from json dump

I'm using the following python code to connect to a jsonrpc server and nick some song information. However, I can't work out how to get the current title in to a variable to print elsewhere. Here is the code:
TracksInfo = []
for song in playingSongs:
data = { "id":1,
"method":"slim.request",
"params":[ "",
["songinfo",0,100, "track_id:%s" % song, "tags:GPASIediqtymkovrfijnCYXRTIuwxN"]
]
}
params = json.dumps(data, sort_keys=True, indent=4)
conn.request("POST", "/jsonrpc.js", params)
httpResponse = conn.getresponse()
data = httpResponse.read()
responce = json.loads(data)
print json.dumps(responce, sort_keys=True, indent=4)
TrackInfo = responce['result']["songinfo_loop"][0]
TracksInfo.append(TrackInfo)
This brings me back the data in json format and the print json.dump brings back:
pi#raspberrypi ~/pithon $ sudo python tom3.py
{
"id": 1,
"method": "slim.request",
"params": [
"",
[
"songinfo",
"0",
100,
"track_id:-140501481178464",
"tags:GPASIediqtymkovrfijnCYXRTIuwxN"
]
],
"result": {
"songinfo_loop": [
{
"id": "-140501481178464"
},
{
"title": "Witchcraft"
},
{
"artist": "Pendulum"
},
{
"duration": "253"
},
{
"tracknum": "1"
},
{
"type": "Ogg Vorbis (Spotify)"
},
{
"bitrate": "320k VBR"
},
{
"coverart": "0"
},
{
"url": "spotify:track:2A7ZZ1tjaluKYMlT3ItSfN"
},
{
"remote": 1
}
]
}
}
What i'm trying to get is result.songinfoloop.title (but I tried that!)
The songinfo_loop structure is.. peculiar. It is a list of dictionaries each with just one key.
Loop through it until you have one with a title:
TrackInfo = next(d['title'] for d in responce['result']["songinfo_loop"] if 'title' in d)
TracksInfo.append(TrackInfo)
A better option would be to 'collapse' all those dictionaries into one:
songinfo = reduce(lambda d, p: d.update(p) or d,
responce['result']["songinfo_loop"], {})
TracksInfo.append(songinfo['title'])
songinfo_loop is a list not a dict. That means you need to call it by position, or loop through it and find the dict with a key value of "title"
positional:
responce["result"]["songinfo_loop"][1]["title"]
loop:
for info in responce["result"]["songinfo_loop"]:
if "title" in info.keys():
print info["title"]
break
else:
print "no song title found"
Really, it seems like you would want to have the songinfo_loop be a dict, not a list. But if you need to leave it as a list, this is how you would pull the title.
The result is really a standard python dict, so you can use
responce["result"]["songinfoloop"]["title"]
which should work

Categories