merge dicts that have the same value for specific key - python

I need to combine dictionaries that have the same value for the key "tag".
Like from this:
[
[
{
"tag": "#2C00L02RU",
"stamina": 233
},
{
"tag": "#8YG8RJV90",
"stamina": 20
},
{
"tag": "#LQV2JCPR",
"stamina": 154
},
{
"tag": "#9JQLPGLJJ",
"stamina": 134
}
],
[
{
"tag": "#2C00L02RU",
"health": 200
},
{
"tag": "#8YG8RJV90",
"health": 40
},
{
"tag": "#LQV2JCPR",
"health": 100
},
{
"tag": "#9JQLPGLJJ",
"health": 240
}
],
[
{
"tag": "#LQV2JCPR",
"fame": 1
},
{
"tag": "#8YG8RJV90",
"fame": 2
},
{
"tag": "#9JQLPGLJJ",
"fame": 3
},
{
"tag": "#2C00L02RU",
"fame": 4
}
],
[
{
"tag": "#LQV2JCPR",
"moves": 6
},
{
"tag": "#8YG8RJV90",
"moves": 0
},
{
"tag": "#9JQLPGLJJ",
"moves": 8
},
{
"tag": "#2C00L02RU",
"moves": 4
}
]
]
to this:
[
{
"tag": "#2C00L02RU",
"stamina": 233,
"health": 200,
"fame": 4,
"moves": 4
},
{
"tag": "#8YG8RJV90",
"stamina": 20,
"health": 40,
"fame": 2,
"moves": 2
},
{
"tag": "#LQV2JCPR",
"stamina": 154,
"health": 100,
"fame": 1,
"moves": 6
},
{
"tag": "#9JQLPGLJJ",
"stamina": 134,
"health": 240,
"fame": 3,
"moves": 8
}
]
I've already tried iterating through countless loops, but only got failures.
I won't show any of my attempts here because they didn't even come close to the expected result.
If you need any other information, just let me know.

If lst is list from your question, you can do:
out = {}
for l in lst:
for d in l:
out.setdefault(d["tag"], {}).update(d)
print(list(out.values()))
Prints:
[
{"tag": "#2C00L02RU", "stamina": 233, "health": 200, "fame": 4, "moves": 4},
{"tag": "#8YG8RJV90", "stamina": 20, "health": 40, "fame": 2, "moves": 0},
{"tag": "#LQV2JCPR", "stamina": 154, "health": 100, "fame": 1, "moves": 6},
{"tag": "#9JQLPGLJJ", "stamina": 134, "health": 240, "fame": 3, "moves": 8},
]

Related

How to split complex JSON file into multiple files by Python

I am currently splitting Json file.
The structure of JSON file is like this :
{
"id": 2131424,
"file": "video_2131424_1938263.mp4",
"metadata": {
"width": 3840,
"height": 2160,
"duration": 312.83,
"fps": 30,
"frames": 9385,
"created": "Sun Jan 17 17:48:52 2021"
},
"frames": [
{
"number": 207,
"image": "frame_207.jpg",
"annotations": [
{
"label": {
"x": 730,
"y": 130,
"width": 62,
"height": 152
},
"category": {
"code": "child",
"attributes": [
{
"code": "global_id",
"value": "7148"
}
]
}
},
{
"label": {
"x": 815,
"y": 81,
"width": 106,
"height": 197
},
"category": {
"code": "person",
"attributes": []
}
}
]
},
{
"number": 221,
"image": "frame_221.jpg",
"annotations": [
{
"label": {
"x": 730,
"y": 130,
"width": 64,
"height": 160
},
"category": {
"code": "child",
"attributes": [
{
"code": "global_id",
"value": "7148"
}
]
}
},
{
"label": {
"x": 819,
"y": 82,
"width": 106,
"height": 200
},
"category": {
"code": "person",
"attributes": []
}
}
]
},
{
"number": 236,
"image": "frame_236.jpg",
"annotations": [
{
"label": {
"x": 731,
"y": 135,
"width": 74,
"height": 160
},
"category": {
"code": "child",
"attributes": [
{
"code": "global_id",
"value": "7148"
}
]
}
},
{
"label": {
"x": 821,
"y": 83,
"width": 106,
"height": 206
},
"category": {
"code": "person",
"attributes": []
}
}
]
},
I have to extract [x, y, width, height] from each label.
I tried some code like this:
file = json.load(open('annotation_2131424.json'))
file['frames'][i]['annotations'][j]['label']['x']
But I cannot split JSON.
I tried like this but I cannot run...
I hope I've understood your question right. To get x, y, width, height from each label (dct is your dictionary from the question):
out = [
[
[
a["label"]["x"],
a["label"]["y"],
a["label"]["width"],
a["label"]["height"],
]
for a in frame["annotations"]
]
for frame in dct["frames"]
]
print(out)
Prints:
[
[[730, 130, 62, 152], [815, 81, 106, 197]],
[[730, 130, 64, 160], [819, 82, 106, 200]],
[[731, 135, 74, 160], [821, 83, 106, 206]],
]

fetching multiple vales and keys from dict

movies={
'actors':{'prabhas':{'knownAs':'Darling', 'awards':{'nandi':1, 'cinemaa':1, 'siima':1},'remuneration':100, 'hits':{'industry':2, 'super':3,'flops':8}, 'age':41, 'height':6.1, 'mStatus':'single','sRate':'35%'},
'pavan':{'knownAs':'Power Star', 'awards':{'nandi':2, 'cinemaa':2, 'siima':5}, 'hits':{'industry':2, 'super':7,'flops':16}, 'age':48, 'height':5.9, 'mStatus':'married','sRate':'37%','remuneration':50},
},
'actress':{
'tamanna':{'knownAs':'Milky Beauty', 'awards':{'nandi':0, 'cinemaa':1, 'siima':1}, 'remuneration':10, 'hits':{'industry':1, 'super':7,'flops':11}, 'age':28, 'height':5.9, 'mStatus':'single', 'sRate':'40%'},
'rashmika':{'knownAs':'Butter Milky Beauty', 'awards':{'nandi':0, 'cinemaa':0, 'siima':2}, 'remuneration':12,'hits':{'industry':0, 'super':4,'flops':2}, 'age':36, 'height':5.9, 'mStatus':'single', 'sRate':'30%'},
1.What are the total number of Nandi Awards won by actors?
2. What is the success rate of Prince?
3.What is the name of Prince?
you can answer the first question with this:
import jmespath
movies={
"actors": {
"prabhas": {
"knownAs": "Darling",
"awards": {
"nandi": 1,
"cinemaa": 1,
"siima": 1
},
"remuneration": 100,
"hits": {
"industry": 2,
"super": 3,
"flops": 8
},
"age": 41,
"height": 6.1,
"mStatus": "single",
"sRate": "35%"
},
"pavan": {
"knownAs": "Power Star",
"awards": {
"nandi": 2,
"cinemaa": 2,
"siima": 5
},
"hits": {
"industry": 2,
"super": 7,
"flops": 16
},
"age": 48,
"height": 5.9,
"mStatus": "married",
"sRate": "37%",
"remuneration": 50
}
},
"actress": {
"tamanna": {
"knownAs": "Milky Beauty",
"awards": {
"nandi": 0,
"cinemaa": 1,
"siima": 1
},
"remuneration": 10,
"hits": {
"industry": 1,
"super": 7,
"flops": 11
},
"age": 28,
"height": 5.9,
"mStatus": "single",
"sRate": "40%"
},
"rashmika": {
"knownAs": "Butter Milky Beauty",
"awards": {
"nandi": 0,
"cinemaa": 0,
"siima": 2
},
"remuneration": 12,
"hits": {
"industry": 0,
"super": 4,
"flops": 2
},
"age": 36,
"height": 5.9,
"mStatus": "single",
"sRate": "30%"
}
}
}
total_nandies_by_actors = sum(jmespath.search('[]',jmespath.search('actors.*.*.nandi',movies)))
but there is no Prince in the data you've provided

Extracting data from JSON File to CSV

I have a big JSON file with a very complex structure
you can look on it here: https://drive.google.com/file/d/1tBVJ2xYSCpTTUGPJegvAz2ZXbeN0bteX/view?usp=sharing
it contains more than 7 millions lines, and I want to extract only the "text" field
I have written a python code, to extra all the values of the "text" key or field in the whole file, and it extracted only 12 values! while when I open the JSON file on the Visualstudio, I have more than 19000 values!!
you can see the code here:
import json
import csv
with open("/Users/zahraa-maher/rasa-init-demo/venv/Tickie/external_data/frames2.json") as file:
data = json.load(file)
fname = "outputText8.csv"
with open(fname, "w") as file:
csv_file = csv.writer(file,lineterminator='\n')
csv_file.writerow(["text"])
for item in data[i]["turns"]:
csv_file.writerow([item['text']])
please take a look on the JSON file as it is very large one and with a complex structure, so I an not paste it here to see because it would be not understandable
also this is a part of the son file:
[
{
"user_id": "U22HTHYNP",
"turns": [
{
"text": "I'd like to book a trip to Atlantis from Caprica on Saturday, August 13, 2016 for 8 adults. I have a tight budget of 1700.",
"labels": {
"acts": [
{
"args": [
{
"val": "book",
"key": "intent"
}
],
"name": "inform"
},
{
"args": [
{
"val": "Atlantis",
"key": "dst_city"
},
{
"val": "Caprica",
"key": "or_city"
},
{
"val": "Saturday, August 13, 2016",
"key": "str_date"
},
{
"val": "8",
"key": "n_adults"
},
{
"val": "1700",
"key": "budget"
}
],
"name": "inform"
}
],
"acts_without_refs": [
{
"args": [
{
"val": "book",
"key": "intent"
}
],
"name": "inform"
},
{
"args": [
{
"val": "Atlantis",
"key": "dst_city"
},
{
"val": "Caprica",
"key": "or_city"
},
{
"val": "Saturday, August 13, 2016",
"key": "str_date"
},
{
"val": "8",
"key": "n_adults"
},
{
"val": "1700",
"key": "budget"
}
],
"name": "inform"
}
],
"active_frame": 1,
"frames": [
{
"info": {
"intent": [
{
"val": "book",
"negated": false
}
],
"budget": [
{
"val": "1700.0",
"negated": false
}
],
"dst_city": [
{
"val": "Atlantis",
"negated": false
}
],
"or_city": [
{
"val": "Caprica",
"negated": false
}
],
"str_date": [
{
"val": "august 13",
"negated": false
}
],
"n_adults": [
{
"val": "8",
"negated": false
}
]
},
"frame_id": 1,
"requests": [],
"frame_parent_id": null,
"binary_questions": [],
"compare_requests": []
}
]
},
"author": "user",
"timestamp": 1471272019730.0
},
{
"db": {
"result": [
[
{
"trip": {
"returning": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 10,
"year": 2016,
"day": 24,
"min": 51,
"month": 8
},
"departure": {
"hour": 10,
"year": 2016,
"day": 24,
"min": 0,
"month": 8
}
},
"seat": "ECONOMY",
"leaving": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 0,
"year": 2016,
"day": 16,
"min": 51,
"month": 8
},
"departure": {
"hour": 0,
"year": 2016,
"day": 16,
"min": 0,
"month": 8
}
},
"or_city": "Porto Alegre",
"duration_days": 9
},
"price": 2118.81,
"hotel": {
"gst_rating": 7.15,
"vicinity": [],
"name": "Scarlet Palms Resort",
"country": "Brazil",
"amenities": [
"FREE_BREAKFAST",
"FREE_PARKING",
"FREE_WIFI"
],
"dst_city": "Goiania",
"category": "3.5 star hotel"
}
},
{
"trip": {
"returning": {
"duration": {
"hours": 2,
"min": 37
},
"arrival": {
"hour": 12,
"year": 2016,
"day": 10,
"min": 37,
"month": 8
},
"departure": {
"hour": 10,
"year": 2016,
"day": 10,
"min": 0,
"month": 8
}
},
"seat": "ECONOMY",
"leaving": {
"duration": {
"hours": 2,
"min": 37
},
"arrival": {
"hour": 0,
"year": 2016,
"day": 4,
"min": 37,
"month": 8
},
"departure": {
"hour": 22,
"year": 2016,
"day": 3,
"min": 0,
"month": 8
}
},
"or_city": "Porto Alegre",
"duration_days": 7
},
"price": 2369.83,
"hotel": {
"gst_rating": 0,
"vicinity": [],
"name": "Sunway Hostel",
"country": "Argentina",
"amenities": [
"FREE_BREAKFAST",
"FREE_WIFI"
],
"dst_city": "Rosario",
"category": "2.0 star hotel"
}
},
{
"trip": {
"returning": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 10,
"year": 2016,
"day": 24,
"min": 51,
"month": 8
},
"departure": {
"hour": 10,
"year": 2016,
"day": 24,
"min": 0,
"month": 8
}
},
"seat": "BUSINESS",
"leaving": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 0,
"year": 2016,
"day": 16,
"min": 51,
"month": 8
},
"departure": {
"hour": 0,
"year": 2016,
"day": 16,
"min": 0,
"month": 8
}
},
"or_city": "Porto Alegre",
"duration_days": 9
},
"price": 2375.72,
"hotel": {
"gst_rating": 7.15,
"vicinity": [],
"name": "Scarlet Palms Resort",
"country": "Brazil",
"amenities": [
"FREE_BREAKFAST",
"FREE_PARKING",
"FREE_WIFI"
],
"dst_city": "Goiania",
"category": "3.5 star hotel"
}
},
{
"trip": {
"returning": {
"duration": {
"hours": 1,
"min": 30
},
"arrival": {
"hour": 11,
"year": 2016,
"day": 1,
"min": 30,
"month": 9
},
"departure": {
"hour": 10,
"year": 2016,
"day": 1,
"min": 0,
"month": 9
}
},
"seat": "BUSINESS",
"leaving": {
"duration": {
"hours": 1,
"min": 30
},
"arrival": {
"hour": 18,
"year": 2016,
"day": 19,
"min": 30,
"month": 8
},
"departure": {
"hour": 17,
"year": 2016,
"day": 19,
"min": 0,
"month": 8
}
},
"or_city": "Porto Alegre",
"duration_days": 13
},
"price": 2492.95,
"hotel": {
"gst_rating": 0,
"vicinity": [],
"name": "Hotel Mundo",
"country": "Brazil",
"amenities": [
"FREE_BREAKFAST",
"FREE_WIFI",
"FREE_PARKING"
],
"dst_city": "Manaus",
"category": "2.5 star hotel"
}
},
{
"trip": {
"returning": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 10,
"year": 2016,
"day": 31,
"min": 51,
"month": 8
},
"departure": {
"hour": 10,
"year": 2016,
"day": 31,
"min": 0,
"month": 8
}
},
"seat": "ECONOMY",
"leaving": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 19,
"year": 2016,
"day": 27,
"min": 51,
"month": 8
},
"departure": {
"hour": 19,
"year": 2016,
"day": 27,
"min": 0,
"month": 8
}
},
"or_city": "Porto Alegre",
"duration_days": 4
},
"price": 2538.0,
"hotel": {
"gst_rating": 8.22,
"vicinity": [],
"name": "The Glee",
"country": "Brazil",
"amenities": [
"FREE_BREAKFAST",
"FREE_WIFI"
],
"dst_city": "Recife",
"category": "4.0 star hotel"
}
}
],
[],
[],
[],
[],
[],
[]
],
"search": [
{
"ORIGIN_CITY": "Porto Alegre",
"PRICE_MIN": "2000",
"NUM_ADULTS": "2",
"timestamp": 1471271949.995,
"PRICE_MAX": "3000",
"ARE_DATES_FLEXIBLE": "true",
"NUM_CHILDREN": "5",
"START_TIME": "1470110400000",
"MAX_DURATION": 2592000000.0,
"DESTINATION_CITY": "Brazil",
"RESULT_LIMIT": "10",
"END_TIME": "1472616000000"
},
{
"ORIGIN_CITY": "Atlantis",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272148.124,
"PRICE_MAX": "1700",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "NaN",
"END_TIME": "NaN"
},
{
"ORIGIN_CITY": "Caprica",
"PRICE_MAX": "1700",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272189.07,
"DESTINATION_CITY": "Atlantis",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "1470715200000",
"END_TIME": "1472011200000"
},
{
"ORIGIN_CITY": "Caprica",
"PRICE_MAX": "1700",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272205.436,
"DESTINATION_CITY": "Atlantis",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "1470715200000",
"END_TIME": "1472011200000"
},
{
"ORIGIN_CITY": "Caprica",
"PRICE_MIN": "1700",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272278.72,
"DESTINATION_CITY": "Atlantis",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "1470715200000",
"END_TIME": "1472011200000"
},
{
"ORIGIN_CITY": "Caprica",
"PRICE_MIN": "1700",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272454.542,
"DESTINATION_CITY": "Atlantis",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "1471060800000",
"END_TIME": "1472011200000"
},
{
"ORIGIN_CITY": "Caprica",
"PRICE_MIN": "1700",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272466.008,
"DESTINATION_CITY": "Atlantis",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "1471060800000",
"END_TIME": "1472011200000"
}
]
},
How it could be modified to extract all the "text" values from the JSON file to a CSV file?
This is a potential solution using pandas:
import pandas as pd
#importing data
dj = pd.read_json("frames2.json")
dtext = dj[["user_id","turns"]]
#Saving text records in a list
list_ = []
for record in dtext["turns"].values:
for r in record:
list_.append(r["text"])
#Exporting the csv
out = pd.Series(list_,name="text")
out.to_csv("text.csv")
It gives the following output.
Try:
import json
import csv
with open("/Users/zahraa-maher/rasa-init-demo/venv/Tickie/external_data/frames2.json") as file:
data = json.load(file)
fname = "outputText8.csv"
with open(fname, "w") as file:
csv_file = csv.writer(file,lineterminator='\n')
csv_file.writerow(["text"])
for keys,values in data.items():
now it up to you which of the fields you want to save, if you user a debugger you can see the values and Keys

Synonym analyzer not working in elastic search with python

I have a scenario as depicted below in python code .
In this I am trying to explicitly define new york and ny as synonyms. But unfortunately it is not working. Can you please guide me as I am new to elastic search.
Also I am using custom analyzer.
I also have the file synonyms.txt having text:
ny,newyork,nyork
from datetime import datetime
from elasticsearch import Elasticsearch
es = Elasticsearch()
keywords = ['thousand eyes', 'facebook', 'superdoc', 'quora', 'your story', 'Surgery', 'lending club', 'ad roll',
'the honest company', 'Draft kings', 'newyork']
count = 1
doc_setting = {
"settings": {
"analysis": {
"analyzer": {
"my_analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"asciifolding",
"lowercase",
"synonym"
]
},
"my_analyzer_shingle": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": "true"
}
}
}
}, "mappings": {
"your_type": {
"properties": {
"keyword": {
"type": "string",
"index_analyzer": "my_analyzer_keyword",
"search_analyzer": "my_analyzer_shingle"
}
}
}
}
}
validate=es.index(index='test', doc_type='your_type', body=doc_setting)
print(validate)
for keyword in keywords:
doc = {
'id': count,
'keyword': keyword
}
res = es.index(index="test", doc_type='your_type', id=count, body=doc)
print(res['result'])
count = count + 1
#res11 = es.get(index="test", doc_type='your_type', id=1)
#print(res11['_source'])
es.indices.refresh(index="test")
question = "I saw news on ny news channel of lending club on facebook, your story and quora"
print("Question asked: %s" % question)
res = es.search(index="test",`enter code here` doc_type='your_type', body={
"query": {"match": {"keyword": question}}})
print("Got %d Hits:" % res['hits']['total'])
for hit in res['hits']['hits']:
print(hit["_source"])
PUT /test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"asciifolding",
"lowercase",
"synonym"
]
},
"my_analyzer_shingle": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"synonym"
]
}
},
"filter": {
"synonym" : {
"type" : "synonym",
"lenient": true,
"synonyms" : ["ny,newyork,nyork"]
}
}
}
}, "mappings": {
"your_type": {
"properties": {
"keyword": {
"type": "text",
"analyzer": "my_analyzer_keyword",
"search_analyzer": "my_analyzer_shingle"
}
}
}
}
}
Then Analyze using
POST /test_index/_analyze
{
"analyzer" : "my_analyzer_shingle",
"text" : "I saw news on ny news channel of lending club on facebook, your story and quorat"
}
The tokens I get are
{
"tokens": [
{
"token": "i",
"start_offset": 0,
"end_offset": 1,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "saw",
"start_offset": 2,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "news",
"start_offset": 6,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "on",
"start_offset": 11,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "ny",
"start_offset": 14,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "newyork",
"start_offset": 14,
"end_offset": 16,
"type": "SYNONYM",
"position": 4
},
{
"token": "nyork",
"start_offset": 14,
"end_offset": 16,
"type": "SYNONYM",
"position": 4
},
{
"token": "news",
"start_offset": 17,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 5
},
{
"token": "channel",
"start_offset": 22,
"end_offset": 29,
"type": "<ALPHANUM>",
"position": 6
},
{
"token": "of",
"start_offset": 30,
"end_offset": 32,
"type": "<ALPHANUM>",
"position": 7
},
{
"token": "lending",
"start_offset": 33,
"end_offset": 40,
"type": "<ALPHANUM>",
"position": 8
},
{
"token": "club",
"start_offset": 41,
"end_offset": 45,
"type": "<ALPHANUM>",
"position": 9
},
{
"token": "on",
"start_offset": 46,
"end_offset": 48,
"type": "<ALPHANUM>",
"position": 10
},
{
"token": "facebook",
"start_offset": 49,
"end_offset": 57,
"type": "<ALPHANUM>",
"position": 11
},
{
"token": "your",
"start_offset": 59,
"end_offset": 63,
"type": "<ALPHANUM>",
"position": 12
},
{
"token": "story",
"start_offset": 64,
"end_offset": 69,
"type": "<ALPHANUM>",
"position": 13
},
{
"token": "and",
"start_offset": 70,
"end_offset": 73,
"type": "<ALPHANUM>",
"position": 14
},
{
"token": "quorat",
"start_offset": 74,
"end_offset": 80,
"type": "<ALPHANUM>",
"position": 15
}
]
}
and the search produces
POST /test_index/_search
{
"query" : {
"match" : { "keyword" : "I saw news on ny news channel of lending club on facebook, your story and quora" }
}
}
{
"took": 36,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1.6858001,
"hits": [
{
"_index": "test_index",
"_type": "your_type",
"_id": "4",
"_score": 1.6858001,
"_source": {
"keyword": "newyork"
}
},
{
"_index": "test_index",
"_type": "your_type",
"_id": "2",
"_score": 1.1727304,
"_source": {
"keyword": "facebook"
}
},
{
"_index": "test_index",
"_type": "your_type",
"_id": "5",
"_score": 0.6931472,
"_source": {
"keyword": "quora"
}
}
]
}
}

JSON in Python: How do I get specific parts of an array?

I'm trying to get a specific value in Python of a JSON object. Before I could use something like:
data['data']['data2']
to get a certain value that is associated with data2 but this is a little different, my JSON file is now more complex and is this
{
"data": {
"playerStatSummaries": {
"playerStatSummarySet": [
{
"aggregatedStats": {
"stats": []
},
"leaves": 0,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "Unranked3x3",
"rating": 400,
"wins": 5
},
{
"aggregatedStats": {
"stats": []
},
"leaves": 0,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "AramUnranked6x6",
"rating": 400,
"wins": 0
},
{
"aggregatedStats": {
"stats": [
{
"statType": "TOTAL_CHAMPION_KILLS",
"value": 68
},
{
"statType": "TOTAL_ASSISTS",
"value": 116
},
{
"statType": "TOTAL_MINION_KILLS",
"value": 1854
},
{
"statType": "TOTAL_TURRETS_KILLED",
"value": 22
},
{
"statType": "TOTAL_NEUTRAL_MINIONS_KILLED",
"value": 359
}
]
},
"leaves": 0,
"losses": 5,
"maxRating": 1505,
"modifyDate": "/Date(1357261303440)/",
"playerStatSummaryType": "RankedSolo5x5",
"rating": 1505,
"wins": 9
},
{
"aggregatedStats": {
"stats": [
{
"statType": "TOTAL_CHAMPION_KILLS",
"value": 369
},
{
"statType": "TOTAL_ASSISTS",
"value": 535
},
{
"statType": "TOTAL_MINION_KILLS",
"value": 9917
},
{
"statType": "TOTAL_TURRETS_KILLED",
"value": 78
},
{
"statType": "TOTAL_NEUTRAL_MINIONS_KILLED",
"value": 1050
}
]
},
"leaves": 0,
"losses": 35,
"maxRating": 1266,
"modifyDate": "/Date(1323496849000)/",
"playerStatSummaryType": "RankedTeam5x5",
"rating": 1266,
"wins": 39
},
{
"aggregatedStats": {
"stats": [
{
"statType": "TOTAL_CHAMPION_KILLS",
"value": 29
},
{
"statType": "TOTAL_ASSISTS",
"value": 17
},
{
"statType": "TOTAL_MINION_KILLS",
"value": 176
},
{
"statType": "TOTAL_TURRETS_KILLED",
"value": 8
},
{
"statType": "TOTAL_NEUTRAL_MINIONS_KILLED",
"value": 12
}
]
},
"leaves": 0,
"losses": 0,
"maxRating": 1200,
"modifyDate": "/Date(1326521499000)/",
"playerStatSummaryType": "CoopVsAI",
"rating": 1200,
"wins": 2
},
{
"aggregatedStats": {
"stats": [
{
"statType": "TOTAL_CHAMPION_KILLS",
"value": 150
},
{
"statType": "TOTAL_ASSISTS",
"value": 184
},
{
"statType": "TOTAL_MINION_KILLS",
"value": 3549
},
{
"statType": "TOTAL_TURRETS_KILLED",
"value": 24
},
{
"statType": "TOTAL_NEUTRAL_MINIONS_KILLED",
"value": 224
}
]
},
"leaves": 0,
"losses": 17,
"maxRating": 0,
"modifyDate": "/Date(1350098520000)/",
"playerStatSummaryType": "RankedTeam3x3",
"rating": 1308,
"wins": 22
},
{
"aggregatedStats": {
"stats": [
{
"statType": "TOTAL_CHAMPION_KILLS",
"value": 15
},
{
"statType": "TOTAL_ASSISTS",
"value": 185
},
{
"statType": "TOTAL_MINION_KILLS",
"value": 250
},
{
"statType": "TOTAL_TURRETS_KILLED",
"value": 4
},
{
"statType": "TOTAL_NEUTRAL_MINIONS_KILLED",
"value": 15
}
]
},
"leaves": 0,
"losses": 3,
"maxRating": 1365,
"modifyDate": "/Date(1321778545000)/",
"playerStatSummaryType": "RankedPremade5x5",
"rating": 1365,
"wins": 8
},
{
"aggregatedStats": {
"stats": [
{
"statType": "TOTAL_CHAMPION_KILLS",
"value": 672
},
{
"statType": "AVERAGE_CHAMPIONS_KILLED",
"value": 9
},
{
"statType": "MAX_COMBAT_PLAYER_SCORE",
"value": 889
},
{
"statType": "AVERAGE_OBJECTIVE_PLAYER_SCORE",
"value": 771
},
{
"statType": "MAX_TEAM_OBJECTIVE",
"value": 2
},
{
"statType": "MAX_NODE_CAPTURE",
"value": 14
},
{
"statType": "MAX_OBJECTIVE_PLAYER_SCORE",
"value": 1424
},
{
"statType": "MAX_TOTAL_PLAYER_SCORE",
"value": 1950
},
{
"statType": "AVERAGE_NUM_DEATHS",
"value": 10
},
{
"statType": "TOTAL_DECAYER",
"value": 105
},
{
"statType": "TOTAL_ASSISTS",
"value": 931
},
{
"statType": "AVERAGE_NODE_NEUTRALIZE",
"value": 6
},
{
"statType": "AVERAGE_NODE_CAPTURE_ASSIST",
"value": 2
},
{
"statType": "MAX_NODE_CAPTURE_ASSIST",
"value": 5
},
{
"statType": "MAX_ASSISTS",
"value": 25
},
{
"statType": "AVERAGE_NODE_NEUTRALIZE_ASSIST",
"value": 1
},
{
"statType": "AVERAGE_TOTAL_PLAYER_SCORE",
"value": 1182
},
{
"statType": "MAX_NODE_NEUTRALIZE_ASSIST",
"value": 3
},
{
"statType": "AVERAGE_COMBAT_PLAYER_SCORE",
"value": 413
},
{
"statType": "AVERAGE_NODE_CAPTURE",
"value": 8
},
{
"statType": "MAX_CHAMPIONS_KILLED",
"value": 20
},
{
"statType": "TOTAL_NODE_NEUTRALIZE",
"value": 391
},
{
"statType": "AVERAGE_TEAM_OBJECTIVE",
"value": 1
},
{
"statType": "AVERAGE_ASSISTS",
"value": 11
},
{
"statType": "TOTAL_NODE_CAPTURE",
"value": 447
},
{
"statType": "MAX_NODE_NEUTRALIZE",
"value": 11
},
{
"statType": "MAX_NUM_DEATHS",
"value": 16
}
]
},
"leaves": 0,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "OdinUnranked",
"rating": 400,
"wins": 43
},
{
"aggregatedStats": {
"stats": []
},
"leaves": 0,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "AramUnranked2x2",
"rating": 400,
"wins": 0
},
{
"aggregatedStats": {
"stats": []
},
"leaves": 0,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "AramUnranked1x1",
"rating": 400,
"wins": 0
},
{
"aggregatedStats": {
"stats": []
},
"leaves": 0,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "AramUnranked3x3",
"rating": 400,
"wins": 0
},
{
"aggregatedStats": {
"stats": [
{
"statType": "TOTAL_CHAMPION_KILLS",
"value": 10269
},
{
"statType": "TOTAL_DECAYER",
"value": 0
},
{
"statType": "TOTAL_ASSISTS",
"value": 15722
},
{
"statType": "TOTAL_MINION_KILLS",
"value": 262793
},
{
"statType": "TOTAL_TURRETS_KILLED",
"value": 1954
},
{
"statType": "TOTAL_NEUTRAL_MINIONS_KILLED",
"value": 43898
},
{
"statType": "TOTAL_DEATHS_PER_SESSION",
"value": 1513
}
]
},
"leaves": 1,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "Unranked",
"rating": 400,
"wins": 1691
},
{
"aggregatedStats": {
"stats": []
},
"leaves": 0,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "AramUnranked5x5",
"rating": 400,
"wins": 0
}
]
},
"previousFirstWinOfDay": "/Date(1357489166306)/",
"userId": 55060
},
"success": true
}
As you can see this is really long, my question is, how would I grab only specific values from a certain playerStatSummarySet set? Like let's say I only wanted to grab the rating value from the set with the playerStatSummaryType value of RankedSolo5x5 how would I do that?
Here's what I have going so far to get the data from the JSON file.
with open('data.txt', 'r') as f:
data = json.load(f)
if you have to work with complex json objects, I suggest you take a look at jsonpath that offers xpath like language for json objects.
An example:
import jsonpath
import json
with open('/test.json', 'r') as f:
data = json.load(f)
path = "$.[?(#.playerStatSummaryType == 'RankedSolo5x5')].rating"
jsonpath.jsonpath(data,path)
out:
[1505]
Use a list comprehension
with open('data.txt', 'r') as f:
data = json.load(f)
rating = [summary["rating"] for summary
in data["data"]["playerStatSummaries"]["playerStatSummarySet"]
if summary["playerStatSummaryType"] == "RankedSolo5x5"][0]
You can still do it, but you have to access the data structure properly. What json.load() is returning is a JSON object which is the same as a Python dictionary. This obj has a key named 'data' in it that is associated with another object-dictionary, etc down until you get to the 'playerStatSummaries' object which has a data member keyed with 'playerStatSummarySet' that is actually a Python list rather than another object-dictionary.
Here's an example of how to search through that list of summary sets and find a specific entry -- remembering that since this data item is a list rather than dictionary object you have step through each of the entries in it to find the one you're looking for rather than just looking-up its name.
import json
with open('data.txt', 'r') as f:
jsonObj = json.load(f)
targetSummaryType = 'RankedSolo5x5'
for summarySet in jsonObj['data']['playerStatSummaries']['playerStatSummarySet']:
if summarySet['playerStatSummaryType'] == targetSummaryType:
print 'max rating for {}: {}'.format(targetSummaryType,
summarySet['maxRating'])
break # if you only expect there to be one
Output:
max rating for RankedSolo5x5: 1505
To figure out what was needed I found it useful to initially pprint() the whole jsonObj which made the structure very easy to see.

Categories