How to split complex JSON file into multiple files by Python

How to split complex JSON file into multiple files by Python - python

I am currently splitting Json file.
The structure of JSON file is like this :
{
"id": 2131424,
"file": "video_2131424_1938263.mp4",
"metadata": {
"width": 3840,
"height": 2160,
"duration": 312.83,
"fps": 30,
"frames": 9385,
"created": "Sun Jan 17 17:48:52 2021"
},
"frames": [
{
"number": 207,
"image": "frame_207.jpg",
"annotations": [
{
"label": {
"x": 730,
"y": 130,
"width": 62,
"height": 152
},
"category": {
"code": "child",
"attributes": [
{
"code": "global_id",
"value": "7148"
}
]
}
},
{
"label": {
"x": 815,
"y": 81,
"width": 106,
"height": 197
},
"category": {
"code": "person",
"attributes": []
}
}
]
},
{
"number": 221,
"image": "frame_221.jpg",
"annotations": [
{
"label": {
"x": 730,
"y": 130,
"width": 64,
"height": 160
},
"category": {
"code": "child",
"attributes": [
{
"code": "global_id",
"value": "7148"
}
]
}
},
{
"label": {
"x": 819,
"y": 82,
"width": 106,
"height": 200
},
"category": {
"code": "person",
"attributes": []
}
}
]
},
{
"number": 236,
"image": "frame_236.jpg",
"annotations": [
{
"label": {
"x": 731,
"y": 135,
"width": 74,
"height": 160
},
"category": {
"code": "child",
"attributes": [
{
"code": "global_id",
"value": "7148"
}
]
}
},
{
"label": {
"x": 821,
"y": 83,
"width": 106,
"height": 206
},
"category": {
"code": "person",
"attributes": []
}
}
]
},
I have to extract [x, y, width, height] from each label.
I tried some code like this:
file = json.load(open('annotation_2131424.json'))
file['frames'][i]['annotations'][j]['label']['x']
But I cannot split JSON.
I tried like this but I cannot run...

I hope I've understood your question right. To get x, y, width, height from each label (dct is your dictionary from the question):
out = [
[
[
a["label"]["x"],
a["label"]["y"],
a["label"]["width"],
a["label"]["height"],
]
for a in frame["annotations"]
]
for frame in dct["frames"]
]
print(out)
Prints:
[
[[730, 130, 62, 152], [815, 81, 106, 197]],
[[730, 130, 64, 160], [819, 82, 106, 200]],
[[731, 135, 74, 160], [821, 83, 106, 206]],
]

Related

merge dicts that have the same value for specific key

I need to combine dictionaries that have the same value for the key "tag".
Like from this:
[
[
{
"tag": "#2C00L02RU",
"stamina": 233
},
{
"tag": "#8YG8RJV90",
"stamina": 20
},
{
"tag": "#LQV2JCPR",
"stamina": 154
},
{
"tag": "#9JQLPGLJJ",
"stamina": 134
}
],
[
{
"tag": "#2C00L02RU",
"health": 200
},
{
"tag": "#8YG8RJV90",
"health": 40
},
{
"tag": "#LQV2JCPR",
"health": 100
},
{
"tag": "#9JQLPGLJJ",
"health": 240
}
],
[
{
"tag": "#LQV2JCPR",
"fame": 1
},
{
"tag": "#8YG8RJV90",
"fame": 2
},
{
"tag": "#9JQLPGLJJ",
"fame": 3
},
{
"tag": "#2C00L02RU",
"fame": 4
}
],
[
{
"tag": "#LQV2JCPR",
"moves": 6
},
{
"tag": "#8YG8RJV90",
"moves": 0
},
{
"tag": "#9JQLPGLJJ",
"moves": 8
},
{
"tag": "#2C00L02RU",
"moves": 4
}
]
]
to this:
[
{
"tag": "#2C00L02RU",
"stamina": 233,
"health": 200,
"fame": 4,
"moves": 4
},
{
"tag": "#8YG8RJV90",
"stamina": 20,
"health": 40,
"fame": 2,
"moves": 2
},
{
"tag": "#LQV2JCPR",
"stamina": 154,
"health": 100,
"fame": 1,
"moves": 6
},
{
"tag": "#9JQLPGLJJ",
"stamina": 134,
"health": 240,
"fame": 3,
"moves": 8
}
]
I've already tried iterating through countless loops, but only got failures.
I won't show any of my attempts here because they didn't even come close to the expected result.
If you need any other information, just let me know.

If lst is list from your question, you can do:
out = {}
for l in lst:
for d in l:
out.setdefault(d["tag"], {}).update(d)
print(list(out.values()))
Prints:
[
{"tag": "#2C00L02RU", "stamina": 233, "health": 200, "fame": 4, "moves": 4},
{"tag": "#8YG8RJV90", "stamina": 20, "health": 40, "fame": 2, "moves": 0},
{"tag": "#LQV2JCPR", "stamina": 154, "health": 100, "fame": 1, "moves": 6},
{"tag": "#9JQLPGLJJ", "stamina": 134, "health": 240, "fame": 3, "moves": 8},
]

Why is the individual dictionary being printed in output in the following code not the same as the one being added?

So, I have a JSON file, with some elements in it. The end goal of the code is to generate 3 elements for each element in the JSON file, with some modified properties, but not all.
The way I went about this is to run a for loop and have 3 different parts of code block generating the required elements, and adding it to an empty list. But for some reason, I the elements I am generating aren't the ones being added to the final list.
Specifically, the "label" sub element gets messed up. Can someone please explain why's that happening?
Here's the code:
from math import sqrt
import json
import gc
with open("your-path here\\inclined-conveyors.json") as f:
conv_json = json.load(f)
#print("conv_json['conv'] object is: \n\n\n", conv_json['conv'])
modified_conv = list()
for item in conv_json["conv"]:
new_x_coordinate_delta = (item['height']) / (2*sqrt(3))
new_y_coordinate_delta = item['height'] / 6
addendum1 = item.copy()
addendum1['name'] = addendum1['name'][:6] + "A"
addendum1['displayName'] = addendum1['name']
new_x_coordinate = addendum1['x'] + new_x_coordinate_delta
new_y_coordinate = addendum1['y'] - new_y_coordinate_delta
addendum1['x'] = round(new_x_coordinate, 6)
addendum1['y'] = round(new_y_coordinate, 6)
addendum1['label']['x'] = round(new_x_coordinate, 6)
addendum1['label']['y'] = round(new_y_coordinate, 6)
addendum1['height'] = addendum1['height'] / 3
print("\n addendum1 is: ", addendum1)
empty_list = []
empty_list.append(addendum1)
modified_conv.extend(empty_list)
# modified_conv.append(addendum)
del addendum1
del empty_list
gc.collect()
# print("added item from x == 1")
# print("\n modified conv is: ", modified_conv)
addendum2 = item.copy()
addendum2['name'] = addendum2['name'][:6] + "B"
addendum2['displayName'] = addendum2['name']
addendum2['label']['x'] = addendum2['x']
addendum2['label']['y'] = addendum2['y']
addendum2['height'] = addendum2['height'] / 3
print("\n addendum2 is: ", addendum2)
empty_list = []
empty_list.append(addendum2)
modified_conv.extend(empty_list)
# modified_conv.append(addendum)
# modified_conv = modified_conv + addendum
del addendum2
del empty_list
gc.collect()
# print("added item from x == 2")
# print("\n modified conv is: ", modified_conv)
addendum3 = item.copy()
addendum3['name'] = addendum3['name'][:6] + "C"
addendum3['displayName'] = addendum3['name']
#addendum3['displayName'] = ""
new_x_coordinate = addendum3['x'] - new_x_coordinate_delta
new_y_coordinate = addendum3['y'] + new_y_coordinate_delta
addendum3['x'] = round(new_x_coordinate, 6)
addendum3['y'] = round(new_y_coordinate, 6)
addendum3['label']['x'] = round(new_x_coordinate, 6)
addendum3['label']['y'] = round(new_y_coordinate, 6)
addendum3['height'] = addendum3['height'] / 3
print("\n addendum3 is: ", addendum3)
empty_list = []
empty_list.append(addendum3)
modified_conv.extend(empty_list)
# modified_conv.append(addendum)
# modified_conv = modified_conv + addendum
del addendum3
del empty_list
gc.collect()
# print("added item from x == 3")
# print("\n modified conv is: ", modified_conv)
# modified_conv["conv"].append(item)
print("\n\n\n modified conv is: \n\n\n", modified_conv)
print("\n\n\ number of items in json is: ", len(conv_json['conv']))
print("\n\n number of items in modified conv is: ", len(modified_conv))
Here is a sample JSON:
{
"conv": [
{
"className": "ConveyorStraight",
"name": "P42700B",
"displayName": "P42700",
"type": "Straight",
"x": 1511,
"y": 2891.5,
"label": {
"x": 1511,
"y": 2891.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P42600B",
"displayName": "P42600",
"type": "Straight",
"x": 1621,
"y": 2891.5,
"label": {
"x": 1621,
"y": 2891.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P42500B",
"displayName": "P42500",
"type": "Straight",
"x": 1731,
"y": 2891.5,
"label": {
"x": 1731,
"y": 2891.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P42400B",
"displayName": "P42400",
"type": "Straight",
"x": 1861,
"y": 2892.5,
"label": {
"x": 1861,
"y": 2892.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P42300B",
"displayName": "P42300",
"type": "Straight",
"x": 1971,
"y": 2892.5,
"label": {
"x": 1971,
"y": 2892.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P42200B",
"displayName": "P42200",
"type": "Straight",
"x": 2081,
"y": 2892.5,
"label": {
"x": 2081,
"y": 2892.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P42100B",
"displayName": "P42100",
"type": "Straight",
"x": 2211,
"y": 2892.5,
"label": {
"x": 2211,
"y": 2892.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P42000B",
"displayName": "P42000",
"type": "Straight",
"x": 2321,
"y": 2892.5,
"label": {
"x": 2321,
"y": 2892.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P41900B",
"displayName": "P41900",
"type": "Straight",
"x": 2431,
"y": 2892.5,
"label": {
"x": 2431,
"y": 2892.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P41800B",
"displayName": "P41800",
"type": "Straight",
"x": 2561,
"y": 2892.5,
"label": {
"x": 2561,
"y": 2892.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P41700B",
"displayName": "P41700",
"type": "Straight",
"x": 2671,
"y": 2892.5,
"label": {
"x": 2671,
"y": 2892.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P41600B",
"displayName": "P41600",
"type": "Straight",
"x": 2781,
"y": 2892.5,
"label": {
"x": 2781,
"y": 2892.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P41500B",
"displayName": "P41500",
"type": "Straight",
"x": 2911,
"y": 2892.5,
"label": {
"x": 2911,
"y": 2892.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P41400B",
"displayName": "P41400",
"type": "Straight",
"x": 3021,
"y": 2892.5,
"label": {
"x": 3021,
"y": 2892.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P41300B",
"displayName": "P41300",
"type": "Straight",
"x": 3131,
"y": 2892.5,
"label": {
"x": 3131,
"y": 2892.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P41200B",
"displayName": "P41200",
"type": "Straight",
"x": 3261,
"y": 2892.5,
"label": {
"x": 3261,
"y": 2892.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
},
{
"className": "ConveyorStraight",
"name": "P41100B",
"displayName": "P41100",
"type": "Straight",
"x": 3371,
"y": 2892.5,
"label": {
"x": 3371,
"y": 2892.5,
"size": 20,
"rotation": 0
},
"height": 360,
"width": 50,
"rotation": 60,
"searchable": true,
"controlled": true
}
]
}

Extracting data from JSON File to CSV

I have a big JSON file with a very complex structure
you can look on it here: https://drive.google.com/file/d/1tBVJ2xYSCpTTUGPJegvAz2ZXbeN0bteX/view?usp=sharing
it contains more than 7 millions lines, and I want to extract only the "text" field
I have written a python code, to extra all the values of the "text" key or field in the whole file, and it extracted only 12 values! while when I open the JSON file on the Visualstudio, I have more than 19000 values!!
you can see the code here:
import json
import csv
with open("/Users/zahraa-maher/rasa-init-demo/venv/Tickie/external_data/frames2.json") as file:
data = json.load(file)
fname = "outputText8.csv"
with open(fname, "w") as file:
csv_file = csv.writer(file,lineterminator='\n')
csv_file.writerow(["text"])
for item in data[i]["turns"]:
csv_file.writerow([item['text']])
please take a look on the JSON file as it is very large one and with a complex structure, so I an not paste it here to see because it would be not understandable
also this is a part of the son file:
[
{
"user_id": "U22HTHYNP",
"turns": [
{
"text": "I'd like to book a trip to Atlantis from Caprica on Saturday, August 13, 2016 for 8 adults. I have a tight budget of 1700.",
"labels": {
"acts": [
{
"args": [
{
"val": "book",
"key": "intent"
}
],
"name": "inform"
},
{
"args": [
{
"val": "Atlantis",
"key": "dst_city"
},
{
"val": "Caprica",
"key": "or_city"
},
{
"val": "Saturday, August 13, 2016",
"key": "str_date"
},
{
"val": "8",
"key": "n_adults"
},
{
"val": "1700",
"key": "budget"
}
],
"name": "inform"
}
],
"acts_without_refs": [
{
"args": [
{
"val": "book",
"key": "intent"
}
],
"name": "inform"
},
{
"args": [
{
"val": "Atlantis",
"key": "dst_city"
},
{
"val": "Caprica",
"key": "or_city"
},
{
"val": "Saturday, August 13, 2016",
"key": "str_date"
},
{
"val": "8",
"key": "n_adults"
},
{
"val": "1700",
"key": "budget"
}
],
"name": "inform"
}
],
"active_frame": 1,
"frames": [
{
"info": {
"intent": [
{
"val": "book",
"negated": false
}
],
"budget": [
{
"val": "1700.0",
"negated": false
}
],
"dst_city": [
{
"val": "Atlantis",
"negated": false
}
],
"or_city": [
{
"val": "Caprica",
"negated": false
}
],
"str_date": [
{
"val": "august 13",
"negated": false
}
],
"n_adults": [
{
"val": "8",
"negated": false
}
]
},
"frame_id": 1,
"requests": [],
"frame_parent_id": null,
"binary_questions": [],
"compare_requests": []
}
]
},
"author": "user",
"timestamp": 1471272019730.0
},
{
"db": {
"result": [
[
{
"trip": {
"returning": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 10,
"year": 2016,
"day": 24,
"min": 51,
"month": 8
},
"departure": {
"hour": 10,
"year": 2016,
"day": 24,
"min": 0,
"month": 8
}
},
"seat": "ECONOMY",
"leaving": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 0,
"year": 2016,
"day": 16,
"min": 51,
"month": 8
},
"departure": {
"hour": 0,
"year": 2016,
"day": 16,
"min": 0,
"month": 8
}
},
"or_city": "Porto Alegre",
"duration_days": 9
},
"price": 2118.81,
"hotel": {
"gst_rating": 7.15,
"vicinity": [],
"name": "Scarlet Palms Resort",
"country": "Brazil",
"amenities": [
"FREE_BREAKFAST",
"FREE_PARKING",
"FREE_WIFI"
],
"dst_city": "Goiania",
"category": "3.5 star hotel"
}
},
{
"trip": {
"returning": {
"duration": {
"hours": 2,
"min": 37
},
"arrival": {
"hour": 12,
"year": 2016,
"day": 10,
"min": 37,
"month": 8
},
"departure": {
"hour": 10,
"year": 2016,
"day": 10,
"min": 0,
"month": 8
}
},
"seat": "ECONOMY",
"leaving": {
"duration": {
"hours": 2,
"min": 37
},
"arrival": {
"hour": 0,
"year": 2016,
"day": 4,
"min": 37,
"month": 8
},
"departure": {
"hour": 22,
"year": 2016,
"day": 3,
"min": 0,
"month": 8
}
},
"or_city": "Porto Alegre",
"duration_days": 7
},
"price": 2369.83,
"hotel": {
"gst_rating": 0,
"vicinity": [],
"name": "Sunway Hostel",
"country": "Argentina",
"amenities": [
"FREE_BREAKFAST",
"FREE_WIFI"
],
"dst_city": "Rosario",
"category": "2.0 star hotel"
}
},
{
"trip": {
"returning": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 10,
"year": 2016,
"day": 24,
"min": 51,
"month": 8
},
"departure": {
"hour": 10,
"year": 2016,
"day": 24,
"min": 0,
"month": 8
}
},
"seat": "BUSINESS",
"leaving": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 0,
"year": 2016,
"day": 16,
"min": 51,
"month": 8
},
"departure": {
"hour": 0,
"year": 2016,
"day": 16,
"min": 0,
"month": 8
}
},
"or_city": "Porto Alegre",
"duration_days": 9
},
"price": 2375.72,
"hotel": {
"gst_rating": 7.15,
"vicinity": [],
"name": "Scarlet Palms Resort",
"country": "Brazil",
"amenities": [
"FREE_BREAKFAST",
"FREE_PARKING",
"FREE_WIFI"
],
"dst_city": "Goiania",
"category": "3.5 star hotel"
}
},
{
"trip": {
"returning": {
"duration": {
"hours": 1,
"min": 30
},
"arrival": {
"hour": 11,
"year": 2016,
"day": 1,
"min": 30,
"month": 9
},
"departure": {
"hour": 10,
"year": 2016,
"day": 1,
"min": 0,
"month": 9
}
},
"seat": "BUSINESS",
"leaving": {
"duration": {
"hours": 1,
"min": 30
},
"arrival": {
"hour": 18,
"year": 2016,
"day": 19,
"min": 30,
"month": 8
},
"departure": {
"hour": 17,
"year": 2016,
"day": 19,
"min": 0,
"month": 8
}
},
"or_city": "Porto Alegre",
"duration_days": 13
},
"price": 2492.95,
"hotel": {
"gst_rating": 0,
"vicinity": [],
"name": "Hotel Mundo",
"country": "Brazil",
"amenities": [
"FREE_BREAKFAST",
"FREE_WIFI",
"FREE_PARKING"
],
"dst_city": "Manaus",
"category": "2.5 star hotel"
}
},
{
"trip": {
"returning": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 10,
"year": 2016,
"day": 31,
"min": 51,
"month": 8
},
"departure": {
"hour": 10,
"year": 2016,
"day": 31,
"min": 0,
"month": 8
}
},
"seat": "ECONOMY",
"leaving": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 19,
"year": 2016,
"day": 27,
"min": 51,
"month": 8
},
"departure": {
"hour": 19,
"year": 2016,
"day": 27,
"min": 0,
"month": 8
}
},
"or_city": "Porto Alegre",
"duration_days": 4
},
"price": 2538.0,
"hotel": {
"gst_rating": 8.22,
"vicinity": [],
"name": "The Glee",
"country": "Brazil",
"amenities": [
"FREE_BREAKFAST",
"FREE_WIFI"
],
"dst_city": "Recife",
"category": "4.0 star hotel"
}
}
],
[],
[],
[],
[],
[],
[]
],
"search": [
{
"ORIGIN_CITY": "Porto Alegre",
"PRICE_MIN": "2000",
"NUM_ADULTS": "2",
"timestamp": 1471271949.995,
"PRICE_MAX": "3000",
"ARE_DATES_FLEXIBLE": "true",
"NUM_CHILDREN": "5",
"START_TIME": "1470110400000",
"MAX_DURATION": 2592000000.0,
"DESTINATION_CITY": "Brazil",
"RESULT_LIMIT": "10",
"END_TIME": "1472616000000"
},
{
"ORIGIN_CITY": "Atlantis",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272148.124,
"PRICE_MAX": "1700",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "NaN",
"END_TIME": "NaN"
},
{
"ORIGIN_CITY": "Caprica",
"PRICE_MAX": "1700",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272189.07,
"DESTINATION_CITY": "Atlantis",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "1470715200000",
"END_TIME": "1472011200000"
},
{
"ORIGIN_CITY": "Caprica",
"PRICE_MAX": "1700",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272205.436,
"DESTINATION_CITY": "Atlantis",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "1470715200000",
"END_TIME": "1472011200000"
},
{
"ORIGIN_CITY": "Caprica",
"PRICE_MIN": "1700",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272278.72,
"DESTINATION_CITY": "Atlantis",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "1470715200000",
"END_TIME": "1472011200000"
},
{
"ORIGIN_CITY": "Caprica",
"PRICE_MIN": "1700",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272454.542,
"DESTINATION_CITY": "Atlantis",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "1471060800000",
"END_TIME": "1472011200000"
},
{
"ORIGIN_CITY": "Caprica",
"PRICE_MIN": "1700",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272466.008,
"DESTINATION_CITY": "Atlantis",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "1471060800000",
"END_TIME": "1472011200000"
}
]
},
How it could be modified to extract all the "text" values from the JSON file to a CSV file?

This is a potential solution using pandas:
import pandas as pd
#importing data
dj = pd.read_json("frames2.json")
dtext = dj[["user_id","turns"]]
#Saving text records in a list
list_ = []
for record in dtext["turns"].values:
for r in record:
list_.append(r["text"])
#Exporting the csv
out = pd.Series(list_,name="text")
out.to_csv("text.csv")
It gives the following output.

Try:
import json
import csv
with open("/Users/zahraa-maher/rasa-init-demo/venv/Tickie/external_data/frames2.json") as file:
data = json.load(file)
fname = "outputText8.csv"
with open(fname, "w") as file:
csv_file = csv.writer(file,lineterminator='\n')
csv_file.writerow(["text"])
for keys,values in data.items():
now it up to you which of the fields you want to save, if you user a debugger you can see the values and Keys

Synonym analyzer not working in elastic search with python

I have a scenario as depicted below in python code .
In this I am trying to explicitly define new york and ny as synonyms. But unfortunately it is not working. Can you please guide me as I am new to elastic search.
Also I am using custom analyzer.
I also have the file synonyms.txt having text:
ny,newyork,nyork
from datetime import datetime
from elasticsearch import Elasticsearch
es = Elasticsearch()
keywords = ['thousand eyes', 'facebook', 'superdoc', 'quora', 'your story', 'Surgery', 'lending club', 'ad roll',
'the honest company', 'Draft kings', 'newyork']
count = 1
doc_setting = {
"settings": {
"analysis": {
"analyzer": {
"my_analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"asciifolding",
"lowercase",
"synonym"
]
},
"my_analyzer_shingle": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": "true"
}
}
}
}, "mappings": {
"your_type": {
"properties": {
"keyword": {
"type": "string",
"index_analyzer": "my_analyzer_keyword",
"search_analyzer": "my_analyzer_shingle"
}
}
}
}
}
validate=es.index(index='test', doc_type='your_type', body=doc_setting)
print(validate)
for keyword in keywords:
doc = {
'id': count,
'keyword': keyword
}
res = es.index(index="test", doc_type='your_type', id=count, body=doc)
print(res['result'])
count = count + 1
#res11 = es.get(index="test", doc_type='your_type', id=1)
#print(res11['_source'])
es.indices.refresh(index="test")
question = "I saw news on ny news channel of lending club on facebook, your story and quora"
print("Question asked: %s" % question)
res = es.search(index="test",`enter code here` doc_type='your_type', body={
"query": {"match": {"keyword": question}}})
print("Got %d Hits:" % res['hits']['total'])
for hit in res['hits']['hits']:
print(hit["_source"])

PUT /test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"asciifolding",
"lowercase",
"synonym"
]
},
"my_analyzer_shingle": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"synonym"
]
}
},
"filter": {
"synonym" : {
"type" : "synonym",
"lenient": true,
"synonyms" : ["ny,newyork,nyork"]
}
}
}
}, "mappings": {
"your_type": {
"properties": {
"keyword": {
"type": "text",
"analyzer": "my_analyzer_keyword",
"search_analyzer": "my_analyzer_shingle"
}
}
}
}
}
Then Analyze using
POST /test_index/_analyze
{
"analyzer" : "my_analyzer_shingle",
"text" : "I saw news on ny news channel of lending club on facebook, your story and quorat"
}
The tokens I get are
{
"tokens": [
{
"token": "i",
"start_offset": 0,
"end_offset": 1,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "saw",
"start_offset": 2,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "news",
"start_offset": 6,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "on",
"start_offset": 11,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "ny",
"start_offset": 14,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "newyork",
"start_offset": 14,
"end_offset": 16,
"type": "SYNONYM",
"position": 4
},
{
"token": "nyork",
"start_offset": 14,
"end_offset": 16,
"type": "SYNONYM",
"position": 4
},
{
"token": "news",
"start_offset": 17,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 5
},
{
"token": "channel",
"start_offset": 22,
"end_offset": 29,
"type": "<ALPHANUM>",
"position": 6
},
{
"token": "of",
"start_offset": 30,
"end_offset": 32,
"type": "<ALPHANUM>",
"position": 7
},
{
"token": "lending",
"start_offset": 33,
"end_offset": 40,
"type": "<ALPHANUM>",
"position": 8
},
{
"token": "club",
"start_offset": 41,
"end_offset": 45,
"type": "<ALPHANUM>",
"position": 9
},
{
"token": "on",
"start_offset": 46,
"end_offset": 48,
"type": "<ALPHANUM>",
"position": 10
},
{
"token": "facebook",
"start_offset": 49,
"end_offset": 57,
"type": "<ALPHANUM>",
"position": 11
},
{
"token": "your",
"start_offset": 59,
"end_offset": 63,
"type": "<ALPHANUM>",
"position": 12
},
{
"token": "story",
"start_offset": 64,
"end_offset": 69,
"type": "<ALPHANUM>",
"position": 13
},
{
"token": "and",
"start_offset": 70,
"end_offset": 73,
"type": "<ALPHANUM>",
"position": 14
},
{
"token": "quorat",
"start_offset": 74,
"end_offset": 80,
"type": "<ALPHANUM>",
"position": 15
}
]
}
and the search produces
POST /test_index/_search
{
"query" : {
"match" : { "keyword" : "I saw news on ny news channel of lending club on facebook, your story and quora" }
}
}
{
"took": 36,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1.6858001,
"hits": [
{
"_index": "test_index",
"_type": "your_type",
"_id": "4",
"_score": 1.6858001,
"_source": {
"keyword": "newyork"
}
},
{
"_index": "test_index",
"_type": "your_type",
"_id": "2",
"_score": 1.1727304,
"_source": {
"keyword": "facebook"
}
},
{
"_index": "test_index",
"_type": "your_type",
"_id": "5",
"_score": 0.6931472,
"_source": {
"keyword": "quora"
}
}
]
}
}

JSON in Python: How do I get specific parts of an array?

I'm trying to get a specific value in Python of a JSON object. Before I could use something like:
data['data']['data2']
to get a certain value that is associated with data2 but this is a little different, my JSON file is now more complex and is this
{
"data": {
"playerStatSummaries": {
"playerStatSummarySet": [
{
"aggregatedStats": {
"stats": []
},
"leaves": 0,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "Unranked3x3",
"rating": 400,
"wins": 5
},
{
"aggregatedStats": {
"stats": []
},
"leaves": 0,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "AramUnranked6x6",
"rating": 400,
"wins": 0
},
{
"aggregatedStats": {
"stats": [
{
"statType": "TOTAL_CHAMPION_KILLS",
"value": 68
},
{
"statType": "TOTAL_ASSISTS",
"value": 116
},
{
"statType": "TOTAL_MINION_KILLS",
"value": 1854
},
{
"statType": "TOTAL_TURRETS_KILLED",
"value": 22
},
{
"statType": "TOTAL_NEUTRAL_MINIONS_KILLED",
"value": 359
}
]
},
"leaves": 0,
"losses": 5,
"maxRating": 1505,
"modifyDate": "/Date(1357261303440)/",
"playerStatSummaryType": "RankedSolo5x5",
"rating": 1505,
"wins": 9
},
{
"aggregatedStats": {
"stats": [
{
"statType": "TOTAL_CHAMPION_KILLS",
"value": 369
},
{
"statType": "TOTAL_ASSISTS",
"value": 535
},
{
"statType": "TOTAL_MINION_KILLS",
"value": 9917
},
{
"statType": "TOTAL_TURRETS_KILLED",
"value": 78
},
{
"statType": "TOTAL_NEUTRAL_MINIONS_KILLED",
"value": 1050
}
]
},
"leaves": 0,
"losses": 35,
"maxRating": 1266,
"modifyDate": "/Date(1323496849000)/",
"playerStatSummaryType": "RankedTeam5x5",
"rating": 1266,
"wins": 39
},
{
"aggregatedStats": {
"stats": [
{
"statType": "TOTAL_CHAMPION_KILLS",
"value": 29
},
{
"statType": "TOTAL_ASSISTS",
"value": 17
},
{
"statType": "TOTAL_MINION_KILLS",
"value": 176
},
{
"statType": "TOTAL_TURRETS_KILLED",
"value": 8
},
{
"statType": "TOTAL_NEUTRAL_MINIONS_KILLED",
"value": 12
}
]
},
"leaves": 0,
"losses": 0,
"maxRating": 1200,
"modifyDate": "/Date(1326521499000)/",
"playerStatSummaryType": "CoopVsAI",
"rating": 1200,
"wins": 2
},
{
"aggregatedStats": {
"stats": [
{
"statType": "TOTAL_CHAMPION_KILLS",
"value": 150
},
{
"statType": "TOTAL_ASSISTS",
"value": 184
},
{
"statType": "TOTAL_MINION_KILLS",
"value": 3549
},
{
"statType": "TOTAL_TURRETS_KILLED",
"value": 24
},
{
"statType": "TOTAL_NEUTRAL_MINIONS_KILLED",
"value": 224
}
]
},
"leaves": 0,
"losses": 17,
"maxRating": 0,
"modifyDate": "/Date(1350098520000)/",
"playerStatSummaryType": "RankedTeam3x3",
"rating": 1308,
"wins": 22
},
{
"aggregatedStats": {
"stats": [
{
"statType": "TOTAL_CHAMPION_KILLS",
"value": 15
},
{
"statType": "TOTAL_ASSISTS",
"value": 185
},
{
"statType": "TOTAL_MINION_KILLS",
"value": 250
},
{
"statType": "TOTAL_TURRETS_KILLED",
"value": 4
},
{
"statType": "TOTAL_NEUTRAL_MINIONS_KILLED",
"value": 15
}
]
},
"leaves": 0,
"losses": 3,
"maxRating": 1365,
"modifyDate": "/Date(1321778545000)/",
"playerStatSummaryType": "RankedPremade5x5",
"rating": 1365,
"wins": 8
},
{
"aggregatedStats": {
"stats": [
{
"statType": "TOTAL_CHAMPION_KILLS",
"value": 672
},
{
"statType": "AVERAGE_CHAMPIONS_KILLED",
"value": 9
},
{
"statType": "MAX_COMBAT_PLAYER_SCORE",
"value": 889
},
{
"statType": "AVERAGE_OBJECTIVE_PLAYER_SCORE",
"value": 771
},
{
"statType": "MAX_TEAM_OBJECTIVE",
"value": 2
},
{
"statType": "MAX_NODE_CAPTURE",
"value": 14
},
{
"statType": "MAX_OBJECTIVE_PLAYER_SCORE",
"value": 1424
},
{
"statType": "MAX_TOTAL_PLAYER_SCORE",
"value": 1950
},
{
"statType": "AVERAGE_NUM_DEATHS",
"value": 10
},
{
"statType": "TOTAL_DECAYER",
"value": 105
},
{
"statType": "TOTAL_ASSISTS",
"value": 931
},
{
"statType": "AVERAGE_NODE_NEUTRALIZE",
"value": 6
},
{
"statType": "AVERAGE_NODE_CAPTURE_ASSIST",
"value": 2
},
{
"statType": "MAX_NODE_CAPTURE_ASSIST",
"value": 5
},
{
"statType": "MAX_ASSISTS",
"value": 25
},
{
"statType": "AVERAGE_NODE_NEUTRALIZE_ASSIST",
"value": 1
},
{
"statType": "AVERAGE_TOTAL_PLAYER_SCORE",
"value": 1182
},
{
"statType": "MAX_NODE_NEUTRALIZE_ASSIST",
"value": 3
},
{
"statType": "AVERAGE_COMBAT_PLAYER_SCORE",
"value": 413
},
{
"statType": "AVERAGE_NODE_CAPTURE",
"value": 8
},
{
"statType": "MAX_CHAMPIONS_KILLED",
"value": 20
},
{
"statType": "TOTAL_NODE_NEUTRALIZE",
"value": 391
},
{
"statType": "AVERAGE_TEAM_OBJECTIVE",
"value": 1
},
{
"statType": "AVERAGE_ASSISTS",
"value": 11
},
{
"statType": "TOTAL_NODE_CAPTURE",
"value": 447
},
{
"statType": "MAX_NODE_NEUTRALIZE",
"value": 11
},
{
"statType": "MAX_NUM_DEATHS",
"value": 16
}
]
},
"leaves": 0,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "OdinUnranked",
"rating": 400,
"wins": 43
},
{
"aggregatedStats": {
"stats": []
},
"leaves": 0,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "AramUnranked2x2",
"rating": 400,
"wins": 0
},
{
"aggregatedStats": {
"stats": []
},
"leaves": 0,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "AramUnranked1x1",
"rating": 400,
"wins": 0
},
{
"aggregatedStats": {
"stats": []
},
"leaves": 0,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "AramUnranked3x3",
"rating": 400,
"wins": 0
},
{
"aggregatedStats": {
"stats": [
{
"statType": "TOTAL_CHAMPION_KILLS",
"value": 10269
},
{
"statType": "TOTAL_DECAYER",
"value": 0
},
{
"statType": "TOTAL_ASSISTS",
"value": 15722
},
{
"statType": "TOTAL_MINION_KILLS",
"value": 262793
},
{
"statType": "TOTAL_TURRETS_KILLED",
"value": 1954
},
{
"statType": "TOTAL_NEUTRAL_MINIONS_KILLED",
"value": 43898
},
{
"statType": "TOTAL_DEATHS_PER_SESSION",
"value": 1513
}
]
},
"leaves": 1,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "Unranked",
"rating": 400,
"wins": 1691
},
{
"aggregatedStats": {
"stats": []
},
"leaves": 0,
"losses": 0,
"maxRating": 0,
"modifyDate": "/Date(1357567398182)/",
"playerStatSummaryType": "AramUnranked5x5",
"rating": 400,
"wins": 0
}
]
},
"previousFirstWinOfDay": "/Date(1357489166306)/",
"userId": 55060
},
"success": true
}
As you can see this is really long, my question is, how would I grab only specific values from a certain playerStatSummarySet set? Like let's say I only wanted to grab the rating value from the set with the playerStatSummaryType value of RankedSolo5x5 how would I do that?
Here's what I have going so far to get the data from the JSON file.
with open('data.txt', 'r') as f:
data = json.load(f)

if you have to work with complex json objects, I suggest you take a look at jsonpath that offers xpath like language for json objects.
An example:
import jsonpath
import json
with open('/test.json', 'r') as f:
data = json.load(f)
path = "$.[?(#.playerStatSummaryType == 'RankedSolo5x5')].rating"
jsonpath.jsonpath(data,path)
out:
[1505]

Use a list comprehension
with open('data.txt', 'r') as f:
data = json.load(f)
rating = [summary["rating"] for summary
in data["data"]["playerStatSummaries"]["playerStatSummarySet"]
if summary["playerStatSummaryType"] == "RankedSolo5x5"][0]

You can still do it, but you have to access the data structure properly. What json.load() is returning is a JSON object which is the same as a Python dictionary. This obj has a key named 'data' in it that is associated with another object-dictionary, etc down until you get to the 'playerStatSummaries' object which has a data member keyed with 'playerStatSummarySet' that is actually a Python list rather than another object-dictionary.
Here's an example of how to search through that list of summary sets and find a specific entry -- remembering that since this data item is a list rather than dictionary object you have step through each of the entries in it to find the one you're looking for rather than just looking-up its name.
import json
with open('data.txt', 'r') as f:
jsonObj = json.load(f)
targetSummaryType = 'RankedSolo5x5'
for summarySet in jsonObj['data']['playerStatSummaries']['playerStatSummarySet']:
if summarySet['playerStatSummaryType'] == targetSummaryType:
print 'max rating for {}: {}'.format(targetSummaryType,
summarySet['maxRating'])
break # if you only expect there to be one
Output:
max rating for RankedSolo5x5: 1505
To figure out what was needed I found it useful to initially pprint() the whole jsonObj which made the structure very easy to see.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to split complex JSON file into multiple files by Python - python

Related

merge dicts that have the same value for specific key

Why is the individual dictionary being printed in output in the following code not the same as the one being added?

Extracting data from JSON File to CSV

Synonym analyzer not working in elastic search with python

JSON in Python: How do I get specific parts of an array?

Categories

Resources