Related
I am using PyMongo with Flask and I would like to know how to optimize a query, as I am filtering within a large collection (8793 documents) with large documents.
This is one of the document structures of the collections:
As you can see, it has 4 properties (simulationID, simulationPartID, timePass and status, which stores many arrays). This collection has a size of 824.4MB. The average size of the documents is 96.0KB.
Basically, I’m trying to find the documents that have simulationPartID 7 (1256 documents) and filter on them the array index equal to the nodeID value (which I receive as a parameter) within the status property, and take the fourth or fifth element of this array (depending on the case parameter), in addition to append the timePass.
def node_history(nodeID, case):
coordinates = []
node_data = db['node_data']
db.node_data.create_index([('simulationPartID', 1), ('simulationID', 1)])
if case == 'Temperature':
for document in node_data.find({"simulationPartID": 7}):
coordinates.append([document['timePass'], document['status'][int(nodeID)-1][3]])
elif case == 'Stress':
for document in node_data.find({"simulationPartID": 7}):
coordinates.append([document['timePass'], document['status'][int(nodeID)-1][4]])
else:
pass
coordinates.sort()
return json.dumps(coordinates, default=json_util.default)
As I mentioned, the collection is very large, and the query takes about 30 - 60 seconds to be performed, depending on the machine, but I want it to run as quickly as possible because I want my application to be as interactive as possible. As you can seeI already tried to create a index in both simulationID and simulationPartID properties.
I never worked with large collections before, so I'm not into indexing. I don't even know if I did it properly in my code. So, I would like to know if there is a way to optimize my query using a different approach of indexes, or in any other possible way, and make it faster.
Data samples:
{
"_id": {
"$oid": "5f83f54d45104462898aba67"
},
"simulationID": "001",
"simulationPartID": 7,
"timePass": 0,
"status": [
[
1,
1.34022987724954e-40,
0.00220799725502729,
20,
114.911392211914
],
[
2,
0.00217749993316829,
0.00220799725502729,
20,
-2.0458550453186
],
[
3,
0.0020274999551475,
0.00235799723304808,
20,
-1.33439755439758
],
[
4,
3.36311631437956e-44,
0.00235799723304808,
20,
148.233413696289
],
[
5,
1.02169119449431e-38,
0.000149997213156894,
20,
-25633.59765625
],
]
},
{
"_id": {
"$oid": "5f83f54d45104462898aba68"
},
"simulationID": "001",
"simulationPartID": 7,
"timePass": 1,
"status": [
[
1,
1.34022987724954e-40,
0.00220799725502729,
20,
114.911392211914
],
[
2,
0.00217749993316829,
0.00220799725502729,
20,
-2.0458550453186
],
[
3,
0.0020274999551475,
0.00235799723304808,
20,
-1.33439755439758
],
[
4,
3.36311631437956e-44,
0.00235799723304808,
20,
148.233413696289
],
[
5,
1.02169119449431e-38,
0.000149997213156894,
20,
-25633.59765625
],
]
},
{
"_id": {
"$oid": "5f83f54d45104462898aba69"
},
"simulationID": "001",
"simulationPartID": 7,
"timePass": 2,
"status": [
[
1,
1.34022987724954e-40,
0.00220799725502729,
20,
114.911392211914
],
[
2,
0.00217749993316829,
0.00220799725502729,
20,
-2.0458550453186
],
[
3,
0.0020274999551475,
0.00235799723304808,
20,
-1.33439755439758
],
[
4,
3.36311631437956e-44,
0.00235799723304808,
20,
148.233413696289
],
[
5,
1.02169119449431e-38,
0.000149997213156894,
20,
-25633.59765625
],
]
}
Thank you!
Do you create the index for each query? An index is created only once when you deploy the application.
Your find returns the full document, which is not needed. You can limit the result with $slice
db.node_data.find({"simulationPartID": 7}, {"timePass": 1, "status": { '$slice': [ 3, 1 ] } } )
This should return the data much faster because it returns only the values you like to get.
If you like to select a sub-elements from array, then you can use this one:
db.collection.aggregate([
{ $match: { timePass: 2 } },
{ $set: { status: { $arrayElemAt: [ "$status", 4 ] } } },
{ $set: { status: { $arrayElemAt: [ "$status", 3 ] } } },
])
Mongo playground
I'm trying to create my first choropleth map with folium.
I have a pandas dataframe which I have successfully used to feed a folium circle map, so I'm pretty confident I don't have an issue with the data content. The below function generated my map just fine.
def plot_cases(cases_date):
folium_map = folium.Map(location=[40.738, -73.98],
zoom_start=1,
tiles="CartoDB dark_matter",
width='100%')
#add a circle marker for each row in data
for index, row in cases_date.iterrows():
# generate the popup message that is shown on click.
popup_text = "Cases: {}<br> Country: {}<br> Province/State: {}"
popup_text = popup_text.format(row["Cases"], row["Country/Region"], row["Province/State"])
#radius of circle
radius = row["Cases"]/1000
# choose the color of the marker
if row["Cases"] > 0:
color="#E37222" # tangerine
else:
color="#0A8A9F" # teal
# add circle on map
folium.CircleMarker(location=(row["Lat"],
row["Long"]),
radius=radius,
color=color,
popup=popup_text,
fill=True).add_to(folium_map)
return folium_map
Now, when I try and generate my choropleth with the same pandas df, the only new concept I introduce is grabbing a world map geoJSON file from a git repo:
#get geoJSON
req = requests.get('https://raw.githubusercontent.com/python-visualization/folium/master/examples/data/world-countries.json')
geofile = req.json()
#instantiate map and enter choropleth params
m = folium.Map(location=[40.738, -73.98],zoom_start=1,tiles="CartoDB dark_matter",width='100%')
choropleth = folium.Choropleth(
geo_data = geofile,
name='choropleth',
data=choro_data,
columns=['Country/Region', 'Cases'],
key_on='feature.properties.name',
fill_color='YlGn',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Cases',
highlight=True,
line_color='black'
).add_to(m)
folium.LayerControl(collapsed=True).add_to(m)
m
The above choropleth code generates the following traceback:
TypeError Traceback (most recent call last)
<ipython-input-15-27efeb8f5173> in <module>
12 legend_name='Cases',
13 highlight=True,
---> 14 line_color='black'
15 ).add_to(m)
16
~/cases/map/lib/python3.6/site-packages/folium/features.py in __init__(self, geo_data, data, columns, key_on, bins, fill_color, nan_fill_color, fill_opacity, nan_fill_opacity, line_color, line_weight, line_opacity, name, legend_name, overlay, control, show, topojson, smooth_factor, highlight, **kwargs)
1076 if color_data is not None and key_on is not None:
1077 real_values = np.array(list(color_data.values()))
-> 1078 real_values = real_values[~np.isnan(real_values)]
1079 _, bin_edges = np.histogram(real_values, bins=bins)
1080
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
hpaulj suggested there is a problem with the json. You can view the json at the source, but I also pulled out a sample from print(geofile):
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"name": "Afghanistan"
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
61.210817,
35.650072
],
[
62.230651,
35.270664
],
[
62.984662,
35.404041
],
[
63.193538,
35.857166
],
[
63.982896,
36.007957
],
[
64.546479,
36.312073
],
[
64.746105,
37.111818
],
[
65.588948,
37.305217
],
[
65.745631,
37.661164
],
[
66.217385,
37.39379
],
[
66.518607,
37.362784
],
[
67.075782,
37.356144
],
[
67.83,
37.144994
],
[
68.135562,
37.023115
],
[
68.859446,
37.344336
],
[
69.196273,
37.151144
],
[
69.518785,
37.608997
],
[
70.116578,
37.588223
],
[
70.270574,
37.735165
],
[
70.376304,
38.138396
],
[
70.806821,
38.486282
],
[
71.348131,
38.258905
],
[
71.239404,
37.953265
],
[
71.541918,
37.905774
],
[
71.448693,
37.065645
],
[
71.844638,
36.738171
],
[
72.193041,
36.948288
],
[
72.63689,
37.047558
],
[
73.260056,
37.495257
],
[
73.948696,
37.421566
],
[
74.980002,
37.41999
],
[
75.158028,
37.133031
],
[
74.575893,
37.020841
],
[
74.067552,
36.836176
],
[
72.920025,
36.720007
],
[
71.846292,
36.509942
],
[
71.262348,
36.074388
],
[
71.498768,
35.650563
],
[
71.613076,
35.153203
],
[
71.115019,
34.733126
],
[
71.156773,
34.348911
],
[
70.881803,
33.988856
],
[
69.930543,
34.02012
],
[
70.323594,
33.358533
],
[
69.687147,
33.105499
],
[
69.262522,
32.501944
],
[
69.317764,
31.901412
],
[
68.926677,
31.620189
],
[
68.556932,
31.71331
],
[
67.792689,
31.58293
],
[
67.683394,
31.303154
],
[
66.938891,
31.304911
],
[
66.381458,
30.738899
],
[
66.346473,
29.887943
],
[
65.046862,
29.472181
],
[
64.350419,
29.560031
],
[
64.148002,
29.340819
],
[
63.550261,
29.468331
],
[
62.549857,
29.318572
],
[
60.874248,
29.829239
],
[
61.781222,
30.73585
],
[
61.699314,
31.379506
],
[
60.941945,
31.548075
],
[
60.863655,
32.18292
],
[
60.536078,
32.981269
],
[
60.9637,
33.528832
],
[
60.52843,
33.676446
],
[
60.803193,
34.404102
],
[
61.210817,
35.650072
]
]
]
},
"id": "AFG"
}
]
}
Ok - looks as if the json was a red herring. Code further up in my script was actually feeding the wrong dataframe to data=choro_data so effectively the folium was receiving malformed data set info and throwing the errors.
This question was edited. Please see the edit on the bottom first.
This question is going to be a bit long so I'm sorry in advance. Please consider two different types of data:
Data A:
{
"files": [
{
"name": "abc",
"valid": [
"func4",
"func1",
"func3"
],
"invalid": [
"func2",
"func8"
]
}
]
}
Data B:
{
"files": [
{
"methods": {
"invalid": [
"func2",
"func8"
],
"valid": [
"func4",
"func1",
"func3"
]
},
"classes": [
{
"invalid": [
"class1",
"class2"
],
"valid": [
"class8",
"class5"
],
"name": "class1"
}
],
"name": "abc"
}
]
}
I'm trying to merge each file (A files with A and B files with B). Previous question helped me figure out how to do it but I got stuck again.
As I said in the previous question there is a rule for merging the files. I'll explain again:
Consider two dictionaries A1 and A2. I want to merge invalid of A1 with A2 and valid of A1 with A2. The merge should be easy enough but the problem is that the data of invalid and valid dependents on each other.
The rule of that dependency - if number x is valid in A1 and invalid in A2 then its valid in the merged report.
The only way to be invalid is to be in the invalid list of both of A1 and A2 (Or invalid in one of them while not existing in the other).
In order to merge the A files I wrote the following code:
def merge_A_files(self, src_report):
for current_file in src_report["files"]:
filename_index = next((index for (index, d) in enumerate(self.A_report["files"]) if d["name"] == current_file["name"]), None)
if filename_index == None:
new_block = {}
new_block['valid'] = current_file['valid']
new_block['invalid'] = current_file['invalid']
new_block['name'] = current_file['name']
self.A_report["files"].append(new_block)
else:
block_to_merge = self.A_report["files"][filename_index]
merged_block = {'valid': [], 'invalid': []}
merged_block['valid'] = list(set(block_to_merge['valid'] + current_file['valid']))
merged_block['invalid'] = list({i for l in [block_to_merge['invalid'], current_file['invalid']]
for i in l if i not in merged_block['valid']})
merged_block['name'] = current_file['name']
self.A_report["files"][filename_index] = merged_block
For merging B files I wrote:
def _merge_functional_files(self, src_report):
for current_file in src_report["files"]:
filename_index = next((index for (index, d) in enumerate(self.B_report["files"]) if d["name"] == current_file["name"]), None)
if filename_index == None:
new_block = {'methods': {}, 'classes': []}
new_block['methods']['valid'] = current_file['methods']['valid']
new_block['methods']['invalid'] = current_file['methods']['invalid']
new_block['classes'] += [{'valid': c['valid'], 'invalid': c['invalid'], 'name': c['name'] } for c in current_file['classes']]
new_block['name'] = current_file['name']
self.B_report["files"].append(new_block)
else:
block_to_merge = self.B_report["files"][filename_index]
merged_block = {'methods': {}, 'classes': []}
for current_class in block_to_merge["classes"]:
current_classname = current_class.get("name")
class_index = next((index for (index, d) in enumerate(merged_block["classes"]) if d["name"] == current_classname), None)
if class_index == None:
merged_block['classes'] += ([{'valid': c['valid'], 'invalid': c['invalid'], 'name': c['name'] } for c in current_file['classes']])
else:
class_block_to_merge = merged_block["classes"][class_index]
class_merged_block = {'valid': [], 'invalid': []}
class_merged_block['valid'] = list(set(class_block_to_merge['valid'] + current_class['valid']))
class_merged_block['invalid'] = list({i for l in [class_block_to_merge['invalid'], current_class['invalid']]
for i in l if i not in class_merged_block['valid']})
class_merged_block['name'] = current_classname
merged_block["classes"][filename_index] = class_merged_block
merged_block['methods']['valid'] = list(set(block_to_merge['methods']['valid'] + current_file['methods']['valid']))
merged_block['methods']['invalid'] = list({i for l in [block_to_merge['methods']['invalid'], current_file['methods']['invalid']]
for i in l if i not in merged_block['methods']['valid']})
merged_block['name'] = current_file['name']
self.B_report["files"][filename_index] = merged_block
It looks like the code of A is valid and works as expected. But I have a problem with B, especially with merging classes. The example I have problem with:
First file:
{
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1"
],
"invalid": [
"func3"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class1",
"class2"
],
"invalid": [
"class3",
"class5"
]
}
]
}
]
}
Second file:
{
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1",
"func3"
],
"invalid": [
"func2",
"func8"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class8",
"class5"
],
"invalid": [
"class1",
"class2"
]
}
]
}
]
}
I get:
{
"files": [
{
"methods": {
"invalid": [
"func2",
"func8"
],
"valid": [
"func3",
"func1",
"func4"
]
},
"classes": [
{
"invalid": [
"class5",
"class3"
],
"valid": [
"class2",
"class1"
],
"name": "class1"
}
],
"name": "some_file1"
}
]
}
But it's wrong because for example class5 should be valid.
So my questions are:
I would love to have another set of eyes to check my code and help me find out the reason for this issue.
Those two methods got so complicated that it's hard to debug it. I would love to see an alternative, less complicated way to achieve it. Maybe some generic solution?
Edit: My first explanation was too complicated. I'll try to explain what I'm trying to achieve. For those of you who read the topic (appreciate it!), please forget about data type A (for simplicity). Consider Data type file B (that was showed at the start). I'm trying to merge a bunch of B files. As I understand, the algorithm for that is to do:
Iterate over files.
Check if file already located in the merged dictionary.
If no, we should add the file block to the files array.
If yes:
Merge methods dictionary.
Merge classes array.
To merge methods: method is invalid only if its invalid in both of the block. Otherwise, it's valid.
To merge classes: It's getting more complicated because it's an array. I want to follow same rule that I did for methods but I need to find the index of each block in the array, first.
The main problem is with merging classes. Can you please suggest a non-complicated on how to merge B type files?
It would be great if you could provide an expected output for the example you're showing. Based on my understanding, what you're trying to achieves is:
You're given multiple JSON files, each contains an "files" entry, which is a list of dictionaries with the structure:
{
"name": "file_name",
"methods": {
"invalid": ["list", "of", "names"],
"valid": ["list", "of", "names"]
},
"classes": [
{
"name": "class_name",
"invalid": ["list", "of", "names"],
"valid": ["list", "of", "names"]
}
]
}
You wish to merge structures from multiple files, so that file entries with the same "name" are merged together, according to the following rule:
For each name inside "methods": if goes into "valid" if it is in the "valid" array in at least one file entry; otherwise if goes into "invalid".
Classes with the same "name" are also merged together, and names inside the "valid" and "invalid" arrays are merged according to the above rule.
The following analysis of your code assumes my understanding as stated above. Let's look at the code snippet for merging lasses:
block_to_merge = self.B_report["files"][filename_index]
merged_block = {'methods': {}, 'classes': []}
for current_class in block_to_merge["classes"]:
current_classname = current_class.get("name")
class_index = next((index for (index, d) in enumerate(merged_block["classes"]) if d["name"] == current_classname), None)
if class_index == None:
merged_block['classes'] += ([{'valid': c['valid'], 'invalid': c['invalid'], 'name': c['name'] } for c in current_file['classes']])
else:
class_block_to_merge = merged_block["classes"][class_index]
class_merged_block = {'valid': [], 'invalid': []}
class_merged_block['valid'] = list(set(class_block_to_merge['valid'] + current_class['valid']))
class_merged_block['invalid'] = list({i for l in [class_block_to_merge['invalid'], current_class['invalid']]
for i in l if i not in class_merged_block['valid']})
class_merged_block['name'] = current_classname
merged_block["classes"][filename_index] = class_merged_block
The code is logically incorrect because:
You're iterating over each class dictionary from block_to_merge["classes"], which is the previous merged block.
The new merged block (merged_block) is initialized to an empty dictionary.
In the case where class_index is None, the class dictionary in merged_block is set to the the class dictionary in the previous merged block.
If you think about it, class_index will always be None, because current_class is enumerated from block_to_merge["classes"], which is already merged. Thus, what gets written into the merged_block is only the "classes" entries from the first file entry for a file. In your example, you can verify that the "classes" entry is exactly the same as that in the first file.
That said, your overall idea of how to merge the files is correct, but implementation-wise it could be a lot more simpler (and efficient). I'll first point out the non-optimal implementations in your code, and then provide a simpler solution.
You're directly storing the data in its output form, however, it's not a form that is efficient for your task. It's perfectly fine to store them in a form that is efficient, and then apply post-processing to transform it into the output form. For instance:
You're using next to find an existing entry in the list with the same "name", but this could take linear time. Instead, you can store these in a dictionary, with "name" as keys.
You're also storing valid & invalid names as a list. While merging, it's converted into a set and then back into a list. This results in a large number of redundant copies. Instead, you can just store them as sets.
You have some duplicate routines that could have been extracted into functions, but instead you rewrote them wherever needed. This violates the DRY principle and increases your chances of introducing bugs.
A revised version of the code is as follows:
class Merger:
def __init__(self):
# A structure optimized for efficiency:
# dict (file_name) -> {
# "methods": {
# "valid": set(names),
# "invalid": set(names),
# }
# "classes": dict (class_name) -> {
# "valid": set(names),
# "invalid": set(names),
# }
# }
self.file_dict = {}
def _create_entry(self, new_entry):
return {
"valid": set(new_entry["valid"]),
"invalid": set(new_entry["invalid"]),
}
def _merge_entry(self, merged_entry, new_entry):
merged_entry["valid"].update(new_entry["valid"])
merged_entry["invalid"].difference_update(new_entry["valid"])
for name in new_entry["invalid"]:
if name not in merged_entry["valid"]:
merged_entry["invalid"].add(name)
def merge_file(self, src_report):
# Method called to merge one file.
for current_file in src_report["files"]:
file_name = current_file["name"]
# Merge methods.
if file_name not in self.file_dict:
self.file_dict[file_name] = {
"methods": self._create_entry(current_file["methods"]),
"classes": {},
}
else:
self._merge_entry(self.file_dict[file_name]["methods"], current_file["methods"])
# Merge classes.
file_class_entry = self.file_dict[file_name]["classes"]
for class_entry in current_file["classes"]:
class_name = class_entry["name"]
if class_name not in file_class_entry:
file_class_entry[class_name] = self._create_entry(class_entry)
else:
self._merge_entry(file_class_entry[class_name], class_entry)
def post_process(self):
# Method called after all files are merged, and returns the data in its output form.
return [
{
"name": file_name,
"methods": {
"valid": list(file_entry["methods"]["valid"]),
"invalid": list(file_entry["methods"]["invalid"]),
},
"classes": [
{
"name": class_name,
"valid": list(class_entry["valid"]),
"invalid": list(class_entry["invalid"]),
}
for class_name, class_entry in file_entry["classes"].items()
],
}
for file_name, file_entry in self.file_dict.items()
]
We can test the implementation by:
def main():
a = {
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1"
],
"invalid": [
"func3"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class1",
"class2"
],
"invalid": [
"class3",
"class5"
]
}
]
}
]
}
b = {
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1",
"func3"
],
"invalid": [
"func2",
"func8"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class8",
"class5"
],
"invalid": [
"class1",
"class2"
]
}
]
}
]
}
import pprint
merge = Merger()
merge.merge_file(a)
merge.merge_file(b)
output = merge.post_process()
pprint.pprint(output)
if __name__ == '__main__':
main()
The output is:
[{'classes': [{'invalid': ['class3'],
'name': 'class1',
'valid': ['class2', 'class5', 'class8', 'class1']}],
'methods': {'invalid': ['func2', 'func8'],
'valid': ['func1', 'func4', 'func3']},
'name': 'some_file1'}]
I'm facing some difficulties while trying to aggregate JSON attributes.
Basically, what I'm trying to do is to groupBy the objects in 'InputTable' array by two attributes 'To' and 'TemplateName'. The JSON template looks as follows:
x = {
"InputTable" :
[
{
"ServerName":"ServerOne",
"To":"David",
"CC":"Oren",
"TemplateName":"LinuxVMOne",
},
{
"ServerName":"ServerTwo",
"To":"David",
"CC":"",
"TemplateName":"LinuxVMOne",
},
{
"ServerName":"ServerThree",
"To":"David",
"CC":"",
"TemplateName":"LinuxVMTwo",
},
{
"ServerName":"ServerFour",
"To":"Sam",
"CC":"Samer",
"TemplateName":"LinuxVMOne",
}
]
}
Expected results would look something like this, list of lists with grouped objects:
[
[
{
"ServerName":"ServerOne",
"To":"David",
"CC":"Oren",
"TemplateName":"LinuxVMOne"
},
{
"ServerName":"ServerTwo",
"To":"David",
"CC":"",
"TemplateName":"LinuxVMOne",
},
],
[
{
"ServerName":"ServerThree",
"To":"David",
"CC":"",
"TemplateName":"LinuxVMTwo",
},
],
[
{
"ServerName":"ServerFour",
"To":"Sam",
"CC":"Samer",
"TemplateName":"LinuxVMOne",
}
]
]
]
Is it possible to do it without using pandas?
Thank you.
This code works:
But I think we can do a code more cleaner !
y = []
for i in x["InputTable"]:
if len(y) == 0:
y.append([i])
else:
for j in y:
if len(j) > 0:
if j[0]["To"] == i["To"] and j[0]["TemplateName"] == i["TemplateName"]:
j.append(i)
break
else:
y.append([i])
break
else:
y.append([i])
break
I have the following dict. I would like to loop through this key and values
i.e for items in ice/cold, print "values"
[
{
"ice/cold": [
"vanilla",
"hotchoc",
"mango",
"banana"
]
},
{
"fire/hot": [
"barbecue",
"hotsalsa",
"sriracha",
"kirikiri"
]
},
{
"friendly/mild": [
"ketchup",
"mustard",
"ranch",
"dipster"
]
}
]
Tried this:
data='*above set*'
for key in data.items():
print value
but gives me error
AttributeError: 'list' object has no attribute 'items'
The data structure you have is a bit strange. You don't have a single dict, you have a list of dicts, each with a single key which itself contains a list. You could do this:
for item in data:
for key, value in item.items():
print value
but a better way would be to change the structure so you only have a single dict:
{
"ice/cold": [
"vanilla",
"hotchoc",
"mango",
"banana"
],
"fire/hot": [
"barbecue",
"hotsalsa",
"sriracha",
"kirikiri"
],
"friendly/mild": [
"ketchup",
"mustard",
"ranch",
"dipster"
]
}
here data is actually a list not a dictionary
and every index of list is a dictionary so just loop through all elements of list and see if it corresponds to desired dictionary
here is the code
data= [
{
"ice/cold": [
"vanilla",
"hotchoc",
"mango",
"banana"
]
},
{
"fire/hot": [
"barbecue",
"hotsalsa",
"sriracha",
"kirikiri"
]
},
{
"friendly/mild": [
"ketchup",
"mustard",
"ranch",
"dipster"
]
}
]
for items in data:
for key, value in items.iteritems():
if key == "ice/cold":
print value