Python dictionary comprehension filtering - python

I have a list of dictionaries, for instance :
movies = [
{
"name": "The Help",
"imdb": 8.0,
"category": "Drama"
},
{
"name": "The Choice",
"imdb": 6.2,
"category": "Romance"
},
{
"name": "Colonia",
"imdb": 7.4,
"category": "Romance"
},
{
"name": "Love",
"imdb": 6.0,
"category": "Romance"
},
{
"name": "Bride Wars",
"imdb": 5.4,
"category": "Romance"
},
{
"name": "AlphaJet",
"imdb": 3.2,
"category": "War"
},
{
"name": "Ringing Crime",
"imdb": 4.0,
"category": "Crime"
}
]
I want to filter them by IMDB > 5.5 :
I try this code:
[ { k:v for (k,v) in i.items() if i.get("imdb") > 5.5 } for i in movies]
and the output:
[{'name': 'The Help', 'imdb': 8.0, 'category': 'Drama'},
{'name': 'The Choice', 'imdb': 6.2, 'category': 'Romance'},
{'name': 'Colonia', 'imdb': 7.4, 'category': 'Romance'},
{'name': 'Love', 'imdb': 6.0, 'category': 'Romance'},
{},
{},
{}]
When the IMDB is lower than 5.5, It returns an empty dictionary. any ideas? thank you!

A dictionary comprehension is not necessary to filter a list of dictionaries.
You can just use a list comprehension with a condition based on a dictionary value:
res = [d for d in movies if d['imdb'] > 5.5]
The way your code is written, the dictionary comprehension produces an empty dictionary in cases where i['imdb'] <= 5.5.

An alternative to using list comprehension is using the filter function from the Python builtins. This takes in a function and an iterable, and returns a "filter object" that only keeps the items which, when passed through the function return True.
In this case, it would be:
list(filter(lambda x:x["imdb"]>5.5, movies))
I included the list() around everything to convert the filter object to a list you can use. If you want to learn more about the filter builtin, you can read about it here.

Other answers have already provided better alternative ways of doing this but let's look at the way you were going about it and look at what was going on.
If I delete some things from your code, I get:
[{} for i in movies}]
Looking at just that, should make it clear why a dictionary is created for each movie. You do have an if statement inside that dictionary, but because it is inside, it doesn't change whether it is being created.
To do this the way you were going about it, you'd essentially need to check twice making the first check irrelevant:
[
{ k:v for (k,v) in i.items() if i.get("imdb") > 5.5 } for i in movies if i.get("imdb") > 5.5
]
which can be simplified to just
[
{ k:v for (k,v) in i.items()} for i in movies if i.get("imdb") > 5.5
]
and now, since we aren't changing the item, just:
[
i for i in movies if i.get("imdb") > 5.5
]

If you are happy to use a 3rd party library, Pandas accepts a list of dictionaries via the pd.DataFrame constructor:
import pandas as pd
df = pd.DataFrame(movies)
res = df[df['imdb'] > 5.5].to_dict('records')
Result:
[{'category': 'Drama', 'imdb': 8.0, 'name': 'The Help'},
{'category': 'Romance', 'imdb': 6.2, 'name': 'The Choice'},
{'category': 'Romance', 'imdb': 7.4, 'name': 'Colonia'},
{'category': 'Romance', 'imdb': 6.0, 'name': 'Love'}]

Related

Propertly form JSON with df.to_json with dataframe containing nested json

I have the following situation:
id items
3b68b7b2-f42c-418b-aa88-02450d66b616 [{quantity=3.0, item_id=210defdb-de69-4d03-bddd-7db626cd501b, description=Abc}, {quantity=1.0, item_id=ff457660-5f30-4432-a5af-564a9dee0029, description=xyz . 23}, {quantity=10.0, item_id=8dbd22f2-cc13-4776-b58c-4d6fe0f3463e, description=abc def}]
where one of my columns has a nested JSON list inside of it.
I wish to output the data of this dataframe as proper JSON, including the nested list.
So, for example, calling df.to_dict(orient='records', indent=4) on the above dataframe yields:
[
{
"id": "3b68b7b2-f42c-418b-aa88-02450d66b616",
"items": "[{quantity=3.0, item_id=210defdb-de69-4d03-bddd-7db626cd501b, description=Abc}, {quantity=1.0, item_id=ff457660-5f30-4432-a5af-564a9dee0029, description=xyz . 23}, {quantity=10.0, item_id=8dbd22f2-cc13-4776-b58c-4d6fe0f3463e, description=abc def}]"
}
]
whereas I want:
[
{
"id": "3b68b7b2-f42c-418b-aa88-02450d66b616",
"items": [
{
"quantity": 3.0,
"item_id": "210defdb-de69-4d03-bddd-7db626cd501b",
"description": "Abc"
},
{
"quantity": 1.0,
"item_id": "ff457660-5f30-4432-a5af-564a9dee0029",
"description": "xyz . 23"
},
{
"quantity": 10.0,
"item_id": "8dbd22f2-cc13-4776-b58c-4d6fe0f3463e",
"description": "abc def"
}
]
}
]
Is this possible using df.to_json()? I have tried to use regex to parse the resulting string, but due to the data contained therein, it is unfortunately extremely difficult so "jsonify" the fields I want.
You don't have a list but a string, and this string is not valid json, so you need a bit of pre-processing.
Assuming a non-nested structure, you can use:
import json
out = (df.assign(items=df['items'].str.replace(r'(\w+)=([^,}]+)', r'"\1": "\2"', regex=True).apply(json.loads))
.to_dict(orient='records')
)
Output:
[{'id': '3b68b7b2-f42c-418b-aa88-02450d66b616',
'items': [{'description': 'Abc',
'item_id': '210defdb-de69-4d03-bddd-7db626cd501b',
'quantity': '3.0'},
{'description': 'xyz . 23',
'item_id': 'ff457660-5f30-4432-a5af-564a9dee0029',
'quantity': '1.0'},
{'description': 'abc def',
'item_id': '8dbd22f2-cc13-4776-b58c-4d6fe0f3463e',
'quantity': '10.0'}]}]

Python Dictionary error list indices must be integers or slices, not str

I have a dictionary below, but trying to get a list of movies with IMDB score greater than 5, but keep getting the error ' list indices must be integers or slices, not str'
movies = [{
"name": "Usual Suspects",
"imdb": 7.0,
"category": "Thriller"
},
{
"name": "Hitman",
"imdb": 6.3,
"category": "Action"
},
{
"name": "Dark Knight",
"imdb": 9.0,
"category": "Adventure"
},
{
"name": "The Help",
"imdb": 8.0,
"category": "Drama"
}]
Any help would be appreciated
Classic way:
# lst = []
for movie in movies:
if movie['imdb'] > 5:
print (movie['name'])
# lst.append(movie['name'])
Output:
Usual Suspects
Hitman
Dark Knight
The Help
With list comprehension:
print ([ movie['name'] for movie in movies if movie['imdb'] > 5 ])
Output:
['Usual Suspects', 'Hitman', 'Dark Knight', 'The Help']
You have a list of dictionaries..Try the following code
movies_with_good_rating = []
for movie in movies:
if int(movie.get('imdb')) > 5:
movies_with_good_rating.apppend(movie.get('name'))
print(movies_with_good_rating)
You have a list of dicts. Output needed is list of movies with Imdb greater than 5. Try the below code.
movies_list = []
for dictionary in movies :
if dictionary["imdb"] > 5:
movies_list.append(dictionary["name"])
print(movies_list)
Output:
['Usual Suspects', 'Hitman', 'Dark Knight', 'The Help']
This is classic
for i in range(0,len(movies)) :
if movies[i]["imdb"] > 5:
print (movies[i]['name'])

Python averaging list of lists of nested dicts

I have a list with this structure:
data = [[
{
"id": 713,
"prediction": 4.8,
"confidence": [
{"percentile": "75", "lower": 4.8, "upper": 5.7}
],
},
{
"id": 714,
"prediction": 4.93,
"confidence": [
{"percentile": "75", "lower": 4.9, "upper": 5.7}
],
},
],
[
{
"id": 713,
"prediction": 5.8,
"confidence": [
{"percentile": "75", "lower": 4.2, "upper": 6.7}
],
},
{
"id": 714,
"prediction": 2.93,
"confidence": [
{"percentile": "75", "lower": 1.9, "upper": 3.7}
],
},
]]
So here we have a list containing two list, but it could also be more lists. Each list consist of a prediction with an id and confidence intervals in another list with a dict.
What I need is to merge these lists so I have one dict per id with the average of the numeric values.
I have tried searching but have not found an answer that matches this nested structure.
The expected output would look like this:
merged_data = [
{
"id": 713,
"prediction": 5.3,
"confidence": [
{"percentile": "75", "lower": 4.5, "upper": 6.2}
],
},
{
"id": 714,
"prediction": 3.93,
"confidence": [
{"percentile": "75", "lower": 3.4, "upper": 4.7}
],
},
]
def merge_items(items):
result = {}
if len(items):
result['id'] = items[0]['id']
result['prediction'] = round(sum([item['prediction'] for item in items]) / len(items), 2)
result['confidence'] = []
result['confidence'].append({
'percentile': items[0]['confidence'][0]['percentile'],
'lower': round(sum(item['confidence'][0]['lower'] for item in items) / len(items), 2),
'upper': round(sum(item['confidence'][0]['upper'] for item in items) / len(items), 2),
})
return result
result = []
ids = list(set([el['id'] for item in data for el in item]))
for id in ids:
to_merge = [sub_item for item in data for sub_item in item if sub_item['id'] == id]
result.append(merge_items(to_merge))
print(result)
dicc = {}
for e in l:
for d in e:
if d["id"] not in dicc:
dicc[d["id"]] = {"prediction": [], "lower": [], "upper": []}
dicc[d["id"]]["prediction"].append(d["prediction"])
dicc[d["id"]]["lower"].append(d["confidence"][0]["lower"])
dicc[d["id"]]["upper"].append(d["confidence"][0]["upper"])
for k in dicc:
dicc[k]["average_prediction"] = sum(dicc[k]["prediction"])/len(dicc[k]["prediction"])
dicc[k]["average_lower"] = sum(dicc[k]["lower"])/len(dicc[k]["lower"])
dicc[k]["average_upper"] = sum(dicc[k]["upper"])/len(dicc[k]["upper"])
print(dicc)
{713: {'prediction': [4.8, 5.8], 'lower': [4.8, 4.2], 'upper': [5.7, 6.7], 'average_prediction': 5.3, 'average_lower': 4.5, 'average_upper': 6.2}, 714: {'prediction': [4.936893921359024, 2.936893921359024], 'lower': [4.9, 1.9], 'upper': [5.7, 3.7], 'average_prediction': 3.936893921359024, 'average_lower': 3.4000000000000004, 'average_upper': 4.7}}
You really have three parts to this question.
How do you unpack the lists and group by the ids in preparation for some kind of aggregation? You have lots of options, but a pretty classic one is to make a lookup table and append any new values:
groups = {}
# `data` is the outer list in your nested structure
for d in (d for L in data for d in L):
L = groups.get(d['id'], [])
L.append(d)
groups[d['id']] = L
How do you aggregate those dictionaries so that you have an average of all the numeric values? There are lots of approaches with varying numeric stability. I'll start with an easy one that recursively walks a partial result set and a new entry.
Note that this assumes an incredibly consistent object structure (like you have shown). If you sometimes have missing keys, mismatched lengths, or other discrepancies you'll have to think long and hard about the exact details of what you want to happen when those structures are merged -- there isn't a one-size fits all solution.
def walk(avgs, new, n):
"""
Most of this algorithm is just walking the object structure.
We keep any keys, lists, etc the same and only average the
numeric elements.
"""
if isinstance(avgs, dict):
return {k:walk(avgs[k], new[k], n) for k in avgs}
if isinstance(avgs, list):
return [walk(x, y, n) for x,y in zip(avgs, new)]
if isinstance(avgs, float): # integers and whatnot also satisfy this
"""
This is the only place that averaging actually happens.
At the risk of some accumulated errors, this directly
computes the total of the last n+1 items and divides
by n+1.
"""
return (avgs*n+new)/(n+1.)
return avgs
def merge(L):
if not L:
# never happens using the above grouping code
return None
d = L[0]
for n, new in enumerate(L[1:], 1):
d = walks(d, new, n)
return d
averaged = {k:merge(v) for k,v in groups.items()}
You probably only want certain keys like the prediction to be averaged. You can do the filtering beforehand on the grouped objects or afterward (it's probably more efficient to do it beforehand):
# before
groups = {
# any transformation you'd like to apply to the dictionaries
k:[{s:d[s] for s in ('prediction', 'confidence')} for d in L] for k,L in groups.items()
}
# after
averaged = {
# basically the same code, except there's only one object per key
k:{s:d[s] for s in ('prediction', 'confidence')} for k,d in averaged.items()
}
For a note on efficiency, I created a bunch of intermediate lists, but those aren't really necessary. Instead of grouping then aggregating you can absolutely apply a rolling update algorithm and save some memory.
averaged = {}
# `data` is the outer list in your nested structure
for d in (d for L in data for d in L):
key = d['id']
d = {s:d[s] for s in ('prediction', 'confidence')} # any desired transforms
if key not in averaged:
averaged[key] = (d, 1)
else:
agg, n = groups[key]
averaged[key] = (walk(agg, d, n), n+1)
averaged = {k:v[0] for k,v in averaged.items()}
We still don't have the output formatted quite like you want (we have a dictionary, and you want a list where the keys are included in the objects). That's a pretty easy problem to solve though:
def inline_key(d, key):
# not a pure function, but we're lazy, and the original
# values are never used
d['id'] = key
return d
final_result = [inline_key(d, k) for k,d in averaged.items()]
Try this :
from copy import deepcopy
input = [[
{
"id": 713,
"prediction": 4.8,
"confidence": [
{"percentile": "75", "lower": 4.8, "upper": 5.7}
],
},
{
"id": 714,
"prediction": 4.936893921359024,
"confidence": [
{"percentile": "75", "lower": 4.9, "upper": 5.7}
],
},
],
[
{
"id": 713,
"prediction": 5.8,
"confidence": [
{"percentile": "75", "lower": 4.2, "upper": 6.7}
],
},
{
"id": 714,
"prediction": 2.936893921359024,
"confidence": [
{"percentile": "75", "lower": 1.9, "upper": 3.7}
],
},
]]
final_dict_list = []
processed_id = []
for item in input:
for dict_ele in item:
if dict_ele["id"] in processed_id:
for final_item in final_dict_list:
if final_item['id'] == dict_ele["id"]:
final_item["prediction"] += dict_ele["prediction"]
final_item["confidence"][0]["lower"] += dict_ele["confidence"][0]["lower"]
final_item["confidence"][0]["upper"] += dict_ele["confidence"][0]["upper"]
else:
final_dict = deepcopy(dict_ele)
final_dict_list.append(final_dict)
processed_id.append(dict_ele["id"])
numer_of_items = len(input)
for item in final_dict_list:
item["prediction"] /= numer_of_items
item["confidence"][0]["lower"] /= numer_of_items
item["confidence"][0]["upper"] /= numer_of_items
print(final_dict_list)
OUTPUT :
[
{'confidence': [{'upper': 6.2, 'lower': 4.5, 'percentile': '75'}], 'id': 713, 'prediction': 5.3},
{'confidence': [{'upper': 4.7, 'lower': 3.4000000000000004, 'percentile': '75'}], 'id': 714, 'prediction': 3.936893921359024}]
Just to point, it could have been much easier if the structure of data would have been a little differently created.

sum list object value with same code in python

I am a beginner in python, I have faced some problem. I have an object list like this :
[
{
'balance':-32399.0,
'code':u'1011',
'name':u'Stock Valuation Account'
},
{
'balance':-143503.34,
'code':u'1011',
'name':u'Stock Interim Account (Received)'
},
{
'balance':117924.2499995,
'code':u'1011',
'name':u'Stock Interim Account (Delivered)'
},
{
'balance':-3500000.0,
'code':u'1101',
'name':u'Cash'
},
{
'balance':-50000.0,
'code':u'1101',
'name':u'Other Cash'
},
]
I need to sum it based on the code, so the result will be.
[
{
'balance':6819,91,
'code':u'1011',
},
{
'balance':-3550000.0,
'code':u'1101',
},
]
have search over StackOverflow, but still not got what I need.
any help?...
Exactly, using groupby and sum within some comprehensions:
As said in the comments, for using groupby the list need to be presorted.
In addition you can use operator.attrgetter instead of lambdas in the key parameters of sorted and groupby.
l = [
{
'balance':-32399.0,
'code':u'1011',
'name':u'Stock Valuation Account'
},
...
]
from itertools import groupby
import operator
selector_func = operator.attrgetter("code")
l = sorted(l, key=selector_func)
result = [{"code" : code, "balance" : sum(x["balance"] for x in values)} for code, values in groupby(l, selector_func)]
print(result)
Result:
[{'code': '1011', 'balance': -57978.0900005}, {'code': '1101', 'balance': -3550000.0}]
Here you have the live example
Here is an oneliner without any import :
a = ...
result = [{'balance' : sum([i['balance'] for i in a if i['code']==j]), 'code' : j} for j in set([k['code'] for k in a])]
OUTPUT :
[{'balance': -3550000.0, 'code': '1101'}, {'balance': -57978.0900005, 'code': '1011'}]
data = [
{
'balance':-32399.0,
'code':u'1011',
'name':u'Stock Valuation Account'
},
...
]
d = {}
for entry in data:
d[entry['code']] = d.get(entry['code'],0) + entry['balance']
print([{'balance':b,'code':c} for c,b in d.items()])
Would print:
[{'balance': -57978.0900005, 'code': '1011'}, {'balance': -3550000.0, 'code': '1101'}]

How do I use simplejson to decode JSON responses to python objects?

JSON serialization Python using simpleJSON
How do I create an object so that we can optimize the serialization of the object
I'm using simpleJSON
1,2 are fixed variables
3 is a fixed dict of category and score
4 is an array of dicts that are fixed in length (4), the array is a length specificed at run-time.
The process needs to be as fast as possible, so I'm not sure about the best solution.
{
"always-include": true,
"geo": null,
"category-score" : [
{
"Arts-Entertainment": 0.72,
"Business": 0.03,
"Computers-Internet": 0.08,
"Gaming": 0.02,
"Health": 0.02,
}
],
"discovered-entities" : [
{
'relevance': '0.410652',
'count': '2',
'type': 'TelevisionStation',
'text': 'Fox News'
},
{
'relevance': '0.396494',
'count': '2',
'type': 'Organization',
'text': 'NBA'
}
]
],
}
Um...
import simplejson as json
result_object = json.loads(input_json_string)
?

Categories