Find unique values in python list of dictionaries

Find unique values in python list of dictionaries - python

I have a list with the following sample key value pairs:
results : {'ROI_0':[{'obj_id':1,'obj_name':'truck', 'obj_confi':93.55, 'bb_box': ['x','y','w','h']},
{'obj_id':2,'obj_name':'truck', 'obj_confi':91.35, 'bb_box': ['x','y','w','h']},
{'obj_id':3,'obj_name':'truck', 'obj_confi':92.65, 'bb_box': ['x','y','w','h']},
{'obj_id':4,'obj_name':'car', 'obj_confi':90.31, 'bb_box': ['x','y','w','h']},
{'obj_id':5,'obj_name':'car', 'obj_confi':90.31, 'bb_box': ['x','y','w','h']}
]}
I need to obtain another list which looks like the one below:
aggreg_results : {'ROI_0':[{'obj_occurances': 3, 'obj_name': 'truck', 'obj_ids':[1,2,3]},
{'obj_occurances': 2, 'obj_name': 'car', 'obj_ids':[4,5]}
]}
Unable to figure out the logic to this.

My answer is a bit long, but I think its easier to understand.
First of all I will create a dict that have obj_name as unique key, then count all occurences and store the obj_ids
results = {'ROI_0':[{'obj_id':1,'obj_name':'truck', 'obj_confi':93.55, 'bb_box': ['x','y','w','h']},
{'obj_id':2,'obj_name':'truck', 'obj_confi':91.35, 'bb_box': ['x','y','w','h']},
{'obj_id':3,'obj_name':'truck', 'obj_confi':92.65, 'bb_box': ['x','y','w','h']},
{'obj_id':4,'obj_name':'car', 'obj_confi':90.31, 'bb_box': ['x','y','w','h']},
{'obj_id':5,'obj_name':'car', 'obj_confi':90.31, 'bb_box': ['x','y','w','h']}
]}
unique_items = dict()
for obj in results['ROI_0']:
if obj['obj_name'] not in unique_items.keys():
unique_items[obj['obj_name']] = {
'obj_occurances': 0,
'obj_ids': []
}
unique_items[obj['obj_name']]['obj_occurances'] += 1
unique_items[obj['obj_name']]['obj_ids'].append(obj['obj_id'])
Now I have a dict like this in unique_items:
{
'truck': {'obj_occurances': 3, 'obj_ids': [1, 2, 3]},
'car': {'obj_occurances': 2, 'obj_ids': [4, 5]}
}
Then I just convert it into a dict as whatever format you want
aggreg_results = dict()
aggreg_results['ROI_0'] = list()
for k, v in unique_items.items():
aggreg_results['ROI_0'].append({
'obj_occurances': v['obj_occurances'],
'obj_name': k,
'obj_ids': v['obj_ids']
})
Finally I get the the aggreg_results as you expected
{'ROI_0': [
{'obj_occurances': 3, 'obj_name': 'truck', 'obj_ids': [1, 2, 3]},
{'obj_occurances': 2, 'obj_name': 'car', 'obj_ids': [4, 5]}
]
}
I hope it helps!

Related

How to get unique list of values from the list of dictonaries

I have a python list of dictionaries like this
test_list = [{'id': 0, 'A':True, 'B':123},
{'id':76, 'A':True, 'B':73},
{'id':5, 'A':False, 'B':223},
{'id':5, 'A':False, 'B':223},
{'id':85, 'A':True, 'B':4},
{'id':81, 'A':False, 'B':76},
{'id':76, 'A':True, 'B':73}]
And I want to make this list unique
using simple set(test_list) give you TypeError: unhashable type: 'dict'
My below answer works fine for me, but I am looking for a better and short answer
unique_ids = list(set([x['id'] for x in test_list]))
d = {}
for item in test_list:
d[item['id']] = item
new_d = []
for x in unique_ids:
new_d.append(d[x])

Create a new dictionary with keys as ids and the values as the entire dictionary and access only the values:
new_d = list({v["id"]: v for v in test_list}.values())
>>> new_d
[{'id': 0, 'A': True, 'B': 123},
{'id': 76, 'A': True, 'B': 73},
{'id': 5, 'A': False, 'B': 223},
{'id': 85, 'A': True, 'B': 4},
{'id': 81, 'A': False, 'B': 76}]
~~

What's the correct way to create a nested list/dictionary in python, similar to a json like structure?

I'm trying to create a nested dictionary / list json-like structure in python, however I'm not sure if my solution is optimal. Esseeantially I'm building list of lists for teams, which will then house their players, player ids, etc.
Below is is my code. Ideally, if i wanted to find players on team alpha, i'd type
team_info['Alpha']['players']
however i have to refer to the location of 'players' in order to pull it, for example:
team_info['Alpha'][0]['players']
Example
players_list = [['A','B','C'],['D','E','F'],['G','H','I']]
id_list = [[1,2,3],[4,5,6],[7,8,9]]
teams = ['Alpha', 'Bravo', 'Charlie']
team_info = {}
for a,b in enumerate(teams):
players = {}
ids = {}
players['players']=players_list[a]
ids['ids']=id_list[a]
team_info[b]=[players,ids]
This doesn't work
team_info['Alpha']['players']
I have to reference by position.
team_info['Alpha'][0]['players']
Is there a better way to set this up?

You do not need two dicts you just need one with two keys
for a,b in enumerate(teams):
players_info={}
players_info['players']=players_list[a]
players_info['ids']=id_list[a]
team_info[b]=players_info
Output of team_info['Alpha']['players']: ['A', 'B', 'C']
The final dict will look like
{'Alpha': {'players': ['A', 'B', 'C'], 'ids': [1, 2, 3]},
'Bravo': {'players': ['D', 'E', 'F'], 'ids': [4, 5, 6]},
'Charlie': {'players': ['G', 'H', 'I'], 'ids': [7, 8, 9]}}

player_lists = [
["A", "B", "C"],
["D", "E", "F"],
["G", "H", "I"]
]
id_lists = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
teams = [
"Alpha",
"Bravo",
"Charlie"
]
team_info = {}
for team, player_list, id_list in zip(teams, player_lists, id_lists):
team_info[team] = {
"players": player_list,
"ids": id_list
}

I'm not exactly sure what you are trying to accomplish, but I guess you would get further by using actual nested dictionaries - take a look at this tutorial on them.
Instantiation:
nested_dict = {
'dictA': {
'key_1': 'value_1',
'key_2': 'value_2'
},
'dictB': {
'key_a': 'value_b',
'key_n': 'value_n'
}
}
Access:
value = nested_dict['dictA']['name']

How to remove a json string from list in python without losing data

My question is similar to that of another question in SO How to remove a json string from list.
The solution to that question did solve a part of my problem but mine is little different.
My lists are:
list1 = [{"ID": 1, "data": "12"},{"ID": 2, "data": "13"}]
list2 = [{"ID": 1, "col": "5"},{"ID": 1, "col": "8"},{"ID": 2,"col": "2"}]
I did the following to modify the final list:
per_id = {}
for info in chain(list1, list2):
per_id.setdefault(info['ID'], {}).update(info)
output = list(per_id.values())
The expected output was:
output = [{"ID": 1,"data": "12", "col": "5"},{"ID": 1,"data": "12", "col": "8"},{"ID": 2,"data": "13","col": "2"}]
But the output i got is:
output = [{"ID": 1,"data": "12", "col": "5"},{"ID": 2,"data": "13","col": "2"}]
Is there a way to rectify this problem.

You get the second version because there is more than one "ID" with the the value of 1. If you use a defaultdict with a list, you can set it to append.
This example is taken directly from the page:
>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
... d[k].append(v)
...
>>> d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

Try itertools-combinations-
from itertools import chain
from itertools import combinations
list1 = [{"ID": 1, "data": "12"},{"ID": 2, "data": "13"}]
list2 = [{"ID": 1, "col": "5"},{"ID": 1, "col": "8"},{"ID": 2,"col": "2"}]
data = []
for i,j in combinations(chain(list1,list2),2):
if i['ID'] == j['ID']:
d = dict(i.items()+j.items())
if len(d.keys())==3:#Ensure that it has three keys i.e. avoid combination between lsit1 elements or combination between list2 elements themselves.
data.append(d)
print data
Output-
[{'data': '12', 'ID': 1, 'col': '5'}, {'data': '12', 'ID': 1, 'col': '8'}, {'data': '13', 'ID': 2, 'col': '2'}]

The previous answer from Serge Ballesta works if you include a simple check to avoid repeating values (i would comment it, but i don't have enough reputation).
result = [] # start with an empty list
for elt1 in list1:
for elt2 in list2:
if elt1['ID'] == elt2['ID']:
for k in elt2.keys():
if k != "ID":
eltr = elt1.copy() # take a copy to avoid changing original lists
eltr.update(elt2)
result.append(eltr)
result
Output:
[{'data': '12', 'ID': 1, 'col': '5'}, {'data': '12', 'ID': 1, 'col': '8'}, {'data': '13', 'ID': 2, 'col': '2'}]

If you have one list that contains some attributes for an ID and another one that contains other attributes, chaining is probably not the best idea.
Here you could simply iterate separately both lists and update the map of one list with the map of the second one. Code example in python console:
>>> result = [] # start with an empty list
>>> for elt1 in list1:
for elt2 in list2:
if elt1['ID'] == elt2['ID']:
eltr = elt1.copy() # take a copy to avoid changing original lists
eltr.update(elt2)
result.append(eltr)
>>> result
[{'data': '12', 'ID': 1, 'col': '5'}, {'data': '12', 'ID': 1, 'col': '8'}, {'data': '13', 'ID': 2, 'col': '2'}]
as expected...

Remove elements from list based on a field of elements

I have a list of tuples where each tuple has two items; first item is a dictionary and second one is a string.
all_values = [
({'x1': 1, 'y1': 2}, 'str1'),
({'x2': 1, 'y2': 2}, 'str2'),
({'x3': 1, 'y3': 2}, 'str3'),
({'x4': 1, 'y4': 2}, 'str1'),
]
I want to remove duplicate data from list based on the second item of the tuple. I wrote this code, but I want to improve it:
flag = False
items = []
for index, item in enumerate(all_values):
for j in range(0, index):
if all_values[j][1] == all_values[index][1]:
flag = True
if not flag:
items.append(item)
flag = False
And get this:
items = [
({'x1': 1, 'y1': 2}, 'str1'),
({'x2': 1, 'y2': 2}, 'str2'),
({'x3': 1, 'y3': 2}, 'str3')
]
Any help?
BTW I tried to remove duplicate data using list(set(all_values)) but I got error unhashable type: dict.

Use another list ('strings') to collect the second string items of the tuples. Thus, you will have a clear way to check if a current list item is a duplicate.
In the code below I added one duplicate list item (with 'str2' value) for demonstration purpose.
all_values = [
({'x1': 1, 'y1': 2}, 'str1'),
({'x2': 1, 'y2': 2}, 'str2'),
({'x5': 8, 'ab': 7}, 'str2'),
({'x3': 1, 'y3': 2}, 'str3')
]
strings = []
result = []
for value in all_values:
if not value[1] in strings:
strings.append(value[1])
result.append(value)
The new non-duplicated list will be in 'result'.

You could use following code
items = []
for item in all_values:
if next((i for i in items if i[1] == item[1]), None) is None:
items.append(item)

items = []
[items.append(item) for item in all_values if item[1] not in [x[1] for x in items]]
print items

If you're not concerned with the ordering, use a dict
formattedValues = {}
# Use with reveresed if you want the first duplicate to be kept
# Use without reveresed if you want the last duplicated
for v in reversed(allValues):
formattedValues[ v[1] ] = v
If ordering is a concern, use OrderedDict
from collections import OrderedDict
formattedValues = OrderedDict()
for v in reversed(allValues):
formattedValues[ v[1] ] = v

The Object oriented approach is not shorter - but it's more intuitive as well as readable/maintainable (IMHO).
Start by creating an object that will mimic the tuple and will provide an additional hash() and eq() functions which will be used by Set later on to check the uniqueness of the objects.
The function __repr__() is declared for debugging purpose:
class tup(object):
def __init__(self, t):
self.t = t
def __eq__(self, other):
return self.t[1] == other.t[1]
def __hash__(self):
return hash(self.t[1])
def __repr__(self):
return str(t)
# now you can declare:
all_values = [
({'x1': 1, 'y1': 2}, 'str1'),
({'x2': 1, 'y2': 2}, 'str2'),
({'x2': 1, 'y2': 2}, 'str2'),
({'x3': 1, 'y3': 2}, 'str3'),
({'x3': 1, 'y3': 2}, 'str3')
]
#create your objects and put them in a list
all_vals = []
map(lambda x: all_vals.append(Tup(x)), all_values)
print all_vals # [({'y1': 2, 'x1': 1}, 'str1'), ({'x2': 1, 'y2': 2}, 'str2'), ({'x2': 1, 'y2': 2}, 'str2'), ({'x3': 1, 'y3': 2}, 'str3'), ({'x3': 1, 'y3': 2}, 'str3')]
# and use Set for uniqueness
from sets import Set
print Set(all_vals) # Set([({'x3': 1, 'y3': 2}, 'str3'), ({'x3': 1, 'y3': 2}, 'str3'), ({'x3': 1, 'y3': 2}, 'str3')])
An alternative shorter version to those who feel that size matters ;)
res = []
for a in all_values:
if a[1] not in map(lambda x: x[1], res):
res.append(a)
print res

itertools Recipes has a function unique_everseen, it will return an iterator of the unique items in the passed in iterable according to the passed in key, if you want a list as a result just pass the result to list() but if theres lots of data it would be better to just iterate if you can to save memory.
from itertools import ifilterfalse
from operator import itemgetter
all_values = [
({'x1': 1, 'y1': 2}, 'str1'),
({'x2': 1, 'y2': 2}, 'str2'),
({'x5': 8, 'ab': 7}, 'str2'),
({'x3': 1, 'y3': 2}, 'str3')]
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
print list(unique_everseen(all_values, itemgetter(1)))
Output
[({'y1': 2, 'x1': 1}, 'str1'), ({'x2': 1, 'y2': 2}, 'str2'), ({'x3': 1, 'y3': 2}, 'str3')]

Group/Count list of dictionaries based on value

I've got a list of Tokens which looks something like:
[{
Value: "Blah",
StartOffset: 0,
EndOffset: 4
}, ... ]
What I want to do is get a count of how many times each value occurs in the list of tokens.
In VB.Net I'd do something like...
Tokens = Tokens.
GroupBy(Function(x) x.Value).
Select(Function(g) New With {
.Value = g.Key,
.Count = g.Count})
What's the equivalent in Python?

IIUC, you can use collections.Counter:
>>> from collections import Counter
>>> tokens = [{"Value": "Blah", "SO": 0}, {"Value": "zoom", "SO": 5}, {"Value": "Blah", "SO": 2}, {"Value": "Blah", "SO": 3}]
>>> Counter(tok['Value'] for tok in tokens)
Counter({'Blah': 3, 'zoom': 1})
if you only need a count. If you want them grouped by the value, you could use itertools.groupby and something like:
>>> from itertools import groupby
>>> def keyfn(x):
return x['Value']
...
>>> [(k, list(g)) for k,g in groupby(sorted(tokens, key=keyfn), keyfn)]
[('Blah', [{'SO': 0, 'Value': 'Blah'}, {'SO': 2, 'Value': 'Blah'}, {'SO': 3, 'Value': 'Blah'}]), ('zoom', [{'SO': 5, 'Value': 'zoom'}])]
although it's a little trickier because groupby requires the grouped terms to be contiguous, and so you have to sort by the key first.

Let's assume that is your python list, containing dictionnaries:
my_list = [{'Value': 'Blah',
'StartOffset': 0,
'EndOffset': 4},
{'Value': 'oqwij',
'StartOffset': 13,
'EndOffset': 98},
{'Value': 'Blah',
'StartOffset': 6,
'EndOffset': 18}]
A one liner:
len([i for i in a if i['Value'] == 'Blah']) # returns 2

import collections
# example token list
tokens = [{'Value':'Blah', 'Start':0}, {'Value':'BlahBlah'}]
count=collections.Counter([d['Value'] for d in tokens])
print count
shows
Counter({'BlahBlah': 1, 'Blah': 1})

token = [{
'Value': "Blah",
'StartOffset': 0,
'EndOffset': 4
}, ... ]
value_counter = {}
for t in token:
v = t['Value']
if v not in value_counter:
value_counter[v] = 0
value_counter[v] += 1
print value_counter

Another efficient way is to convert data to Pandas DataFrame and then aggregate them. Like this:
import pandas as pd
df = pd.DataFrame(data)
df.groupby('key')['value'].count()
df.groupby('key')['value'].sum()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find unique values in python list of dictionaries - python

Related

How to get unique list of values from the list of dictonaries

What's the correct way to create a nested list/dictionary in python, similar to a json like structure?

How to remove a json string from list in python without losing data

Remove elements from list based on a field of elements

Group/Count list of dictionaries based on value

Categories

Resources