keep duplicates by key in a list of dictionaries

keep duplicates by key in a list of dictionaries - python

I have a list of dictionaries, and I would like to obtain those that have the same value in a key:
my_list_of_dicts = [{
'id': 3,
'name': 'John'
},{
'id': 5,
'name': 'Peter'
},{
'id': 2,
'name': 'Peter'
},{
'id': 6,
'name': 'Mariah'
},{
'id': 7,
'name': 'John'
},{
'id': 1,
'name': 'Louis'
}
]
I want to keep those items that have the same 'name', so, I would like to obtain something like:
duplicates: [{
'id': 3,
'name': 'John'
},{
'id': 5,
'name': 'Peter'
},{
'id': 2,
'name': 'Peter'
}, {
'id': 7,
'name': 'John'
}
]
I'm trying (not successfully):
duplicates = [item for item in my_list_of_dicts if len(my_list_of_dicts.get('name', None)) > 1]
I have clear my problem with this code, but not able to do the right sentence

Another concise way using collections.Counter:
from collections import Counter
my_list_of_dicts = [{
'id': 3,
'name': 'John'
},{
'id': 5,
'name': 'Peter'
},{
'id': 2,
'name': 'Peter'
},{
'id': 6,
'name': 'Mariah'
},{
'id': 7,
'name': 'John'
},{
'id': 1,
'name': 'Louis'
}
]
c = Counter(x['name'] for x in my_list_of_dicts)
duplicates = [x for x in my_list_of_dicts if c[x['name']] > 1]

You could use the following list comprehension:
>>> [d for d in my_list_of_dicts if len([e for e in my_list_of_dicts if e['name'] == d['name']]) > 1]
[{'id': 3, 'name': 'John'},
{'id': 5, 'name': 'Peter'},
{'id': 2, 'name': 'Peter'},
{'id': 7, 'name': 'John'}]

my_list_of_dicts = [{
'id': 3,
'name': 'John'
},{
'id': 5,
'name': 'Peter'
},{
'id': 2,
'name': 'Peter'
},{
'id': 6,
'name': 'Mariah'
},{
'id': 7,
'name': 'John'
},{
'id': 1,
'name': 'Louis'
}
]
df = pd.DataFrame(my_list_of_dicts)
df[df.name.isin(df[df.name.duplicated()]['name'])].to_json(orient='records')

Attempt similar to #cucuru
Hopefully Helpful.
Explained in comments what I did differently.
my_list_of_dicts = [{
'id': 3,
'name': 'John'
},{
'id': 5,
'name': 'Peter'
},{
'id': 2,
'name': 'Peter'
},{
'id': 6,
'name': 'Mariah'
},{
'id': 7,
'name': 'John'
},{
'id': 1,
'name': 'Louis'
}
]
# Create a list of names
names = [person.get('name') for person in my_list_of_dicts]
# Add item to list if the name occurs more than once in names
duplicates = [item for item in my_list_of_dicts if names.count(item.get('name')) > 1]
print(duplicates)
produces
[{'id': 3, 'name': 'John'}, {'id': 5, 'name': 'Peter'}, {'id': 2, 'name': 'Peter'}, {'id': 7, 'name': 'John'}]
[Program finished]

Related

Python: Change a JSON value

Let's say I have the following JSON file named output.
{'fields': [{'name': 2, 'type': 'Int32'},
{'name': 12, 'type': 'string'},
{'name': 9, 'type': 'datetimeoffset'},
}],
'type': 'struct'}
If type key has a value datetimeoffset, I would like to change it to dateTime and if If type key has a value Int32, I would like to change it to integer and like this, I have multiple values to replace.
The expected output is
{'fields': [{ 'name': 2, 'type': 'integer'},
{ 'name': 12, 'type': 'string'},
{ 'name': 9, 'type': 'dateTime'},
,
}],
'type': 'struct'}
Can anyone help with this in Python?

You can try this out:
substitute = {"Int32": "integer", "datetimeoffset": "dateTime"}
x = {'fields': [
{'name': 2, 'type': 'Int32'},
{'name': 12, 'type': 'string'},
{'name': 9, 'type': 'datetimeoffset'}
],'type': 'struct'}
for i in range(len(x['fields'])):
if x['fields'][i]["type"] in substitute:
x['fields'][i]['type'] = substitute[x['fields'][i]['type']]
print(x)

You can use the following code. Include in equivalences dict the values you want to replace:
json = {
'fields': [
{'name': 2, 'type': 'Int32'},
{'name': 12, 'type': 'string'},
{'name': 9, 'type': 'datetimeoffset'},
],
'type': 'struct'
}
equivalences = {"datetimeoffset": "dateTime", "Int32": "integer"}
#Replace values based on equivalences dict
for i, data in enumerate(json["fields"]):
if data["type"] in equivalences.keys():
json["fields"][i]["type"] = equivalences[data["type"]]
print(json)
The output is:
{
"fields": [
{
"name": 2,
"type": "integer"
},
{
"name": 12,
"type": "string"
},
{
"name": 9,
"type": "dateTime"
}
],
"type": "struct"
}

simple but ugly way:
json_ ={'fields': [{'name': 2, 'type': 'Int32'},
{'name': 12, 'type': 'string'},
{'name': 9, 'type': 'datetimeoffset'}], 'type': 'struct'}
result = json.loads(json.dumps(json_ ).replace("datetimeoffset", "dateTime").replace("Int32", "integer"))

python sort list by custom criteria

I have following data read from csv :
venues =[{'capacity': 700, 'id': 1, 'name': 'AMD'},
{'capacity': 2000, 'id': 2, 'name': 'Honda'},
{'capacity': 2300, 'id': 3, 'name': 'Austin Kiddie Limits'},
{'capacity': 2000, 'id': 4, 'name': 'Austin Ventures'}]
i get the unique keys with :
b= list({k for d in venues for k in d.keys()})
which results in random order :
['name', 'capacity', 'id']
i would like to sort the unique key result in following manner :
sorted_keys = ['id','name','capacity']
how may i achieve this ?

In python tuples are sorted element-wise, so using a key function that produces tuple from your dictionaries should do the trick.
>>> sorted(venues, key=lambda row: (row['id'], row['name'], row['capacity']))
To be slightly more concise, you could use operator.itemgetter.
>>> from operator import itemgetter
>>> sorted(venues, key=itemgetter('id','name','capacity'))

You can use sort() function and its property key to introduce specific criteria when sorting your list:
venues =[{'capacity': 700, 'id': 1, 'name': 'AMD'},
{'capacity': 2000, 'id': 2, 'name': 'Honda'},
{'capacity': 2300, 'id': 3, 'name': 'Austin Kiddie Limits'},
{'capacity': 2000, 'id': 4, 'name': 'Austin Ventures'}]
venues.sort(key=lambda x: x["capacity"])
print(venues)
Output: In this case it sorts by capacity parameter
[{'capacity': 700, 'id': 1, 'name': 'AMD'}, {'capacity': 2000, 'id': 2, 'name': 'Honda'}, {'capacity': 2000, 'id': 4, 'name': 'Austin Ventures'}, {'capacity': 2300, 'id': 3, 'name': 'Austin Kiddie Limits'}]
Also, you can sort unique keys as follows:
venues =[{'capacity': 700, 'id': 1, 'name': 'AMD'},
{'capacity': 2000, 'id': 2, 'name': 'Honda'},
{'capacity': 2300, 'id': 3, 'name': 'Austin Kiddie Limits'},
{'capacity': 2000, 'id': 4, 'name': 'Austin Ventures'}]
venues.sort(key=lambda x: (x["id"], x["name"], x["capacity"]))
print(venues)

To get your sort order you could use name length as the key.
b = sorted(b, key=lambda x: len(x))

How to extract group count from dictionary?

I need to get the count of groups which is same 'id' and 'name'
Input:
myd = {
"Items": [
{
"id": 1,
"name": "ABC",
"value": 666
},
{
"id": 1,
"name": "ABC",
"value": 89
},
{
"id": 2,
"name": "DEF",
"value": 111
},
{
"id": 3,
"name": "GHI",
"value": 111
}
]
}
Expected output:
The count of {'id':1, 'name': 'ABC' } is 2
The count of {'id':2, 'name': 'DEF' } is 1
The count of {'id':3, 'name': 'GHI' } is 1
for total length we can get by len(myd) for single key its len(myd['id'])
How to get the count for the combination of id and name

You can use collections.OrderedDict and set both 'id' and 'name' as tuple keys. In this way, the OrderedDict automatically groups the dictionaries with same 'id' and 'name' values in order:
myd = {'Items': [
{'id':1, 'name': 'ABC', 'value': 666},
{'id':1, 'name': 'ABC', 'value': 89},
{'id':2, 'name': 'DEF', 'value': 111 },
{'id':3, 'name': 'GHI', 'value': 111 }]
}
from collections import OrderedDict
od = OrderedDict()
for d in myd['Items']:
od.setdefault((d['id'], d['name']), set()).add(d['value'])
for ks, v in od.items():
print("The count of {{'id': {}, 'name': {}}} is {}".format(ks[0], ks[1], len(v)))
Output:
The count of {'id': 1, 'name': ABC} is 2
The count of {'id': 2, 'name': DEF} is 1
The count of {'id': 3, 'name': GHI} is 1

This is a good candidate for groupby and itemgetter usage:
from itertools import groupby
from operator import itemgetter
myd = {'Items': [
{'id': 1, 'name': 'ABC', 'value': 666},
{'id': 1, 'name': 'ABC', 'value': 89},
{'id': 2, 'name': 'DEF', 'value': 111},
{'id': 3, 'name': 'GHI', 'value': 111}]
}
grouper = itemgetter('id', 'name')
for i, v in groupby(sorted(myd['Items'], key=grouper), key=grouper):
print(f"the count for {dict(id=i[0], name=i[1])} is {len(list(v))}")

jmespath search nested array issue

I need to search all dict in a nested array as below by its key with jmespath
my_list = [[{'age': 1, 'name': 'kobe'}, {'age': 2, 'name': 'james'}], [{'age': 3, 'name': 'kobe'}]]
I got an empty list with jmespath search: jmespath.search("[][?name=='kobe']", my_list)
how can I get result: [{'age': 1, 'name': 'kobe'}, {'age': 3, 'name': 'kobe'}] with jmespath search

Use the following jmesQuery:
[]|[?name=='kobe']
on input:
[[{"age": 1, "name": "kobe"}, {"age": 2, "name": "james"}], [{"age": 3, "name": "kobe"}]]
to get output:
[
{
"age": 1,
"name": "kobe"
},
{
"age": 3,
"name": "kobe"
}
]

The problem here is that you have a mix of different types, that is why you don't get expected results.
What you should do is this:
jmespath.search("[].to_array(#)[?name=='kobe'][]", my_list)
Here is a break down using Python console (pay attention to :
>>> my_list
[[{'age': 1, 'name': 'kobe'}, {'age': 2, 'name': 'james'}], [{'age': 3, 'name': 'kobe'}]]
>>> jmespath.search("[]", my_list)
[{'age': 1, 'name': 'kobe'}, {'age': 2, 'name': 'james'}, {'age': 3, 'name': 'kobe'}]
>>> jmespath.search("[].to_array(#)", my_list)
[[{'age': 1, 'name': 'kobe'}], [{'age': 2, 'name': 'james'}], [{'age': 3, 'name': 'kobe'}]]
>>> jmespath.search("[].to_array(#)[]", my_list)
[{'age': 1, 'name': 'kobe'}, {'age': 2, 'name': 'james'}, {'age': 3, 'name': 'kobe'}]
>>> jmespath.search("[].to_array(#)[?name=='kobe']", my_list)
[[{'age': 1, 'name': 'kobe'}], [], [{'age': 3, 'name': 'kobe'}]]
>>> jmespath.search("[].to_array(#)[?name=='kobe'][]", my_list)
[{'age': 1, 'name': 'kobe'}, {'age': 3, 'name': 'kobe'}]
You can find more explanation with examples in this guide: https://www.doaws.pl/blog/2021-12-05-how-to-master-aws-cli-in-15-minutes/how-to-master-aws-cli-in-15-minutes

Use Below code:
my_list = [[{'age': 1, 'name': 'kobe'}, {'age': 2, 'name': 'james'}], [{'age': 3,
'name': 'kobe'}]]
for l in my_list:
for dictionary in l:
Value_List = dictionary.values()
if "kobe" in Value_List:
print(dictionary)
Output:
{'age': 1, 'name': 'kobe'}
{'age': 3, 'name': 'kobe'}
OR-----
my_list = [[{'age': 1, 'name': 'kobe'}, {'age': 2, 'name': 'james'}],
[{'age': 3, 'name': 'kobe'}]]
Match_List = []
for l in my_list:
for dictionary in l:
if dictionary["name"] == "kobe":
Match_List.append(dictionary)
print(Match_List)
Output:
[{'age': 1, 'name': 'kobe'}, {'age': 3, 'name': 'kobe'}]

How can I create aggregate expressions of this list of dicts?

I have a list of dictionaries that expresses periods+days for a class in a student information system. Here's the data I'd like to aggregate:
[
{
'period': {
'name': '1',
'sort_order': 1
},
'day': {
'name': 'A',
'sort_order': 1
}
},
{
'period': {
'name': '1',
'sort_order': 1
},
'day': {
'name': 'B',
'sort_order': 2
}
},
{
'period': {
'name': '1',
'sort_order': 1
},
'day': {
'name': 'C',
'sort_order': 1
}
},
{
'period': {
'name': '3',
'sort_order': 3
},
'day': {
'name': 'A',
'sort_order': 1
}
},
{
'period': {
'name': '3',
'sort_order': 3
},
'day': {
'name': 'B',
'sort_order': 2
}
},
{
'period': {
'name': '3',
'sort_order': 3
},
'day': {
'name': 'C',
'sort_order': 2
}
},
{
'period': {
'name': '4',
'sort_order': 4
},
'day': {
'name': 'D',
'sort_order': 3
}
}
]
The aggregated string I'd like the above to reduce to is 1,3(A-C) 4(D). Notice that objects that aren't "adjacent" (determined by the object's sort_order) to each other are delimited by , and "adjacent" records are delimited by a -.
EDIT
Let me try to elaborate on the aggregation process. Each "class meeting" object contains a period and day. There are usually ~5 periods per day, and the days alternate cyclically between A,B,C,D, etc. So if I have a class that occurs 1st period on an A day, we might express that as 1(A). If a class occurs on 1st and 2nd period on an A day, the raw form of that might be 1(A),2(A), but it can be shortened to 1-2(A).
Some classes might not be in "adjacent" periods or days. A class might occur on 1st period and 3rd period on an A day, so its short form would be 1,3(A). However, if that class were on 1st, 2nd, and 3rd period on an A day, it could be written as 1-3(A). This also applies to days, so if a class occurs on 1st,2nd, and 3rd period, on A,B, and C day, then we could write it 1-3(A-C).
Finally, if a class occurs on 1st,2nd, and 3rd period and on A,B, and C day, but also on 4th period on D day, its short form would be 1-3(A-C) 4(D).
What I've tried
The first step that occurs to me to perform is to "group" the meeting objects into related sub-lists with the following function:
def _to_related_lists(list):
"""Given a list of section meeting dicts, return a list of lists, where each sub-list is list of
related section meetings, either related by period or day"""
related_list = []
sub_list = []
related_values = set()
for index, section_meeting_object in enumerate(list):
# starting with empty values list
if not related_values:
related_values.add(section_meeting_object['period']['name'])
related_values.add(section_meeting_object['day']['name'])
sub_list.append(section_meeting_object)
elif section_meeting_object['period']['name'] in related_values or section_meeting_object['day']['name'] in related_values:
related_values.add(section_meeting_object['period']['name'])
related_values.add(section_meeting_object['day']['name'])
sub_list.append(section_meeting_object)
else:
# no related values found in current section_meeting_object
related_list.append(sub_list)
sub_list = []
related_values = set()
related_values.add(section_meeting_object['period']['name'])
related_values.add(section_meeting_object['day']['name'])
sub_list.append(section_meeting_object)
related_list.append(sub_list)
return related_list
Which returns:
[
[{
'period': {
'sort_order': 1,
'name': '1'
},
'day': {
'sort_order': 1,
'name': 'A'
}
}, {
'period': {
'sort_order': 1,
'name': '1'
},
'day': {
'sort_order': 2,
'name': 'B'
}
}, {
'period': {
'sort_order': 2,
'name': '2'
},
'day': {
'sort_order': 1,
'name': 'A'
}
}, {
'period': {
'sort_order': 2,
'name': '2'
},
'day': {
'sort_order': 2,
'name': 'B'
}
}],
[{
'period': {
'sort_order': 4,
'name': '4'
},
'day': {
'sort_order': 3,
'name': 'C'
}
}]
]
If the entire string 1-3(A-C) 4(D) is the aggregate expression I'd like in the end, let's call 1-3(A-C) and 4(D) "sub-expressions". Each related sub-list would be a "sub-expression", so I was thinking I'd somehow iterate through every sublist and create the sub-expression, but I"m not exactly sure how to do that.

First, let us define your list as d_list.
d_list = [
{'period': {'sort_order': 1, 'name': '1'}, 'day': {'sort_order': 1, 'name': 'A'}},
{'period': {'sort_order': 1, 'name': '1'}, 'day': {'sort_order': 2, 'name': 'B'}},
{'period': {'sort_order': 1, 'name': '1'}, 'day': {'sort_order': 1, 'name': 'C'}},
{'period': {'sort_order': 3, 'name': '3'}, 'day': {'sort_order': 1, 'name': 'A'}},
{'period': {'sort_order': 3, 'name': '3'}, 'day': {'sort_order': 2, 'name': 'B'}},
{'period': {'sort_order': 3, 'name': '3'}, 'day': {'sort_order': 2, 'name': 'C'}},
{'period': {'sort_order': 4, 'name': '4'}, 'day': {'sort_order': 3, 'name': 'D'}},
]
Note that I use the python native module string to define that B is between A and C. Thus what you may want to do is
import string
agg0 = {}
for d in d_list:
name = d['period']['name']
if name not in agg0:
agg0[name] = []
day = d['day']
agg0[name].append(day['name'])
agg1 = {}
for k,v in agg0.items():
pos_in_alph = [string.ascii_lowercase.index(el.lower()) for el in v]
allowed_indexes = [max(pos_in_alph),min(pos_in_alph)]
agg1[k] = [el for el in v if string.ascii_lowercase.index(el.lower()) in allowed_indexes]
agg = {}
for k,v in agg1.items():
w = tuple(v)
if w not in agg:
agg[w] = {'ks':[],'gr':len(agg0[k])>2}
agg[w]['ks'].append(k)
print agg[w]
str_ = ''
for k,v in sorted(agg.items(), key=lambda item:item[0], reverse=False):
str_ += ' {pnames}({dnames})'.format(pnames=('-' if v['gr'] else ',').join(sorted(v['ks'])),
dnames='-'.join(k))
print(str_.strip())
which outputs 1-3(A-C) 4(D)
Following #NathanJones's comment, note that if d_list were defined as
d_list = [
{'period': {'sort_order': 1, 'name': '1'}, 'day': {'sort_order': 1, 'name': 'A'}},
##{'period': {'sort_order': 1, 'name': '1'}, 'day': {'sort_order': 2, 'name': 'B'}},
{'period': {'sort_order': 1, 'name': '1'}, 'day': {'sort_order': 1, 'name': 'C'}},
{'period': {'sort_order': 3, 'name': '3'}, 'day': {'sort_order': 1, 'name': 'A'}},
{'period': {'sort_order': 3, 'name': '3'}, 'day': {'sort_order': 2, 'name': 'B'}},
{'period': {'sort_order': 3, 'name': '3'}, 'day': {'sort_order': 2, 'name': 'C'}},
{'period': {'sort_order': 4, 'name': '4'}, 'day': {'sort_order': 3, 'name': 'D'}},
]
The code above would print 1,3(A-C) 4(D)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

keep duplicates by key in a list of dictionaries - python

You could use the following list comprehension: >>> [d for d in my_list_of_dicts if len([e for e in my_list_of_dicts if e['name'] == d['name']]) > 1] [{'id': 3, 'name': 'John'}, {'id': 5, 'name': 'Peter'}, {'id': 2, 'name': 'Peter'}, {'id': 7, 'name': 'John'}]

Related

Python: Change a JSON value

python sort list by custom criteria

How to extract group count from dictionary?

jmespath search nested array issue

How can I create aggregate expressions of this list of dicts?

Categories

Resources