Pythonic way to group items in a list [duplicate]

Pythonic way to group items in a list [duplicate] - python

This question already has an answer here:
Group list of dictionaries to list of list of dictionaries with same property value
(1 answer)
Closed 8 years ago.
Consider a list of dicts:
items = [
{'a': 1, 'b': 9, 'c': 8},
{'a': 1, 'b': 5, 'c': 4},
{'a': 2, 'b': 3, 'c': 1},
{'a': 2, 'b': 7, 'c': 9},
{'a': 3, 'b': 8, 'c': 2}
]
Is there a pythonic way to extract and group these items by their a field, such that:
result = {
1 : [{'b': 9, 'c': 8}, {'b': 5, 'c': 4}]
2 : [{'b': 3, 'c': 1}, {'b': 7, 'c': 9}]
3 : [{'b': 8, 'c': 2}]
}
References to any similar Pythonic constructs are appreciated.

Use itertools.groupby:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> {k: list(g) for k, g in groupby(items, itemgetter('a'))}
{1: [{'a': 1, 'c': 8, 'b': 9},
{'a': 1, 'c': 4, 'b': 5}],
2: [{'a': 2, 'c': 1, 'b': 3},
{'a': 2, 'c': 9, 'b': 7}],
3: [{'a': 3, 'c': 2, 'b': 8}]}
If item are not in sorted order then you can either sort them and then use groupby or you can use collections.OrderedDict(if order matters) or collections.defaultdict to do it in O(N) time:
>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> for item in items:
... d.setdefault(item['a'], []).append(item)
...
>>> dict(d.items())
{1: [{'a': 1, 'c': 8, 'b': 9},
{'a': 1, 'c': 4, 'b': 5}],
2: [{'a': 2, 'c': 1, 'b': 3},
{'a': 2, 'c': 9, 'b': 7}],
3: [{'a': 3, 'c': 2, 'b': 8}]}
Update:
I see that you only want the those keys to be returned that we didn't use for grouping, for that you'll need to do something like this:
>>> group_keys = {'a'}
>>> {k:[{k:d[k] for k in d.viewkeys() - group_keys} for d in g]
for k, g in groupby(items, itemgetter(*group_keys))}
{1: [{'c': 8, 'b': 9},
{'c': 4, 'b': 5}],
2: [{'c': 1, 'b': 3},
{'c': 9, 'b': 7}],
3: [{'c': 2, 'b': 8}]}

Note: This code assumes the the data is already sorted. If it is not, we have to sort it manually
from itertools import groupby
print {key:list(grp) for key, grp in groupby(items, key=lambda x:x["a"])}
Output
{1: [{'a': 1, 'b': 9, 'c': 8}, {'a': 1, 'b': 5, 'c': 4}],
2: [{'a': 2, 'b': 3, 'c': 1}, {'a': 2, 'b': 7, 'c': 9}],
3: [{'a': 3, 'b': 8, 'c': 2}]}
To get the result in the same format you asked for,
from itertools import groupby
from operator import itemgetter
a_getter, getter, keys = itemgetter("a"), itemgetter("b", "c"), ("b", "c")
def recon_dicts(items):
return dict(zip(keys, getter(items)))
{key: map(recon_dicts, grp) for key, grp in groupby(items, key=a_getter)}
Output
{1: [{'c': 8, 'b': 9}, {'c': 4, 'b': 5}],
2: [{'c': 1, 'b': 3}, {'c': 9, 'b': 7}],
3: [{'c': 2, 'b': 8}]}
If the data is not sorted already, you can either use the defaultdict method in this answer, or you can use sorted function to sort based on a, like this
{key: map(recon_dicts, grp)
for key, grp in groupby(sorted(items, key=a_getter), key=a_getter)}
References:
operator.itemgetter
itertools.groupby
zip, map, dict, sorted

Related

How to efficiently calculate prefix sum of frequencies of characters in a string?

Say, I have a string
s = 'AAABBBCAB'
How can I efficiently calculate the prefix sum of frequencies of each character in the string, i.e.:
psum = [{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, {'A': 3, 'B': 2}, {'A': 3, 'B': 3}, {'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1}, {'A': 4, 'B': 4, 'C': 1}]

You can do it in one line using itertools.accumulate and collections.Counter:
from collections import Counter
from itertools import accumulate
s = 'AAABBBCAB'
psum = list(accumulate(map(Counter, s)))
This gives you a list of Counter objects. Now, to get frequencies for any substring of s in O(1) time, you can simply subtract counters, e.g.:
>>> psum[6] - psum[1] # get frequencies for s[2:7]
Counter({'B': 3, 'A': 1, 'C': 1})

this is an option:
from collections import Counter
c = Counter()
s = 'AAABBBCAB'
psum = []
for char in s:
c.update(char)
psum.append(dict(c))
# [{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, {'A': 3, 'B': 2},
# {'A': 3, 'B': 3}, {'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1},
# {'A': 4, 'B': 4, 'C': 1}]
i use collections.Counter in order to keep a 'running sum' and add (a copy of the result) to the list psum. this way i iterate once only over the string s.
if you prefer to have collections.Counter objects in your result, you could change the last line to
psum.append(c.copy())
in order to get
[Counter({'A': 1}), Counter({'A': 2}), ...
Counter({'A': 4, 'B': 4, 'C': 1})]
the same result could also be achieved with this (using accumulate was first proposed in Eugene Yarmash's answer; i just avoid map in favour of a generator expression):
from itertools import accumulate
from collections import Counter
s = "AAABBBCAB"
psum = list(accumulate(Counter(char) for char in s))
just for completeness (as there is no 'pure dict' answer here yet). if you do not want to use Counter or defaultdict you could use this as well:
c = {}
s = 'AAABBBCAB'
psum = []
for char in s:
c[char] = c.get(char, 0) + 1
psum.append(c.copy())
although defaultdict is usually more performant than dict.get(key, default).

You actually don't even need a counter for this, just a defaultdict would suffice!
from collections import defaultdict
c = defaultdict(int)
s = 'AAABBBCAB'
psum = []
#iterate through the character
for char in s:
#Update count for each character
c[char] +=1
#Add the updated dictionary to the output list
psum.append(dict(c))
print(psum)
The output looks like
[{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1},
{'A': 3, 'B': 2}, {'A': 3, 'B': 3},
{'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1},
{'A': 4, 'B': 4, 'C': 1}]

Simplest would be to use the Counter object from collections.
from collections import Counter
s = 'AAABBBCAB'
[ dict(Counter(s[:i]) for i in range(1,len(s))]
Yields:
[{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, {'A': 3, 'B': 2},
{'A': 3, 'B': 3}, {'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1}]

In Python 3.8 you can use a list comprehension with an assignment expression (aka "the walrus operator"):
>>> from collections import Counter
>>> s = 'AAABBBCAB'
>>> c = Counter()
>>> [c := c + Counter(x) for x in s]
[Counter({'A': 1}), Counter({'A': 2}), Counter({'A': 3}), Counter({'A': 3, 'B': 1}), Counter({'A': 3, 'B': 2}), Counter({'A': 3, 'B': 3}), Counter({'A': 3, 'B': 3, 'C': 1}), Counter({'A': 4, 'B': 3, 'C': 1}), Counter({'A': 4, 'B': 4, 'C': 1})]

Convert dict to list of dict for each combinations

I have a dict looks like this :
my_dict = {
"a":[1, 2, 3],
"b":[10],
"c":[4, 5],
"d":[11]
}
And I would like to obtain a list containig all combinations keeping keys and value like this:
result = [
{"a":1, "b":10, "c":4, "d":11},
{"a":1, "b":10, "c":5, "d":11},
{"a":2, "b":10, "c":4, "d":11},
{"a":2, "b":10, "c":5, "d":11},
{"a":3, "b":10, "c":4, "d":11},
{"a":3, "b":10, "c":5, "d":11}
]
Do someone have a solution for this ?
Is there any existing solution to do this, or how should I proceed to do it myself ?
Thank you.

A task for itertools.product:
>>> from itertools import product
>>> for dict_items in product(*[product([k],v) for k, v in my_dict.items()]):
... print(dict(dict_items))
{'a': 1, 'b': 10, 'c': 4, 'd': 11}
{'a': 1, 'b': 10, 'c': 5, 'd': 11}
{'a': 2, 'b': 10, 'c': 4, 'd': 11}
{'a': 2, 'b': 10, 'c': 5, 'd': 11}
{'a': 3, 'b': 10, 'c': 4, 'd': 11}
{'a': 3, 'b': 10, 'c': 5, 'd': 11}
Small explanation:
The inner product(...) will expand the dict to a list such as [[(k1, v11), (k1, v12), ...], [(k2, v21), (k2, v22), ...], ...].
The outer product(...) will reassemble the items lists by choosing one tuple from each list.
dict(...) will create a dictionary from a sequence of (k1, v#), (k2, v#), ... tuples.

Try:
def permute(d):
k = d.keys()
perms = itertools.product(*d.values())
return [dict(zip(k, v)) for v in perms]
Example usage:
>>> d = {'a': [1, 2, 3], 'b': [10], 'c': [4, 5], 'd': [11]}
>>> pprint(permute(d))
[{'a': 1, 'b': 10, 'c': 4, 'd': 11},
{'a': 1, 'b': 10, 'c': 5, 'd': 11},
{'a': 2, 'b': 10, 'c': 4, 'd': 11},
{'a': 2, 'b': 10, 'c': 5, 'd': 11},
{'a': 3, 'b': 10, 'c': 4, 'd': 11},
{'a': 3, 'b': 10, 'c': 5, 'd': 11}]

Assuming that you are only interested in my_dict having 4 keys, it is simple enough to use nested for loops:
my_dict = {
"a": [1, 2, 3],
"b": [10],
"c": [4, 5],
"d": [11]
}
result = []
for a_val in my_dict['a']:
for b_val in my_dict['b']:
for c_val in my_dict['c']:
for d_val in my_dict['d']:
result.append({'a': a_val, 'b': b_val, 'c': c_val, 'd': d_val})
print(result)
This gives the expected result.

You can use:
from itertools import product
allNames = sorted(my_dict)
values= list(product(*(my_dict[Name] for Name in allNames)))
d = list(dict(zip(['a','b','c','d'],i)) for i in values)
Output:
[{'a': 1, 'c': 4, 'b': 10, 'd': 11},
{'a': 1, 'c': 5, 'b': 10, 'd': 11},
{'a': 2, 'c': 4, 'b': 10, 'd': 11},
{'a': 2, 'c': 5, 'b': 10, 'd': 11},
{'a': 3, 'c': 4, 'b': 10, 'd': 11},
{'a': 3, 'c': 5, 'b': 10, 'd': 11}]

itertools.product produces the combinations of a list of iterators.
dict.values() gets the list needed.
For each combination, zip up the dict.keys() with the combination.
Use a list comprehension to collect them up:
from itertools import product
from pprint import pprint
my_dict = {
"a":[1, 2, 3],
"b":[10],
"c":[4, 5],
"d":[11]
}
result = [dict(zip(my_dict,i)) for i in product(*my_dict.values())]
pprint(result)
Output:
[{'a': 1, 'b': 10, 'c': 4, 'd': 11},
{'a': 1, 'b': 10, 'c': 5, 'd': 11},
{'a': 2, 'b': 10, 'c': 4, 'd': 11},
{'a': 2, 'b': 10, 'c': 5, 'd': 11},
{'a': 3, 'b': 10, 'c': 4, 'd': 11},
{'a': 3, 'b': 10, 'c': 5, 'd': 11}]

Python using lambda sort list or dicts by multiple keys

here is my list of dict:
l = [{'a': 2, 'c': 1, 'b': 3},
{'a': 2, 'c': 3, 'b': 1},
{'a': 1, 'c': 2, 'b': 3},
{'a': 1, 'c': 3, 'b': 2},
{'a': 2, 'c': 5, 'b': 3}]
and now I want to sort the list by keys and orders provided by the user. for instance:
keys = ['a', 'c', 'b']
orders = [1, -1, 1]
I tried to using lambda in sort()method but it failed in a weird way :
>>> l.sort(key=lambda x: (order * x[key] for (key, order) in zip(keys, orders)))
>>> l
[{'a': 2, 'c': 5, 'b': 3},
{'a': 1, 'c': 3, 'b': 2},
{'a': 1, 'c': 2, 'b': 3},
{'a': 2, 'c': 3, 'b': 1},
{'a': 2, 'c': 1, 'b': 3}]
Anyone know how to solve this?

You were almost there; your lambda produces generator expressions and those happen to be ordered by their memory address (in Python 2) and produce a TypeError: '<' not supported between instances of 'generator' and 'generator' exception in Python 3.
Use a list comprehension instead:
l.sort(key=lambda x: [order * x[key] for (key, order) in zip(keys, orders)])
Demo:
>>> l = [{'a': 1, 'c': 2, 'b': 3},
... {'a': 1, 'c': 3, 'b': 2},
... {'a': 2, 'c': 1, 'b': 3},
... {'a': 2, 'c': 5, 'b': 3},
... {'a': 2, 'c': 3, 'b': 1}]
>>> keys = ['a', 'c', 'b']
>>> orders = [1, -1, 1]
>>> l.sort(key=lambda x: [order * x[key] for (key, order) in zip(keys, orders)])
>>> from pprint import pprint
>>> pprint(l)
[{'a': 1, 'b': 2, 'c': 3},
{'a': 1, 'b': 3, 'c': 2},
{'a': 2, 'b': 3, 'c': 5},
{'a': 2, 'b': 1, 'c': 3},
{'a': 2, 'b': 3, 'c': 1}]

Filter Dictionary keys of multilevel dictionary

I have the following dict structure:
{12345: {2006: [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'a': 1, 'b': 5}]}, 12346: {2007: [{'a': 2, 'b': 7}, {'a': 1, 'b': 9}, {'a': 1, 'b': 12}]}}
I want to be able to filter based on the keys of 'a' or 'b'
for example if 'a' is 1 the my filtered dict would look like:
{12345: {2006: [{'a': 1, 'b': 2}, {'a': 1, 'b': 5}]}, 12346: {2007: [{'a': 1, 'b': 9}, {'a': 1, 'b': 12}]}}
I have the following for loop which gets me down to where I have the inner dict's I want, but I am not sure how to put it back into a dict of the same structure.
d = {12345: {2006: [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'a': 1, 'b': 5}]}, 12346: {2007: [{'a': 2, 'b': 7}, {'a': 1, 'b': 9}, {'a': 1, 'b': 12}]}}
d_filter = {}
for item_code in d.keys():
for year in d[item_code]:
for item_dict in d[item_code][year]:
if item_dict['a'] == 1:
print(item_dict) # how to put this back in d_filter?
producing:
{'a': 1, 'b': 2}
{'a': 1, 'b': 5}
{'a': 1, 'b': 9}
{'a': 1, 'b': 12}
I am guessing there is a better way to filter that I can not find, or something with dictionary comprehension that my small mind can not grasp.
Any help would be appreciated.

Here's a dictionary comprehension that does just that; dct is your initial dictionary:
d = {k: {ky: [d for d in vl if d['a']==1] for ky, vl in v.items()}
for k, v in dct.items()}
print d
# {12345: {2006: [{'a': 1, 'b': 2}, {'a': 1, 'b': 5}]}, 12346: {2007: [{'a': 1, 'b': 9}, {'a': 1, 'b': 12}]}}
You can change the inner filter (i.e. d['a']==1) to the dict key and/or value of your choice.

You could do something like this:
filtered = {
item_code: {
year: [item for item in items if item['a'] == 1]
for year, items in years.items()
}
for item_code, years in d.items()
}
Which results in:
{12345: {2006: [{'a': 1, 'b': 2}, {'a': 1, 'b': 5}]},
12346: {2007: [{'a': 1, 'b': 9}, {'a': 1, 'b': 12}]}}

Combining all combinations of two lists into a dict of special form

I have two lists:
var_a = [1,2,3,4]
var_b = [6,7]
I want to have a list of dicts as follows:
result = [{'a':1,'b':6},{'a':1,'b':7},{'a':2,'b':6},{'a':2,'b':7},....]
I think the result should be clear.

[{k:v for k,v in itertools.izip('ab', comb)} for comb in itertools.product([1,2,3,4], [6,7])]
>>> import itertools
>>> [{k:v for k,v in itertools.izip('ab', comb)} for comb in itertools.product([
1,2,3,4], [6,7])]
[{'a': 1, 'b': 6}, {'a': 1, 'b': 7}, {'a': 2, 'b': 6}, {'a': 2, 'b': 7}, {'a': 3
, 'b': 6}, {'a': 3, 'b': 7}, {'a': 4, 'b': 6}, {'a': 4, 'b': 7}]

from itertools import product
a = [1,2,3,4]
b = [6,7]
[dict(zip(('a','b'), (i,j))) for i,j in product(a,b)]
yields
[{'a': 1, 'b': 6},
{'a': 1, 'b': 7},
{'a': 2, 'b': 6},
{'a': 2, 'b': 7},
{'a': 3, 'b': 6},
{'a': 3, 'b': 7},
{'a': 4, 'b': 6},
{'a': 4, 'b': 7}]

If the name of variables is given to you, you could use.
>>> a = [1,2,3,4]
>>> b = [6,7]
>>> from itertools import product
>>> nameTup = ('a', 'b')
>>> [dict(zip(nameTup, elem)) for elem in product(a, b)]
[{'a': 1, 'b': 6}, {'a': 1, 'b': 7}, {'a': 2, 'b': 6}, {'a': 2, 'b': 7}, {'a': 3, 'b': 6}, {'a': 3, 'b': 7}, {'a': 4, 'b': 6}, {'a': 4, 'b': 7}]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pythonic way to group items in a list [duplicate] - python

Related

How to efficiently calculate prefix sum of frequencies of characters in a string?

Convert dict to list of dict for each combinations

Python using lambda sort list or dicts by multiple keys

Filter Dictionary keys of multilevel dictionary

Combining all combinations of two lists into a dict of special form

Categories

Resources