Dictionary value sorting - python

I have a dictionary with tuples as keys (containing a string and an int) and floats as values. An example:
first = {}
first['monkey', 1] = 130.0
first['dog', 2] = 123.0-
first['cat', 3] = 130.0
first['cat', 4] = 130.0
first['mouse', 6] = 100.0
Now, I need to make a new dictionary, which has the original dictionary key's second element as
it's key. The new dictionary's value should be the the place it stands if the keys were sorted. Added to this, there are two exceptions:
If two dicts have values that are equal, but have different strings in the key, the one with the lowest int in the key should be placed higher.
If two dicts have values that are equal, but have different ints in the key, they should be placed equal in the new dict and all get the same values.
So, the new dictionary should be as the following:
second[1] = 3
second[2] = 2
second[3] = 4
second[4] = 4
second[6] = 1
I know that it's ignorant to ask someone else to solve my problem without giving my code for it.
But i'm simply don't know how to approach the problem. I would be glad if you could provide me with an explanation how would you solve this problem , or even give me a pseudocode of the algorithm.

import itertools as IT
first = {
('monkey',1): 130.0,
('dog',2): 123.0,
('cat', 3): 130.0,
('cat', 4): 130.0,
('mouse', 6): 100.0
}
counter = 0
ordered = sorted(first, key = lambda k: (first[k], k[1], k[0]))
second = {}
for key, group in IT.groupby(ordered, first.__getitem__):
# group = list(group)
# print(key, group)
# (100.0, [('mouse', 6)])
# (123.0, [('dog', 2)])
# (130.0, [('monkey', 1), ('cat', 3), ('cat', 4)])
previous = None
for name, num in group:
if name != previous:
counter += 1
second[num] = counter
previous = name
print(second)
yields
{1: 3, 2: 2, 3: 4, 4: 4, 6: 1}
Explanation:
The first step is to order the (name, num) keys of first according to the associated values. However, in case of ties, the num is used. If there is still a tie, the name is used to break the tie.
In [96]: ordered = sorted(first, key = lambda k: (first[k], k[1], k[0]))
In [97]: ordered
Out[97]: [('mouse', 6), ('dog', 2), ('monkey', 1), ('cat', 3), ('cat', 4)]
Next, we need to group the items in ordered since there are special rules when the value first[k] is the same. The grouping can be achieved using itertools.groupby:
In [99]: for key, group in IT.groupby(ordered, first.__getitem__):
....: print(key, list(group))
....:
....:
(100.0, [('mouse', 6)])
(123.0, [('dog', 2)])
(130.0, [('monkey', 1), ('cat', 3), ('cat', 4)])
itertools.groupby is collecting the items in ordered into bunches according to the value of the key, first.__getitem__(item). For example,
In [100]: first.__getitem__(('monkey', 1))
Out[100]: 130.0
In [101]: first.__getitem__(('cat', 3))
Out[101]: 130.0
first.__getitem__(item) is just a fancy way of writing first[item]. The reason why I use first.__getitem__ is because itertools.groupby expects a function for its second argument, and first.__getitem__ is the function that fits the bill.
Finally, we iterate through each group. Basically, we want to do this:
for name, num in group:
counter += 1
second[num] = counter
except that, when the names are equal, we do not want to advance the counter. So to check if the names are equal, it helps to store the previous name:
previous = None
for name, num in group:
if name != previous:
counter += 1
...
previous = name
Warning: Note that rkd91's code and my code produce different answers for
first = {
('monkey',1): 130.0,
('dog',2): 123.0,
('cat', 3): 129.0,
('cat', 4): 130.0,
('mouse', 6): 100.0
}
probably due to different interpretations of the specifications. I'll leave it to you do decide which is yielding the desired output.
#rdk91's code yields
{1: 4, 2: 2, 3: 5, 4: 3, 6: 1}
my code yields
{1: 4, 2: 2, 3: 3, 4: 5, 6: 1}

1) Get a list of key-value tuples using first_list = first.items()
2) Create a custom comparator function that will sort the list according to your criteria.
3) Sort the list using first_list.sort(comparator)
4) Build your new dictionary from the sorted list.

rob#rivertam:~/Programming$ cat sorter.py
first = {}
first['monkey', 1] = 130.0
first['dog', 2] = 123.0
first['cat', 3] = 130.0
first['cat', 4] = 130.0
first['mouse', 6] = 100.0
# Get the keys of first, sorted by the value (ascending order), and then by the integer in the key (descending order) if two keys have the same value
s = sorted(first, key=lambda x: x[0])
s.reverse()
s = sorted(s, key=lambda x: first[x])
# Loop through these, and create a new list where the key is the integer in the old key, and the value is the position in the sorted order.
last_val = None
last = (None, None)
index = 0
new_dict = {}
for item in s:
if not ((first[item] == last_val) and (item[1] != last[1]) and item[0] == last[0]):
# When we have the same value, the same string but a different integer from the last key, consider it to be the same position in the sorted order.
index += 1
new_dict[item[1]] = index
last_val = first[item]
last = item
print new_dict
rob#rivertam:~/Programming$ python sorter.py
{1: 3, 2: 2, 3: 4, 4: 4, 6: 1}

Related

Python: Removing list duplicates based on first 2 inner list values

Question:
I have a list in the following format:
x = [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
The algorithm:
Combine all inner lists with the same starting 2 values, the third value doesn't have to be the same to combine them
e.g. "hello",0,5 is combined with "hello",0,8
But not combined with "hello",1,1
The 3rd value becomes the average of the third values: sum(all 3rd vals) / len(all 3rd vals)
Note: by all 3rd vals I am referring to the 3rd value of each inner list of duplicates
e.g. "hello",0,5 and "hello",0,8 becomes hello,0,6.5
Desired output: (Order of list doesn't matter)
x = [["hello",0,6.5], ["hi",0,6], ["hello",1,1]]
Question:
How can I implement this algorithm in Python?
Ideally it would be efficient as this will be used on very large lists.
If anything is unclear let me know and I will explain.
Edit: I have tried to change the list to a set to remove duplicates, however this doesn't account for the third variable in the inner lists and therefore doesn't work.
Solution Performance:
Thanks to everyone who has provided a solution to this problem! Here
are the results based on a speed test of all the functions:
Update using running sum and count
I figured out how to improve my previous code (see original below). You can keep running totals and counts, then compute the averages at the end, which avoids recording all the individual numbers.
from collections import defaultdict
class RunningAverage:
def __init__(self):
self.total = 0
self.count = 0
def add(self, value):
self.total += value
self.count += 1
def calculate(self):
return self.total / self.count
def func(lst):
thirds = defaultdict(RunningAverage)
for sub in lst:
k = tuple(sub[:2])
thirds[k].add(sub[2])
lst_out = [[*k, v.calculate()] for k, v in thirds.items()]
return lst_out
print(func(x)) # -> [['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]
Original answer
This probably won't be very efficient since it has to accumulate all the values to average them. I think you could get around that by having a running average with a weighting factored in, but I'm not quite sure how to do that.
from collections import defaultdict
def avg(nums):
return sum(nums) / len(nums)
def func(lst):
thirds = defaultdict(list)
for sub in lst:
k = tuple(sub[:2])
thirds[k].append(sub[2])
lst_out = [[*k, avg(v)] for k, v in thirds.items()]
return lst_out
print(func(x)) # -> [['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]
You can try using groupby.
m = [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
from itertools import groupby
m.sort(key=lambda x:x[0]+str(x[1]))
for i,j in groupby(m, lambda x:x[0]+str(x[1])):
ss=0
c=0.0
for k in j:
ss+=k[2]
c+=1.0
print [k[0], k[1], ss/c]
This should be O(N), someone correct me if I'm wrong:
def my_algorithm(input_list):
"""
:param input_list: list of lists in format [string, int, int]
:return: list
"""
# Dict in format (string, int): [int, count_int]
# So our list is in this format, example:
# [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
# so for our dict we will make keys a tuple of the first 2 values of each sublist (since that needs to be unique)
# while values are a list of third element from our sublist + counter (which counts every time we have a duplicate
# key, so we can divide it and get average).
my_dict = {}
for element in input_list:
# key is a tuple of the first 2 values of each sublist
key = (element[0], element[1])
if key not in my_dict:
# If the key do not exists add it.
# Value is in form of third element from our sublist + counter. Since this is first value set counter to 1
my_dict[key] = [element[2], 1]
else:
# If key does exist then increment our value and increment counter by 1
my_dict[key][0] += element[2]
my_dict[key][1] += 1
# we have a dict so we will need to convert it to list (and on the way calculate averages)
return _convert_my_dict_to_list(my_dict)
def _convert_my_dict_to_list(my_dict):
"""
:param my_dict: dict, key is in form of tuple (string, int) and values are in form of list [int, int_counter]
:return: list
"""
my_list = []
for key, value in my_dict.items():
sublist = [key[0], key[1], value[0]/value[1]]
my_list.append(sublist)
return my_list
my_algorithm(x)
This will return:
[['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]
While your expected return is:
[["hello", 0, 6.5], ["hi", 0, 6], ["hello", 1, 1]]
If you really need ints then you can modify _convert_my_dict_to_list function.
Here's my variation on this theme: a groupby sans the expensive sort. I also changed the problem to make the input and output a list of tuples as these are fixed-size records:
from itertools import groupby
from operator import itemgetter
from collections import defaultdict
data = [("hello", 0, 5), ("hi", 0, 6), ("hello", 0, 8), ("hello", 1, 1)]
dictionary = defaultdict(complex)
for key, group in groupby(data, itemgetter(slice(2))):
total = sum(value for (string, number, value) in group)
dictionary[key] += total + 1j
array = [(*key, value.real / value.imag) for key, value in dictionary.items()]
print(array)
OUTPUT
> python3 test.py
[('hello', 0, 6.5), ('hi', 0, 6.0), ('hello', 1, 1.0)]
>
Thanks to #wjandrea for the itemgetter replacement for lambda. (And yes, I am using complex numbers in passing for the average to track the total and count.)

Python Shell Not Returning Anything

This is my code below that I believe should be working. When I call it, the python shell returns empty(blank) and just another Restart line pops up above. Wondering how to fix this?
Instructions for this function problem are the following:
Description: Write a function called animal_locator that takes in a dictionary
containing zoo locations as keys and their values being a list of tuples with the
specific animal and the population of that specific animal at that zoo. You should
return a dictionary containing the animals as keys and their values being a tuple
with their first element being an ordered list of all the zoo locations based on
how many animals are at each location (greatest to least) and the second element
being an integer of the total population of that specific animal.
You do not have to take in account case sensitivity.
def animal_locator(places):
newdict = {}
for city in places:
numtup = len(places[city])
num = 0
while num < numtup:
if places[city][num][0] not in newdict:
newlist = []
newtup = (places[city][num][1], city)
newlist.append(newtup)
for city1 in places:
if city1 != city:
for tup in places[city1]:
if tup[0] == places[city][num][0]:
tupnew = (tup[1], city1)
newlist.append(tupnew)
newlist.sort(reverse=True)
count = 0
newlist2 = []
for tup in newlist:
newlist2.append(tup[1])
count += tup[0]
newtup = (newlist2, count)
newdict[places[city][num][0]] = newtup
num += 1
return newdict
zoo_location1 = {'San Diego': [('lion', 4), ('tiger', 2), ('bear', 8)], 'Bronx': [('lion', 20), ('snake', 5), ('tiger', 1)], 'Atlanta': [('lion', 3), ('snake', 2), ('bee', 4500)], 'Orlando': [('bee', 234), ('tiger', 123)]}
animal_dict1 = animal_locator(zoo_location1)
print(animal_dict1)
I found out my num += 1 line needed to be indented by one tab and then it ran normally.

OrderedDict Changing Order after Double Iterator Loop

I set up an OrderedDict and perform dictionary comprehensions with different grammars, which I have simplified to a function dictcomp(fn, dictionary, key_or_value)::
x = OrderedDict(self._Median_Colors)
x = self.dictcomp(hex2color, x, 'v')
x = self.dictcomp(rgb_to_hsv, x, 'v_tuple')
At this point I am able to sort the dictionary:
x = self.dictcomp(self.sort_by_hue, x, 'v')
Everything seems to check out so far:
print x
Now I need to rename keys, so I will create a new ordered dictionary:
color_indexes = list(xrange(0, len(x.keys())))
print color_indexes
newkeys = [self.rename(color_index) for color_index in color_indexes]
print x.values()
vi = iter(x.values())
x = OrderedDict.fromkeys(newkeys);
I had no idea how to fill in the old values immediately, so I did this:
ki = iter(x.keys())
for k, v in zip(ki, vi):
#print "k:", k
print v
x[k] = tuple(v)
Checks out fine:
print x.items()
Here comes trouble:
x = self.dictcomp(hsv_to_rgb, x, 'v_tuple')
print x.items()
where dictcomp does this:
dictionary = {k: fn(*v) for k, v in dictionary.items()}
where fn=hsv_to_rgb, dictionary=x
Now, I have:
[('Blue', (0.9764705882352941, 0.5529411764705883, 0.0)), ....
instead of the expected:
[('Red', (0.4745098039215686, 0.7372549019607844, 0.23137254901960794)), ....
The keys are the same, but the values have changed. I am guessing that the insertion order was somehow affected. How did this happen and how can I keep the order of keys in the dictionary?
The problem is because of
for i, j in zip([4, 5, 6], [1, 2, 3]):
print i
print j
Results in the column:
4 1 5 2 6 3
It turns out that zip acts as a zipper if using two iterators.
The fix is to get the keyword-value as an iterable tuple:
for i in zip([4, 5, 6], [1, 2, 3]):
print i
Returns
(4, 1)
(5, 2)
(6, 3)

python group by and ordered a list by another list

I wonder if there is more Pythonic way to do group by and ordered a list by the order of another list.
The lstNeedOrder has couple pairs in random order. I want the output to be ordered as order in lst. The result should have all pairs containing a's then follow by all b's and c's.
The lstNeedOrder would only have either format in a/c or c/a.
input:
lstNeedOrder = ['a/b','c/b','f/d','a/e','c/d','a/c']
lst = ['a','b','c']
output:
res = ['a/b','a/c','a/e','c/b','c/d','f/d']
update
The lst = ['a','b','c'] is not actual data. it just make logic easy to understand. the actual data are more complex string pairs
Using sorted with customer key function:
>>> lstNeedOrder = ['a/b','c/d','f/d','a/e','c/d','a/c']
>>> lst = ['a','b','c']
>>> order = {ch: i for i, ch in enumerate(lst)} # {'a': 0, 'b': 1, 'c': 2}
>>> def sort_key(x):
... # 'a/b' -> (0, 1), 'c/d' -> (2, 3), ...
... a, b = x.split('/')
... return order.get(a, len(lst)), order.get(b, len(lst))
...
>>> sorted(lstNeedOrder, key=sort_key)
['a/b', 'a/c', 'a/e', 'c/d', 'c/d', 'f/d']

How do I use itertools.groupby()?

I haven't been able to find an understandable explanation of how to actually use Python's itertools.groupby() function. What I'm trying to do is this:
Take a list - in this case, the children of an objectified lxml element
Divide it into groups based on some criteria
Then later iterate over each of these groups separately.
I've reviewed the documentation, but I've had trouble trying to apply them beyond a simple list of numbers.
So, how do I use of itertools.groupby()? Is there another technique I should be using? Pointers to good "prerequisite" reading would also be appreciated.
IMPORTANT NOTE: You have to sort your data first.
The part I didn't get is that in the example construction
groups = []
uniquekeys = []
for k, g in groupby(data, keyfunc):
groups.append(list(g)) # Store group iterator as a list
uniquekeys.append(k)
k is the current grouping key, and g is an iterator that you can use to iterate over the group defined by that grouping key. In other words, the groupby iterator itself returns iterators.
Here's an example of that, using clearer variable names:
from itertools import groupby
things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
for key, group in groupby(things, lambda x: x[0]):
for thing in group:
print("A %s is a %s." % (thing[1], key))
print("")
This will give you the output:
A bear is a animal.
A duck is a animal.
A cactus is a plant.
A speed boat is a vehicle.
A school bus is a vehicle.
In this example, things is a list of tuples where the first item in each tuple is the group the second item belongs to.
The groupby() function takes two arguments: (1) the data to group and (2) the function to group it with.
Here, lambda x: x[0] tells groupby() to use the first item in each tuple as the grouping key.
In the above for statement, groupby returns three (key, group iterator) pairs - once for each unique key. You can use the returned iterator to iterate over each individual item in that group.
Here's a slightly different example with the same data, using a list comprehension:
for key, group in groupby(things, lambda x: x[0]):
listOfThings = " and ".join([thing[1] for thing in group])
print(key + "s: " + listOfThings + ".")
This will give you the output:
animals: bear and duck.
plants: cactus.
vehicles: speed boat and school bus.
itertools.groupby is a tool for grouping items.
From the docs, we glean further what it might do:
# [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
groupby objects yield key-group pairs where the group is a generator.
Features
A. Group consecutive items together
B. Group all occurrences of an item, given a sorted iterable
C. Specify how to group items with a key function *
Comparisons
# Define a printer for comparing outputs
>>> def print_groupby(iterable, keyfunc=None):
... for k, g in it.groupby(iterable, keyfunc):
... print("key: '{}'--> group: {}".format(k, list(g)))
# Feature A: group consecutive occurrences
>>> print_groupby("BCAACACAADBBB")
key: 'B'--> group: ['B']
key: 'C'--> group: ['C']
key: 'A'--> group: ['A', 'A']
key: 'C'--> group: ['C']
key: 'A'--> group: ['A']
key: 'C'--> group: ['C']
key: 'A'--> group: ['A', 'A']
key: 'D'--> group: ['D']
key: 'B'--> group: ['B', 'B', 'B']
# Feature B: group all occurrences
>>> print_groupby(sorted("BCAACACAADBBB"))
key: 'A'--> group: ['A', 'A', 'A', 'A', 'A']
key: 'B'--> group: ['B', 'B', 'B', 'B']
key: 'C'--> group: ['C', 'C', 'C']
key: 'D'--> group: ['D']
# Feature C: group by a key function
>>> # islower = lambda s: s.islower() # equivalent
>>> def islower(s):
... """Return True if a string is lowercase, else False."""
... return s.islower()
>>> print_groupby(sorted("bCAaCacAADBbB"), keyfunc=islower)
key: 'False'--> group: ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'D']
key: 'True'--> group: ['a', 'a', 'b', 'b', 'c']
Uses
Anagrams (see notebook)
Binning
Group odd and even numbers
Group a list by values
Remove duplicate elements
Find indices of repeated elements in an array
Split an array into n-sized chunks
Find corresponding elements between two lists
Compression algorithm (see notebook)/Run Length Encoding
Grouping letters by length, key function (see notebook)
Consecutive values over a threshold (see notebook)
Find ranges of numbers in a list or continuous items (see docs)
Find all related longest sequences
Take consecutive sequences that meet a condition (see related post)
Note: Several of the latter examples derive from Víctor Terrón's PyCon (talk) (Spanish), "Kung Fu at Dawn with Itertools". See also the groupby source code written in C.
* A function where all items are passed through and compared, influencing the result. Other objects with key functions include sorted(), max() and min().
Response
# OP: Yes, you can use `groupby`, e.g.
[do_something(list(g)) for _, g in groupby(lxml_elements, criteria_func)]
The example on the Python docs is quite straightforward:
groups = []
uniquekeys = []
for k, g in groupby(data, keyfunc):
groups.append(list(g)) # Store group iterator as a list
uniquekeys.append(k)
So in your case, data is a list of nodes, keyfunc is where the logic of your criteria function goes and then groupby() groups the data.
You must be careful to sort the data by the criteria before you call groupby or it won't work. groupby method actually just iterates through a list and whenever the key changes it creates a new group.
A neato trick with groupby is to run length encoding in one line:
[(c,len(list(cgen))) for c,cgen in groupby(some_string)]
will give you a list of 2-tuples where the first element is the char and the 2nd is the number of repetitions.
Edit: Note that this is what separates itertools.groupby from the SQL GROUP BY semantics: itertools doesn't (and in general can't) sort the iterator in advance, so groups with the same "key" aren't merged.
Another example:
for key, igroup in itertools.groupby(xrange(12), lambda x: x // 5):
print key, list(igroup)
results in
0 [0, 1, 2, 3, 4]
1 [5, 6, 7, 8, 9]
2 [10, 11]
Note that igroup is an iterator (a sub-iterator as the documentation calls it).
This is useful for chunking a generator:
def chunker(items, chunk_size):
'''Group items in chunks of chunk_size'''
for _key, group in itertools.groupby(enumerate(items), lambda x: x[0] // chunk_size):
yield (g[1] for g in group)
with open('file.txt') as fobj:
for chunk in chunker(fobj):
process(chunk)
Another example of groupby - when the keys are not sorted. In the following example, items in xx are grouped by values in yy. In this case, one set of zeros is output first, followed by a set of ones, followed again by a set of zeros.
xx = range(10)
yy = [0, 0, 0, 1, 1, 1, 0, 0, 0, 0]
for group in itertools.groupby(iter(xx), lambda x: yy[x]):
print group[0], list(group[1])
Produces:
0 [0, 1, 2]
1 [3, 4, 5]
0 [6, 7, 8, 9]
WARNING:
The syntax list(groupby(...)) won't work the way that you intend. It seems to destroy the internal iterator objects, so using
for x in list(groupby(range(10))):
print(list(x[1]))
will produce:
[]
[]
[]
[]
[]
[]
[]
[]
[]
[9]
Instead, of list(groupby(...)), try [(k, list(g)) for k,g in groupby(...)], or if you use that syntax often,
def groupbylist(*args, **kwargs):
return [(k, list(g)) for k, g in groupby(*args, **kwargs)]
and get access to the groupby functionality while avoiding those pesky (for small data) iterators all together.
I would like to give another example where groupby without sort is not working. Adapted from example by James Sulak
from itertools import groupby
things = [("vehicle", "bear"), ("animal", "duck"), ("animal", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
for key, group in groupby(things, lambda x: x[0]):
for thing in group:
print "A %s is a %s." % (thing[1], key)
print " "
output is
A bear is a vehicle.
A duck is a animal.
A cactus is a animal.
A speed boat is a vehicle.
A school bus is a vehicle.
there are two groups with vehicule, whereas one could expect only one group
#CaptSolo, I tried your example, but it didn't work.
from itertools import groupby
[(c,len(list(cs))) for c,cs in groupby('Pedro Manoel')]
Output:
[('P', 1), ('e', 1), ('d', 1), ('r', 1), ('o', 1), (' ', 1), ('M', 1), ('a', 1), ('n', 1), ('o', 1), ('e', 1), ('l', 1)]
As you can see, there are two o's and two e's, but they got into separate groups. That's when I realized you need to sort the list passed to the groupby function. So, the correct usage would be:
name = list('Pedro Manoel')
name.sort()
[(c,len(list(cs))) for c,cs in groupby(name)]
Output:
[(' ', 1), ('M', 1), ('P', 1), ('a', 1), ('d', 1), ('e', 2), ('l', 1), ('n', 1), ('o', 2), ('r', 1)]
Just remembering, if the list is not sorted, the groupby function will not work!
Sorting and groupby
from itertools import groupby
val = [{'name': 'satyajit', 'address': 'btm', 'pin': 560076},
{'name': 'Mukul', 'address': 'Silk board', 'pin': 560078},
{'name': 'Preetam', 'address': 'btm', 'pin': 560076}]
for pin, list_data in groupby(sorted(val, key=lambda k: k['pin']),lambda x: x['pin']):
... print pin
... for rec in list_data:
... print rec
...
o/p:
560076
{'name': 'satyajit', 'pin': 560076, 'address': 'btm'}
{'name': 'Preetam', 'pin': 560076, 'address': 'btm'}
560078
{'name': 'Mukul', 'pin': 560078, 'address': 'Silk board'}
Sadly I don’t think it’s advisable to use itertools.groupby(). It’s just too hard to use safely, and it’s only a handful of lines to write something that works as expected.
def my_group_by(iterable, keyfunc):
"""Because itertools.groupby is tricky to use
The stdlib method requires sorting in advance, and returns iterators not
lists, and those iterators get consumed as you try to use them, throwing
everything off if you try to look at something more than once.
"""
ret = defaultdict(list)
for k in iterable:
ret[keyfunc(k)].append(k)
return dict(ret)
Use it like this:
def first_letter(x):
return x[0]
my_group_by('four score and seven years ago'.split(), first_letter)
to get
{'f': ['four'], 's': ['score', 'seven'], 'a': ['and', 'ago'], 'y': ['years']}
How do I use Python's itertools.groupby()?
You can use groupby to group things to iterate over. You give groupby an iterable, and a optional key function/callable by which to check the items as they come out of the iterable, and it returns an iterator that gives a two-tuple of the result of the key callable and the actual items in another iterable. From the help:
groupby(iterable[, keyfunc]) -> create an iterator which returns
(key, sub-iterator) grouped by each value of key(value).
Here's an example of groupby using a coroutine to group by a count, it uses a key callable (in this case, coroutine.send) to just spit out the count for however many iterations and a grouped sub-iterator of elements:
import itertools
def grouper(iterable, n):
def coroutine(n):
yield # queue up coroutine
for i in itertools.count():
for j in range(n):
yield i
groups = coroutine(n)
next(groups) # queue up coroutine
for c, objs in itertools.groupby(iterable, groups.send):
yield c, list(objs)
# or instead of materializing a list of objs, just:
# return itertools.groupby(iterable, groups.send)
list(grouper(range(10), 3))
prints
[(0, [0, 1, 2]), (1, [3, 4, 5]), (2, [6, 7, 8]), (3, [9])]
This basic implementation helped me understand this function. Hope it helps others as well:
arr = [(1, "A"), (1, "B"), (1, "C"), (2, "D"), (2, "E"), (3, "F")]
for k,g in groupby(arr, lambda x: x[0]):
print("--", k, "--")
for tup in g:
print(tup[1]) # tup[0] == k
-- 1 --
A
B
C
-- 2 --
D
E
-- 3 --
F
One useful example that I came across may be helpful:
from itertools import groupby
#user input
myinput = input()
#creating empty list to store output
myoutput = []
for k,g in groupby(myinput):
myoutput.append((len(list(g)),int(k)))
print(*myoutput)
Sample input: 14445221
Sample output: (1,1) (3,4) (1,5) (2,2) (1,1)
from random import randint
from itertools import groupby
l = [randint(1, 3) for _ in range(20)]
d = {}
for k, g in groupby(l, lambda x: x):
if not d.get(k, None):
d[k] = list(g)
else:
d[k] = d[k] + list(g)
the code above shows how groupby can be used to group a list based on the lambda function/key supplied. The only problem is that the output is not merged, this can be easily resolved using a dictionary.
Example:
l = [2, 1, 2, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 3, 1, 2, 1, 3, 2, 3]
after applying groupby the result will be:
for k, g in groupby(l, lambda x:x):
print(k, list(g))
2 [2]
1 [1]
2 [2]
3 [3]
1 [1]
3 [3]
2 [2]
1 [1]
3 [3, 3]
1 [1]
3 [3]
2 [2]
3 [3]
1 [1]
2 [2]
1 [1]
3 [3]
2 [2]
3 [3]
Once a dictionary is used as shown above following result is derived which can be easily iterated over:
{2: [2, 2, 2, 2, 2, 2], 1: [1, 1, 1, 1, 1, 1], 3: [3, 3, 3, 3, 3, 3, 3, 3]}
The key thing to recognize with itertools.groupby is that items are only grouped together as long as they're sequential in the iterable. This is why sorting works, because basically you're rearranging the collection so that all of the items which satisfy callback(item) now appear in the sorted collection sequentially.
That being said, you don't need to sort the list, you just need a collection of key-value pairs, where the value can grow in accordance to each group iterable yielded by groupby. i.e. a dict of lists.
>>> things = [("vehicle", "bear"), ("animal", "duck"), ("animal", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
>>> coll = {}
>>> for k, g in itertools.groupby(things, lambda x: x[0]):
... coll.setdefault(k, []).extend(i for _, i in g)
...
{'vehicle': ['bear', 'speed boat', 'school bus'], 'animal': ['duck', 'cactus']}

Categories