How do I use itertools.groupby()?

How do I use itertools.groupby()? - python

I haven't been able to find an understandable explanation of how to actually use Python's itertools.groupby() function. What I'm trying to do is this:
Take a list - in this case, the children of an objectified lxml element
Divide it into groups based on some criteria
Then later iterate over each of these groups separately.
I've reviewed the documentation, but I've had trouble trying to apply them beyond a simple list of numbers.
So, how do I use of itertools.groupby()? Is there another technique I should be using? Pointers to good "prerequisite" reading would also be appreciated.

IMPORTANT NOTE: You have to sort your data first.
The part I didn't get is that in the example construction
groups = []
uniquekeys = []
for k, g in groupby(data, keyfunc):
groups.append(list(g)) # Store group iterator as a list
uniquekeys.append(k)
k is the current grouping key, and g is an iterator that you can use to iterate over the group defined by that grouping key. In other words, the groupby iterator itself returns iterators.
Here's an example of that, using clearer variable names:
from itertools import groupby
things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
for key, group in groupby(things, lambda x: x[0]):
for thing in group:
print("A %s is a %s." % (thing[1], key))
print("")
This will give you the output:
A bear is a animal.
A duck is a animal.
A cactus is a plant.
A speed boat is a vehicle.
A school bus is a vehicle.
In this example, things is a list of tuples where the first item in each tuple is the group the second item belongs to.
The groupby() function takes two arguments: (1) the data to group and (2) the function to group it with.
Here, lambda x: x[0] tells groupby() to use the first item in each tuple as the grouping key.
In the above for statement, groupby returns three (key, group iterator) pairs - once for each unique key. You can use the returned iterator to iterate over each individual item in that group.
Here's a slightly different example with the same data, using a list comprehension:
for key, group in groupby(things, lambda x: x[0]):
listOfThings = " and ".join([thing[1] for thing in group])
print(key + "s: " + listOfThings + ".")
This will give you the output:
animals: bear and duck.
plants: cactus.
vehicles: speed boat and school bus.

itertools.groupby is a tool for grouping items.
From the docs, we glean further what it might do:
# [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
groupby objects yield key-group pairs where the group is a generator.
Features
A. Group consecutive items together
B. Group all occurrences of an item, given a sorted iterable
C. Specify how to group items with a key function *
Comparisons
# Define a printer for comparing outputs
>>> def print_groupby(iterable, keyfunc=None):
... for k, g in it.groupby(iterable, keyfunc):
... print("key: '{}'--> group: {}".format(k, list(g)))
# Feature A: group consecutive occurrences
>>> print_groupby("BCAACACAADBBB")
key: 'B'--> group: ['B']
key: 'C'--> group: ['C']
key: 'A'--> group: ['A', 'A']
key: 'C'--> group: ['C']
key: 'A'--> group: ['A']
key: 'C'--> group: ['C']
key: 'A'--> group: ['A', 'A']
key: 'D'--> group: ['D']
key: 'B'--> group: ['B', 'B', 'B']
# Feature B: group all occurrences
>>> print_groupby(sorted("BCAACACAADBBB"))
key: 'A'--> group: ['A', 'A', 'A', 'A', 'A']
key: 'B'--> group: ['B', 'B', 'B', 'B']
key: 'C'--> group: ['C', 'C', 'C']
key: 'D'--> group: ['D']
# Feature C: group by a key function
>>> # islower = lambda s: s.islower() # equivalent
>>> def islower(s):
... """Return True if a string is lowercase, else False."""
... return s.islower()
>>> print_groupby(sorted("bCAaCacAADBbB"), keyfunc=islower)
key: 'False'--> group: ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'D']
key: 'True'--> group: ['a', 'a', 'b', 'b', 'c']
Uses
Anagrams (see notebook)
Binning
Group odd and even numbers
Group a list by values
Remove duplicate elements
Find indices of repeated elements in an array
Split an array into n-sized chunks
Find corresponding elements between two lists
Compression algorithm (see notebook)/Run Length Encoding
Grouping letters by length, key function (see notebook)
Consecutive values over a threshold (see notebook)
Find ranges of numbers in a list or continuous items (see docs)
Find all related longest sequences
Take consecutive sequences that meet a condition (see related post)
Note: Several of the latter examples derive from Víctor Terrón's PyCon (talk) (Spanish), "Kung Fu at Dawn with Itertools". See also the groupby source code written in C.
* A function where all items are passed through and compared, influencing the result. Other objects with key functions include sorted(), max() and min().
Response
# OP: Yes, you can use `groupby`, e.g.
[do_something(list(g)) for _, g in groupby(lxml_elements, criteria_func)]

The example on the Python docs is quite straightforward:
groups = []
uniquekeys = []
for k, g in groupby(data, keyfunc):
groups.append(list(g)) # Store group iterator as a list
uniquekeys.append(k)
So in your case, data is a list of nodes, keyfunc is where the logic of your criteria function goes and then groupby() groups the data.
You must be careful to sort the data by the criteria before you call groupby or it won't work. groupby method actually just iterates through a list and whenever the key changes it creates a new group.

A neato trick with groupby is to run length encoding in one line:
[(c,len(list(cgen))) for c,cgen in groupby(some_string)]
will give you a list of 2-tuples where the first element is the char and the 2nd is the number of repetitions.
Edit: Note that this is what separates itertools.groupby from the SQL GROUP BY semantics: itertools doesn't (and in general can't) sort the iterator in advance, so groups with the same "key" aren't merged.

Another example:
for key, igroup in itertools.groupby(xrange(12), lambda x: x // 5):
print key, list(igroup)
results in
0 [0, 1, 2, 3, 4]
1 [5, 6, 7, 8, 9]
2 [10, 11]
Note that igroup is an iterator (a sub-iterator as the documentation calls it).
This is useful for chunking a generator:
def chunker(items, chunk_size):
'''Group items in chunks of chunk_size'''
for _key, group in itertools.groupby(enumerate(items), lambda x: x[0] // chunk_size):
yield (g[1] for g in group)
with open('file.txt') as fobj:
for chunk in chunker(fobj):
process(chunk)
Another example of groupby - when the keys are not sorted. In the following example, items in xx are grouped by values in yy. In this case, one set of zeros is output first, followed by a set of ones, followed again by a set of zeros.
xx = range(10)
yy = [0, 0, 0, 1, 1, 1, 0, 0, 0, 0]
for group in itertools.groupby(iter(xx), lambda x: yy[x]):
print group[0], list(group[1])
Produces:
0 [0, 1, 2]
1 [3, 4, 5]
0 [6, 7, 8, 9]

WARNING:
The syntax list(groupby(...)) won't work the way that you intend. It seems to destroy the internal iterator objects, so using
for x in list(groupby(range(10))):
print(list(x[1]))
will produce:
[]
[]
[]
[]
[]
[]
[]
[]
[]
[9]
Instead, of list(groupby(...)), try [(k, list(g)) for k,g in groupby(...)], or if you use that syntax often,
def groupbylist(*args, **kwargs):
return [(k, list(g)) for k, g in groupby(*args, **kwargs)]
and get access to the groupby functionality while avoiding those pesky (for small data) iterators all together.

I would like to give another example where groupby without sort is not working. Adapted from example by James Sulak
from itertools import groupby
things = [("vehicle", "bear"), ("animal", "duck"), ("animal", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
for key, group in groupby(things, lambda x: x[0]):
for thing in group:
print "A %s is a %s." % (thing[1], key)
print " "
output is
A bear is a vehicle.
A duck is a animal.
A cactus is a animal.
A speed boat is a vehicle.
A school bus is a vehicle.
there are two groups with vehicule, whereas one could expect only one group

#CaptSolo, I tried your example, but it didn't work.
from itertools import groupby
[(c,len(list(cs))) for c,cs in groupby('Pedro Manoel')]
Output:
[('P', 1), ('e', 1), ('d', 1), ('r', 1), ('o', 1), (' ', 1), ('M', 1), ('a', 1), ('n', 1), ('o', 1), ('e', 1), ('l', 1)]
As you can see, there are two o's and two e's, but they got into separate groups. That's when I realized you need to sort the list passed to the groupby function. So, the correct usage would be:
name = list('Pedro Manoel')
name.sort()
[(c,len(list(cs))) for c,cs in groupby(name)]
Output:
[(' ', 1), ('M', 1), ('P', 1), ('a', 1), ('d', 1), ('e', 2), ('l', 1), ('n', 1), ('o', 2), ('r', 1)]
Just remembering, if the list is not sorted, the groupby function will not work!

Sorting and groupby
from itertools import groupby
val = [{'name': 'satyajit', 'address': 'btm', 'pin': 560076},
{'name': 'Mukul', 'address': 'Silk board', 'pin': 560078},
{'name': 'Preetam', 'address': 'btm', 'pin': 560076}]
for pin, list_data in groupby(sorted(val, key=lambda k: k['pin']),lambda x: x['pin']):
... print pin
... for rec in list_data:
... print rec
...
o/p:
560076
{'name': 'satyajit', 'pin': 560076, 'address': 'btm'}
{'name': 'Preetam', 'pin': 560076, 'address': 'btm'}
560078
{'name': 'Mukul', 'pin': 560078, 'address': 'Silk board'}

Sadly I don’t think it’s advisable to use itertools.groupby(). It’s just too hard to use safely, and it’s only a handful of lines to write something that works as expected.
def my_group_by(iterable, keyfunc):
"""Because itertools.groupby is tricky to use
The stdlib method requires sorting in advance, and returns iterators not
lists, and those iterators get consumed as you try to use them, throwing
everything off if you try to look at something more than once.
"""
ret = defaultdict(list)
for k in iterable:
ret[keyfunc(k)].append(k)
return dict(ret)
Use it like this:
def first_letter(x):
return x[0]
my_group_by('four score and seven years ago'.split(), first_letter)
to get
{'f': ['four'], 's': ['score', 'seven'], 'a': ['and', 'ago'], 'y': ['years']}

How do I use Python's itertools.groupby()?
You can use groupby to group things to iterate over. You give groupby an iterable, and a optional key function/callable by which to check the items as they come out of the iterable, and it returns an iterator that gives a two-tuple of the result of the key callable and the actual items in another iterable. From the help:
groupby(iterable[, keyfunc]) -> create an iterator which returns
(key, sub-iterator) grouped by each value of key(value).
Here's an example of groupby using a coroutine to group by a count, it uses a key callable (in this case, coroutine.send) to just spit out the count for however many iterations and a grouped sub-iterator of elements:
import itertools
def grouper(iterable, n):
def coroutine(n):
yield # queue up coroutine
for i in itertools.count():
for j in range(n):
yield i
groups = coroutine(n)
next(groups) # queue up coroutine
for c, objs in itertools.groupby(iterable, groups.send):
yield c, list(objs)
# or instead of materializing a list of objs, just:
# return itertools.groupby(iterable, groups.send)
list(grouper(range(10), 3))
prints
[(0, [0, 1, 2]), (1, [3, 4, 5]), (2, [6, 7, 8]), (3, [9])]

This basic implementation helped me understand this function. Hope it helps others as well:
arr = [(1, "A"), (1, "B"), (1, "C"), (2, "D"), (2, "E"), (3, "F")]
for k,g in groupby(arr, lambda x: x[0]):
print("--", k, "--")
for tup in g:
print(tup[1]) # tup[0] == k
-- 1 --
A
B
C
-- 2 --
D
E
-- 3 --
F

One useful example that I came across may be helpful:
from itertools import groupby
#user input
myinput = input()
#creating empty list to store output
myoutput = []
for k,g in groupby(myinput):
myoutput.append((len(list(g)),int(k)))
print(*myoutput)
Sample input: 14445221
Sample output: (1,1) (3,4) (1,5) (2,2) (1,1)

from random import randint
from itertools import groupby
l = [randint(1, 3) for _ in range(20)]
d = {}
for k, g in groupby(l, lambda x: x):
if not d.get(k, None):
d[k] = list(g)
else:
d[k] = d[k] + list(g)
the code above shows how groupby can be used to group a list based on the lambda function/key supplied. The only problem is that the output is not merged, this can be easily resolved using a dictionary.
Example:
l = [2, 1, 2, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 3, 1, 2, 1, 3, 2, 3]
after applying groupby the result will be:
for k, g in groupby(l, lambda x:x):
print(k, list(g))
2 [2]
1 [1]
2 [2]
3 [3]
1 [1]
3 [3]
2 [2]
1 [1]
3 [3, 3]
1 [1]
3 [3]
2 [2]
3 [3]
1 [1]
2 [2]
1 [1]
3 [3]
2 [2]
3 [3]
Once a dictionary is used as shown above following result is derived which can be easily iterated over:
{2: [2, 2, 2, 2, 2, 2], 1: [1, 1, 1, 1, 1, 1], 3: [3, 3, 3, 3, 3, 3, 3, 3]}

The key thing to recognize with itertools.groupby is that items are only grouped together as long as they're sequential in the iterable. This is why sorting works, because basically you're rearranging the collection so that all of the items which satisfy callback(item) now appear in the sorted collection sequentially.
That being said, you don't need to sort the list, you just need a collection of key-value pairs, where the value can grow in accordance to each group iterable yielded by groupby. i.e. a dict of lists.
>>> things = [("vehicle", "bear"), ("animal", "duck"), ("animal", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
>>> coll = {}
>>> for k, g in itertools.groupby(things, lambda x: x[0]):
... coll.setdefault(k, []).extend(i for _, i in g)
...
{'vehicle': ['bear', 'speed boat', 'school bus'], 'animal': ['duck', 'cactus']}

Related

How does one find the number of subsets for a particular row, in a 2D list with python? Can collections' Counter function be used?

Please excuse the title, it is hard to express the problem correctly without showing an example.
I have a very large 2D array with rows of varying sizes, for example:
big2DArray =
[["a","g","r"],
["a","r"],
["p","q"],
["a", "r"]]
I need to return a dictionary, it has to look something like this:
{('a','g','r'): 1, ('a', 'r'): 3, ('p', 'q'):1}
The ('a', 'r') tuple is found to have a value of 3, since it occurs twice as itself and once as a subset (less than or equal) to the tuple ('a', 'g', 'r').
Normally I would use something like this:
dictCounts = Counter(map(tuple, big2DArray))
Which, for big2Darray, would give:
{('a','g','r'): 1, ('a', 'r'): 2, ('p', 'q'):1}
My question is this, can Collections' Counter function be used so that it gives the counts for the subsets as well, like explained above? If not, is there any comparably efficient method to return my desired dictionary output for subsets?
Thanks so much!
Edit 1: Just for further clarity! I do not want to return all subsets, such as {('a','g'): 1, ('a','r'):3}, and so on. I only want to return the counts for the unique rows in the 2D array. So in this case the counts for: ('a','g','r'), ('a','r'), ('p','q').
Edit 2: The row ["a","r"] should be treated as equivalent to ["r", "a"], and so should the tuples ('a','r') and ('r','a')

You can use set.issubset with collections.Counter here.
Demo:
from collections import Counter
big2DArray = [["a","g","r"],
["a","r"],
["p","q"],
["a", "r"],
["r", "a"]]
counts = Counter(map(lambda x: tuple(sorted(x)), big2DArray))
count_lst = list(counts)
for i, k1 in enumerate(count_lst):
rest = count_lst[:i] + count_lst[i+1:]
for k2 in rest:
if set(k1).issubset(k2):
counts[k1] += 1
print(counts)
Output:
Counter({('a', 'r'): 4, ('a', 'g', 'r'): 1, ('p', 'q'): 1})
In the above code, in order to make sure ["r", "a"] and ["a","r"] are equivalent, you can sort them beforehand, and add them as tuples to Counter().
The other more efficient way would be to use frozenset, as shown in the other answer.

Here is one solution. It uses defaultdict instead of Counter. The dictionary keys are frozensets. If you need ordered tuple dictionary keys, see #RoadRunner's solution.
from itertools import combinations, chain
from collections import defaultdict
big2DArray = [["a","g","r"],
["a","r"],
["p","q"],
["a", "r"]]
arr_new = [[set(i) for k in range(2, len(j)+1) \
for i in combinations(j, k)] for j in big2DArray]
full_list = set(map(frozenset, big2DArray))
counter = defaultdict(int)
for i in range(len(big2DArray)):
for j in full_list:
if j in arr_new[i]:
counter[frozenset(j)] += 1
# defaultdict(int,
# {frozenset({'a', 'r'}): 3,
# frozenset({'a', 'g', 'r'}): 1,
# frozenset({'p', 'q'}): 1})

Arrange elements with same count in alphabetical order

Python Collection Counter.most_common(n) method returns the top n elements with their counts. However, if the counts for two elements is the same, how can I return the result sorted by alphabetical order?
For example: for a string like: BBBAAACCD, for the "2-most common" elements, I want the result to be for specified n = 2:
[('A', 3), ('B', 3), ('C', 2)]
and NOT:
[('B', 3), ('A', 3), ('C', 2)]
Notice that although A and B have the same frequency, A comes before B in the resultant list since it comes before B in alphabetical order.
[('A', 3), ('B', 3), ('C', 2)]
How can I achieve that?

Although this question is already a bit old i'd like to suggest a very simple solution to the problem which just involves sorting the input of Counter() before creating the Counter object itself. If you then call most_common(n) you will get the top n entries sorted in alphabetical order.
from collections import Counter
char_counter = Counter(sorted('ccccbbbbdaef'))
for char in char_counter.most_common(3):
print(*char)
resulting in the output:
b 4
c 4
a 1

There are two issues here:
Include duplicates when considering top n most common values excluding duplicates.
For any duplicates, order alphabetically.
None of the solutions thus far address the first issue. You can use a heap queue with the itertools unique_everseen recipe (also available in 3rd party libraries such as toolz.unique) to calculate the nth largest count.
Then use sorted with a custom key.
from collections import Counter
from heapq import nlargest
from toolz import unique
x = 'BBBAAACCD'
c = Counter(x)
n = 2
nth_largest = nlargest(n, unique(c.values()))[-1]
def sort_key(x):
return -x[1], x[0]
gen = ((k, v) for k, v in c.items() if v >= nth_largest)
res = sorted(gen, key=sort_key)
[('A', 3), ('B', 3), ('C', 2)]

I would first sort your output array in alphabetical order and than sort again by most occurrences which will keep the alphabetical order:
from collections import Counter
alphabetic_sorted = sorted(Counter('BBBAAACCD').most_common(), key=lambda tup: tup[0])
final_sorted = sorted(alphabetic_sorted, key=lambda tup: tup[1], reverse=True)
print(final_sorted[:3])
Output:
[('A', 3), ('B', 3), ('C', 2)]

I would go for:
sorted(Counter('AAABBBCCD').most_common(), key=lambda t: (-t[1], t[0]))
This sorts count descending (as they are already, which should be more performant) and then sorts by name ascending in each equal count group

This is one of the problems I got in the interview exam and failed to do it. Came home slept for a while and solution came in my mind.
from collections import Counter
def bags(list):
cnt = Counter(list)
print(cnt)
order = sorted(cnt.most_common(2), key=lambda i:( i[1],i[0]), reverse=True)
print(order)
return order[0][0]
print(bags(['a','b','c','a','b']))

s = "BBBAAACCD"
p = [(i,s.count(i)) for i in sorted(set(s))]
**If you are okay with not using the Counter.

from collections import Counter
s = 'qqweertyuiopasdfghjklzxcvbnm'
s_list = list(s)
elements = Counter(s_list).most_common()
print(elements)
alphabet_sort = sorted(elements, key=lambda x: x[0])
print(alphabet_sort)
num_sort = sorted(alphabet_sort, key=lambda x: x[1], reverse=True)
print(num_sort)
if you need to get slice:
print(num_sort[:3])

from collections import Counter
print(sorted(Counter('AAABBBCCD').most_common(3)))
This question seems to be a duplicate
How to sort Counter by value? - python

python group by and ordered a list by another list

I wonder if there is more Pythonic way to do group by and ordered a list by the order of another list.
The lstNeedOrder has couple pairs in random order. I want the output to be ordered as order in lst. The result should have all pairs containing a's then follow by all b's and c's.
The lstNeedOrder would only have either format in a/c or c/a.
input:
lstNeedOrder = ['a/b','c/b','f/d','a/e','c/d','a/c']
lst = ['a','b','c']
output:
res = ['a/b','a/c','a/e','c/b','c/d','f/d']
update
The lst = ['a','b','c'] is not actual data. it just make logic easy to understand. the actual data are more complex string pairs

Using sorted with customer key function:
>>> lstNeedOrder = ['a/b','c/d','f/d','a/e','c/d','a/c']
>>> lst = ['a','b','c']
>>> order = {ch: i for i, ch in enumerate(lst)} # {'a': 0, 'b': 1, 'c': 2}
>>> def sort_key(x):
... # 'a/b' -> (0, 1), 'c/d' -> (2, 3), ...
... a, b = x.split('/')
... return order.get(a, len(lst)), order.get(b, len(lst))
...
>>> sorted(lstNeedOrder, key=sort_key)
['a/b', 'a/c', 'a/e', 'c/d', 'c/d', 'f/d']

return the top n most frequently occurring chars and their respective counts in python

how to return the top n most frequently occurring chars and their respective counts # e.g 'aaaaaabbbbcccc', 2 should return [('a', 6), ('b', 4)] in python
I tried this
def top_chars(input, n):
list1=list(input)
list3=[]
list2=[]
list4=[]
set1=set(list1)
list2=list(set1)
def count(item):
count=0
for x in input:
if x in input:
count+=item.count(x)
list3.append(count)
return count
list2.sort(key=count)
list3.sort()
list4=list(zip(list2,list3))
list4.reverse()
list4.sort(key=lambda list4: ((list4[1]),(list4[0])), reverse=True)
return list4[0:n]
pass
but it doesn't work for the input ("aabc",2)
The output it should give is
[('a', 2), ('b', 1)]
but the output I get is
[('a', 2), ('c', 1)]

Use collections.Counter(); it has a most_common() method that does just that:
>>> from collections import Counter
>>> counts = Counter('aaaaaabbbbcccc')
>>> counts.most_common(2)
[('a', 6), ('c', 4)]
Note that for both the above input and in aabc both b and c have the same count, and both can be valid top contenders. Because both you and Counter sort by count then key in reverse, c is sorted before b.
If instead of sorting in reverse, you used the negative count as the sort key, you'd sort b before c again:
list4.sort(key=lambda v: (-v[1], v[0))
Not that Counter.most_common() actually uses sorting when your are asking for fewer items than there are keys in the counter; it uses a heapq-based algorithm instead to only get the top N items.

A little harder, but also works:
text = "abbbaaaa"
dict = {}
for lines in text:
for char in lines:
dict[char] = dict.get(char, 0) + 1
print dict

Text="abbbaaaa"
dict={ }
For lines in text:
For chae in lines:
dict[char]=dict.get(char,0)+1
Print dict

Dictionary value sorting

I have a dictionary with tuples as keys (containing a string and an int) and floats as values. An example:
first = {}
first['monkey', 1] = 130.0
first['dog', 2] = 123.0-
first['cat', 3] = 130.0
first['cat', 4] = 130.0
first['mouse', 6] = 100.0
Now, I need to make a new dictionary, which has the original dictionary key's second element as
it's key. The new dictionary's value should be the the place it stands if the keys were sorted. Added to this, there are two exceptions:
If two dicts have values that are equal, but have different strings in the key, the one with the lowest int in the key should be placed higher.
If two dicts have values that are equal, but have different ints in the key, they should be placed equal in the new dict and all get the same values.
So, the new dictionary should be as the following:
second[1] = 3
second[2] = 2
second[3] = 4
second[4] = 4
second[6] = 1
I know that it's ignorant to ask someone else to solve my problem without giving my code for it.
But i'm simply don't know how to approach the problem. I would be glad if you could provide me with an explanation how would you solve this problem , or even give me a pseudocode of the algorithm.

import itertools as IT
first = {
('monkey',1): 130.0,
('dog',2): 123.0,
('cat', 3): 130.0,
('cat', 4): 130.0,
('mouse', 6): 100.0
}
counter = 0
ordered = sorted(first, key = lambda k: (first[k], k[1], k[0]))
second = {}
for key, group in IT.groupby(ordered, first.__getitem__):
# group = list(group)
# print(key, group)
# (100.0, [('mouse', 6)])
# (123.0, [('dog', 2)])
# (130.0, [('monkey', 1), ('cat', 3), ('cat', 4)])
previous = None
for name, num in group:
if name != previous:
counter += 1
second[num] = counter
previous = name
print(second)
yields
{1: 3, 2: 2, 3: 4, 4: 4, 6: 1}
Explanation:
The first step is to order the (name, num) keys of first according to the associated values. However, in case of ties, the num is used. If there is still a tie, the name is used to break the tie.
In [96]: ordered = sorted(first, key = lambda k: (first[k], k[1], k[0]))
In [97]: ordered
Out[97]: [('mouse', 6), ('dog', 2), ('monkey', 1), ('cat', 3), ('cat', 4)]
Next, we need to group the items in ordered since there are special rules when the value first[k] is the same. The grouping can be achieved using itertools.groupby:
In [99]: for key, group in IT.groupby(ordered, first.__getitem__):
....: print(key, list(group))
....:
....:
(100.0, [('mouse', 6)])
(123.0, [('dog', 2)])
(130.0, [('monkey', 1), ('cat', 3), ('cat', 4)])
itertools.groupby is collecting the items in ordered into bunches according to the value of the key, first.__getitem__(item). For example,
In [100]: first.__getitem__(('monkey', 1))
Out[100]: 130.0
In [101]: first.__getitem__(('cat', 3))
Out[101]: 130.0
first.__getitem__(item) is just a fancy way of writing first[item]. The reason why I use first.__getitem__ is because itertools.groupby expects a function for its second argument, and first.__getitem__ is the function that fits the bill.
Finally, we iterate through each group. Basically, we want to do this:
for name, num in group:
counter += 1
second[num] = counter
except that, when the names are equal, we do not want to advance the counter. So to check if the names are equal, it helps to store the previous name:
previous = None
for name, num in group:
if name != previous:
counter += 1
...
previous = name
Warning: Note that rkd91's code and my code produce different answers for
first = {
('monkey',1): 130.0,
('dog',2): 123.0,
('cat', 3): 129.0,
('cat', 4): 130.0,
('mouse', 6): 100.0
}
probably due to different interpretations of the specifications. I'll leave it to you do decide which is yielding the desired output.
#rdk91's code yields
{1: 4, 2: 2, 3: 5, 4: 3, 6: 1}
my code yields
{1: 4, 2: 2, 3: 3, 4: 5, 6: 1}

1) Get a list of key-value tuples using first_list = first.items()
2) Create a custom comparator function that will sort the list according to your criteria.
3) Sort the list using first_list.sort(comparator)
4) Build your new dictionary from the sorted list.

rob#rivertam:~/Programming$ cat sorter.py
first = {}
first['monkey', 1] = 130.0
first['dog', 2] = 123.0
first['cat', 3] = 130.0
first['cat', 4] = 130.0
first['mouse', 6] = 100.0
# Get the keys of first, sorted by the value (ascending order), and then by the integer in the key (descending order) if two keys have the same value
s = sorted(first, key=lambda x: x[0])
s.reverse()
s = sorted(s, key=lambda x: first[x])
# Loop through these, and create a new list where the key is the integer in the old key, and the value is the position in the sorted order.
last_val = None
last = (None, None)
index = 0
new_dict = {}
for item in s:
if not ((first[item] == last_val) and (item[1] != last[1]) and item[0] == last[0]):
# When we have the same value, the same string but a different integer from the last key, consider it to be the same position in the sorted order.
index += 1
new_dict[item[1]] = index
last_val = first[item]
last = item
print new_dict
rob#rivertam:~/Programming$ python sorter.py
{1: 3, 2: 2, 3: 4, 4: 4, 6: 1}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I use itertools.groupby()? - python

This basic implementation helped me understand this function. Hope it helps others as well: arr = [(1, "A"), (1, "B"), (1, "C"), (2, "D"), (2, "E"), (3, "F")] for k,g in groupby(arr, lambda x: x[0]): print("--", k, "--") for tup in g: print(tup[1]) # tup[0] == k -- 1 -- A B C -- 2 -- D E -- 3 -- F

Related

How does one find the number of subsets for a particular row, in a 2D list with python? Can collections' Counter function be used?

Arrange elements with same count in alphabetical order

python group by and ordered a list by another list

return the top n most frequently occurring chars and their respective counts in python

Dictionary value sorting

Categories

Resources