Splitting a string into consecutive counts? - python

For example, if the given string is this:
"aaabbbbccdaeeee"
I want to say something like:
3 a, 4 b, 2 c, 1 d, 1 a, 4 e
It is easy enough to do in Python with a brute force loop, but I am wondering if there is a more Pythonic / cleaner one-liner type of approach.
My brute force:
while source!="":
leading = source[0]
c=0
while source!="" and source[0]==leading:
c+=1
source=source[1:]
print(c, leading)

Use a Counter for a count of each distinct letter in the string regardless of position:
>>> s="aaabbbbccdaeeee"
>>> from collections import Counter
>>> Counter(s)
Counter({'a': 4, 'b': 4, 'e': 4, 'c': 2, 'd': 1})
You can use groupby if the position in the string has meaning:
from itertools import groupby
li=[]
for k, l in groupby(s):
li.append((k, len(list(l))))
print li
Prints:
[('a', 3), ('b', 4), ('c', 2), ('d', 1), ('a', 1), ('e', 4)]
Which can be reduce to a list comprehension:
[(k,len(list(l))) for k, l in groupby(s)]
You can even use a regex:
>>> [(m.group(0)[0], len(m.group(0))) for m in re.finditer(r'((\w)\2*)', s)]
[('a', 3), ('b', 4), ('c', 2), ('d', 1), ('a', 1), ('e', 4)]

There are a number of different ways to solve the problem. #dawg has already posted the best solution, but if for some reason you aren't allowed to use Counter() (maybe a job interview or school assignment) then you can actually solve the problem in a few ways.
from collections import Counter, defaultdict
def counter_counts(s):
""" Preferred method using Counter()
Arguments:
s {string} -- [string to have each character counted]
Returns:
[dict] -- [dictionary of counts of each char]
"""
return Counter(s)
def default_counts(s):
""" Alternative solution using defaultdict
Arguments:
s {string} -- [string to have each character counted]
Returns:
[dict] -- [dictionary of counts of each char]
"""
counts = defaultdict(int) # each key is initalized to 0
for char in s:
counts[char] += 1 # increment the count of each character by 1
return counts
def vanilla_counts_1(s):
""" Alternative solution using a vanilla dicitonary
Arguments:
s {string} -- [string to have each character counted]
Returns:
[dict] -- [dictionary of counts of each char]
"""
counts = {}
for char in s:
# we have to manually check that each value is in the dictionary before attempting to increment it
if char in counts:
counts[char] += 1
else:
counts[char] = 1
return counts
def vanilla_counts_2(s):
""" Alternative solution using a vanilla dicitonary
This version uses the .get() method to increment instead of checking if a key already exists
Arguments:
s {string} -- [string to have each character counted]
Returns:
[dict] -- [dictionary of counts of each char]
"""
counts = {}
for char in s:
# the second argument in .get() is the default value if we dont find the key
counts[char] = counts.get(char, 0) + 1
return counts
And just for fun lets take a look at how each method performs.
For s = "aaabbbbccdaeeee" and 10,000 runs:
Counter: 0.0330204963684082s
defaultdict: 0.01565241813659668s
vanilla 1: 0.01562952995300293s
vanilla 2: 0.015581130981445312s
(actually rather surprising results)
Now let's test what happens if we set our string to the entire plaintext version of the book of Genesis and 1,000 runs:
Counter: 8.500739336013794s
defaultdict: 14.721554040908813s
vanilla 1: 18.089043855667114s
vanilla 2: 27.01840090751648s
Looks like the overhead of creating the Counter() object becomes much less important!
(These weren't very scientific tests, but it was a bit of fun).

Related

How to count elements on each position in lists

I have a lot of lists like:
SI821lzc1n4
MCap1kr01lv
All of them have the same length. I need to count how many times each symbol appears on each position. Example:
abcd
a5c1
b51d
Here it'll be a5cd
One way is to use zip to associate characters in the same position. We can then send all of the characters from each position to a Counter, then use Counter.most_common to get the most common character
from collections import Counter
l = ['abcd', 'a5c1', 'b51d']
print(''.join([Counter(z).most_common(1)[0][0] for z in zip(*l)]))
# a5cd
from statistics import mode
[mode([x[i] for x in y]) for i in xrange(len(y[0]))]
where y is your list.
Python 3.4 and up
You could use combination of zip and Counter
a = ("abcd")
b = ("a5c1")
c = ("b51d")
from collections import Counter
zippedList = list(zip(a,b,c))
print("zipped: {}".format(zippedList))
final = ""
for x in zippedList:
countLetters = Counter(x)
print(countLetters)
final += countLetters.most_common(3)[0][0]
print("output: {}".format(final))
output:
zipped: [('a', 'a', 'b'), ('b', '5', '5'), ('c', 'c', '1'), ('d', '1', 'd')]
Counter({'a': 2, 'b': 1})
Counter({'5': 2, 'b': 1})
Counter({'c': 2, '1': 1})
Counter({'d': 2, '1': 1})
output: a5cd
This all depends on where your list is. Is your list coming from another file or is it an actual array? At the end of the day, the best way to do this simply is going to be to use a dictionary and a for loop.
new_dict = {}
for i in range(len(line)):
if i in new_dict:
new_dict[i].append(line[i])
else:
new_dict[i] = [line[i]]
Then after that I'm assuming that you'd like to output the four most common element appearances. For that I'd recommend importing statistics and using the mode method...
from statistics import mode
new_line = ""
for key in new_dict:
x = mode(new_dict[key])
new_line = new_line + x
However, your question is quite vague, please elaborate more next time.
P.s. I'm a newbie so all you experienced programmers plz don't hate :)
I would use a combination of defaultdict, enumerate, and Counter:
>>> from collections import Counter, defaultdict
>>> data = '''abcd
a5c1
b51d
'''
>>> poscount = defaultdict(Counter)
>>> for line in data.split():
for i, character in enumerate(line):
poscount[i][character] += 1
>>> ''.join([poscount[i].most_common(1)[0][0] for i in sorted(poscount)])
'a5cd'
Here's how it works:
The defaultdict() creates new entries when it sees a new key.
The enumerate() function returns both the character and its position in the line.
The Counter counts the occurences of individual characters
Combining the three makes a defaultdict whose keys are the column positions and whose values are character counters. That gives you one character counter per column.
The most_common() method returns the highest frequency (character, count) pair for that counter.
The [0][0] extracts the character from the list of (character, count) tuples.
The str.join() method combines the results back together.

How does one find the number of subsets for a particular row, in a 2D list with python? Can collections' Counter function be used?

Please excuse the title, it is hard to express the problem correctly without showing an example.
I have a very large 2D array with rows of varying sizes, for example:
big2DArray =
[["a","g","r"],
["a","r"],
["p","q"],
["a", "r"]]
I need to return a dictionary, it has to look something like this:
{('a','g','r'): 1, ('a', 'r'): 3, ('p', 'q'):1}
The ('a', 'r') tuple is found to have a value of 3, since it occurs twice as itself and once as a subset (less than or equal) to the tuple ('a', 'g', 'r').
Normally I would use something like this:
dictCounts = Counter(map(tuple, big2DArray))
Which, for big2Darray, would give:
{('a','g','r'): 1, ('a', 'r'): 2, ('p', 'q'):1}
My question is this, can Collections' Counter function be used so that it gives the counts for the subsets as well, like explained above? If not, is there any comparably efficient method to return my desired dictionary output for subsets?
Thanks so much!
Edit 1: Just for further clarity! I do not want to return all subsets, such as {('a','g'): 1, ('a','r'):3}, and so on. I only want to return the counts for the unique rows in the 2D array. So in this case the counts for: ('a','g','r'), ('a','r'), ('p','q').
Edit 2: The row ["a","r"] should be treated as equivalent to ["r", "a"], and so should the tuples ('a','r') and ('r','a')
You can use set.issubset with collections.Counter here.
Demo:
from collections import Counter
big2DArray = [["a","g","r"],
["a","r"],
["p","q"],
["a", "r"],
["r", "a"]]
counts = Counter(map(lambda x: tuple(sorted(x)), big2DArray))
count_lst = list(counts)
for i, k1 in enumerate(count_lst):
rest = count_lst[:i] + count_lst[i+1:]
for k2 in rest:
if set(k1).issubset(k2):
counts[k1] += 1
print(counts)
Output:
Counter({('a', 'r'): 4, ('a', 'g', 'r'): 1, ('p', 'q'): 1})
In the above code, in order to make sure ["r", "a"] and ["a","r"] are equivalent, you can sort them beforehand, and add them as tuples to Counter().
The other more efficient way would be to use frozenset, as shown in the other answer.
Here is one solution. It uses defaultdict instead of Counter. The dictionary keys are frozensets. If you need ordered tuple dictionary keys, see #RoadRunner's solution.
from itertools import combinations, chain
from collections import defaultdict
big2DArray = [["a","g","r"],
["a","r"],
["p","q"],
["a", "r"]]
arr_new = [[set(i) for k in range(2, len(j)+1) \
for i in combinations(j, k)] for j in big2DArray]
full_list = set(map(frozenset, big2DArray))
counter = defaultdict(int)
for i in range(len(big2DArray)):
for j in full_list:
if j in arr_new[i]:
counter[frozenset(j)] += 1
# defaultdict(int,
# {frozenset({'a', 'r'}): 3,
# frozenset({'a', 'g', 'r'}): 1,
# frozenset({'p', 'q'}): 1})

return the top n most frequently occurring chars and their respective counts in python

how to return the top n most frequently occurring chars and their respective counts # e.g 'aaaaaabbbbcccc', 2 should return [('a', 6), ('b', 4)] in python
I tried this
def top_chars(input, n):
list1=list(input)
list3=[]
list2=[]
list4=[]
set1=set(list1)
list2=list(set1)
def count(item):
count=0
for x in input:
if x in input:
count+=item.count(x)
list3.append(count)
return count
list2.sort(key=count)
list3.sort()
list4=list(zip(list2,list3))
list4.reverse()
list4.sort(key=lambda list4: ((list4[1]),(list4[0])), reverse=True)
return list4[0:n]
pass
but it doesn't work for the input ("aabc",2)
The output it should give is
[('a', 2), ('b', 1)]
but the output I get is
[('a', 2), ('c', 1)]
Use collections.Counter(); it has a most_common() method that does just that:
>>> from collections import Counter
>>> counts = Counter('aaaaaabbbbcccc')
>>> counts.most_common(2)
[('a', 6), ('c', 4)]
Note that for both the above input and in aabc both b and c have the same count, and both can be valid top contenders. Because both you and Counter sort by count then key in reverse, c is sorted before b.
If instead of sorting in reverse, you used the negative count as the sort key, you'd sort b before c again:
list4.sort(key=lambda v: (-v[1], v[0))
Not that Counter.most_common() actually uses sorting when your are asking for fewer items than there are keys in the counter; it uses a heapq-based algorithm instead to only get the top N items.
A little harder, but also works:
text = "abbbaaaa"
dict = {}
for lines in text:
for char in lines:
dict[char] = dict.get(char, 0) + 1
print dict
Text="abbbaaaa"
dict={ }
For lines in text:
For chae in lines:
dict[char]=dict.get(char,0)+1
Print dict

How to find the most frequent character? [duplicate]

This question already has answers here:
Finding the most frequent character in a string
(18 answers)
Closed 9 years ago.
I am looking to write a program that lets a user enter a string and then shows the character that appears the most often in their inputted string. I originally wrote a program that counted the number of times a specific letter appeared (the letter T), but was looking to convert that. It may take a whole new code, though.
My former program is here:
# This program counts the number of times
# the letter T (uppercase or lowercase)
# appears in a string.
def main():
# Create a variable to use to hold the count.
# The variable must start with 0.
count = 0
# Get a string from the user.
my_string = input('Enter a sentence: ')
# Count the Ts.
for ch in my_string:
if ch == 'T' or ch == 't':
count += 1
# Print the result.
print('The letter T appears', count, 'times.')
# Call the main function.
main()
Similar to this code, but I am not familiar with a lot of what is used there. I am thinking the version of Python I use is older, but I may not be correct.
You could use collections.Counter from Py's std lib.
Take a look here.
Also, a small example:
>>> from collections import Counter
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
EDIT
One liner!
count = [(i, string.count(i)) for i in set(string)]
Or if you want to write your own function (that's interesting!) you could start with something like this:
string = "abracadabra"
def Counter(string):
count = {}
for each in string:
if each in count:
count[each] += 1
else:
count[each] = 1
return count # Or 'return [(k, v) for k, v in count]'
Counter(string)
out: {'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1}
>>> s = 'abracadabra'
>>> max([(i, s.count(i)) for i in set(s.lower)], key=lambda x: x[1])
('a', 5)
This will not show ties though. This will:
>>> s = 'abracadabrarrr'
>>> lcounts = sorted([(i, s.count(i)) for i in set(s)], key=lambda x: x[1])
>>> count = max(lcounts, key=lambda x: x[1])[1]
>>> filter(lambda x: x[1] == count, lcounts)
[('a', 5), ('r', 5)]

How do I use itertools.groupby()?

I haven't been able to find an understandable explanation of how to actually use Python's itertools.groupby() function. What I'm trying to do is this:
Take a list - in this case, the children of an objectified lxml element
Divide it into groups based on some criteria
Then later iterate over each of these groups separately.
I've reviewed the documentation, but I've had trouble trying to apply them beyond a simple list of numbers.
So, how do I use of itertools.groupby()? Is there another technique I should be using? Pointers to good "prerequisite" reading would also be appreciated.
IMPORTANT NOTE: You have to sort your data first.
The part I didn't get is that in the example construction
groups = []
uniquekeys = []
for k, g in groupby(data, keyfunc):
groups.append(list(g)) # Store group iterator as a list
uniquekeys.append(k)
k is the current grouping key, and g is an iterator that you can use to iterate over the group defined by that grouping key. In other words, the groupby iterator itself returns iterators.
Here's an example of that, using clearer variable names:
from itertools import groupby
things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
for key, group in groupby(things, lambda x: x[0]):
for thing in group:
print("A %s is a %s." % (thing[1], key))
print("")
This will give you the output:
A bear is a animal.
A duck is a animal.
A cactus is a plant.
A speed boat is a vehicle.
A school bus is a vehicle.
In this example, things is a list of tuples where the first item in each tuple is the group the second item belongs to.
The groupby() function takes two arguments: (1) the data to group and (2) the function to group it with.
Here, lambda x: x[0] tells groupby() to use the first item in each tuple as the grouping key.
In the above for statement, groupby returns three (key, group iterator) pairs - once for each unique key. You can use the returned iterator to iterate over each individual item in that group.
Here's a slightly different example with the same data, using a list comprehension:
for key, group in groupby(things, lambda x: x[0]):
listOfThings = " and ".join([thing[1] for thing in group])
print(key + "s: " + listOfThings + ".")
This will give you the output:
animals: bear and duck.
plants: cactus.
vehicles: speed boat and school bus.
itertools.groupby is a tool for grouping items.
From the docs, we glean further what it might do:
# [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
groupby objects yield key-group pairs where the group is a generator.
Features
A. Group consecutive items together
B. Group all occurrences of an item, given a sorted iterable
C. Specify how to group items with a key function *
Comparisons
# Define a printer for comparing outputs
>>> def print_groupby(iterable, keyfunc=None):
... for k, g in it.groupby(iterable, keyfunc):
... print("key: '{}'--> group: {}".format(k, list(g)))
# Feature A: group consecutive occurrences
>>> print_groupby("BCAACACAADBBB")
key: 'B'--> group: ['B']
key: 'C'--> group: ['C']
key: 'A'--> group: ['A', 'A']
key: 'C'--> group: ['C']
key: 'A'--> group: ['A']
key: 'C'--> group: ['C']
key: 'A'--> group: ['A', 'A']
key: 'D'--> group: ['D']
key: 'B'--> group: ['B', 'B', 'B']
# Feature B: group all occurrences
>>> print_groupby(sorted("BCAACACAADBBB"))
key: 'A'--> group: ['A', 'A', 'A', 'A', 'A']
key: 'B'--> group: ['B', 'B', 'B', 'B']
key: 'C'--> group: ['C', 'C', 'C']
key: 'D'--> group: ['D']
# Feature C: group by a key function
>>> # islower = lambda s: s.islower() # equivalent
>>> def islower(s):
... """Return True if a string is lowercase, else False."""
... return s.islower()
>>> print_groupby(sorted("bCAaCacAADBbB"), keyfunc=islower)
key: 'False'--> group: ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'D']
key: 'True'--> group: ['a', 'a', 'b', 'b', 'c']
Uses
Anagrams (see notebook)
Binning
Group odd and even numbers
Group a list by values
Remove duplicate elements
Find indices of repeated elements in an array
Split an array into n-sized chunks
Find corresponding elements between two lists
Compression algorithm (see notebook)/Run Length Encoding
Grouping letters by length, key function (see notebook)
Consecutive values over a threshold (see notebook)
Find ranges of numbers in a list or continuous items (see docs)
Find all related longest sequences
Take consecutive sequences that meet a condition (see related post)
Note: Several of the latter examples derive from Víctor Terrón's PyCon (talk) (Spanish), "Kung Fu at Dawn with Itertools". See also the groupby source code written in C.
* A function where all items are passed through and compared, influencing the result. Other objects with key functions include sorted(), max() and min().
Response
# OP: Yes, you can use `groupby`, e.g.
[do_something(list(g)) for _, g in groupby(lxml_elements, criteria_func)]
The example on the Python docs is quite straightforward:
groups = []
uniquekeys = []
for k, g in groupby(data, keyfunc):
groups.append(list(g)) # Store group iterator as a list
uniquekeys.append(k)
So in your case, data is a list of nodes, keyfunc is where the logic of your criteria function goes and then groupby() groups the data.
You must be careful to sort the data by the criteria before you call groupby or it won't work. groupby method actually just iterates through a list and whenever the key changes it creates a new group.
A neato trick with groupby is to run length encoding in one line:
[(c,len(list(cgen))) for c,cgen in groupby(some_string)]
will give you a list of 2-tuples where the first element is the char and the 2nd is the number of repetitions.
Edit: Note that this is what separates itertools.groupby from the SQL GROUP BY semantics: itertools doesn't (and in general can't) sort the iterator in advance, so groups with the same "key" aren't merged.
Another example:
for key, igroup in itertools.groupby(xrange(12), lambda x: x // 5):
print key, list(igroup)
results in
0 [0, 1, 2, 3, 4]
1 [5, 6, 7, 8, 9]
2 [10, 11]
Note that igroup is an iterator (a sub-iterator as the documentation calls it).
This is useful for chunking a generator:
def chunker(items, chunk_size):
'''Group items in chunks of chunk_size'''
for _key, group in itertools.groupby(enumerate(items), lambda x: x[0] // chunk_size):
yield (g[1] for g in group)
with open('file.txt') as fobj:
for chunk in chunker(fobj):
process(chunk)
Another example of groupby - when the keys are not sorted. In the following example, items in xx are grouped by values in yy. In this case, one set of zeros is output first, followed by a set of ones, followed again by a set of zeros.
xx = range(10)
yy = [0, 0, 0, 1, 1, 1, 0, 0, 0, 0]
for group in itertools.groupby(iter(xx), lambda x: yy[x]):
print group[0], list(group[1])
Produces:
0 [0, 1, 2]
1 [3, 4, 5]
0 [6, 7, 8, 9]
WARNING:
The syntax list(groupby(...)) won't work the way that you intend. It seems to destroy the internal iterator objects, so using
for x in list(groupby(range(10))):
print(list(x[1]))
will produce:
[]
[]
[]
[]
[]
[]
[]
[]
[]
[9]
Instead, of list(groupby(...)), try [(k, list(g)) for k,g in groupby(...)], or if you use that syntax often,
def groupbylist(*args, **kwargs):
return [(k, list(g)) for k, g in groupby(*args, **kwargs)]
and get access to the groupby functionality while avoiding those pesky (for small data) iterators all together.
I would like to give another example where groupby without sort is not working. Adapted from example by James Sulak
from itertools import groupby
things = [("vehicle", "bear"), ("animal", "duck"), ("animal", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
for key, group in groupby(things, lambda x: x[0]):
for thing in group:
print "A %s is a %s." % (thing[1], key)
print " "
output is
A bear is a vehicle.
A duck is a animal.
A cactus is a animal.
A speed boat is a vehicle.
A school bus is a vehicle.
there are two groups with vehicule, whereas one could expect only one group
#CaptSolo, I tried your example, but it didn't work.
from itertools import groupby
[(c,len(list(cs))) for c,cs in groupby('Pedro Manoel')]
Output:
[('P', 1), ('e', 1), ('d', 1), ('r', 1), ('o', 1), (' ', 1), ('M', 1), ('a', 1), ('n', 1), ('o', 1), ('e', 1), ('l', 1)]
As you can see, there are two o's and two e's, but they got into separate groups. That's when I realized you need to sort the list passed to the groupby function. So, the correct usage would be:
name = list('Pedro Manoel')
name.sort()
[(c,len(list(cs))) for c,cs in groupby(name)]
Output:
[(' ', 1), ('M', 1), ('P', 1), ('a', 1), ('d', 1), ('e', 2), ('l', 1), ('n', 1), ('o', 2), ('r', 1)]
Just remembering, if the list is not sorted, the groupby function will not work!
Sorting and groupby
from itertools import groupby
val = [{'name': 'satyajit', 'address': 'btm', 'pin': 560076},
{'name': 'Mukul', 'address': 'Silk board', 'pin': 560078},
{'name': 'Preetam', 'address': 'btm', 'pin': 560076}]
for pin, list_data in groupby(sorted(val, key=lambda k: k['pin']),lambda x: x['pin']):
... print pin
... for rec in list_data:
... print rec
...
o/p:
560076
{'name': 'satyajit', 'pin': 560076, 'address': 'btm'}
{'name': 'Preetam', 'pin': 560076, 'address': 'btm'}
560078
{'name': 'Mukul', 'pin': 560078, 'address': 'Silk board'}
Sadly I don’t think it’s advisable to use itertools.groupby(). It’s just too hard to use safely, and it’s only a handful of lines to write something that works as expected.
def my_group_by(iterable, keyfunc):
"""Because itertools.groupby is tricky to use
The stdlib method requires sorting in advance, and returns iterators not
lists, and those iterators get consumed as you try to use them, throwing
everything off if you try to look at something more than once.
"""
ret = defaultdict(list)
for k in iterable:
ret[keyfunc(k)].append(k)
return dict(ret)
Use it like this:
def first_letter(x):
return x[0]
my_group_by('four score and seven years ago'.split(), first_letter)
to get
{'f': ['four'], 's': ['score', 'seven'], 'a': ['and', 'ago'], 'y': ['years']}
How do I use Python's itertools.groupby()?
You can use groupby to group things to iterate over. You give groupby an iterable, and a optional key function/callable by which to check the items as they come out of the iterable, and it returns an iterator that gives a two-tuple of the result of the key callable and the actual items in another iterable. From the help:
groupby(iterable[, keyfunc]) -> create an iterator which returns
(key, sub-iterator) grouped by each value of key(value).
Here's an example of groupby using a coroutine to group by a count, it uses a key callable (in this case, coroutine.send) to just spit out the count for however many iterations and a grouped sub-iterator of elements:
import itertools
def grouper(iterable, n):
def coroutine(n):
yield # queue up coroutine
for i in itertools.count():
for j in range(n):
yield i
groups = coroutine(n)
next(groups) # queue up coroutine
for c, objs in itertools.groupby(iterable, groups.send):
yield c, list(objs)
# or instead of materializing a list of objs, just:
# return itertools.groupby(iterable, groups.send)
list(grouper(range(10), 3))
prints
[(0, [0, 1, 2]), (1, [3, 4, 5]), (2, [6, 7, 8]), (3, [9])]
This basic implementation helped me understand this function. Hope it helps others as well:
arr = [(1, "A"), (1, "B"), (1, "C"), (2, "D"), (2, "E"), (3, "F")]
for k,g in groupby(arr, lambda x: x[0]):
print("--", k, "--")
for tup in g:
print(tup[1]) # tup[0] == k
-- 1 --
A
B
C
-- 2 --
D
E
-- 3 --
F
One useful example that I came across may be helpful:
from itertools import groupby
#user input
myinput = input()
#creating empty list to store output
myoutput = []
for k,g in groupby(myinput):
myoutput.append((len(list(g)),int(k)))
print(*myoutput)
Sample input: 14445221
Sample output: (1,1) (3,4) (1,5) (2,2) (1,1)
from random import randint
from itertools import groupby
l = [randint(1, 3) for _ in range(20)]
d = {}
for k, g in groupby(l, lambda x: x):
if not d.get(k, None):
d[k] = list(g)
else:
d[k] = d[k] + list(g)
the code above shows how groupby can be used to group a list based on the lambda function/key supplied. The only problem is that the output is not merged, this can be easily resolved using a dictionary.
Example:
l = [2, 1, 2, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 3, 1, 2, 1, 3, 2, 3]
after applying groupby the result will be:
for k, g in groupby(l, lambda x:x):
print(k, list(g))
2 [2]
1 [1]
2 [2]
3 [3]
1 [1]
3 [3]
2 [2]
1 [1]
3 [3, 3]
1 [1]
3 [3]
2 [2]
3 [3]
1 [1]
2 [2]
1 [1]
3 [3]
2 [2]
3 [3]
Once a dictionary is used as shown above following result is derived which can be easily iterated over:
{2: [2, 2, 2, 2, 2, 2], 1: [1, 1, 1, 1, 1, 1], 3: [3, 3, 3, 3, 3, 3, 3, 3]}
The key thing to recognize with itertools.groupby is that items are only grouped together as long as they're sequential in the iterable. This is why sorting works, because basically you're rearranging the collection so that all of the items which satisfy callback(item) now appear in the sorted collection sequentially.
That being said, you don't need to sort the list, you just need a collection of key-value pairs, where the value can grow in accordance to each group iterable yielded by groupby. i.e. a dict of lists.
>>> things = [("vehicle", "bear"), ("animal", "duck"), ("animal", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
>>> coll = {}
>>> for k, g in itertools.groupby(things, lambda x: x[0]):
... coll.setdefault(k, []).extend(i for _, i in g)
...
{'vehicle': ['bear', 'speed boat', 'school bus'], 'animal': ['duck', 'cactus']}

Categories