find all the elements with equal counts with Counter().most_common - python

I am using Counter().most_common from collections. My input(e.g counter_list) has many equal elements but if i use Counter(mylist).most_common(1) instead of getting all the results i get only the first
mylist=['gene0.txt','gene1.txt','gene1.txt','gene2.txt','gene2.txt','gene3.txt','gene3.txt']
C = Counter(mylist).most_common(1)
I am getting this
[('gene1.txt',2)]
Instead of that
[('gene1.txt',2),('gene2.txt',2),('gene3.txt',2)]

One solution is using itertools.takewhile:
mylist=['gene0.txt','gene1.txt','gene1.txt','gene2.txt','gene2.txt','gene3.txt','gene3.txt']
from collections import Counter
from itertools import takewhile
c = Counter(mylist)
print(list(takewhile(lambda val: val[1] == c.most_common(1)[0][1], c.most_common())))
Prints:
[('gene1.txt', 2), ('gene2.txt', 2), ('gene3.txt', 2)]
Or without itertools.takewhile, using list comprehension:
print([item for item in c.most_common() if item[1] == c.most_common(1)[0][1]])
But this will iterate over all items in most_common().
EDIT (Some debug info for clarity):
for val in c.most_common():
print('val[0]={} val[1]={} c.most_common(1)[0][1]={}'.format(val[0], val[1], c.most_common(1)[0][1]))
Prints:
val[0]=gene1.txt val[1]=2 c.most_common(1)[0][1]=2
val[0]=gene2.txt val[1]=2 c.most_common(1)[0][1]=2
val[0]=gene3.txt val[1]=2 c.most_common(1)[0][1]=2
val[0]=gene0.txt val[1]=1 c.most_common(1)[0][1]=2

Related

Python - itertools.groupby 2

Just having trouble with itertools.groupby. Given a list of dictionaries,
my_list= [
"AD01", "AD01AA", "AD01AB", "AD01AC", "AD01AD","AD02", "AD02AA", "AD02AB", "AD02AC"]
from this list, I expected to create a dictionary, where the key is the shortest name and the values ​​are the longest names
example
[
{"Legacy" : "AD01", "rphy" : ["AD01AA", "AD01AB", "AD01AC", "AD01AD"]},
{"Legacy" : "AD02", "rphy" : ["AD02AA", "AD02AB", "AD02AC"]},
]
could you help me please
You can use itertools.groupby, with some nexts:
from itertools import groupby
my_list= ["AD01", "AD01AA", "AD01AB", "AD01AC", "AD01AD","AD02", "AD02AA", "AD02AB", "AD02AC"]
groups = groupby(my_list, len)
output = [{'Legacy': next(g), 'rphy': list(next(groups)[1])} for _, g in groups]
print(output)
# [{'Legacy': 'AD01', 'rphy': ['AD01AA', 'AD01AB', 'AD01AC', 'AD01AD']},
# {'Legacy': 'AD02', 'rphy': ['AD02AA', 'AD02AB', 'AD02AC']}]
This is not robust to reordering of the input list.
Also, if there is some "gap" in the input, e.g., if "AD01" does not have corresponding 'rphy' entries, then it will throw a StopIteration error as you have found out. In that case you can use a more conventional approach:
from itertools import groupby
my_list= ["AD01", "AD02", "AD02AA", "AD02AB", "AD02AC"]
output = []
for item in my_list:
if len(item) == 4:
dct = {'Legacy': item, 'rphy': []}
output.append(dct)
else:
dct['rphy'].append(item)
print(output)
# [{'Legacy': 'AD01', 'rphy': []}, {'Legacy': 'AD02', 'rphy': ['AD02AA', 'AD02AB', 'AD02AC']}]
One approach would be: (see the note at the end of the answer)
from itertools import groupby
from pprint import pprint
my_list = [
"AD01",
"AD01AA",
"AD01AB",
"AD01AC",
"AD01AD",
"AD02",
"AD02AA",
"AD02AB",
"AD02AC",
]
res = []
for _, g in groupby(my_list, len):
lst = list(g)
if len(lst) == 1:
res.append({"Legacy": lst[0], "rphy": []})
else:
res[-1]["rphy"].append(lst)
pprint(res)
output:
[{'Legacy': 'AD01', 'rphy': [['AD01AA', 'AD01AB', 'AD01AC', 'AD01AD']]},
{'Legacy': 'AD02', 'rphy': [['AD02AA', 'AD02AB', 'AD02AC']]}]
This assumes that your data always starts with your desired key(the name which has the smallest name compare to the next values).
Basically in every iteration you check then length of the created list from groupby. If it is 1, this mean it's your key, if not, it will add the next items to the dictionary.
Note: This code would break if there aren't at least 2 names with the length larger than the keys between two keys.

how to separate cam1,2,3,4,5,6 first images from the list

lst = ['Cam218-10-03_16-05-21-54.jpg',
'Cam318-10-03_17-04-21-54.jpg',
'Cam418-10-03_16-04-21-54.jpg',
'Cam218-10-02_16-05-21-54.jpg',
'Cam318-10-02_17-04-21-54.jpg',
'Cam418-10-02_16-04-21-54.jpg',
'Cam218-10-02_16-04-08-31.jpg',
'Cam318-10-02_16-04-08-30.jpg',
'Cam418-10-02_16-04-08-30.jpg',
'Cam518-10-02_16-04-08-35.jpg',
'Cam618-10-02_16-04-08-36.jpg',
'Cam118-10-02_16-04-09-33.jpg',
'Cam218-10-02_16-04-09-33.jpg',
'Cam318-10-02_16-04-09-33.jpg',
'Cam418-10-02_16-04-09-33.jpg',
'Cam518-10-02_16-04-09-33.jpg',
'Cam618-10-02_16-04-09-33.jpg',
'Cam118-10-02_16-04-11-53.jpg',
'Cam218-10-02_16-04-11-53.jpg',
'Cam318-10-02_16-04-11-53.jpg',
'Cam418-10-02_16-04-08-30.jpg',
'Cam118-10-02_16-04-08-31.jpg',
'Cam518-10-02_16-04-11-53.jpg',
'Cam118-10-02_16-04-11-53.jpg']
From this list I want the output:
['Cam118-10-02_16-04-08-31.jpg',
'Cam218-10-02_16-04-08-31.jpg',
'Cam318-10-02_16-04-08-30.jpg',
'Cam418-10-02_16-04-08-30.jpg',
'Cam518-10-02_16-04-08-35.jpg',
'Cam618-10-02_16-04-08-36.jpg']
by using Python. Could anybody help me?
With itertools.groupby - O(n*log(n))
>>> from itertools import groupby
>>> [next(g) for _, g in groupby(sorted(lst), key=lambda cam: cam.partition('-')[0])]
['Cam118-10-02_16-04-08-31.jpg',
'Cam218-10-02_16-04-08-31.jpg',
'Cam318-10-02_16-04-08-30.jpg',
'Cam418-10-02_16-04-08-30.jpg',
'Cam518-10-02_16-04-08-35.jpg',
'Cam618-10-02_16-04-08-36.jpg']
With keeping track of duplicates manually (output not sorted, but potentially useful to other readers) - O(n)
>>> seen = set()
>>> result = []
>>>
>>> for cam in lst:
...: model, *_ = cam.partition('-')
...: if model not in seen:
...: result.append(cam)
...: seen.add(model)
...:
>>> result
['Cam218-10-03_16-05-21-54.jpg',
'Cam318-10-03_17-04-21-54.jpg',
'Cam418-10-03_16-04-21-54.jpg',
'Cam518-10-02_16-04-08-35.jpg',
'Cam618-10-02_16-04-08-36.jpg',
'Cam118-10-02_16-04-09-33.jpg']
you can make if condition to check for the occurrence of the photo tag after sorting the list
list.sort()
i = 1
for item in list:
if(item[3]==str(i)):
i=i+1
print(item)
continue
the result is
Cam118-10-02_16-04-08-31.jpg
Cam218-10-02_16-04-08-31.jpg
Cam318-10-02_16-04-08-30.jpg
Cam418-10-02_16-04-08-30.jpg
Cam518-10-02_16-04-08-35.jpg
Cam618-10-02_16-04-08-36.jpg
if you want to get the first occurrence of item with no regards to its order ascendingly, removing list.sort() shall resolve that.

Python group by adjacent items in a list with same attributes

Please see the simplified example:
A=[(721,'a'),(765,'a'),(421,'a'),(422,'a'),(106,'b'),(784,'a'),(201,'a'),(206,'b'),(207,'b')]
I want group adjacent tuples with attribute 'a', every two pair wise and leave tuples with 'b' alone.
So the desired tuple would looks like:
A=[[(721,'a'),(765,'a')],
[(421,'a'),(422,'a')],
[(106,'b')],
[(784,'a'),(201,'a')],
[(206,'b')],[(207,'b')]]
What I can do is to build two separated lists contains tuples with a and b.
Then pair tuples in a, and add back. But it seems not very efficient. Any faster and simple solutions?
Assuming a items are always in pairs, a simple approach would be as follows.
Look at the first item - if it's an a, use it and the next item as a pair. Otherwise, just use the single item. Then 'jump' forward by 1 or 2, as appropriate:
A=[(721,'a'),(765,'a'),(421,'a'),(422,'a'),(106,'b'),(784,'a'),(201,'a'),(206,'b'),(207,'b')]
result = []
count = 0
while count <= len(A)-1:
if A[count][1] == 'a':
result.append([A[count], A[count+1]])
count += 2
else:
result.append([A[count]])
count += 1
print(result)
You can use itertools.groupby:
import itertools
A=[(721,'a'),(765,'a'),(421,'a'),(422,'a'),(106,'b'),(784,'a'),(201,'a'),(206,'b'),(207,'b')]
def split(s):
return [s[i:i+2] for i in range(0, len(s), 2)]
new_data = [i if isinstance(i, list) else [i] for i in list(itertools.chain(*[split(list(b)) if a == 'a' else list(b) for a, b in itertools.groupby(A, key=lambda x:x[-1])]))
Output:
[[(721, 'a'), (765, 'a')], [(421, 'a'), (422, 'a')], [(106, 'b')], [(784, 'a'), (201, 'a')], [(206, 'b')], [(207, 'b')]]
No need to use two lists. Edit: If the 'a' are not assumed to come always as pairs/adjacent
A = [(721,'a'),(765,'a'),(421,'a'),(422,'a'),(106,'b'),(784,'a'),(201,'a'),(206,'b'),(207,'b')]
new_list = []
i = 0
while i < len(A):
if i == len(A)-1:
new_list.append([A[i]])
i+=1
elif (A[i][1]==A[i+1][1]=='a') :
new_list.append([A[i], A[i+1]])
i += 2
else:
new_list.append([A[i]])
i += 1
print(new_list)

Arrange elements with same count in alphabetical order

Python Collection Counter.most_common(n) method returns the top n elements with their counts. However, if the counts for two elements is the same, how can I return the result sorted by alphabetical order?
For example: for a string like: BBBAAACCD, for the "2-most common" elements, I want the result to be for specified n = 2:
[('A', 3), ('B', 3), ('C', 2)]
and NOT:
[('B', 3), ('A', 3), ('C', 2)]
Notice that although A and B have the same frequency, A comes before B in the resultant list since it comes before B in alphabetical order.
[('A', 3), ('B', 3), ('C', 2)]
How can I achieve that?
Although this question is already a bit old i'd like to suggest a very simple solution to the problem which just involves sorting the input of Counter() before creating the Counter object itself. If you then call most_common(n) you will get the top n entries sorted in alphabetical order.
from collections import Counter
char_counter = Counter(sorted('ccccbbbbdaef'))
for char in char_counter.most_common(3):
print(*char)
resulting in the output:
b 4
c 4
a 1
There are two issues here:
Include duplicates when considering top n most common values excluding duplicates.
For any duplicates, order alphabetically.
None of the solutions thus far address the first issue. You can use a heap queue with the itertools unique_everseen recipe (also available in 3rd party libraries such as toolz.unique) to calculate the nth largest count.
Then use sorted with a custom key.
from collections import Counter
from heapq import nlargest
from toolz import unique
x = 'BBBAAACCD'
c = Counter(x)
n = 2
nth_largest = nlargest(n, unique(c.values()))[-1]
def sort_key(x):
return -x[1], x[0]
gen = ((k, v) for k, v in c.items() if v >= nth_largest)
res = sorted(gen, key=sort_key)
[('A', 3), ('B', 3), ('C', 2)]
I would first sort your output array in alphabetical order and than sort again by most occurrences which will keep the alphabetical order:
from collections import Counter
alphabetic_sorted = sorted(Counter('BBBAAACCD').most_common(), key=lambda tup: tup[0])
final_sorted = sorted(alphabetic_sorted, key=lambda tup: tup[1], reverse=True)
print(final_sorted[:3])
Output:
[('A', 3), ('B', 3), ('C', 2)]
I would go for:
sorted(Counter('AAABBBCCD').most_common(), key=lambda t: (-t[1], t[0]))
This sorts count descending (as they are already, which should be more performant) and then sorts by name ascending in each equal count group
This is one of the problems I got in the interview exam and failed to do it. Came home slept for a while and solution came in my mind.
from collections import Counter
def bags(list):
cnt = Counter(list)
print(cnt)
order = sorted(cnt.most_common(2), key=lambda i:( i[1],i[0]), reverse=True)
print(order)
return order[0][0]
print(bags(['a','b','c','a','b']))
s = "BBBAAACCD"
p = [(i,s.count(i)) for i in sorted(set(s))]
**If you are okay with not using the Counter.
from collections import Counter
s = 'qqweertyuiopasdfghjklzxcvbnm'
s_list = list(s)
elements = Counter(s_list).most_common()
print(elements)
alphabet_sort = sorted(elements, key=lambda x: x[0])
print(alphabet_sort)
num_sort = sorted(alphabet_sort, key=lambda x: x[1], reverse=True)
print(num_sort)
if you need to get slice:
print(num_sort[:3])
from collections import Counter
print(sorted(Counter('AAABBBCCD').most_common(3)))
This question seems to be a duplicate
How to sort Counter by value? - python

Finding index of values in a list dynamically

I am having two lists as follows:
list_1
['A-1','A-1','A-1','A-2','A-2','A-3']
list_2
['iPad','iPod','iPhone','Windows','X-box','Kindle']
I would like to split the list_2 based on the index values in list_1. For instance,
list_a1
['iPad','iPod','iPhone']
list_a2
['Windows','X-box']
list_a3
['Kindle']
I know index method, but it needs the value to be matched to be passed along with. In this case, I would like to dynamically find the indexes of the values in list_1 with the same value. Is this possible? Any tips/hints would be deeply appreciated.
Thanks.
There are a few ways to do this.
I'd do it by using zip and groupby.
First:
>>> list(zip(list_1, list_2))
[('A-1', 'iPad'),
('A-1', 'iPod'),
('A-1', 'iPhone'),
('A-2', 'Windows'),
('A-2', 'X-box'),
('A-3', 'Kindle')]
Now:
>>> import itertools, operator
>>> [(key, list(group)) for key, group in
... itertools.groupby(zip(list_1, list_2), operator.itemgetter(0))]
[('A-1', [('A-1', 'iPad'), ('A-1', 'iPod'), ('A-1', 'iPhone')]),
('A-2', [('A-2', 'Windows'), ('A-2', 'X-box')]),
('A-3', [('A-3', 'Kindle')])]
So, you just want each group, ignoring the key, and you only want the second element of each element in the group. You can get the second element of each group with another comprehension, or just by unzipping:
>>> [list(zip(*group))[1] for key, group in
... itertools.groupby(zip(list_1, list_2), operator.itemgetter(0))]
[('iPad', 'iPod', 'iPhone'), ('Windows', 'X-box'), ('Kindle',)]
I would personally find this more readable as a sequence of separate iterator transformations than as one long expression. Taken to the extreme:
>>> ziplists = zip(list_1, list_2)
>>> pairs = itertools.groupby(ziplists, operator.itemgetter(0))
>>> groups = (group for key, group in pairs)
>>> values = (zip(*group)[1] for group in groups)
>>> [list(value) for value in values]
… but a happy medium of maybe 2 or 3 lines is usually better than either extreme.
Usually I'm the one rushing to a groupby solution ;^) but here I'll go the other way and manually insert into an OrderedDict:
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
from collections import OrderedDict
d = OrderedDict()
for code, product in zip(list_1, list_2):
d.setdefault(code, []).append(product)
produces a d looking like
>>> d
OrderedDict([('A-1', ['iPad', 'iPod', 'iPhone']),
('A-2', ['Windows', 'X-box']), ('A-3', ['Kindle'])])
with easy access:
>>> d["A-2"]
['Windows', 'X-box']
and we can get the list-of-lists in list_1 order using .values():
>>> d.values()
[['iPad', 'iPod', 'iPhone'], ['Windows', 'X-box'], ['Kindle']]
If you've noticed that no one is telling you how to make a bunch of independent lists with names like list_a1 and so on-- that's because that's a bad idea. You want to keep the data together in something which you can (at a minimum) iterate over easily, and both dictionaries and list of lists qualify.
Maybe something like this?
#!/usr/local/cpython-3.3/bin/python
import pprint
import collections
def main():
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
result = collections.defaultdict(list)
for list_1_element, list_2_element in zip(list_1, list_2):
result[list_1_element].append(list_2_element)
pprint.pprint(result)
main()
Using itertools.izip_longest and itertools.groupby:
>>> from itertools import groupby, izip_longest
>>> inds = [next(g)[0] for k, g in groupby(enumerate(list_1), key=lambda x:x[1])]
First group items of list_1 and find the starting index of each group:
>>> inds
[0, 3, 5]
Now use slicing and izip_longest as we need pairs list_2[0:3], list_2[3:5], list_2[5:]:
>>> [list_2[x:y] for x, y in izip_longest(inds, inds[1:])]
[['iPad', 'iPod', 'iPhone'], ['Windows', 'X-box'], ['Kindle']]
To get a list of dicts you can something like:
>>> inds = [next(g) for k, g in groupby(enumerate(list_1), key=lambda x:x[1])]
>>> {k: list_2[ind1: ind2[0]] for (ind1, k), ind2 in
zip_longest(inds, inds[1:], fillvalue=[None])}
{'A-1': ['iPad', 'iPod', 'iPhone'], 'A-3': ['Kindle'], 'A-2': ['Windows', 'X-box']}
You could do this if you want simple code, it's not pretty, but gets the job done.
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
list_1a = []
list_1b = []
list_1c = []
place = 0
for i in list_1[::1]:
if list_1[place] == 'A-1':
list_1a.append(list_2[place])
elif list_1[place] == 'A-2':
list_1b.append(list_2[place])
else:
list_1c.append(list_2[place])
place += 1

Categories