Sum of all counts in a collections.Counter - python

What is the best way of establishing the sum of all counts in a collections.Counter object?
I've tried:
sum(Counter([1,2,3,4,5,1,2,1,6]))
but this gives 21 instead of 9?

The code you have adds up the keys (i.e. the unique values in the list: 1+2+3+4+5+6=21).
To add up the counts, use:
In [4]: sum(Counter([1,2,3,4,5,1,2,1,6]).values())
Out[4]: 9
This idiom is mentioned in the documentation, under "Common patterns".

Sum the values:
sum(some_counter.values())
Demo:
>>> from collections import Counter
>>> c = Counter([1,2,3,4,5,1,2,1,6])
>>> sum(c.values())
9

Starting in Python 3.10, Counter is given a total() function which provides the sum of the counts:
from collections import Counter
Counter([1,2,3,4,5,1,2,1,6]).total()
# 9

sum(Counter([1,2,3,4,5,1,2,1,6]).values())

Related

Python: Count Values in multi-value Dictionary

How can I count the number of a specific values in a multi-value dictionary?
For example, if I have the keys A and B with different sets of numbers as values, I want get the count of each number amongst all of the dictionary's keys.
I've tried this code, but I get 0 instead of 2.
dic = {'A':{0,1,2},'B':{1,2}}
print(sum(value == 1 for value in dic.values()))
Counter is a good option for this, especially if you want more than a single result:
from collections import Counter
from itertools import chain
from collections import Counter
count = Counter(chain(*(dic.values())))
In the REPL:
>>> count
Counter({1: 2, 2: 2, 0: 1})
>>> count.get(1)
2
Counter simply tallies each item in a list. By using chain we treat a list of lists as simply one large list, gluing everything together. Feeding this right to Counter does the work of counting how many of each item there is.

How to find the two strings that occurs most in a list?

I'm trying to get the number of the two elements that are the most frequent in an array. For example, in the list ['aa','bb','cc','dd','bb','bb','cc','ff'] the number of the most frequent should be 3(the number of times 'bb' appear in the array) and the second most frequent 2(number of times 'cc' appear in the array).
I tried this:
max = 0
snd_max = 0
for i in x:
aux=x.count(i)
if aux > max
snd_max=max
max=aux
print(max, snd_max)
But I was in doubt if there is an easier way?
You can use collections.Counter:
from collections import Counter
x = ['aa','bb','cc','dd','bb','bb','cc','ff']
counter = Counter(x)
print(counter.most_common(2))
[('bb', 3), ('cc', 2)]
Try this:
l = ['aa','bb','cc','dd','bb','bb','cc','ff']
b = list(dict.fromkeys(l))
a = [(l.count(x), x) for x in b]
a.sort(reverse=True)
a = a[:2]
print(a)
I use max(), it's simple.
lst = ['aa','bb','cc','dd','bb','bb','cc','ff']
print(max(set(lst), key=lst.count))
You could use pandas value_counts()
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html
Put the list into the dataframe, then use value counts.
That will give you a dataframe with each element and and how many times it appears, sorted by the most common on top.

python list of sets find symmetric difference in all elements

Consider this list of sets
my_input_list= [
{1,2,3,4,5},
{2,3,7,4,5},
set(),
{1,2,3,4,5,6},
set(),]
I want to get the only exclusive elements 6 and 7 as the response, list or set. Set preferred.
I tried
print reduce(set.symmetric_difference,my_input_list) but that gives
{2,3,4,5,6,7}
And i tried sorting the list by length, smallest first raises an error due to two empty sets. Largest first gives the same result as unsorted.
Any help or ideas please?
Thanks :)
Looks like the most straightforward solution is to count everything and return the elements that only appear once.
This solution uses chain.from_iterable (to flatten your sets) + Counter (to count things). Finally, use a set comprehension to filter elements with count == 1.
from itertools import chain
from collections import Counter
c = Counter(chain.from_iterable(my_input_list))
print({k for k in c if c[k] == 1})
{6, 7}
A quick note; the empty literal {} is used to indicate an empty dict, not set. For the latter, use set().
You could use itertools.chain and collection.Counter:
from itertools import chain
from collections import Counter
r = {k for k,v in Counter(chain.from_iterable(my_input_list)).items() if v==1}

Grouping tuples using libraries in Python

Using Python is there an easier way without writing bunch of loops be able to count the values in a similar way. Perhaps using some library such as itertools groupby?
#original tuple array
[("A","field1"),("A","field1"),("B","field1")]
#output array
[("A","field1", 2), ("B", "field1",1)]
You can use a Counter dict to group, adding the count at the end
l = [("A","field1"),("A","field1"),("B","field1")]
from collections import Counter
print([k+(v,) for k,v in Counter(l).items()])
If you want the output ordered by the first time you encounter a tuple, you can use an OrderedDict to do the counting:
from collections import OrderedDict
d = OrderedDict()
for t in l:
d.setdefault(t, 0)
d[t] += 1
print([k+(v,) for k,v in d.items()])
I guess you just want to count the number of ocurrences of each tuple...right? If so, you can use Counter
You can use itertools.groupby. Omit the key parameter, then it will just group equal elements (assuming those are consecutive), then add the number of elements in the group.
>>> lst = [("A","field1"),("A","field1"),("B","field1")]
>>> [(k + (len(list(g)),)) for k, g in itertools.groupby(lst)]
[('A', 'field1', 2), ('B', 'field1', 1)]
If the elements are not consecitive, then this will not work, and anyway, the solution suggesting collections.Counter seems to be a better fit to the problem.

Sum of values across all nested dictionaries in python

I have a dictionary of Counters, e.g:
from collections import Counter, defaultdict
numbers = defaultdict(Counter)
numbers['a']['first'] = 1
numbers['a']['second'] = 2
numbers['b']['first'] = 3
I want to get the sum: 1+2+3 = 6
What would be the simplest / idiomatic way to do this in python 3?
Use a nested comprehension:
sum(x for counter in numbers.values() for x in counter.values())
Or sum first the counters (starting with an empty one), and then their values:
sum(sum(numbers.values(), Counter()).values())
Or first each counter's values, and then the intermediate results:
sum(sum(c.values()) for c in numbers.values())
Or use chain:
from itertools import chain
sum(chain.from_iterable(d.values() for d in numbers.values()))
I prefer the first way.
sum(sum(c.values()) for c in numbers.values())
from itertools import chain
sum(chain.from_iterable(d.values() for d in numbers.values()))
# outputs: 6
In terms of performance use .itervalues() in python 2.x, that avoids building intermediary list (applies to all solutions here).
sum(chain.from_iterable(d.itervalues() for d in numbers.itervalues()))

Categories