Calculate average of list values grouped by second list - python

I didn't know how to better express myself in the title. Basically what I have is two lists:
a = ['A','B','A','C','D','C','A',...]
b = [2,4,8,3,5,2,1,...]
a and b have the same length, b represents a value related to the letter in a .
Now I would like to calculate the Average value in b for each letter in a. So at the end I would have:
a = ['A','B','C','D',...]
b = [3.67, 4, 2.5, 5,...]
Is there a standard implementation for this in python?

You can first perform a group by. We can do this for instance with a defaultdict:
from collections import defaultdict
col = defaultdict(list)
for ai,bi in zip(a,b):
col[ai].append(bi)
Now the dictionary col will look like:
>>> col
defaultdict(<class 'list'>, {'C': [3, 2], 'B': [4], 'D': [5], 'A': [2, 8, 1]})
and now we can calculate the average of all elements in the dictionary for instance like:
>>> {key:sum(vals)/len(vals) for key,vals in col.items()}
{'C': 2.5, 'B': 4.0, 'D': 5.0, 'A': 3.6666666666666665}
You can also convert it to two tuples by using zip:
a,b = zip(*[(key,sum(vals)/len(vals)) for key,vals in col.items()])
resulting in:
>>> a,b = zip(*[(key,sum(vals)/len(vals)) for key,vals in col.items()])
>>> a
('C', 'B', 'D', 'A')
>>> b
(2.5, 4.0, 5.0, 3.6666666666666665)
If you want to generate lists instead, you can convert them to lists:
a,b = map(list,zip(*[(key,sum(vals)/len(vals)) for key,vals in col.items()]))
This results in:
>>> a,b = map(list,zip(*[(key,sum(vals)/len(vals)) for key,vals in col.items()]))
>>> a
['C', 'B', 'D', 'A']
>>> b
[2.5, 4.0, 5.0, 3.6666666666666665]

I believe a cleaner way to do this would be to simply use a pandas groupby:
import pandas as pd
data = pd.DataFrame(b,index=a)
a,b = (list(data.groupby(data.index)[0].mean().index),list(data.groupby(data.index)[0].mean()))

You can use numpy as follows:
>>> import numpy as np
>>> array_a = np.array(a)
>>> array_b = np.array(b)
>>> avrg_of_a = np.average(array_b[array_a == 'A'])
>>> avrg_of_a
3.6666666666666665
>>> avrg_of_b = np.average(array_b[array_a == 'B'])
4.0
You can generate a list use list comprehensions [np.average(array_b[array_a == item]) for item in np.unique(array_a)]

Related

Process a list of lists, finding all lists that have matching last values?

Given a list of lists
lol = [[0,a], [0,b],
[1,b], [1,c],
[2,d], [2,e],
[2,g], [2,b],
[3,e], [3,f]]
I would like to extract all sublists that have the same last element (lol[n][1]) and end up with something like below:
[0,b]
[1.b]
[2,b]
[2,e]
[3,e]
I know that given two lists we can use an intersection, what is the right way to go about a problem like this, other than incrementing the index value in a for each loop?
1. Using collections.defaultdict
You can use defaultdict to the first group up your items with more than one occurrence, then, iterate over the dict.items to get what you need.
from collections import defaultdict
lol = [[0,'a'], [0,'b'],
[1,'b'], [1,'c'],
[2,'d'], [2,'e'],
[2,'g'], [2,'b'],
[3,'e'], [3,'f']]
d = defaultdict(list)
for v,k in lol:
d[k].append(v)
# d looks like -
# defaultdict(list,
# {'a': [0],
# 'b': [0, 1, 2],
# 'c': [1],
# 'd': [2],
# 'e': [2, 3],
# 'g': [2],
# 'f': [3]})
result = [[v,k] for k,vs in d.items() for v in vs if len(vs)>1]
print(result)
[[0, 'b'], [1, 'b'], [2, 'b'], [2, 'e'], [3, 'e']]
2. Using pandas.duplicated
Here is how you can do this with Pandas -
Convert to pandas dataframe
For key column, find the duplicates and keep all of them
Convert to list of records while ignoring index
import pandas as pd
df = pd.DataFrame(lol, columns=['val','key'])
dups = df[df['key'].duplicated(keep=False)]
result = list(dups.to_records(index=False))
print(result)
[(0, 'b'), (1, 'b'), (2, 'e'), (2, 'b'), (3, 'e')]
3. Using numpy.unique
You can solve this in a vectorized manner using numpy -
Convert to numpy matrix arr
Find unique elements u and their counts c
Filter list of unique elements that occur more than once dup
Use broadcasting to compare the second column of the array and take any over axis=0 to get a boolean which is True for duplicated rows
Filter the arr based on this boolean
import numpy as np
arr = np.array(lol)
u, c = np.unique(arr[:,1], return_counts=True)
dup = u[c > 1]
result = arr[(arr[:,1]==dup[:,None]).any(0)]
result
array([['0', 'b'],
['1', 'b'],
['2', 'e'],
['2', 'b'],
['3', 'e']], dtype='<U21')

split an table(or list) based on column python

I have a array
arr = [['a', 'b', 'a'], [1, 2, 3]
I need this to be spliited based on the first array values i.e based on 'a' or 'b'. So Expected output is
arr_out_a = [1, 3]
arr_out_b = [2]
How do I do it?
Please help me correct the question,if the way I'm using words like list and array might create confusion
Use collections.defaultdict():
In [82]: arr = [['a', 'b', 'a'], [1, 2, 3]]
In [83]: from collections import defaultdict
In [84]: d = defaultdict(list)
In [85]: for i, j in zip(*arr):
....: d[i].append(j)
....:
In [86]: d
Out[86]: defaultdict(<class 'list'>, {'b': [2], 'a': [1, 3]})
Basically just append them conditionally to predefined empty lists:
arr_out_a = []
arr_out_b = []
for char, num in zip(*arr):
if char == 'a':
arr_out_a.append(num)
else:
arr_out_b.append(num)
or if you don't like the zip:
arr_out_a = []
arr_out_b = []
for idx in range(len(arr[0])):
if arr[0][idx] == 'a':
arr_out_a.append(arr[1][idx])
else:
arr_out_b.append(arr[1][idx])

Find the same values in array [duplicate]

Given a single item, how do I count occurrences of it in a list, in Python?
A related but different problem is counting occurrences of each different element in a collection, getting a dictionary or list as a histogram result instead of a single integer. For that problem, see Using a dictionary to count the items in a list.
If you only want a single item's count, use the count method:
>>> [1, 2, 3, 4, 1, 4, 1].count(1)
3
Important: this is very slow if you are counting multiple different items
Each count call goes over the entire list of n elements. Calling count in a loop n times means n * n total checks, which can be catastrophic for performance.
If you want to count multiple items, use Counter, which only does n total checks.
Use Counter if you are using Python 2.7 or 3.x and you want the number of occurrences for each element:
>>> from collections import Counter
>>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']
>>> Counter(z)
Counter({'blue': 3, 'red': 2, 'yellow': 1})
Counting the occurrences of one item in a list
For counting the occurrences of just one list item you can use count()
>>> l = ["a","b","b"]
>>> l.count("a")
1
>>> l.count("b")
2
Counting the occurrences of all items in a list is also known as "tallying" a list, or creating a tally counter.
Counting all items with count()
To count the occurrences of items in l one can simply use a list comprehension and the count() method
[[x,l.count(x)] for x in set(l)]
(or similarly with a dictionary dict((x,l.count(x)) for x in set(l)))
Example:
>>> l = ["a","b","b"]
>>> [[x,l.count(x)] for x in set(l)]
[['a', 1], ['b', 2]]
>>> dict((x,l.count(x)) for x in set(l))
{'a': 1, 'b': 2}
Counting all items with Counter()
Alternatively, there's the faster Counter class from the collections library
Counter(l)
Example:
>>> l = ["a","b","b"]
>>> from collections import Counter
>>> Counter(l)
Counter({'b': 2, 'a': 1})
How much faster is Counter?
I checked how much faster Counter is for tallying lists. I tried both methods out with a few values of n and it appears that Counter is faster by a constant factor of approximately 2.
Here is the script I used:
from __future__ import print_function
import timeit
t1=timeit.Timer('Counter(l)', \
'import random;import string;from collections import Counter;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
)
t2=timeit.Timer('[[x,l.count(x)] for x in set(l)]',
'import random;import string;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
)
print("Counter(): ", t1.repeat(repeat=3,number=10000))
print("count(): ", t2.repeat(repeat=3,number=10000)
And the output:
Counter(): [0.46062711701961234, 0.4022796869976446, 0.3974247490405105]
count(): [7.779430688009597, 7.962715800967999, 8.420845870045014]
Another way to get the number of occurrences of each item, in a dictionary:
dict((i, a.count(i)) for i in a)
Given an item, how can I count its occurrences in a list in Python?
Here's an example list:
>>> l = list('aaaaabbbbcccdde')
>>> l
['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'e']
list.count
There's the list.count method
>>> l.count('b')
4
This works fine for any list. Tuples have this method as well:
>>> t = tuple('aabbbffffff')
>>> t
('a', 'a', 'b', 'b', 'b', 'f', 'f', 'f', 'f', 'f', 'f')
>>> t.count('f')
6
collections.Counter
And then there's collections.Counter. You can dump any iterable into a Counter, not just a list, and the Counter will retain a data structure of the counts of the elements.
Usage:
>>> from collections import Counter
>>> c = Counter(l)
>>> c['b']
4
Counters are based on Python dictionaries, their keys are the elements, so the keys need to be hashable. They are basically like sets that allow redundant elements into them.
Further usage of collections.Counter
You can add or subtract with iterables from your counter:
>>> c.update(list('bbb'))
>>> c['b']
7
>>> c.subtract(list('bbb'))
>>> c['b']
4
And you can do multi-set operations with the counter as well:
>>> c2 = Counter(list('aabbxyz'))
>>> c - c2 # set difference
Counter({'a': 3, 'c': 3, 'b': 2, 'd': 2, 'e': 1})
>>> c + c2 # addition of all elements
Counter({'a': 7, 'b': 6, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
>>> c | c2 # set union
Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
>>> c & c2 # set intersection
Counter({'a': 2, 'b': 2})
Silly answer, sum
There are good builtin answers, but this example is slightly instructive. Here we sum all the occurences where the character, c, is equal to 'b':
>>> sum(c == 'b' for c in l)
4
Not great for this use-case, but if you need to have a count of iterables where the case is True it works perfectly fine to sum the boolean results, since True is equivalent to 1.
Why not pandas?
Another answer suggests:
Why not use pandas?
Pandas is a common library, but it's not in the standard library. Adding it as a requirement is non-trivial.
There are builtin solutions for this use-case in the list object itself as well as in the standard library.
If your project does not already require pandas, it would be foolish to make it a requirement just for this functionality.
I've compared all suggested solutions (and a few new ones) with perfplot (a small project of mine).
Counting one item
For large enough arrays, it turns out that
numpy.sum(numpy.array(a) == 1)
is slightly faster than the other solutions.
Counting all items
As established before,
numpy.bincount(a)
is what you want.
Code to reproduce the plots:
from collections import Counter
from collections import defaultdict
import numpy
import operator
import pandas
import perfplot
def counter(a):
return Counter(a)
def count(a):
return dict((i, a.count(i)) for i in set(a))
def bincount(a):
return numpy.bincount(a)
def pandas_value_counts(a):
return pandas.Series(a).value_counts()
def occur_dict(a):
d = {}
for i in a:
if i in d:
d[i] = d[i]+1
else:
d[i] = 1
return d
def count_unsorted_list_items(items):
counts = defaultdict(int)
for item in items:
counts[item] += 1
return dict(counts)
def operator_countof(a):
return dict((i, operator.countOf(a, i)) for i in set(a))
perfplot.show(
setup=lambda n: list(numpy.random.randint(0, 100, n)),
n_range=[2**k for k in range(20)],
kernels=[
counter, count, bincount, pandas_value_counts, occur_dict,
count_unsorted_list_items, operator_countof
],
equality_check=None,
logx=True,
logy=True,
)
from collections import Counter
from collections import defaultdict
import numpy
import operator
import pandas
import perfplot
def counter(a):
return Counter(a)
def count(a):
return dict((i, a.count(i)) for i in set(a))
def bincount(a):
return numpy.bincount(a)
def pandas_value_counts(a):
return pandas.Series(a).value_counts()
def occur_dict(a):
d = {}
for i in a:
if i in d:
d[i] = d[i] + 1
else:
d[i] = 1
return d
def count_unsorted_list_items(items):
counts = defaultdict(int)
for item in items:
counts[item] += 1
return dict(counts)
def operator_countof(a):
return dict((i, operator.countOf(a, i)) for i in set(a))
b = perfplot.bench(
setup=lambda n: list(numpy.random.randint(0, 100, n)),
n_range=[2 ** k for k in range(20)],
kernels=[
counter,
count,
bincount,
pandas_value_counts,
occur_dict,
count_unsorted_list_items,
operator_countof,
],
equality_check=None,
)
b.save("out.png")
b.show()
list.count(x) returns the number of times x appears in a list
see:
http://docs.python.org/tutorial/datastructures.html#more-on-lists
If you want to count all values at once you can do it very fast using numpy arrays and bincount as follows
import numpy as np
a = np.array([1, 2, 3, 4, 1, 4, 1])
np.bincount(a)
which gives
>>> array([0, 3, 1, 1, 2])
Why not using Pandas?
import pandas as pd
my_list = ['a', 'b', 'c', 'd', 'a', 'd', 'a']
# converting the list to a Series and counting the values
my_count = pd.Series(my_list).value_counts()
my_count
Output:
a 3
d 2
b 1
c 1
dtype: int64
If you are looking for a count of a particular element, say a, try:
my_count['a']
Output:
3
If you can use pandas, then value_counts is there for rescue.
>>> import pandas as pd
>>> a = [1, 2, 3, 4, 1, 4, 1]
>>> pd.Series(a).value_counts()
1 3
4 2
3 1
2 1
dtype: int64
It automatically sorts the result based on frequency as well.
If you want the result to be in a list of list, do as below
>>> pd.Series(a).value_counts().reset_index().values.tolist()
[[1, 3], [4, 2], [3, 1], [2, 1]]
I had this problem today and rolled my own solution before I thought to check SO. This:
dict((i,a.count(i)) for i in a)
is really, really slow for large lists. My solution
def occurDict(items):
d = {}
for i in items:
if i in d:
d[i] = d[i]+1
else:
d[i] = 1
return d
is actually a bit faster than the Counter solution, at least for Python 2.7.
Count of all elements with itertools.groupby()
Antoher possiblity for getting the count of all elements in the list could be by means of itertools.groupby().
With "duplicate" counts
from itertools import groupby
L = ['a', 'a', 'a', 't', 'q', 'a', 'd', 'a', 'd', 'c'] # Input list
counts = [(i, len(list(c))) for i,c in groupby(L)] # Create value-count pairs as list of tuples
print(counts)
Returns
[('a', 3), ('t', 1), ('q', 1), ('a', 1), ('d', 1), ('a', 1), ('d', 1), ('c', 1)]
Notice how it combined the first three a's as the first group, while other groups of a are present further down the list. This happens because the input list L was not sorted. This can be a benefit sometimes if the groups should in fact be separate.
With unique counts
If unique group counts are desired, just sort the input list:
counts = [(i, len(list(c))) for i,c in groupby(sorted(L))]
print(counts)
Returns
[('a', 5), ('c', 1), ('d', 2), ('q', 1), ('t', 1)]
Note: For creating unique counts, many of the other answers provide easier and more readable code compared to the groupby solution. But it is shown here to draw a parallel to the duplicate count example.
Although it is very old question, since i didn't find a one liner, i made one.
# original numbers in list
l = [1, 2, 2, 3, 3, 3, 4]
# empty dictionary to hold pair of number and its count
d = {}
# loop through all elements and store count
[ d.update( {i:d.get(i, 0)+1} ) for i in l ]
print(d)
# {1: 1, 2: 2, 3: 3, 4: 1}
# Python >= 2.6 (defaultdict) && < 2.7 (Counter, OrderedDict)
from collections import defaultdict
def count_unsorted_list_items(items):
"""
:param items: iterable of hashable items to count
:type items: iterable
:returns: dict of counts like Py2.7 Counter
:rtype: dict
"""
counts = defaultdict(int)
for item in items:
counts[item] += 1
return dict(counts)
# Python >= 2.2 (generators)
def count_sorted_list_items(items):
"""
:param items: sorted iterable of items to count
:type items: sorted iterable
:returns: generator of (item, count) tuples
:rtype: generator
"""
if not items:
return
elif len(items) == 1:
yield (items[0], 1)
return
prev_item = items[0]
count = 1
for item in items[1:]:
if prev_item == item:
count += 1
else:
yield (prev_item, count)
count = 1
prev_item = item
yield (item, count)
return
import unittest
class TestListCounters(unittest.TestCase):
def test_count_unsorted_list_items(self):
D = (
([], []),
([2], [(2,1)]),
([2,2], [(2,2)]),
([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]),
)
for inp, exp_outp in D:
counts = count_unsorted_list_items(inp)
print inp, exp_outp, counts
self.assertEqual(counts, dict( exp_outp ))
inp, exp_outp = UNSORTED_WIN = ([2,2,4,2], [(2,3), (4,1)])
self.assertEqual(dict( exp_outp ), count_unsorted_list_items(inp) )
def test_count_sorted_list_items(self):
D = (
([], []),
([2], [(2,1)]),
([2,2], [(2,2)]),
([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]),
)
for inp, exp_outp in D:
counts = list( count_sorted_list_items(inp) )
print inp, exp_outp, counts
self.assertEqual(counts, exp_outp)
inp, exp_outp = UNSORTED_FAIL = ([2,2,4,2], [(2,3), (4,1)])
self.assertEqual(exp_outp, list( count_sorted_list_items(inp) ))
# ... [(2,2), (4,1), (2,1)]
Below are the three solutions:
Fastest is using a for loop and storing it in a Dict.
import time
from collections import Counter
def countElement(a):
g = {}
for i in a:
if i in g:
g[i] +=1
else:
g[i] =1
return g
z = [1,1,1,1,2,2,2,2,3,3,4,5,5,234,23,3,12,3,123,12,31,23,13,2,4,23,42,42,34,234,23,42,34,23,423,42,34,23,423,4,234,23,42,34,23,4,23,423,4,23,4]
#Solution 1 - Faster
st = time.monotonic()
for i in range(1000000):
b = countElement(z)
et = time.monotonic()
print(b)
print('Simple for loop and storing it in dict - Duration: {}'.format(et - st))
#Solution 2 - Fast
st = time.monotonic()
for i in range(1000000):
a = Counter(z)
et = time.monotonic()
print (a)
print('Using collections.Counter - Duration: {}'.format(et - st))
#Solution 3 - Slow
st = time.monotonic()
for i in range(1000000):
g = dict([(i, z.count(i)) for i in set(z)])
et = time.monotonic()
print(g)
print('Using list comprehension - Duration: {}'.format(et - st))
Result
#Solution 1 - Faster
{1: 4, 2: 5, 3: 4, 4: 6, 5: 2, 234: 3, 23: 10, 12: 2, 123: 1, 31: 1, 13: 1, 42: 5, 34: 4, 423: 3}
Simple for loop and storing it in dict - Duration: 12.032000000000153
#Solution 2 - Fast
Counter({23: 10, 4: 6, 2: 5, 42: 5, 1: 4, 3: 4, 34: 4, 234: 3, 423: 3, 5: 2, 12: 2, 123: 1, 31: 1, 13: 1})
Using collections.Counter - Duration: 15.889999999999418
#Solution 3 - Slow
{1: 4, 2: 5, 3: 4, 4: 6, 5: 2, 34: 4, 423: 3, 234: 3, 42: 5, 12: 2, 13: 1, 23: 10, 123: 1, 31: 1}
Using list comprehension - Duration: 33.0
It was suggested to use numpy's bincount, however it works only for 1d arrays with non-negative integers. Also, the resulting array might be confusing (it contains the occurrences of the integers from min to max of the original list, and sets to 0 the missing integers).
A better way to do it with numpy is to use the unique function with the attribute return_counts set to True. It returns a tuple with an array of the unique values and an array of the occurrences of each unique value.
# a = [1, 1, 0, 2, 1, 0, 3, 3]
a_uniq, counts = np.unique(a, return_counts=True) # array([0, 1, 2, 3]), array([2, 3, 1, 2]
and then we can pair them as
dict(zip(a_uniq, counts)) # {0: 2, 1: 3, 2: 1, 3: 2}
It also works with other data types and "2d lists", e.g.
>>> a = [['a', 'b', 'b', 'b'], ['a', 'c', 'c', 'a']]
>>> dict(zip(*np.unique(a, return_counts=True)))
{'a': 3, 'b': 3, 'c': 2}
To count the number of diverse elements having a common type:
li = ['A0','c5','A8','A2','A5','c2','A3','A9']
print sum(1 for el in li if el[0]=='A' and el[1] in '01234')
gives
3 , not 6
You can also use countOf method of a built-in module operator.
>>> import operator
>>> operator.countOf([1, 2, 3, 4, 1, 4, 1], 1)
3
I would use filter(), take Lukasz's example:
>>> lst = [1, 2, 3, 4, 1, 4, 1]
>>> len(filter(lambda x: x==1, lst))
3
use %timeit to see which operation is more efficient. np.array counting operations should be faster.
from collections import Counter
mylist = [1,7,7,7,3,9,9,9,7,9,10,0]
types_counts=Counter(mylist)
print(types_counts)
May not be the most efficient, requires an extra pass to remove duplicates.
Functional implementation :
arr = np.array(['a','a','b','b','b','c'])
print(set(map(lambda x : (x , list(arr).count(x)) , arr)))
returns :
{('c', 1), ('b', 3), ('a', 2)}
or return as dict :
print(dict(map(lambda x : (x , list(arr).count(x)) , arr)))
returns :
{'b': 3, 'c': 1, 'a': 2}
Given a list X
import numpy as np
X = [1, -1, 1, -1, 1]
The dictionary which shows i: frequency(i) for elements of this list is:
{i:X.count(i) for i in np.unique(X)}
Output:
{-1: 2, 1: 3}
Alternatively, you can also implement the counter by yourself. This is the way I do:
item_list = ['me', 'me', 'you', 'you', 'you', 'they']
occ_dict = {}
for item in item_list:
if item not in occ_dict:
occ_dict[item] = 1
else:
occ_dict[item] +=1
print(occ_dict)
Output: {'me': 2, 'you': 3, 'they': 1}
mot = ["compte", "france", "zied"]
lst = ["compte", "france", "france", "france", "france"]
dict((x, lst.count(x)) for x in set(mot))
this gives
{'compte': 1, 'france': 4, 'zied': 0}
sum([1 for elem in <yourlist> if elem==<your_value>])
This will return the amount of occurences of your_value
test = [409.1, 479.0, 340.0, 282.4, 406.0, 300.0, 374.0, 253.3, 195.1, 269.0, 329.3, 250.7, 250.7, 345.3, 379.3, 275.0, 215.2, 300.0]
for i in test:
print('{} numbers {}'.format(i, test.count(i)))
import pandas as pd
test = [409.1, 479.0, 340.0, 282.4, 406.0, 300.0, 374.0, 253.3, 195.1, 269.0, 329.3, 250.7, 250.7, 345.3, 379.3, 275.0, 215.2, 300.0]
#turning the list into a temporary dataframe
test = pd.DataFrame(test)
#using the very convenient value_counts() function
df_counts = test.value_counts()
df_counts
then you can use df_counts.index and df_counts.values to get the data.
x = ['Jess', 'Jack', 'Mary', 'Sophia', 'Karen',
'Addison', 'Joseph','Jack', 'Jack', 'Eric', 'Ilona', 'Jason']
the_item = input('Enter the item that you wish to find : ')
how_many_times = 0
for occurrence in x:
if occurrence == the_item :
how_many_times += 1
print('The occurrence of', the_item, 'in', x,'is',how_many_times)
Created a list of names wherein the name 'Jack' is repeated.
In order to check its Occurrence, I ran a for loop in the list named x.
Upon each iteration, if the loop variable attains the value same that of received from the user and stored in the variable the_item, the variable how_many_times gets incremented by 1.
After attaining some value...We print how_many_times which stores the value of the occurance of the word 'jack'
def countfrequncyinarray(arr1):
r=len(arr1)
return {i:arr1.count(i) for i in range(1,r+1)}
arr1=[4,4,4,4]
a=countfrequncyinarray(arr1)
print(a)

In Python, I have a table with columns and lines; how do I count how many repetitions there are? [duplicate]

Given a single item, how do I count occurrences of it in a list, in Python?
A related but different problem is counting occurrences of each different element in a collection, getting a dictionary or list as a histogram result instead of a single integer. For that problem, see Using a dictionary to count the items in a list.
If you only want a single item's count, use the count method:
>>> [1, 2, 3, 4, 1, 4, 1].count(1)
3
Important: this is very slow if you are counting multiple different items
Each count call goes over the entire list of n elements. Calling count in a loop n times means n * n total checks, which can be catastrophic for performance.
If you want to count multiple items, use Counter, which only does n total checks.
Use Counter if you are using Python 2.7 or 3.x and you want the number of occurrences for each element:
>>> from collections import Counter
>>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']
>>> Counter(z)
Counter({'blue': 3, 'red': 2, 'yellow': 1})
Counting the occurrences of one item in a list
For counting the occurrences of just one list item you can use count()
>>> l = ["a","b","b"]
>>> l.count("a")
1
>>> l.count("b")
2
Counting the occurrences of all items in a list is also known as "tallying" a list, or creating a tally counter.
Counting all items with count()
To count the occurrences of items in l one can simply use a list comprehension and the count() method
[[x,l.count(x)] for x in set(l)]
(or similarly with a dictionary dict((x,l.count(x)) for x in set(l)))
Example:
>>> l = ["a","b","b"]
>>> [[x,l.count(x)] for x in set(l)]
[['a', 1], ['b', 2]]
>>> dict((x,l.count(x)) for x in set(l))
{'a': 1, 'b': 2}
Counting all items with Counter()
Alternatively, there's the faster Counter class from the collections library
Counter(l)
Example:
>>> l = ["a","b","b"]
>>> from collections import Counter
>>> Counter(l)
Counter({'b': 2, 'a': 1})
How much faster is Counter?
I checked how much faster Counter is for tallying lists. I tried both methods out with a few values of n and it appears that Counter is faster by a constant factor of approximately 2.
Here is the script I used:
from __future__ import print_function
import timeit
t1=timeit.Timer('Counter(l)', \
'import random;import string;from collections import Counter;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
)
t2=timeit.Timer('[[x,l.count(x)] for x in set(l)]',
'import random;import string;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
)
print("Counter(): ", t1.repeat(repeat=3,number=10000))
print("count(): ", t2.repeat(repeat=3,number=10000)
And the output:
Counter(): [0.46062711701961234, 0.4022796869976446, 0.3974247490405105]
count(): [7.779430688009597, 7.962715800967999, 8.420845870045014]
Another way to get the number of occurrences of each item, in a dictionary:
dict((i, a.count(i)) for i in a)
Given an item, how can I count its occurrences in a list in Python?
Here's an example list:
>>> l = list('aaaaabbbbcccdde')
>>> l
['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'e']
list.count
There's the list.count method
>>> l.count('b')
4
This works fine for any list. Tuples have this method as well:
>>> t = tuple('aabbbffffff')
>>> t
('a', 'a', 'b', 'b', 'b', 'f', 'f', 'f', 'f', 'f', 'f')
>>> t.count('f')
6
collections.Counter
And then there's collections.Counter. You can dump any iterable into a Counter, not just a list, and the Counter will retain a data structure of the counts of the elements.
Usage:
>>> from collections import Counter
>>> c = Counter(l)
>>> c['b']
4
Counters are based on Python dictionaries, their keys are the elements, so the keys need to be hashable. They are basically like sets that allow redundant elements into them.
Further usage of collections.Counter
You can add or subtract with iterables from your counter:
>>> c.update(list('bbb'))
>>> c['b']
7
>>> c.subtract(list('bbb'))
>>> c['b']
4
And you can do multi-set operations with the counter as well:
>>> c2 = Counter(list('aabbxyz'))
>>> c - c2 # set difference
Counter({'a': 3, 'c': 3, 'b': 2, 'd': 2, 'e': 1})
>>> c + c2 # addition of all elements
Counter({'a': 7, 'b': 6, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
>>> c | c2 # set union
Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
>>> c & c2 # set intersection
Counter({'a': 2, 'b': 2})
Silly answer, sum
There are good builtin answers, but this example is slightly instructive. Here we sum all the occurences where the character, c, is equal to 'b':
>>> sum(c == 'b' for c in l)
4
Not great for this use-case, but if you need to have a count of iterables where the case is True it works perfectly fine to sum the boolean results, since True is equivalent to 1.
Why not pandas?
Another answer suggests:
Why not use pandas?
Pandas is a common library, but it's not in the standard library. Adding it as a requirement is non-trivial.
There are builtin solutions for this use-case in the list object itself as well as in the standard library.
If your project does not already require pandas, it would be foolish to make it a requirement just for this functionality.
I've compared all suggested solutions (and a few new ones) with perfplot (a small project of mine).
Counting one item
For large enough arrays, it turns out that
numpy.sum(numpy.array(a) == 1)
is slightly faster than the other solutions.
Counting all items
As established before,
numpy.bincount(a)
is what you want.
Code to reproduce the plots:
from collections import Counter
from collections import defaultdict
import numpy
import operator
import pandas
import perfplot
def counter(a):
return Counter(a)
def count(a):
return dict((i, a.count(i)) for i in set(a))
def bincount(a):
return numpy.bincount(a)
def pandas_value_counts(a):
return pandas.Series(a).value_counts()
def occur_dict(a):
d = {}
for i in a:
if i in d:
d[i] = d[i]+1
else:
d[i] = 1
return d
def count_unsorted_list_items(items):
counts = defaultdict(int)
for item in items:
counts[item] += 1
return dict(counts)
def operator_countof(a):
return dict((i, operator.countOf(a, i)) for i in set(a))
perfplot.show(
setup=lambda n: list(numpy.random.randint(0, 100, n)),
n_range=[2**k for k in range(20)],
kernels=[
counter, count, bincount, pandas_value_counts, occur_dict,
count_unsorted_list_items, operator_countof
],
equality_check=None,
logx=True,
logy=True,
)
from collections import Counter
from collections import defaultdict
import numpy
import operator
import pandas
import perfplot
def counter(a):
return Counter(a)
def count(a):
return dict((i, a.count(i)) for i in set(a))
def bincount(a):
return numpy.bincount(a)
def pandas_value_counts(a):
return pandas.Series(a).value_counts()
def occur_dict(a):
d = {}
for i in a:
if i in d:
d[i] = d[i] + 1
else:
d[i] = 1
return d
def count_unsorted_list_items(items):
counts = defaultdict(int)
for item in items:
counts[item] += 1
return dict(counts)
def operator_countof(a):
return dict((i, operator.countOf(a, i)) for i in set(a))
b = perfplot.bench(
setup=lambda n: list(numpy.random.randint(0, 100, n)),
n_range=[2 ** k for k in range(20)],
kernels=[
counter,
count,
bincount,
pandas_value_counts,
occur_dict,
count_unsorted_list_items,
operator_countof,
],
equality_check=None,
)
b.save("out.png")
b.show()
list.count(x) returns the number of times x appears in a list
see:
http://docs.python.org/tutorial/datastructures.html#more-on-lists
If you want to count all values at once you can do it very fast using numpy arrays and bincount as follows
import numpy as np
a = np.array([1, 2, 3, 4, 1, 4, 1])
np.bincount(a)
which gives
>>> array([0, 3, 1, 1, 2])
Why not using Pandas?
import pandas as pd
my_list = ['a', 'b', 'c', 'd', 'a', 'd', 'a']
# converting the list to a Series and counting the values
my_count = pd.Series(my_list).value_counts()
my_count
Output:
a 3
d 2
b 1
c 1
dtype: int64
If you are looking for a count of a particular element, say a, try:
my_count['a']
Output:
3
If you can use pandas, then value_counts is there for rescue.
>>> import pandas as pd
>>> a = [1, 2, 3, 4, 1, 4, 1]
>>> pd.Series(a).value_counts()
1 3
4 2
3 1
2 1
dtype: int64
It automatically sorts the result based on frequency as well.
If you want the result to be in a list of list, do as below
>>> pd.Series(a).value_counts().reset_index().values.tolist()
[[1, 3], [4, 2], [3, 1], [2, 1]]
I had this problem today and rolled my own solution before I thought to check SO. This:
dict((i,a.count(i)) for i in a)
is really, really slow for large lists. My solution
def occurDict(items):
d = {}
for i in items:
if i in d:
d[i] = d[i]+1
else:
d[i] = 1
return d
is actually a bit faster than the Counter solution, at least for Python 2.7.
Count of all elements with itertools.groupby()
Antoher possiblity for getting the count of all elements in the list could be by means of itertools.groupby().
With "duplicate" counts
from itertools import groupby
L = ['a', 'a', 'a', 't', 'q', 'a', 'd', 'a', 'd', 'c'] # Input list
counts = [(i, len(list(c))) for i,c in groupby(L)] # Create value-count pairs as list of tuples
print(counts)
Returns
[('a', 3), ('t', 1), ('q', 1), ('a', 1), ('d', 1), ('a', 1), ('d', 1), ('c', 1)]
Notice how it combined the first three a's as the first group, while other groups of a are present further down the list. This happens because the input list L was not sorted. This can be a benefit sometimes if the groups should in fact be separate.
With unique counts
If unique group counts are desired, just sort the input list:
counts = [(i, len(list(c))) for i,c in groupby(sorted(L))]
print(counts)
Returns
[('a', 5), ('c', 1), ('d', 2), ('q', 1), ('t', 1)]
Note: For creating unique counts, many of the other answers provide easier and more readable code compared to the groupby solution. But it is shown here to draw a parallel to the duplicate count example.
Although it is very old question, since i didn't find a one liner, i made one.
# original numbers in list
l = [1, 2, 2, 3, 3, 3, 4]
# empty dictionary to hold pair of number and its count
d = {}
# loop through all elements and store count
[ d.update( {i:d.get(i, 0)+1} ) for i in l ]
print(d)
# {1: 1, 2: 2, 3: 3, 4: 1}
# Python >= 2.6 (defaultdict) && < 2.7 (Counter, OrderedDict)
from collections import defaultdict
def count_unsorted_list_items(items):
"""
:param items: iterable of hashable items to count
:type items: iterable
:returns: dict of counts like Py2.7 Counter
:rtype: dict
"""
counts = defaultdict(int)
for item in items:
counts[item] += 1
return dict(counts)
# Python >= 2.2 (generators)
def count_sorted_list_items(items):
"""
:param items: sorted iterable of items to count
:type items: sorted iterable
:returns: generator of (item, count) tuples
:rtype: generator
"""
if not items:
return
elif len(items) == 1:
yield (items[0], 1)
return
prev_item = items[0]
count = 1
for item in items[1:]:
if prev_item == item:
count += 1
else:
yield (prev_item, count)
count = 1
prev_item = item
yield (item, count)
return
import unittest
class TestListCounters(unittest.TestCase):
def test_count_unsorted_list_items(self):
D = (
([], []),
([2], [(2,1)]),
([2,2], [(2,2)]),
([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]),
)
for inp, exp_outp in D:
counts = count_unsorted_list_items(inp)
print inp, exp_outp, counts
self.assertEqual(counts, dict( exp_outp ))
inp, exp_outp = UNSORTED_WIN = ([2,2,4,2], [(2,3), (4,1)])
self.assertEqual(dict( exp_outp ), count_unsorted_list_items(inp) )
def test_count_sorted_list_items(self):
D = (
([], []),
([2], [(2,1)]),
([2,2], [(2,2)]),
([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]),
)
for inp, exp_outp in D:
counts = list( count_sorted_list_items(inp) )
print inp, exp_outp, counts
self.assertEqual(counts, exp_outp)
inp, exp_outp = UNSORTED_FAIL = ([2,2,4,2], [(2,3), (4,1)])
self.assertEqual(exp_outp, list( count_sorted_list_items(inp) ))
# ... [(2,2), (4,1), (2,1)]
Below are the three solutions:
Fastest is using a for loop and storing it in a Dict.
import time
from collections import Counter
def countElement(a):
g = {}
for i in a:
if i in g:
g[i] +=1
else:
g[i] =1
return g
z = [1,1,1,1,2,2,2,2,3,3,4,5,5,234,23,3,12,3,123,12,31,23,13,2,4,23,42,42,34,234,23,42,34,23,423,42,34,23,423,4,234,23,42,34,23,4,23,423,4,23,4]
#Solution 1 - Faster
st = time.monotonic()
for i in range(1000000):
b = countElement(z)
et = time.monotonic()
print(b)
print('Simple for loop and storing it in dict - Duration: {}'.format(et - st))
#Solution 2 - Fast
st = time.monotonic()
for i in range(1000000):
a = Counter(z)
et = time.monotonic()
print (a)
print('Using collections.Counter - Duration: {}'.format(et - st))
#Solution 3 - Slow
st = time.monotonic()
for i in range(1000000):
g = dict([(i, z.count(i)) for i in set(z)])
et = time.monotonic()
print(g)
print('Using list comprehension - Duration: {}'.format(et - st))
Result
#Solution 1 - Faster
{1: 4, 2: 5, 3: 4, 4: 6, 5: 2, 234: 3, 23: 10, 12: 2, 123: 1, 31: 1, 13: 1, 42: 5, 34: 4, 423: 3}
Simple for loop and storing it in dict - Duration: 12.032000000000153
#Solution 2 - Fast
Counter({23: 10, 4: 6, 2: 5, 42: 5, 1: 4, 3: 4, 34: 4, 234: 3, 423: 3, 5: 2, 12: 2, 123: 1, 31: 1, 13: 1})
Using collections.Counter - Duration: 15.889999999999418
#Solution 3 - Slow
{1: 4, 2: 5, 3: 4, 4: 6, 5: 2, 34: 4, 423: 3, 234: 3, 42: 5, 12: 2, 13: 1, 23: 10, 123: 1, 31: 1}
Using list comprehension - Duration: 33.0
It was suggested to use numpy's bincount, however it works only for 1d arrays with non-negative integers. Also, the resulting array might be confusing (it contains the occurrences of the integers from min to max of the original list, and sets to 0 the missing integers).
A better way to do it with numpy is to use the unique function with the attribute return_counts set to True. It returns a tuple with an array of the unique values and an array of the occurrences of each unique value.
# a = [1, 1, 0, 2, 1, 0, 3, 3]
a_uniq, counts = np.unique(a, return_counts=True) # array([0, 1, 2, 3]), array([2, 3, 1, 2]
and then we can pair them as
dict(zip(a_uniq, counts)) # {0: 2, 1: 3, 2: 1, 3: 2}
It also works with other data types and "2d lists", e.g.
>>> a = [['a', 'b', 'b', 'b'], ['a', 'c', 'c', 'a']]
>>> dict(zip(*np.unique(a, return_counts=True)))
{'a': 3, 'b': 3, 'c': 2}
To count the number of diverse elements having a common type:
li = ['A0','c5','A8','A2','A5','c2','A3','A9']
print sum(1 for el in li if el[0]=='A' and el[1] in '01234')
gives
3 , not 6
You can also use countOf method of a built-in module operator.
>>> import operator
>>> operator.countOf([1, 2, 3, 4, 1, 4, 1], 1)
3
I would use filter(), take Lukasz's example:
>>> lst = [1, 2, 3, 4, 1, 4, 1]
>>> len(filter(lambda x: x==1, lst))
3
use %timeit to see which operation is more efficient. np.array counting operations should be faster.
from collections import Counter
mylist = [1,7,7,7,3,9,9,9,7,9,10,0]
types_counts=Counter(mylist)
print(types_counts)
May not be the most efficient, requires an extra pass to remove duplicates.
Functional implementation :
arr = np.array(['a','a','b','b','b','c'])
print(set(map(lambda x : (x , list(arr).count(x)) , arr)))
returns :
{('c', 1), ('b', 3), ('a', 2)}
or return as dict :
print(dict(map(lambda x : (x , list(arr).count(x)) , arr)))
returns :
{'b': 3, 'c': 1, 'a': 2}
Given a list X
import numpy as np
X = [1, -1, 1, -1, 1]
The dictionary which shows i: frequency(i) for elements of this list is:
{i:X.count(i) for i in np.unique(X)}
Output:
{-1: 2, 1: 3}
Alternatively, you can also implement the counter by yourself. This is the way I do:
item_list = ['me', 'me', 'you', 'you', 'you', 'they']
occ_dict = {}
for item in item_list:
if item not in occ_dict:
occ_dict[item] = 1
else:
occ_dict[item] +=1
print(occ_dict)
Output: {'me': 2, 'you': 3, 'they': 1}
mot = ["compte", "france", "zied"]
lst = ["compte", "france", "france", "france", "france"]
dict((x, lst.count(x)) for x in set(mot))
this gives
{'compte': 1, 'france': 4, 'zied': 0}
sum([1 for elem in <yourlist> if elem==<your_value>])
This will return the amount of occurences of your_value
test = [409.1, 479.0, 340.0, 282.4, 406.0, 300.0, 374.0, 253.3, 195.1, 269.0, 329.3, 250.7, 250.7, 345.3, 379.3, 275.0, 215.2, 300.0]
for i in test:
print('{} numbers {}'.format(i, test.count(i)))
import pandas as pd
test = [409.1, 479.0, 340.0, 282.4, 406.0, 300.0, 374.0, 253.3, 195.1, 269.0, 329.3, 250.7, 250.7, 345.3, 379.3, 275.0, 215.2, 300.0]
#turning the list into a temporary dataframe
test = pd.DataFrame(test)
#using the very convenient value_counts() function
df_counts = test.value_counts()
df_counts
then you can use df_counts.index and df_counts.values to get the data.
x = ['Jess', 'Jack', 'Mary', 'Sophia', 'Karen',
'Addison', 'Joseph','Jack', 'Jack', 'Eric', 'Ilona', 'Jason']
the_item = input('Enter the item that you wish to find : ')
how_many_times = 0
for occurrence in x:
if occurrence == the_item :
how_many_times += 1
print('The occurrence of', the_item, 'in', x,'is',how_many_times)
Created a list of names wherein the name 'Jack' is repeated.
In order to check its Occurrence, I ran a for loop in the list named x.
Upon each iteration, if the loop variable attains the value same that of received from the user and stored in the variable the_item, the variable how_many_times gets incremented by 1.
After attaining some value...We print how_many_times which stores the value of the occurance of the word 'jack'
def countfrequncyinarray(arr1):
r=len(arr1)
return {i:arr1.count(i) for i in range(1,r+1)}
arr1=[4,4,4,4]
a=countfrequncyinarray(arr1)
print(a)

Getting items from a list of lists if the list contains any keywords?

In my Python code, there are two objects, (x, y).
x is a numpy array from a separate function containing an x, y, and z coordinate. Each x, y and z coordinate corresponds to an object in list 'y'.
and 'y' would be a list of letters between a - j in random order.
There are be multiple instances of each letter i.e.: a b b c d a a f b d e e f e c a so on. For every value of 'x', there is a corresponding 'y' letter. Each line is different.
I want to get the x that corresponds a list of chosen letters, say a, c, and f.
How can I do this? I've tried looking into slices and indices but I'm not sure where to begin.
Trying to grab an item from array x, that corresponds to the same line in list y, if that makes any sense?
You wanted the values corresponding to 'a', 'c', and 'f':
>>> x = [ 1, 2, 3, 4, 5, 6 ]
>>> y = 'cgadfh'
>>> d = dict(zip(y, x))
>>> d['a']
3
>>> [d[char] for char in 'acf']
[3, 1, 5]
'a' is the third character in y and 3 is the third number in x, so d['a'] returns 3.
Incidentally, this approach works the same whether y is a string or a list:
>>> x = [ 1, 2, 3, 4, 5, 6 ]
>>> y = ['c', 'g', 'a', 'd', 'f', 'h']
>>> d = dict(zip(y, x))
>>> [d[char] for char in 'acf']
[3, 1, 5]
You can use collections.defaultdict and enumerate function to achieve this
X = [ 1, 2, 3, 4, 5, 6]
Y = ["a", "f", "c", "a", "c", "f"]
from collections import defaultdict
result = defaultdict(list)
for idx, y in enumerate(Y):
result[y].append(X[idx])
print result
Output
defaultdict(<type 'list'>, {'a': [1, 4], 'c': [3, 5], 'f': [2, 6]})
If X is just an indeces started from 1 then you can do the following:
exp = ['a', 'c', 'f']
output = [idx+1 for idx, ch in enumerate(Y) if ch in exp]
Otherwise you can try zip or izip or izip_longest:
import string
import random
a = range(15)
b = random.sample(string.lowercase, 15)
exp = random.sample(b, 3)
output = [k for k, v in zip(a, b) if v in exp]

Categories