What is an efficient way to find the most common element in a Python list?
My list items may not be hashable so can't use a dictionary.
Also in case of draws the item with the lowest index should be returned. Example:
>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'
A simpler one-liner:
def most_common(lst):
return max(set(lst), key=lst.count)
Borrowing from here, this can be used with Python 2.7:
from collections import Counter
def Most_Common(lst):
data = Counter(lst)
return data.most_common(1)[0][0]
Works around 4-6 times faster than Alex's solutions, and is 50 times faster than the one-liner proposed by newacct.
On CPython 3.6+ (any Python 3.7+) the above will select the first seen element in case of ties. If you're running on older Python, to retrieve the element that occurs first in the list in case of ties you need to do two passes to preserve order:
# Only needed pre-3.6!
def most_common(lst):
data = Counter(lst)
return max(lst, key=data.get)
With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [itertools.groupby][1]. itertools offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:
import itertools
import operator
def most_common(L):
# get an iterable of (item, iterable) pairs
SL = sorted((x, i) for i, x in enumerate(L))
# print 'SL:', SL
groups = itertools.groupby(SL, key=operator.itemgetter(0))
# auxiliary function to get "quality" for an item
def _auxfun(g):
item, iterable = g
count = 0
min_index = len(L)
for _, where in iterable:
count += 1
min_index = min(min_index, where)
# print 'item %r, count %r, minind %r' % (item, count, min_index)
return count, -min_index
# pick the highest-count/earliest item
return max(groups, key=_auxfun)[0]
This could be written more concisely, of course, but I'm aiming for maximal clarity. The two print statements can be uncommented to better see the machinery in action; for example, with prints uncommented:
print most_common(['goose', 'duck', 'duck', 'goose'])
emits:
SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose
As you see, SL is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).
groupby groups by the item only (via operator.itemgetter). The auxiliary function, called once per grouping during the max computation, receives and internally unpacks a group - a tuple with two items (item, iterable) where the iterable's items are also two-item tuples, (item, original index) [[the items of SL]].
Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the max operation will consider "better" those items that occurred earlier in the original list.
This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:
def most_common(L):
groups = itertools.groupby(sorted(L))
def _auxfun((item, iterable)):
return len(list(iterable)), -L.index(item)
return max(groups, key=_auxfun)[0]
same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the L.index of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)
Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).
from itertools import groupby as g
def most_common_oneliner(L):
return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]
What you want is known in statistics as mode, and Python of course has a built-in function to do exactly that for you:
>>> from statistics import mode
>>> mode([1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6])
3
Note that if there is no "most common element" such as cases where the top two are tied, this will raise StatisticsError on Python
<=3.7, and on 3.8 onwards it will return the first one encountered.
Without the requirement about the lowest index, you can use collections.Counter for this:
from collections import Counter
a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801]
c = Counter(a)
print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'
If they are not hashable, you can sort them and do a single loop over the result counting the items (identical items will be next to each other). But it might be faster to make them hashable and use a dict.
def most_common(lst):
cur_length = 0
max_length = 0
cur_i = 0
max_i = 0
cur_item = None
max_item = None
for i, item in sorted(enumerate(lst), key=lambda x: x[1]):
if cur_item is None or cur_item != item:
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
max_length = cur_length
max_i = cur_i
max_item = cur_item
cur_length = 1
cur_i = i
cur_item = item
else:
cur_length += 1
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
return cur_item
return max_item
This is an O(n) solution.
mydict = {}
cnt, itm = 0, ''
for item in reversed(lst):
mydict[item] = mydict.get(item, 0) + 1
if mydict[item] >= cnt :
cnt, itm = mydict[item], item
print itm
(reversed is used to make sure that it returns the lowest index item)
Sort a copy of the list and find the longest run. You can decorate the list before sorting it with the index of each element, and then choose the run that starts with the lowest index in the case of a tie.
A one-liner:
def most_common (lst):
return max(((item, lst.count(item)) for item in set(lst)), key=lambda a: a[1])[0]
I am doing this using scipy stat module and lambda:
import scipy.stats
lst = [1,2,3,4,5,6,7,5]
most_freq_val = lambda x: scipy.stats.mode(x)[0][0]
print(most_freq_val(lst))
Result:
most_freq_val = 5
# use Decorate, Sort, Undecorate to solve the problem
def most_common(iterable):
# Make a list with tuples: (item, index)
# The index will be used later to break ties for most common item.
lst = [(x, i) for i, x in enumerate(iterable)]
lst.sort()
# lst_final will also be a list of tuples: (count, index, item)
# Sorting on this list will find us the most common item, and the index
# will break ties so the one listed first wins. Count is negative so
# largest count will have lowest value and sort first.
lst_final = []
# Get an iterator for our new list...
itr = iter(lst)
# ...and pop the first tuple off. Setup current state vars for loop.
count = 1
tup = next(itr)
x_cur, i_cur = tup
# Loop over sorted list of tuples, counting occurrences of item.
for tup in itr:
# Same item again?
if x_cur == tup[0]:
# Yes, same item; increment count
count += 1
else:
# No, new item, so write previous current item to lst_final...
t = (-count, i_cur, x_cur)
lst_final.append(t)
# ...and reset current state vars for loop.
x_cur, i_cur = tup
count = 1
# Write final item after loop ends
t = (-count, i_cur, x_cur)
lst_final.append(t)
lst_final.sort()
answer = lst_final[0][2]
return answer
print most_common(['x', 'e', 'a', 'e', 'a', 'e', 'e']) # prints 'e'
print most_common(['goose', 'duck', 'duck', 'goose']) # prints 'goose'
Building on Luiz's answer, but satisfying the "in case of draws the item with the lowest index should be returned" condition:
from statistics import mode, StatisticsError
def most_common(l):
try:
return mode(l)
except StatisticsError as e:
# will only return the first element if no unique mode found
if 'no unique mode' in e.args[0]:
return l[0]
# this is for "StatisticsError: no mode for empty data"
# after calling mode([])
raise
Example:
>>> most_common(['a', 'b', 'b'])
'b'
>>> most_common([1, 2])
1
>>> most_common([])
StatisticsError: no mode for empty data
Simple one line solution
moc= max([(lst.count(chr),chr) for chr in set(lst)])
It will return most frequent element with its frequency.
You probably don't need this anymore, but this is what I did for a similar problem. (It looks longer than it is because of the comments.)
itemList = ['hi', 'hi', 'hello', 'bye']
counter = {}
maxItemCount = 0
for item in itemList:
try:
# Referencing this will cause a KeyError exception
# if it doesn't already exist
counter[item]
# ... meaning if we get this far it didn't happen so
# we'll increment
counter[item] += 1
except KeyError:
# If we got a KeyError we need to create the
# dictionary key
counter[item] = 1
# Keep overwriting maxItemCount with the latest number,
# if it's higher than the existing itemCount
if counter[item] > maxItemCount:
maxItemCount = counter[item]
mostPopularItem = item
print mostPopularItem
ans = [1, 1, 0, 0, 1, 1]
all_ans = {ans.count(ans[i]): ans[i] for i in range(len(ans))}
print(all_ans)
all_ans={4: 1, 2: 0}
max_key = max(all_ans.keys())
4
print(all_ans[max_key])
1
#This will return the list sorted by frequency:
def orderByFrequency(list):
listUniqueValues = np.unique(list)
listQty = []
listOrderedByFrequency = []
for i in range(len(listUniqueValues)):
listQty.append(list.count(listUniqueValues[i]))
for i in range(len(listQty)):
index_bigger = np.argmax(listQty)
for j in range(listQty[index_bigger]):
listOrderedByFrequency.append(listUniqueValues[index_bigger])
listQty[index_bigger] = -1
return listOrderedByFrequency
#And this will return a list with the most frequent values in a list:
def getMostFrequentValues(list):
if (len(list) <= 1):
return list
list_most_frequent = []
list_ordered_by_frequency = orderByFrequency(list)
list_most_frequent.append(list_ordered_by_frequency[0])
frequency = list_ordered_by_frequency.count(list_ordered_by_frequency[0])
index = 0
while(index < len(list_ordered_by_frequency)):
index = index + frequency
if(index < len(list_ordered_by_frequency)):
testValue = list_ordered_by_frequency[index]
testValueFrequency = list_ordered_by_frequency.count(testValue)
if (testValueFrequency == frequency):
list_most_frequent.append(testValue)
else:
break
return list_most_frequent
#tests:
print(getMostFrequentValues([]))
print(getMostFrequentValues([1]))
print(getMostFrequentValues([1,1]))
print(getMostFrequentValues([2,1]))
print(getMostFrequentValues([2,2,1]))
print(getMostFrequentValues([1,2,1,2]))
print(getMostFrequentValues([1,2,1,2,2]))
print(getMostFrequentValues([3,2,3,5,6,3,2,2]))
print(getMostFrequentValues([1,2,2,60,50,3,3,50,3,4,50,4,4,60,60]))
Results:
[]
[1]
[1]
[1, 2]
[2]
[1, 2]
[2]
[2, 3]
[3, 4, 50, 60]
Here:
def most_common(l):
max = 0
maxitem = None
for x in set(l):
count = l.count(x)
if count > max:
max = count
maxitem = x
return maxitem
I have a vague feeling there is a method somewhere in the standard library that will give you the count of each element, but I can't find it.
This is the obvious slow solution (O(n^2)) if neither sorting nor hashing is feasible, but equality comparison (==) is available:
def most_common(items):
if not items:
raise ValueError
fitems = []
best_idx = 0
for item in items:
item_missing = True
i = 0
for fitem in fitems:
if fitem[0] == item:
fitem[1] += 1
d = fitem[1] - fitems[best_idx][1]
if d > 0 or (d == 0 and fitems[best_idx][2] > fitem[2]):
best_idx = i
item_missing = False
break
i += 1
if item_missing:
fitems.append([item, 1, i])
return items[best_idx]
But making your items hashable or sortable (as recommended by other answers) would almost always make finding the most common element faster if the length of your list (n) is large. O(n) on average with hashing, and O(n*log(n)) at worst for sorting.
>>> li = ['goose', 'duck', 'duck']
>>> def foo(li):
st = set(li)
mx = -1
for each in st:
temp = li.count(each):
if mx < temp:
mx = temp
h = each
return h
>>> foo(li)
'duck'
I needed to do this in a recent program. I'll admit it, I couldn't understand Alex's answer, so this is what I ended up with.
def mostPopular(l):
mpEl=None
mpIndex=0
mpCount=0
curEl=None
curCount=0
for i, el in sorted(enumerate(l), key=lambda x: (x[1], x[0]), reverse=True):
curCount=curCount+1 if el==curEl else 1
curEl=el
if curCount>mpCount \
or (curCount==mpCount and i<mpIndex):
mpEl=curEl
mpIndex=i
mpCount=curCount
return mpEl, mpCount, mpIndex
I timed it against Alex's solution and it's about 10-15% faster for short lists, but once you go over 100 elements or more (tested up to 200000) it's about 20% slower.
def most_frequent(List):
counter = 0
num = List[0]
for i in List:
curr_frequency = List.count(i)
if(curr_frequency> counter):
counter = curr_frequency
num = i
return num
List = [2, 1, 2, 2, 1, 3]
print(most_frequent(List))
Hi this is a very simple solution, with linear time complexity
L = ['goose', 'duck', 'duck']
def most_common(L):
current_winner = 0
max_repeated = None
for i in L:
amount_times = L.count(i)
if amount_times > current_winner:
current_winner = amount_times
max_repeated = i
return max_repeated
print(most_common(L))
"duck"
Where number, is the element in the list that repeats most of the time
numbers = [1, 3, 7, 4, 3, 0, 3, 6, 3]
max_repeat_num = max(numbers, key=numbers.count) *# which number most* frequently
max_repeat = numbers.count(max_repeat_num) *#how many times*
print(f" the number {max_repeat_num} is repeated{max_repeat} times")
def mostCommonElement(list):
count = {} // dict holder
max = 0 // keep track of the count by key
result = None // holder when count is greater than max
for i in list:
if i not in count:
count[i] = 1
else:
count[i] += 1
if count[i] > max:
max = count[i]
result = i
return result
mostCommonElement(["a","b","a","c"]) -> "a"
The most common element should be the one which is appearing more than N/2 times in the array where N being the len(array). The below technique will do it in O(n) time complexity, with just consuming O(1) auxiliary space.
from collections import Counter
def majorityElement(arr):
majority_elem = Counter(arr)
size = len(arr)
for key, val in majority_elem.items():
if val > size/2:
return key
return -1
def most_common(lst):
if max([lst.count(i)for i in lst]) == 1:
return False
else:
return max(set(lst), key=lst.count)
def popular(L):
C={}
for a in L:
C[a]=L.count(a)
for b in C.keys():
if C[b]==max(C.values()):
return b
L=[2,3,5,3,6,3,6,3,6,3,7,467,4,7,4]
print popular(L)
Related
What is an efficient way to find the most common element in a Python list?
My list items may not be hashable so can't use a dictionary.
Also in case of draws the item with the lowest index should be returned. Example:
>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'
A simpler one-liner:
def most_common(lst):
return max(set(lst), key=lst.count)
Borrowing from here, this can be used with Python 2.7:
from collections import Counter
def Most_Common(lst):
data = Counter(lst)
return data.most_common(1)[0][0]
Works around 4-6 times faster than Alex's solutions, and is 50 times faster than the one-liner proposed by newacct.
On CPython 3.6+ (any Python 3.7+) the above will select the first seen element in case of ties. If you're running on older Python, to retrieve the element that occurs first in the list in case of ties you need to do two passes to preserve order:
# Only needed pre-3.6!
def most_common(lst):
data = Counter(lst)
return max(lst, key=data.get)
With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [itertools.groupby][1]. itertools offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:
import itertools
import operator
def most_common(L):
# get an iterable of (item, iterable) pairs
SL = sorted((x, i) for i, x in enumerate(L))
# print 'SL:', SL
groups = itertools.groupby(SL, key=operator.itemgetter(0))
# auxiliary function to get "quality" for an item
def _auxfun(g):
item, iterable = g
count = 0
min_index = len(L)
for _, where in iterable:
count += 1
min_index = min(min_index, where)
# print 'item %r, count %r, minind %r' % (item, count, min_index)
return count, -min_index
# pick the highest-count/earliest item
return max(groups, key=_auxfun)[0]
This could be written more concisely, of course, but I'm aiming for maximal clarity. The two print statements can be uncommented to better see the machinery in action; for example, with prints uncommented:
print most_common(['goose', 'duck', 'duck', 'goose'])
emits:
SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose
As you see, SL is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).
groupby groups by the item only (via operator.itemgetter). The auxiliary function, called once per grouping during the max computation, receives and internally unpacks a group - a tuple with two items (item, iterable) where the iterable's items are also two-item tuples, (item, original index) [[the items of SL]].
Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the max operation will consider "better" those items that occurred earlier in the original list.
This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:
def most_common(L):
groups = itertools.groupby(sorted(L))
def _auxfun((item, iterable)):
return len(list(iterable)), -L.index(item)
return max(groups, key=_auxfun)[0]
same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the L.index of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)
Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).
from itertools import groupby as g
def most_common_oneliner(L):
return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]
What you want is known in statistics as mode, and Python of course has a built-in function to do exactly that for you:
>>> from statistics import mode
>>> mode([1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6])
3
Note that if there is no "most common element" such as cases where the top two are tied, this will raise StatisticsError on Python
<=3.7, and on 3.8 onwards it will return the first one encountered.
Without the requirement about the lowest index, you can use collections.Counter for this:
from collections import Counter
a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801]
c = Counter(a)
print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'
If they are not hashable, you can sort them and do a single loop over the result counting the items (identical items will be next to each other). But it might be faster to make them hashable and use a dict.
def most_common(lst):
cur_length = 0
max_length = 0
cur_i = 0
max_i = 0
cur_item = None
max_item = None
for i, item in sorted(enumerate(lst), key=lambda x: x[1]):
if cur_item is None or cur_item != item:
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
max_length = cur_length
max_i = cur_i
max_item = cur_item
cur_length = 1
cur_i = i
cur_item = item
else:
cur_length += 1
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
return cur_item
return max_item
This is an O(n) solution.
mydict = {}
cnt, itm = 0, ''
for item in reversed(lst):
mydict[item] = mydict.get(item, 0) + 1
if mydict[item] >= cnt :
cnt, itm = mydict[item], item
print itm
(reversed is used to make sure that it returns the lowest index item)
Sort a copy of the list and find the longest run. You can decorate the list before sorting it with the index of each element, and then choose the run that starts with the lowest index in the case of a tie.
A one-liner:
def most_common (lst):
return max(((item, lst.count(item)) for item in set(lst)), key=lambda a: a[1])[0]
I am doing this using scipy stat module and lambda:
import scipy.stats
lst = [1,2,3,4,5,6,7,5]
most_freq_val = lambda x: scipy.stats.mode(x)[0][0]
print(most_freq_val(lst))
Result:
most_freq_val = 5
# use Decorate, Sort, Undecorate to solve the problem
def most_common(iterable):
# Make a list with tuples: (item, index)
# The index will be used later to break ties for most common item.
lst = [(x, i) for i, x in enumerate(iterable)]
lst.sort()
# lst_final will also be a list of tuples: (count, index, item)
# Sorting on this list will find us the most common item, and the index
# will break ties so the one listed first wins. Count is negative so
# largest count will have lowest value and sort first.
lst_final = []
# Get an iterator for our new list...
itr = iter(lst)
# ...and pop the first tuple off. Setup current state vars for loop.
count = 1
tup = next(itr)
x_cur, i_cur = tup
# Loop over sorted list of tuples, counting occurrences of item.
for tup in itr:
# Same item again?
if x_cur == tup[0]:
# Yes, same item; increment count
count += 1
else:
# No, new item, so write previous current item to lst_final...
t = (-count, i_cur, x_cur)
lst_final.append(t)
# ...and reset current state vars for loop.
x_cur, i_cur = tup
count = 1
# Write final item after loop ends
t = (-count, i_cur, x_cur)
lst_final.append(t)
lst_final.sort()
answer = lst_final[0][2]
return answer
print most_common(['x', 'e', 'a', 'e', 'a', 'e', 'e']) # prints 'e'
print most_common(['goose', 'duck', 'duck', 'goose']) # prints 'goose'
Building on Luiz's answer, but satisfying the "in case of draws the item with the lowest index should be returned" condition:
from statistics import mode, StatisticsError
def most_common(l):
try:
return mode(l)
except StatisticsError as e:
# will only return the first element if no unique mode found
if 'no unique mode' in e.args[0]:
return l[0]
# this is for "StatisticsError: no mode for empty data"
# after calling mode([])
raise
Example:
>>> most_common(['a', 'b', 'b'])
'b'
>>> most_common([1, 2])
1
>>> most_common([])
StatisticsError: no mode for empty data
Simple one line solution
moc= max([(lst.count(chr),chr) for chr in set(lst)])
It will return most frequent element with its frequency.
You probably don't need this anymore, but this is what I did for a similar problem. (It looks longer than it is because of the comments.)
itemList = ['hi', 'hi', 'hello', 'bye']
counter = {}
maxItemCount = 0
for item in itemList:
try:
# Referencing this will cause a KeyError exception
# if it doesn't already exist
counter[item]
# ... meaning if we get this far it didn't happen so
# we'll increment
counter[item] += 1
except KeyError:
# If we got a KeyError we need to create the
# dictionary key
counter[item] = 1
# Keep overwriting maxItemCount with the latest number,
# if it's higher than the existing itemCount
if counter[item] > maxItemCount:
maxItemCount = counter[item]
mostPopularItem = item
print mostPopularItem
ans = [1, 1, 0, 0, 1, 1]
all_ans = {ans.count(ans[i]): ans[i] for i in range(len(ans))}
print(all_ans)
all_ans={4: 1, 2: 0}
max_key = max(all_ans.keys())
4
print(all_ans[max_key])
1
#This will return the list sorted by frequency:
def orderByFrequency(list):
listUniqueValues = np.unique(list)
listQty = []
listOrderedByFrequency = []
for i in range(len(listUniqueValues)):
listQty.append(list.count(listUniqueValues[i]))
for i in range(len(listQty)):
index_bigger = np.argmax(listQty)
for j in range(listQty[index_bigger]):
listOrderedByFrequency.append(listUniqueValues[index_bigger])
listQty[index_bigger] = -1
return listOrderedByFrequency
#And this will return a list with the most frequent values in a list:
def getMostFrequentValues(list):
if (len(list) <= 1):
return list
list_most_frequent = []
list_ordered_by_frequency = orderByFrequency(list)
list_most_frequent.append(list_ordered_by_frequency[0])
frequency = list_ordered_by_frequency.count(list_ordered_by_frequency[0])
index = 0
while(index < len(list_ordered_by_frequency)):
index = index + frequency
if(index < len(list_ordered_by_frequency)):
testValue = list_ordered_by_frequency[index]
testValueFrequency = list_ordered_by_frequency.count(testValue)
if (testValueFrequency == frequency):
list_most_frequent.append(testValue)
else:
break
return list_most_frequent
#tests:
print(getMostFrequentValues([]))
print(getMostFrequentValues([1]))
print(getMostFrequentValues([1,1]))
print(getMostFrequentValues([2,1]))
print(getMostFrequentValues([2,2,1]))
print(getMostFrequentValues([1,2,1,2]))
print(getMostFrequentValues([1,2,1,2,2]))
print(getMostFrequentValues([3,2,3,5,6,3,2,2]))
print(getMostFrequentValues([1,2,2,60,50,3,3,50,3,4,50,4,4,60,60]))
Results:
[]
[1]
[1]
[1, 2]
[2]
[1, 2]
[2]
[2, 3]
[3, 4, 50, 60]
Here:
def most_common(l):
max = 0
maxitem = None
for x in set(l):
count = l.count(x)
if count > max:
max = count
maxitem = x
return maxitem
I have a vague feeling there is a method somewhere in the standard library that will give you the count of each element, but I can't find it.
This is the obvious slow solution (O(n^2)) if neither sorting nor hashing is feasible, but equality comparison (==) is available:
def most_common(items):
if not items:
raise ValueError
fitems = []
best_idx = 0
for item in items:
item_missing = True
i = 0
for fitem in fitems:
if fitem[0] == item:
fitem[1] += 1
d = fitem[1] - fitems[best_idx][1]
if d > 0 or (d == 0 and fitems[best_idx][2] > fitem[2]):
best_idx = i
item_missing = False
break
i += 1
if item_missing:
fitems.append([item, 1, i])
return items[best_idx]
But making your items hashable or sortable (as recommended by other answers) would almost always make finding the most common element faster if the length of your list (n) is large. O(n) on average with hashing, and O(n*log(n)) at worst for sorting.
>>> li = ['goose', 'duck', 'duck']
>>> def foo(li):
st = set(li)
mx = -1
for each in st:
temp = li.count(each):
if mx < temp:
mx = temp
h = each
return h
>>> foo(li)
'duck'
I needed to do this in a recent program. I'll admit it, I couldn't understand Alex's answer, so this is what I ended up with.
def mostPopular(l):
mpEl=None
mpIndex=0
mpCount=0
curEl=None
curCount=0
for i, el in sorted(enumerate(l), key=lambda x: (x[1], x[0]), reverse=True):
curCount=curCount+1 if el==curEl else 1
curEl=el
if curCount>mpCount \
or (curCount==mpCount and i<mpIndex):
mpEl=curEl
mpIndex=i
mpCount=curCount
return mpEl, mpCount, mpIndex
I timed it against Alex's solution and it's about 10-15% faster for short lists, but once you go over 100 elements or more (tested up to 200000) it's about 20% slower.
def most_frequent(List):
counter = 0
num = List[0]
for i in List:
curr_frequency = List.count(i)
if(curr_frequency> counter):
counter = curr_frequency
num = i
return num
List = [2, 1, 2, 2, 1, 3]
print(most_frequent(List))
Hi this is a very simple solution, with linear time complexity
L = ['goose', 'duck', 'duck']
def most_common(L):
current_winner = 0
max_repeated = None
for i in L:
amount_times = L.count(i)
if amount_times > current_winner:
current_winner = amount_times
max_repeated = i
return max_repeated
print(most_common(L))
"duck"
Where number, is the element in the list that repeats most of the time
numbers = [1, 3, 7, 4, 3, 0, 3, 6, 3]
max_repeat_num = max(numbers, key=numbers.count) *# which number most* frequently
max_repeat = numbers.count(max_repeat_num) *#how many times*
print(f" the number {max_repeat_num} is repeated{max_repeat} times")
def mostCommonElement(list):
count = {} // dict holder
max = 0 // keep track of the count by key
result = None // holder when count is greater than max
for i in list:
if i not in count:
count[i] = 1
else:
count[i] += 1
if count[i] > max:
max = count[i]
result = i
return result
mostCommonElement(["a","b","a","c"]) -> "a"
The most common element should be the one which is appearing more than N/2 times in the array where N being the len(array). The below technique will do it in O(n) time complexity, with just consuming O(1) auxiliary space.
from collections import Counter
def majorityElement(arr):
majority_elem = Counter(arr)
size = len(arr)
for key, val in majority_elem.items():
if val > size/2:
return key
return -1
def most_common(lst):
if max([lst.count(i)for i in lst]) == 1:
return False
else:
return max(set(lst), key=lst.count)
def popular(L):
C={}
for a in L:
C[a]=L.count(a)
for b in C.keys():
if C[b]==max(C.values()):
return b
L=[2,3,5,3,6,3,6,3,6,3,7,467,4,7,4]
print popular(L)
I have a sorted list that looks like this:
sortedlist = ['0','0','0','1','1,'1,'2',2','3']
I also have a count variable:
count = '1'
*note: sometimes count can be an integar greater that the max value in the list. For example count = '4'
What I want to do is to find the first occurrence of the count in the list and print the index. If the value is greater than the max value in the list, then assign a string. Here is what I have tried:
maxvalue = max(sortedlist)
for i in sortedlist:
if int(count) < int(sortedlist[int(i)]):
indexval = i
break
OutputFile.write(''+str(indexval)+'\n')
if int(count) > int(maxvalue):
indexval = "over"
OutputFile.write(''+str(indexval)+'\n')
I thought the break would end the for loop, but I'm only getting results from the last if statement. Am I doing something incorrectly?
Your logic is wrong, you have a so called sorted list of strings which unless you compared as integer would not be sorted correctly, you should use integers from the get-go and bisect_left to find index:
from bisect import bisect_left
sortedlist = sorted(map(int, ['0', '0', '0', '1', '1', '1', '2', '2', '3']))
count = 0
def get_val(lst, cn):
if lst[-1] < cn:
return "whatever"
return bisect_left(lst, cn, hi=len(lst) - 1)
If the value falls between two as per your requirement, you will get the first index of the higher value, if you get an exact match you will get that index:
In [13]: lst = [0,0,2,2]
In [14]: get_val(lst, 1)
Out[14]: 2
In [15]: lst = [0,0,1,1,2,2,2,3]
In [16]: get_val(lst, 2)
Out[16]: 4
In [17]: get_val(lst, 9)
Out[17]: 'whatever'
As there are some over-complicated solutions here it's worth posting how straightforwardly this can be done:
def get_index(a, L):
for i, b in enumerate(L):
if b >= a:
return i
return "over"
get_index('1', ['0','0','2','2','3'])
>>> 2
get_index('1', ['0','0','0','1','2','3'])
>>> 3
get_index('4', ['0','0','0','1','2','3'])
>>> 'over'
But, use bisect.
You could use a function (using EAFP principle) to find the first occurrence that is equal to or greater than the count:
In [239]: l = ['0','0','0','1','1','1','2','2','3']
In [240]: def get_index(count, sorted_list):
...: try:
...: return next(x[0] for x in enumerate(l) if int(x[1]) >= int(count))
...: except StopIteration:
...: return "over"
...:
In [241]: get_index('3', l)
Out[241]: 8
In [242]: get_index('7', l)
Out[242]: 'over'
As your list is already sorted, so the maximum value will be the last element of your list i.e maxval = sortedlist[-1] . secondly there is an error in your for loop. for i in sortedlist: This gives you each element in the list . To get index do a for loop on range len(sortedlist) Here i is the element in the list. You should break after writing to the file. Below is the fixed code:
maxvalue = sortedlist[-1]
if int(count) > int(maxvalue):
indexval = "over"
OutputFile.write(''+str(indexval)+'\n')
else:
for i in xrange(len(sortedlist)):
if int(count) <= int(sortedlist[int(i)]):
indexval = i
OutputFile.write(''+str(indexval)+'\n')
break
First of all:
for i in range(1, 100):
if i >= 3:
break
destroyTheInterwebz()
print i
Will never execute that last function. It will onmy peint 1 and 2. Because break immediately leaves the loop; it does not wait for the current iteration to finish.
In my opinion, the code would be nicer if you used a function indexOf and return instead of break.
Last but not least: the data structures here are pretty expensive. You may want to use integers instead of strings, and numpy arrays. You could then use the very fast numpy.searchsorted function.
Using itertools.dropwhile():
from itertools import dropwhile
sortedlist = [0, 0, 0, 1, 1, 1, 2, 2, 3]
def getindex(count):
index = len(sortedlist) - len(list(dropwhile(lambda x: x < count, sortedlist)))
return "some_string" if index >= len(sortedlist) else index
The test:
print(getindex(5))
> some_string
and:
print(getindex(3))
> 8
Explanation
dropwhile() drops the list until the first occurrence, when item < count returns False. By subrtracting the (number of) items after that from the length of the original list, we have the index.
"an iterator that drops elements from the iterable as long as the predicate is true; afterwards, returns every element."
This question already has answers here:
return the top n most frequently occurring chars and their respective counts in python
(3 answers)
Closed 6 years ago.
Can anyone help me with the Python code to return the most frequently occurring characters and their respective counts in a given string?
For example, "aaaaabbbbccc", 2 should return [('a', 5), ('b', 4)]
In case of a tie, the character which comes later in alphabet comes first
in the ordering. For example:
"cdbba",2 -> [('b',2), ('d',1)]
'cdbba',3 -> [('b',2), ('d',1), ('c',1)]
This could be a start for you:
from collections import Counter
import string
s = "aaaaabbbbccc"
counter = Counter(s)
for c in string.ascii_letters:
if counter[c] > 0:
print (c, counter[c])
You can adjust the code, and instead of printing, save the results in a list, sort it, and print top 2 results
First of all - What have you tried so far?
Please, read this before asking!
To solve that, first create a dictionary of frequencies, after that convert it to an array of tuples, and lastly sort.
def freq(st, n):
# Create the dictionary
dct = {}
for c in st:
dct[c] = dct.get(c, 0) + 1
# Convert to array of tuples
ar = [(key, val) for key, val in dct.iteritems()]
# Sort the array:
ar.sort(key=lambda x: x[1])
return ar[:n]
This will not solve your question completely - you still need to figure out how to break the ties. Here is one way: create another dictionary of first occurrences (just a dict with character as the key, and index as value), and before returning check iterate again while identifying the ties, and resolving them.
def freq(word, n):
l1 = list(word)
l3 = []
l2 = []
l4 = []
s = set(l1)
l2 = list(s)
def count(p):
c = 0
for a in word:
if a in word:
c = c + p.count(a)
l3.append(c)
return c
l2.sort(key=count)
l3.sort()
l4 = list(zip(l2, l3))
l4.reverse()
l4.sort(key=lambda l4: ((l4[1]), (l4[0])), reverse=True)
return l4[0:n]
pass
def freq(word, n):
res=[]
a = []
b = []
c=0
if type(word) != str or type(n) != int:
c = 1
raise TypeError
if n <= 0:
c = 1
raise ValueError
for x in word:
a.append(x)
b.append(word.count(x))
c = zip(a, b)
c = list(set(c))
c.sort(key=lambda t: t[0], reverse=True)
c.sort(key=lambda t: t[1], reverse=True)
count=0
for x in range(len(c)):
count+=1
if n>count:
n=count
for x in range(n):
res.append(c[x],)
return res
Counter built-in class provides a function called most_common.
Return a list of the n most common elements and their counts from the
most common to the least. If n is omitted or None, most_common()
returns all elements in the counter. Elements with equal counts are
ordered arbitrarily:
After getting characters and their frequency, we need to sort them first for his frequency and if there is a tie, we select the character with the highest ascii code. Then, we just pick up the N most common elements slicing by any number.
from collections import Counter
s = "cdbba"
sorted_counter = sorted(Counter(s).most_common(),
key = lambda x: (-x[1], -ord(x[0])))
slice_counter = sorted_counter[:2] # here you select n most common
>[('b',2), ('d',1)]
slice_counter = sorted_counter[:3] # here you select n most common
>[('b',2), ('d',1), ('c',1)]
I want to test if a list contains consecutive integers and no repetition of numbers.
For example, if I have
l = [1, 3, 5, 2, 4, 6]
It should return True.
How should I check if the list contains up to n consecutive numbers without modifying the original list?
I thought about copying the list and removing each number that appears in the original list and if the list is empty then it will return True.
Is there a better way to do this?
For the whole list, it should just be as simple as
sorted(l) == list(range(min(l), max(l)+1))
This preserves the original list, but making a copy (and then sorting) may be expensive if your list is particularly long.
Note that in Python 2 you could simply use the below because range returned a list object. In 3.x and higher the function has been changed to return a range object, so an explicit conversion to list is needed before comparing to sorted(l)
sorted(l) == range(min(l), max(l)+1))
To check if n entries are consecutive and non-repeating, it gets a little more complicated:
def check(n, l):
subs = [l[i:i+n] for i in range(len(l)) if len(l[i:i+n]) == n]
return any([(sorted(sub) in range(min(l), max(l)+1)) for sub in subs])
The first code removes duplicates but keeps order:
from itertools import groupby, count
l = [1,2,4,5,2,1,5,6,5,3,5,5]
def remove_duplicates(values):
output = []
seen = set()
for value in values:
if value not in seen:
output.append(value)
seen.add(value)
return output
l = remove_duplicates(l) # output = [1, 2, 4, 5, 6, 3]
The next set is to identify which ones are in order, taken from here:
def as_range(iterable):
l = list(iterable)
if len(l) > 1:
return '{0}-{1}'.format(l[0], l[-1])
else:
return '{0}'.format(l[0])
l = ','.join(as_range(g) for _, g in groupby(l, key=lambda n, c=count(): n-next(c)))
l outputs as: 1-2,4-6,3
You can customize the functions depending on your output.
We can use known mathematics formula for checking consecutiveness,
Assuming min number always start from 1
sum of consecutive n numbers 1...n = n * (n+1) /2
def check_is_consecutive(l):
maximum = max(l)
if sum(l) == maximum * (maximum+1) /2 :
return True
return False
Once you verify that the list has no duplicates, just compute the sum of the integers between min(l) and max(l):
def check(l):
total = 0
minimum = float('+inf')
maximum = float('-inf')
seen = set()
for n in l:
if n in seen:
return False
seen.add(n)
if n < minimum:
minimum = n
if n > maximum:
maximum = n
total += n
if 2 * total != maximum * (maximum + 1) - minimum * (minimum - 1):
return False
return True
import numpy as np
import pandas as pd
(sum(np.diff(sorted(l)) == 1) >= n) & (all(pd.Series(l).value_counts() == 1))
We test both conditions, first by finding the iterative difference of the sorted list np.diff(sorted(l)) we can test if there are n consecutive integers. Lastly, we test if the value_counts() are all 1, indicating no repeats.
I split your query into two parts part A "list contains up to n consecutive numbers" this is the first line if len(l) != len(set(l)):
And part b, splits the list into possible shorter lists and checks if they are consecutive.
def example (l, n):
if len(l) != len(set(l)): # part a
return False
for i in range(0, len(l)-n+1): # part b
if l[i:i+3] == sorted(l[i:i+3]):
return True
return False
l = [1, 3, 5, 2, 4, 6]
print example(l, 3)
def solution(A):
counter = [0]*len(A)
limit = len(A)
for element in A:
if not 1 <= element <= limit:
return False
else:
if counter[element-1] != 0:
return False
else:
counter[element-1] = 1
return True
The input to this function is your list.This function returns False if the numbers are repeated.
The below code works even if the list does not start with 1.
def check_is_consecutive(l):
"""
sorts the list and
checks if the elements in the list are consecutive
This function does not handle any exceptions.
returns true if the list contains consecutive numbers, else False
"""
l = list(filter(None,l))
l = sorted(l)
if len(l) > 1:
maximum = l[-1]
minimum = l[0] - 1
if minimum == 0:
if sum(l) == (maximum * (maximum+1) /2):
return True
else:
return False
else:
if sum(l) == (maximum * (maximum+1) /2) - (minimum * (minimum+1) /2) :
return True
else:
return False
else:
return True
1.
l.sort()
2.
for i in range(0,len(l)-1)))
print(all((l[i+1]-l[i]==1)
list must be sorted!
lst = [9,10,11,12,13,14,15,16]
final = True if len( [ True for x in lst[:-1] for y in lst[1:] if x + 1 == y ] ) == len(lst[1:]) else False
i don't know how efficient this is but it should do the trick.
With sorting
In Python 3, I use this simple solution:
def check(lst):
lst = sorted(lst)
if lst:
return lst == list(range(lst[0], lst[-1] + 1))
else:
return True
Note that, after sorting the list, its minimum and maximum come for free as the first (lst[0]) and the last (lst[-1]) elements.
I'm returning True in case the argument is empty, but this decision is arbitrary. Choose whatever fits best your use case.
In this solution, we first sort the argument and then compare it with another list that we know that is consecutive and has no repetitions.
Without sorting
In one of the answers, the OP commented asking if it would be possible to do the same without sorting the list. This is interesting, and this is my solution:
def check(lst):
if lst:
r = range(min(lst), max(lst) + 1) # *r* is our reference
return (
len(lst) == len(r)
and all(map(lst.__contains__, r))
# alternative: all(x in lst for x in r)
# test if every element of the reference *r* is in *lst*
)
else:
return True
In this solution, we build a reference range r that is a consecutive (and thus non-repeating) sequence of ints. With this, our test is simple: first we check that lst has the correct number of elements (not more, which would indicate repetitions, nor less, which indicates gaps) by comparing it with the reference. Then we check that every element in our reference is also in lst (this is what all(map(lst.__contains__, r)) is doing: it iterates over r and tests if all of its elements are in lts).
l = [1, 3, 5, 2, 4, 6]
from itertools import chain
def check_if_consecutive_and_no_duplicates(my_list=None):
return all(
list(
chain.from_iterable(
[
[a + 1 in sorted(my_list) for a in sorted(my_list)[:-1]],
[sorted(my_list)[-2] + 1 in my_list],
[len(my_list) == len(set(my_list))],
]
)
)
)
Add 1 to any number in the list except for the last number(6) and check if the result is in the list. For the last number (6) which is the greatest one, pick the number before it(5) and add 1 and check if the result(6) is in the list.
Here is a really short easy solution without having to use any imports:
range = range(10)
L = [1,3,5,2,4,6]
L = sorted(L, key = lambda L:L)
range[(L[0]):(len(L)+L[0])] == L
>>True
This works for numerical lists of any length and detects duplicates.
Basically, you are creating a range your list could potentially be in, editing that range to match your list's criteria (length, starting value) and making a snapshot comparison. I came up with this for a card game I am coding where I need to detect straights/runs in a hand and it seems to work pretty well.
In my program I need to put a while function which sums this list until a particular number is found:
[5,8,1,999,7,5]
The output is supposed to be 14, because it sums 5+8+1 and stops when it finds 999.
My idea looks like:
def mentre(llista):
while llista != 999:
solution = sum(llista)
return solution
Use the iter-function:
>>> l = [5,8,1,999,7,5]
>>> sum(iter(iter(l).next, 999))
14
iter calls the first argument, until the second argument is found. So all numbers are summed up, till 999 is found.
Since you mention using a while loop, you could try a generator-based approach using itertools.takewhile:
>>> from itertools import takewhile
>>> l = [5,8,1,999,7,5]
>>> sum(takewhile(lambda a: a != 999, l))
14
The generator consumes from the list l as long as the predicate (a != 999) is true, and these values are summed. The predicate can be anything you like here (like a normal while loop), e.g. you could sum the list while the values are less than 500.
An example of explicitly using a while loop would be as follows:
def sum_until_found(lst, num):
index = 0
res = 0
if num in lst:
while index < lst.index(num):
res += lst[index]
index += 1
else:
return "The number is not in the list!"
return res
Another possible way is:
def sum_until_found(lst, num):
index = 0
res = 0
found = False
if num in lst:
while not found:
res += lst[index]
index += 1
if lst[index] == num:
found = True
else:
return "The number is not in the list!"
return res
There's many ways of doing this without using a while loop, one of which is using recursion:
def sum_until_found_3(lst, num, res=0):
if num in lst:
if lst[0] == num:
return res
else:
return sum_until_found_3(lst[1:], num, res + lst[0])
else:
return "The number is not in the list!"
Finally, an even simpler solution:
def sum_until_found(lst, num):
if num in lst:
return sum(lst[:lst.index(num)])
else:
return "The number is not in the list!"
Use index and slice
def mentre(llista):
solution = sum(lista[:lista.index(999)])
return solution
Demo
>>> lista = [5,8,1,999,7,5]
>>> sum(lista[:lista.index(999)])
14