What are some ways to get the most common value in a list?
l = [1,2,2]
So far I'm doing:
Counter(l).most_common()[0][0]
But I was wondering if there was a list method or something 'simpler' to do this?
That's pretty much as good as it gets - although I'd suggest using .most_common(1) which will be more efficient* than .most_common() and use it like so:
(value, count), = Counter(sequence).most_common(1)
*Source from collections.Counter:
if n is None:
return sorted(self.items(), key=_itemgetter(1), reverse=True)
return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
You can use max with list.count, but it's not efficient as your current solution:
>>> l = [1, 2, 2]
>>> max(set(l), key=l.count)
2
This is almost equivalent to what #JonClement 's solution does
>>> from collections import Counter
>>> l = [1,2,2]
>>> c = Counter(l)
>>> max(c, key=c.get)
2
As heapq.nlargest will run
if n == 1:
it = iter(iterable)
head = list(islice(it, 1))
if not head:
return []
if key is None:
return [max(chain(head, it))]
return [max(chain(head, it), key=key)]
in this specific case where n=1 which performs the same as above just without the list of a single tuple.
Related
If I have list of integers and a function getErrorType(int) that returns some enum type, what's a Pythonic way to get a dictionary where the key is the enum type and value is the count of how many values in the array returned that error type?
Example:
arr = [1, 2, 3]
getErrorType(1) returns EXCEPTION
getErrorType(2) returns MALFORMED_DATA
getErrorType(3) returns EXCEPTION
I want to be able to get: {EXCEPTION: 2, MALFORMED_DATA: 1}
I would use dict comprehension
d={getErrorType(a):a for a in arr}
EDIT:
ok to get the counts like OP said, i would do something like this
d={x: [getErrorType(a) for a in arr].count(x) for x in set([getErrorType(a) for a in arr])}
Though this may make it too hard to read to be pythonic/////
data = {}
for a in arr:
error_type = getErrorType(a)
if error_type in data:
data[error_type] += 1
else:
data[error_type] = 1
Don't think there is an efficient method to keep count using dict comprehension.
I would probably just use an iterative approach using a normal dictionary or a defaultdict
from collections import defaultdict
d = defaultdict(int)
for num in arr: d[getErrorType(num)] += 1
Made a generic function just to simulate you could pass the whole arr into your function then you could use .count on the new list that has the results to form a dicitonary
def getErrorType(a):
return ['Ex' if i % 2 else 'Mal' for i in a ]
arr = [1, 2, 3]
lista = getErrorType(arr)
dicta = {i: lista.count(i) for i in lista}
(xenial)vash#localhost:~/python/stack_overflow$ python3.7 helping.py
{'Ex': 2, 'Mal': 1}
I do not agree with looping through += 1 every item to create a dictionary that doesn't seem efficient, I stand by this
Combined some solution above
d = defaultdict(int)
for num in set(arr):
d[getErrorType(num)] += arr.count(num)
My professor gave me an exercise where I write a function that returns a list without the duplicate to the old list.
This is the code but I don't know how to write the method without using .remove():
def distinct(lst):
lstnew = []
c = range(len(lst))
for i in range(len(lst)):
if i in range(len(lst)) != c:
lstnew += [i]
c += 1
return lstnew
print distinct([1,3,1,2,6])
print distinct([['a','ab','a','ab']])
I forgot to write an important thing, I must preserve order in the output list.
[UPDATE]
After I read the answer of Jai Srivastav I code this:
def distinct(lst):
lstnew = []
for element in lst:
if element not in lstnew:
lstnew = lstnew + [element]
return lstnew
And It works perfectly
def distinct(lst):
dlst = []
for val in lst:
if val not in dlst:
dlst.append(val)
return dlst
Is this considered cheating?
>>> distinct = lambda lst: list(set(lst))
>>> distinct([1,3,1,2,6])
[1, 2, 3, 6]
>>> distinct(['a','ab','a','ab'])
['a', 'ab']
If order isn't important, you can cast it to a set, then back to a list
def distinct(lst):
return list(set(lst))
If you need to eliminate duplicates AND preserve order you can do this:
def distinct(lst):
seen = set()
for item in lst:
if item not in seen:
yield item
seen.add(item)
a = [1,3,1,2,6]
print(list(distinct(a)))
[1,3,2,6]
b = ['a','ab','a','ab']
print(list(distinct(b)))
['a', 'ab']
See a demo here: https://ideone.com/a2khCg
There are Excellent Solutions That I Already Applied. But my professor said us that we don't must use the methods of the list. Has anyone else got any more thoughts?
I need to create a function that returns the second smallest unique number, which means if
list1 = [5,4,3,2,2,1], I need to return 3, because 2 is not unique.
I've tried:
def second(list1):
result = sorted(list1)[1]
return result
and
def second(list1):
result = list(set((list1)))
return result
but they all return 2.
EDIT1:
Thanks guys! I got it working using this final code:
def second(list1):
b = [i for i in list1 if list1.count(i) == 1]
b.sort()
result = sorted(b)[1]
return result
EDIT 2:
Okay guys... really confused. My Prof just told me that if list1 = [1,1,2,3,4], it should return 2 because 2 is still the second smallest number, and if list1 = [1,2,2,3,4], it should return 3.
Code in eidt1 wont work if list1 = [1,1,2,3,4].
I think I need to do something like:
if duplicate number in position list1[0], then remove all duplicates and return second number.
Else if duplicate number postion not in list1[0], then just use the code in EDIT1.
Without using anything fancy, why not just get a list of uniques, sort it, and get the second list item?
a = [5,4,3,2,2,1] #second smallest is 3
b = [i for i in a if a.count(i) == 1]
b.sort()
>>>b[1]
3
a = [5,4,4,3,3,2,2,1] #second smallest is 5
b = [i for i in a if a.count(i) == 1]
b.sort()
>>> b[1]
5
Obviously you should test that your list has at least two unique numbers in it. In other words, make sure b has a length of at least 2.
Remove non unique elements - use sort/itertools.groupby or collections.Counter
Use min - O(n) to determine the minimum instead of sort - O(nlongn). (In any case if you are using groupby the data is already sorted) I missed the fact that OP wanted the second minimum, so sorting is still a better option here
Sample Code
Using Counter
>>> sorted(k for k, v in Counter(list1).items() if v == 1)[1]
1
Using Itertools
>>> sorted(k for k, g in groupby(sorted(list1)) if len(list(g)) == 1)[1]
3
Here's a fancier approach that doesn't use count (which means it should have significantly better performance on large datasets).
from collections import defaultdict
def getUnique(data):
dd = defaultdict(lambda: 0)
for value in data:
dd[value] += 1
result = [key for key in dd.keys() if dd[key] == 1]
result.sort()
return result
a = [5,4,3,2,2,1]
b = getUnique(a)
print(b)
# [1, 3, 4, 5]
print(b[1])
# 3
Okay guys! I got the working code thanks to all your help and helping me to think on the right track. This code works:
`def second(list1):
if len(list1)!= len(set(list1)):
result = sorted(list1)[2]
return result
elif len(list1) == len(set(list1)):
result = sorted(list1)[1]
return result`
Okay, here usage of set() on a list is not going to help. It doesn't purge the duplicated elements. What I mean is :
l1=[5,4,3,2,2,1]
print set(l1)
Prints
[0, 1, 2, 3, 4, 5]
Here, you're not removing the duplicated elements, but the list gets unique
In your example you want to remove all duplicated elements.
Try something like this.
l1=[5,4,3,2,2,1]
newlist=[]
for i in l1:
if l1.count(i)==1:
newlist.append(i)
print newlist
This in this example prints
[5, 4, 3, 1]
then you can use heapq to get your second largest number in your list, like this
print heapq.nsmallest(2, newlist)[-1]
Imports : import heapq, The above snippet prints 3 for you.
This should to the trick. Cheers!
I know that we can use the set in python to find if there is any duplicate in a list. I was just wondering, if we can find a duplicate in a list without using set.
Say, my list is
a=['1545','1254','1545']
then how to find a duplicate?
a=['1545','1254','1545']
from collections import Counter
print [item for item, count in Counter(a).items() if count != 1]
Output
['1545']
This solution runs in O(N). This will be a huge advantage if the list used has a lot of elements.
If you just want to find if the list has duplicates, you can simply do
a=['1545','1254','1545']
from collections import Counter
print any(count != 1 for count in Counter(a).values())
As #gnibbler suggested, this would be the practically fastest solution
from collections import defaultdict
def has_dup(a):
result = defaultdict(int)
for item in a:
result[item] += 1
if result[item] > 1:
return True
else:
return False
a=['1545','1254','1545']
print has_dup(a)
>>> lis = []
>>> a=['1545','1254','1545']
>>> for i in a:
... if i not in lis:
... lis.append(i)
...
>>> lis
['1545', '1254']
>>> set(a)
set(['1254', '1545'])
use list.count:
In [309]: a=['1545','1254','1545']
...: a.count('1545')>1
Out[309]: True
Using list.count:
>>> a = ['1545','1254','1545']
>>> any(a.count(x) > 1 for x in a) # To check whether there's any duplicate
True
>>> # To retrieve any single element that is duplicated
>>> next((x for x in a if a.count(x) > 1), None)
'1545'
# To get duplicate elements (used set literal!)
>>> {x for x in a if a.count(x) > 1}
set(['1545'])
sort the list and check that the next value is not equal to the last one..
a.sort()
last_x = None
for x in a:
if x == last_x:
print "duplicate: %s" % x
break # existence of duplicates is enough
last_x = x
This should be O(n log n) which is slower for big n than the Counter solution (but counter uses a dict under the hood.. which is not too dissimilar from a set really).
An alternative is to insert the elements and keep the list sorted.. see the bisect module. It makes your inserts slower but your check for duplicates fast.
If this is homework, your teacher is probably asking for the hideously inefficient .count() style answer.
In practice using a dict is your next best bet if set is disallowed.
>>> a = ['1545','1254','1545']
>>> D = {}
>>> for i in a:
... if i in D:
... print "duplicate", i
... break
... D[i] = i
... else:
... print "no duplicate"
...
duplicate 1545
Here is a version using groupby which is still much better that the .count() method
>>> from itertools import groupby
>>> a = ['1545','1254','1545']
>>> next(k for k, g in groupby(sorted(a)) if sum(1 for i in g) > 1)
'1545'
thanks all for working on this problem. I also got to learn a lot from different answers. This is how I have answered:
a=['1545','1254','1545']
d=[]
duplicates=False
for i in a:
if i not in d:
d.append(i)
if len(d)<len(a):
duplicates=True
else:
duplicates=False
print(duplicates)
What is the best way (best as in the conventional way) of checking whether all elements in a list are unique?
My current approach using a Counter is:
>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
if values > 1:
# do something
Can I do better?
Not the most efficient, but straight forward and concise:
if len(x) > len(set(x)):
pass # do something
Probably won't make much of a difference for short lists.
Here is a two-liner that will also do early exit:
>>> def allUnique(x):
... seen = set()
... return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False
If the elements of x aren't hashable, then you'll have to resort to using a list for seen:
>>> def allUnique(x):
... seen = list()
... return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False
An early-exit solution could be
def unique_values(g):
s = set()
for x in g:
if x in s: return False
s.add(x)
return True
however for small cases or if early-exiting is not the common case then I would expect len(x) != len(set(x)) being the fastest method.
for speed:
import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)
How about adding all the entries to a set and checking its length?
len(set(x)) == len(x)
Alternative to a set, you can use a dict.
len({}.fromkeys(x)) == len(x)
Another approach entirely, using sorted and groupby:
from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))
It requires a sort, but exits on the first repeated value.
Here is a recursive O(N2) version for fun:
def is_unique(lst):
if len(lst) > 1:
return is_unique(s[1:]) and (s[0] not in s[1:])
return True
Here is a recursive early-exit function:
def distinct(L):
if len(L) == 2:
return L[0] != L[1]
H = L[0]
T = L[1:]
if (H in T):
return False
else:
return distinct(T)
It's fast enough for me without using weird(slow) conversions while
having a functional-style approach.
All answer above are good but I prefer to use all_unique example from 30 seconds of python
You need to use set() on the given list to remove duplicates, compare its length with the length of the list.
def all_unique(lst):
return len(lst) == len(set(lst))
It returns True if all the values in a flat list are unique, False otherwise.
x = [1, 2, 3, 4, 5, 6]
y = [1, 2, 2, 3, 4, 5]
all_unique(x) # True
all_unique(y) # False
I've compared the suggested solutions with perfplot and found that
len(lst) == len(set(lst))
is indeed the fastest solution. If there are early duplicates in the list, there are some constant-time solutions which are to be preferred.
Code to reproduce the plot:
import perfplot
import numpy as np
import pandas as pd
def len_set(lst):
return len(lst) == len(set(lst))
def set_add(lst):
seen = set()
return not any(i in seen or seen.add(i) for i in lst)
def list_append(lst):
seen = list()
return not any(i in seen or seen.append(i) for i in lst)
def numpy_unique(lst):
return np.unique(lst).size == len(lst)
def set_add_early_exit(lst):
s = set()
for item in lst:
if item in s:
return False
s.add(item)
return True
def pandas_is_unique(lst):
return pd.Series(lst).is_unique
def sort_diff(lst):
return not np.any(np.diff(np.sort(lst)) == 0)
b = perfplot.bench(
setup=lambda n: list(np.arange(n)),
title="All items unique",
# setup=lambda n: [0] * n,
# title="All items equal",
kernels=[
len_set,
set_add,
list_append,
numpy_unique,
set_add_early_exit,
pandas_is_unique,
sort_diff,
],
n_range=[2**k for k in range(18)],
xlabel="len(lst)",
)
b.save("out.png")
b.show()
How about this
def is_unique(lst):
if not lst:
return True
else:
return Counter(lst).most_common(1)[0][1]==1
If and only if you have the data processing library pandas in your dependencies, there's an already implemented solution which gives the boolean you want :
import pandas as pd
pd.Series(lst).is_unique
You can use Yan's syntax (len(x) > len(set(x))), but instead of set(x), define a function:
def f5(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
and do len(x) > len(f5(x)). This will be fast and is also order preserving.
Code there is taken from: http://www.peterbe.com/plog/uniqifiers-benchmark
Using a similar approach in a Pandas dataframe to test if the contents of a column contains unique values:
if tempDF['var1'].size == tempDF['var1'].unique().size:
print("Unique")
else:
print("Not unique")
For me, this is instantaneous on an int variable in a dateframe containing over a million rows.
It does not fully fit the question but if you google the task I had you get this question ranked first and it might be of interest to the users as it is an extension of the quesiton. If you want to investigate for each list element if it is unique or not you can do the following:
import timeit
import numpy as np
def get_unique(mylist):
# sort the list and keep the index
sort = sorted((e,i) for i,e in enumerate(mylist))
# check for each element if it is similar to the previous or next one
isunique = [[sort[0][1],sort[0][0]!=sort[1][0]]] + \
[[s[1], (s[0]!=sort[i-1][0])and(s[0]!=sort[i+1][0])]
for [i,s] in enumerate (sort) if (i>0) and (i<len(sort)-1) ] +\
[[sort[-1][1],sort[-1][0]!=sort[-2][0]]]
# sort indices and booleans and return only the boolean
return [a[1] for a in sorted(isunique)]
def get_unique_using_count(mylist):
return [mylist.count(item)==1 for item in mylist]
mylist = list(np.random.randint(0,10,10))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)
mylist = list(np.random.randint(0,1000,1000))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)
for short lists the get_unique_using_count as suggested in some answers is fast. But if your list is already longer than 100 elements the count function takes quite long. Thus the approach shown in the get_unique function is much faster although it looks more complicated.
If the list is sorted anyway, you can use:
not any(sorted_list[i] == sorted_list[i + 1] for i in range(len(sorted_list) - 1))
Pretty efficient, but not worth sorting for this purpose though.
For begginers:
def AllDifferent(s):
for i in range(len(s)):
for i2 in range(len(s)):
if i != i2:
if s[i] == s[i2]:
return False
return True