I want to check if there is no identical entries in a list of list. If there are no identical matches, then return True, otherwise False.
For example:
[[1],[1,2],[1,2,3]] # False
[[1,2,3],[10,20,30]] # True
I am thinking of combine all of the entries into one list,
for example: change [[1,2,3][4,5,6]] into [1,2,3,4,5,6] and then check
Thanks for editing the question and helping me!
>>> def flat_unique(list_of_lists):
... flat = [element for sublist in list_of_lists for element in sublist]
... return len(flat) == len(set(flat))
...
>>> flat_unique([[1],[1,2],[1,2,3]])
False
>>> flat_unique([[1,2,3],[10,20,30]])
True
We can use itertools.chain.from_iterable and set built-in function.
import itertools
def check_iden(data):
return len(list(itertools.chain.from_iterable(data))) == len(set(itertools.chain.from_iterable(data)))
data1 = [[1],[1,2],[1,2,3]]
data2 = [[1,2,3],[10,20,30]]
print check_iden(data1)
print check_iden(data2)
Returns
False
True
You could use sets which have intersection methods to find which elements are common
Place all elements of each sublist into a separate list. If that separate list has any duplicates (call set() to find out), then return False. Otherwise return True.
def identical(x):
newX = []
for i in x:
for j in i:
newX.append(j)
if len(newX) == len(set(newX)): # if newX has any duplicates, the len(set(newX)) will be less than len(newX)
return True
return False
I think you can flat the list and count the element in it, then compare it with set()
import itertools
a = [[1],[1,2],[1,2,3]]
b = [[1,2,3],[10,20,30]]
def check(l):
merged = list(itertools.chain.from_iterable(l))
if len(set(merged)) < len(merged):
return False
else:
return True
print check(a) # False
print check(b) # True
Depending on your data you might not want to look at all the elements, here is a solution that returns False as soon as you hit a first duplicate.
def all_unique(my_lists):
my_set = set()
for sub_list in my_lists:
for ele in sub_list:
if ele in my_set:
return False
my_set.add(ele)
else:
return True
Result:
In [12]: all_unique([[1,2,3],[10,20,30]])
Out[12]: True
In [13]: all_unique([[1],[1,2],[1,2,3]])
Out[13]: False
Using this method will make the boolean variable "same" turn to True if there is a number in your list that occurs more than once as the .count() function returns you how many time a said number was found in the list.
li = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
same = False
for nbr in li:
if li.count(nbr) > 1:
same = True
Related
How would I check if the first digits in each element in a list are the same?
for i in range(0,len(lst)-1):
if lst[i] == lst[i+1]:
return True
I know that this checks for if the number before is equal to the next number in the list, but I just want to focus on the first digit.
You can use math.log10 and floor division to calculate the first digit. Then use all with a generator expression and zip to test adjacent elements sequentially:
from math import log10
def get_first(x):
return x // 10**int(log10(x))
L = [12341, 1765, 1342534, 176845, 1]
res = all(get_first(i) == get_first(j) for i, j in zip(L, L[1:])) # True
For an explanation of how this construct works, see this related answer. You can apply the same logic via a regular for loop:
def check_first(L):
for i, j in zip(L, L[1:]):
if get_first(i) != get_first(j):
return False
return True
res = check_first(L) # True
Use all() as a generator for the first character(s) of your numbers:
>>> l = [1, 10, 123]
>>> all(str(x)[0] == str(l[0])[0] for x in l)
True
The list comprehension
>>> [str(x)[0] for x in l]
creates a list
['1', '1', '1']
which sounds as if this should be enough. But all processes boolean values, and the boolean value of a string is always True, except when the string is empty. That means that it would also consider ['1','2','3'] to be True. You need to add a comparison against a constant value -- I picked the first item from the original list:
>>> [str(x)[0] == str(l[0])[0] for x in l]
[True, True, True]
whereas a list such as [1,20,333] would show
['1', '2', '3']
and
[True, False, False]
You can adjust it for a larger numbers of digits as well:
>>> all(str(x)[:2] == str(l[0])[:2] for x in l)
False
>>> l = [12,123,1234]
>>> all(str(x)[:2] == str(l[0])[:2] for x in l)
True
You could do something like this:
lst = [12, 13, 14]
def all_equals(l):
return len(set(e[0] for e in map(str, l))) == 1
print all_equals(lst)
Output
True
Explanation
The function map(str, l) converts all elements in the list to string then with (e[0] for e in map(str, l)) get the first digit of all the elements using a generator expression. Finally feed the generator into the set function this will remove all duplicates, finally you have to check if the length of the set is 1, meaning that all elements were duplicates.
For a boolean predicate on a list like this, you want a solution that returns False as soon as a conflict is found -- solutions that convert the entire list just to find the first and second item didn't match aren't good algorithms. Here's one approach:
def all_same_first(a):
return not a or all(map(lambda b, c=str(a[0])[0]: str(b)[0] == c, a[1:]))
Although at first glance this might appear to violate what I said above, the map function is lazy and so only hands the all function what it needs as it needs it, so as soon as some element doesn't match the first (initial-digit-wise) the boolean result is returned and the rest of the list isn't processed.
Going back to your original code:
this checks for if the number before is equal to the next
number in the list
for i in range(0,len(lst)-1):
if lst[i] == lst[i+1]:
return True
This doesn't work, as you claim. To work properly, it would need to do:
for i in range(0, len(lst) - 1):
if lst[i] != lst[i + 1]:
return False
return True
Do you see the difference?
I got 2 lists:
alist = ['A','B','C','D']
anotherList = ['A','C','B','D']
would like to write a function which returns True if both lists contain the exact same elements, and are same length. I'm kinda new on this stuff, so I got this, which I'm pretty sure it's terrible, and I'm trying to find a more efficient way. Thanks!
def smyFunction(aList,anotherList):
n = 0
for element in aList:
if element in anotherList:
n = n+1
if n == len(aList):
return True
else:
return False
The two ways that come to mind are:
1) Use collections.Counter
>>> from collections import Counter
>>> Counter(alist) == Counter(anotherList)
True
2) Compare the sorted lists
>>> sorted(alist) == sorted(anotherList)
True
Sort the lists with sorted and then compare them with ==:
>>> alist = ['A','B','C','D']
>>> anotherList = ['A','C','B','D']
>>> def smyFunction(aList,anotherList):
... return sorted(aList) == sorted(anotherList)
...
>>> smyFunction(alist, anotherList)
True
>>>
You need to sort them first in case the elements are out of order:
>>> alist = ['A','B','C','D']
>>> anotherList = ['D','A','C','B']
>>> alist == anotherList
False
>>> sorted(alist) == sorted(anotherList)
True
>>>
Actually, it would probably be better to test the length of the lists first and then use sorted:
return len(alist) == len(anotherList) and sorted(alist) == sorted(anotherList)
That way, we can avoid the sorting operations if the lengths of the list are different to begin with (using len on a list has O(1) (constant) complexity, so it is very cheap).
If there aren't duplicates, use a set, it doesn't have an order:
set(alist) == set(anotherList)
try like this:
def check(a,b):
return sorted(a) == sorted(b)
I would like to search for numbers in existing list. If is one of this numbers repeated then set variable's value to true and break for loop.
list = [3, 5, 3] //numbers in list
So if the function gets two same numbers then break for - in this case there is 3 repeated.
How to do that?
First, don't name your list list. That is a Python built-in, and using it as a variable name can give undesired side effects. Let's call it L instead.
You can solve your problem by comparing the list to a set version of itself.
Edit: You want true when there is a repeat, not the other way around. Code edited.
def testlist(L):
return sorted(set(L)) != sorted(L)
You could look into sets. You loop through your list, and either add the number to a support set, or break out the loop.
>>> l = [3, 5, 3]
>>> s = set()
>>> s
set([])
>>> for x in l:
... if x not in s:
... s.add(x)
... else:
... break
You could also take a step further and make a function out of this code, returning the first duplicated number you find (or None if the list doesn't contain duplicates):
def get_first_duplicate(l):
s = set()
for x in l:
if x not in s:
s.add(x)
else:
return x
get_first_duplicate([3, 5, 3])
# returns 3
Otherwise, if you want to get a boolean answer to the question "does this list contain duplicates?", you can return it instead of the duplicate element:
def has_duplicates(l):
s = set()
for x in l:
if x not in s:
s.add(x)
else:
return true
return false
get_first_duplicate([3, 5, 3])
# returns True
senderle pointed out:
there's an idiom that people sometimes use to compress this logic into a couple of lines. I don't necessarily recommend it, but it's worth knowing:
s = set(); has_dupe = any(x in s or s.add(x) for x in l)
you can use collections.Counter() and any():
>>> lis=[3,5,3]
>>> c=Counter(lis)
>>> any(x>1 for x in c.values()) # True means yes some value is repeated
True
>>> lis=range(10)
>>> c=Counter(lis)
>>> any(x>1 for x in c.values()) # False means all values only appeared once
False
or use sets and match lengths:
In [5]: lis=[3,3,5]
In [6]: not (len(lis)==len(set(lis)))
Out[6]: True
In [7]: lis=range(10)
In [8]: not (len(lis)==len(set(lis)))
Out[8]: False
You should never give the name list to a variable - list is a type in Python, and you can give yourself all kinds of problems masking built-in names like that. Give it a descriptive name, like numbers.
That said ... you could use a set to keep track of which numbers you've already seen:
def first_double(seq):
"""Return the first item in seq that appears twice."""
found = set()
for item in seq:
if item in found:
return item
# return will terminate the function, so no need for 'break'.
else:
found.add(item)
numbers = [3, 5, 3]
number = first_double(numbers)
without additional memory:
any(l.count(x) > 1 for x in l)
Is there a way to find if a list contains duplicates. For example:
list1 = [1,2,3,4,5]
list2 = [1,1,2,3,4,5]
list1.*method* = False # no duplicates
list2.*method* = True # contains duplicates
If you convert the list to a set temporarily, that will eliminate the duplicates in the set. You can then compare the lengths of the list and set.
In code, it would look like this:
list1 = [...]
tmpSet = set(list1)
haveDuplicates = len(list1) != len(tmpSet)
Convert the list to a set to remove duplicates. Compare the lengths of the original list and the set to see if any duplicates existed.
>>> list1 = [1,2,3,4,5]
>>> list2 = [1,1,2,3,4,5]
>>> len(list1) == len(set(list1))
True # no duplicates
>>> len(list2) == len(set(list2))
False # duplicates
Check if the length of the original list is larger than the length of the unique "set" of elements in the list. If so, there must have been duplicates
list1 = [1,2,3,4,5]
list2 = [1,1,2,3,4,5]
if len(list1) != len(set(list1)):
#duplicates
The set() approach only works for hashable objects, so for completness, you could do it with just plain iteration:
import itertools
def has_duplicates(iterable):
"""
>>> has_duplicates([1,2,3])
False
>>> has_duplicates([1, 2, 1])
True
>>> has_duplicates([[1,1], [3,2], [4,3]])
False
>>> has_duplicates([[1,1], [3,2], [4,3], [4,3]])
True
"""
return any(x == y for x, y in itertools.combinations(iterable, 2))
What is the best way (best as in the conventional way) of checking whether all elements in a list are unique?
My current approach using a Counter is:
>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
if values > 1:
# do something
Can I do better?
Not the most efficient, but straight forward and concise:
if len(x) > len(set(x)):
pass # do something
Probably won't make much of a difference for short lists.
Here is a two-liner that will also do early exit:
>>> def allUnique(x):
... seen = set()
... return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False
If the elements of x aren't hashable, then you'll have to resort to using a list for seen:
>>> def allUnique(x):
... seen = list()
... return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False
An early-exit solution could be
def unique_values(g):
s = set()
for x in g:
if x in s: return False
s.add(x)
return True
however for small cases or if early-exiting is not the common case then I would expect len(x) != len(set(x)) being the fastest method.
for speed:
import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)
How about adding all the entries to a set and checking its length?
len(set(x)) == len(x)
Alternative to a set, you can use a dict.
len({}.fromkeys(x)) == len(x)
Another approach entirely, using sorted and groupby:
from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))
It requires a sort, but exits on the first repeated value.
Here is a recursive O(N2) version for fun:
def is_unique(lst):
if len(lst) > 1:
return is_unique(s[1:]) and (s[0] not in s[1:])
return True
Here is a recursive early-exit function:
def distinct(L):
if len(L) == 2:
return L[0] != L[1]
H = L[0]
T = L[1:]
if (H in T):
return False
else:
return distinct(T)
It's fast enough for me without using weird(slow) conversions while
having a functional-style approach.
All answer above are good but I prefer to use all_unique example from 30 seconds of python
You need to use set() on the given list to remove duplicates, compare its length with the length of the list.
def all_unique(lst):
return len(lst) == len(set(lst))
It returns True if all the values in a flat list are unique, False otherwise.
x = [1, 2, 3, 4, 5, 6]
y = [1, 2, 2, 3, 4, 5]
all_unique(x) # True
all_unique(y) # False
I've compared the suggested solutions with perfplot and found that
len(lst) == len(set(lst))
is indeed the fastest solution. If there are early duplicates in the list, there are some constant-time solutions which are to be preferred.
Code to reproduce the plot:
import perfplot
import numpy as np
import pandas as pd
def len_set(lst):
return len(lst) == len(set(lst))
def set_add(lst):
seen = set()
return not any(i in seen or seen.add(i) for i in lst)
def list_append(lst):
seen = list()
return not any(i in seen or seen.append(i) for i in lst)
def numpy_unique(lst):
return np.unique(lst).size == len(lst)
def set_add_early_exit(lst):
s = set()
for item in lst:
if item in s:
return False
s.add(item)
return True
def pandas_is_unique(lst):
return pd.Series(lst).is_unique
def sort_diff(lst):
return not np.any(np.diff(np.sort(lst)) == 0)
b = perfplot.bench(
setup=lambda n: list(np.arange(n)),
title="All items unique",
# setup=lambda n: [0] * n,
# title="All items equal",
kernels=[
len_set,
set_add,
list_append,
numpy_unique,
set_add_early_exit,
pandas_is_unique,
sort_diff,
],
n_range=[2**k for k in range(18)],
xlabel="len(lst)",
)
b.save("out.png")
b.show()
How about this
def is_unique(lst):
if not lst:
return True
else:
return Counter(lst).most_common(1)[0][1]==1
If and only if you have the data processing library pandas in your dependencies, there's an already implemented solution which gives the boolean you want :
import pandas as pd
pd.Series(lst).is_unique
You can use Yan's syntax (len(x) > len(set(x))), but instead of set(x), define a function:
def f5(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
and do len(x) > len(f5(x)). This will be fast and is also order preserving.
Code there is taken from: http://www.peterbe.com/plog/uniqifiers-benchmark
Using a similar approach in a Pandas dataframe to test if the contents of a column contains unique values:
if tempDF['var1'].size == tempDF['var1'].unique().size:
print("Unique")
else:
print("Not unique")
For me, this is instantaneous on an int variable in a dateframe containing over a million rows.
It does not fully fit the question but if you google the task I had you get this question ranked first and it might be of interest to the users as it is an extension of the quesiton. If you want to investigate for each list element if it is unique or not you can do the following:
import timeit
import numpy as np
def get_unique(mylist):
# sort the list and keep the index
sort = sorted((e,i) for i,e in enumerate(mylist))
# check for each element if it is similar to the previous or next one
isunique = [[sort[0][1],sort[0][0]!=sort[1][0]]] + \
[[s[1], (s[0]!=sort[i-1][0])and(s[0]!=sort[i+1][0])]
for [i,s] in enumerate (sort) if (i>0) and (i<len(sort)-1) ] +\
[[sort[-1][1],sort[-1][0]!=sort[-2][0]]]
# sort indices and booleans and return only the boolean
return [a[1] for a in sorted(isunique)]
def get_unique_using_count(mylist):
return [mylist.count(item)==1 for item in mylist]
mylist = list(np.random.randint(0,10,10))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)
mylist = list(np.random.randint(0,1000,1000))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)
for short lists the get_unique_using_count as suggested in some answers is fast. But if your list is already longer than 100 elements the count function takes quite long. Thus the approach shown in the get_unique function is much faster although it looks more complicated.
If the list is sorted anyway, you can use:
not any(sorted_list[i] == sorted_list[i + 1] for i in range(len(sorted_list) - 1))
Pretty efficient, but not worth sorting for this purpose though.
For begginers:
def AllDifferent(s):
for i in range(len(s)):
for i2 in range(len(s)):
if i != i2:
if s[i] == s[i2]:
return False
return True