Python methods to find duplicates - python

Is there a way to find if a list contains duplicates. For example:
list1 = [1,2,3,4,5]
list2 = [1,1,2,3,4,5]
list1.*method* = False # no duplicates
list2.*method* = True # contains duplicates

If you convert the list to a set temporarily, that will eliminate the duplicates in the set. You can then compare the lengths of the list and set.
In code, it would look like this:
list1 = [...]
tmpSet = set(list1)
haveDuplicates = len(list1) != len(tmpSet)

Convert the list to a set to remove duplicates. Compare the lengths of the original list and the set to see if any duplicates existed.
>>> list1 = [1,2,3,4,5]
>>> list2 = [1,1,2,3,4,5]
>>> len(list1) == len(set(list1))
True # no duplicates
>>> len(list2) == len(set(list2))
False # duplicates

Check if the length of the original list is larger than the length of the unique "set" of elements in the list. If so, there must have been duplicates
list1 = [1,2,3,4,5]
list2 = [1,1,2,3,4,5]
if len(list1) != len(set(list1)):
#duplicates

The set() approach only works for hashable objects, so for completness, you could do it with just plain iteration:
import itertools
def has_duplicates(iterable):
"""
>>> has_duplicates([1,2,3])
False
>>> has_duplicates([1, 2, 1])
True
>>> has_duplicates([[1,1], [3,2], [4,3]])
False
>>> has_duplicates([[1,1], [3,2], [4,3], [4,3]])
True
"""
return any(x == y for x, y in itertools.combinations(iterable, 2))

Related

How to concatenate string values in two lists without creating duplicates?

I have two lists that I am trying to join. They two become one. A partial match between the values is used to merge the values.
list1 = ["15:09.123", "15:09.234", "15:09.522", "15:09.621", "15:10.123", "15:11.123", "15:12.123", "15:12.987"]
list2 = ["15:09", "15:09", "15:10", "15:14"]
final = []
for each in list2:
for each1 in list1:
if each in each1:
eachtemp = each1.split(".")[1]
final.append(each+"."+eachtemp)
list1.remove(each1)
print(final)
This produces an output
['15:09.123', '15:09.522', '15:09.234', '15:10.123']
But ideal output I want is
['15:09.123', '15:09.234', '15:10.123', '15:14.000']
It should not contain "15:09.522", "15:09.621" even though they have partially matching elements in the list1. The final list should contain all elements in list2 with 3 more digits acquired from list1 and final list should be the same length as list2.
I might suggest using an iterator over list1 and consuming elements from it as you iterate over list2 (renaming these ms_list and s_list to make it easier to keep track of which is which):
>>> ms_list = ["15:09.123", "15:09.234", "15:09.522", "15:09.621", "15:10.123", "15:11.123", "15:12.123", "15:12.987"]
>>> s_list = ["15:09", "15:09", "15:10", "15:14"]
>>> final = []
>>> ms_iter = iter(ms_list)
>>> for s in s_list:
... default = f"{s}.000"
... ms = next(ms_iter, default)
... while not ms.startswith(s) and ms < s:
... ms = next(ms_iter, default)
... final.append(ms if ms.startswith(s) else default)
...
>>> final
['15:09.123', '15:09.234', '15:10.123', '15:14.000']

Easier way to check if an item from one list of tuples doesn't exist in another list of tuples in python

I have two lists of tuples, say,
list1 = [('item1',),('item2',),('item3',), ('item4',)] # Contains just one item per tuple
list2 = [('item1', 'd',),('item2', 'a',),('item3', 'f',)] # Contains multiple items per tuple
Expected output: 'item4' # Item that doesn't exist in list2
As shown in above example I want to check which item in tuples in list 1 does not exist in first index of tuples in list 2. What is the easiest way to do this without running two for loops?
Assuming your tuple structure is exactly as shown above, this would work:
tuple(set(x[0] for x in list1) - set(x[0] for x in list2))
or per #don't talk just code, better as set comprehensions:
tuple({x[0] for x in list1} - {x[0] for x in list2})
result:
('item4',)
This gives you {'item4'}:
next(zip(*list1)) - dict(list2).keys()
The next(zip(*list1)) gives you the tuple ('item1', 'item2', 'item3', 'item4').
The dict(list2).keys() gives you dict_keys(['item1', 'item2', 'item3']), which happily offers you set operations like that set difference.
Try it online!
This is the only way I can think of doing it, not sure if it helps though. I removed the commas in the items in list1 because I don't see why they are there and it affects the code.
list1 = [('item1'),('item2'),('item3'), ('item4')] # Contains just one item per tuple
list2 = [('item1', 'd',),('item2', 'a',),('item3', 'f',)] # Contains multiple items per tuple
not_in_tuple = []
OutputTuple = [(a) for a, b in list2]
for i in list1:
if i in OutputTuple:
pass
else:
not_in_tuple.append(i)
for i in not_in_tuple:
print(i)
You don't really have a choice but to loop over the two lists. Once efficient way could be to first construct a set of the first elements of list2:
items = {e[0] for e in list2}
list3 = list(filter(lambda x:x[0] not in items, list1))
Output:
>>> list3
[('item4',)]
Try set.difference:
>>> set(next(zip(*list1))).difference(dict(list2))
{'item4'}
>>>
Or even better:
>>> set(list1) ^ {x[:1] for x in list2}
{('item4',)}
>>>
that is a difference operation for sets:
set1 = set(j[0] for j in list1)
set2 = set(j[0] for j in list2)
result = set1.difference(set2)
output:
{'item4'}
for i in list1:
a=i[0]
for j in list2:
b=j[0]
if a==b:
break
else:
print(a)

check identical for list in list(python)

I want to check if there is no identical entries in a list of list. If there are no identical matches, then return True, otherwise False.
For example:
[[1],[1,2],[1,2,3]] # False
[[1,2,3],[10,20,30]] # True
I am thinking of combine all of the entries into one list,
for example: change [[1,2,3][4,5,6]] into [1,2,3,4,5,6] and then check
Thanks for editing the question and helping me!
>>> def flat_unique(list_of_lists):
... flat = [element for sublist in list_of_lists for element in sublist]
... return len(flat) == len(set(flat))
...
>>> flat_unique([[1],[1,2],[1,2,3]])
False
>>> flat_unique([[1,2,3],[10,20,30]])
True
We can use itertools.chain.from_iterable and set built-in function.
import itertools
def check_iden(data):
return len(list(itertools.chain.from_iterable(data))) == len(set(itertools.chain.from_iterable(data)))
data1 = [[1],[1,2],[1,2,3]]
data2 = [[1,2,3],[10,20,30]]
print check_iden(data1)
print check_iden(data2)
Returns
False
True
You could use sets which have intersection methods to find which elements are common
Place all elements of each sublist into a separate list. If that separate list has any duplicates (call set() to find out), then return False. Otherwise return True.
def identical(x):
newX = []
for i in x:
for j in i:
newX.append(j)
if len(newX) == len(set(newX)): # if newX has any duplicates, the len(set(newX)) will be less than len(newX)
return True
return False
I think you can flat the list and count the element in it, then compare it with set()
import itertools
a = [[1],[1,2],[1,2,3]]
b = [[1,2,3],[10,20,30]]
def check(l):
merged = list(itertools.chain.from_iterable(l))
if len(set(merged)) < len(merged):
return False
else:
return True
print check(a) # False
print check(b) # True
Depending on your data you might not want to look at all the elements, here is a solution that returns False as soon as you hit a first duplicate.
def all_unique(my_lists):
my_set = set()
for sub_list in my_lists:
for ele in sub_list:
if ele in my_set:
return False
my_set.add(ele)
else:
return True
Result:
In [12]: all_unique([[1,2,3],[10,20,30]])
Out[12]: True
In [13]: all_unique([[1],[1,2],[1,2,3]])
Out[13]: False
Using this method will make the boolean variable "same" turn to True if there is a number in your list that occurs more than once as the .count() function returns you how many time a said number was found in the list.
li = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
same = False
for nbr in li:
if li.count(nbr) > 1:
same = True

Efficiently test if two lists have the same elements and length

I got 2 lists:
alist = ['A','B','C','D']
anotherList = ['A','C','B','D']
would like to write a function which returns True if both lists contain the exact same elements, and are same length. I'm kinda new on this stuff, so I got this, which I'm pretty sure it's terrible, and I'm trying to find a more efficient way. Thanks!
def smyFunction(aList,anotherList):
n = 0
for element in aList:
if element in anotherList:
n = n+1
if n == len(aList):
return True
else:
return False
The two ways that come to mind are:
1) Use collections.Counter
>>> from collections import Counter
>>> Counter(alist) == Counter(anotherList)
True
2) Compare the sorted lists
>>> sorted(alist) == sorted(anotherList)
True
Sort the lists with sorted and then compare them with ==:
>>> alist = ['A','B','C','D']
>>> anotherList = ['A','C','B','D']
>>> def smyFunction(aList,anotherList):
... return sorted(aList) == sorted(anotherList)
...
>>> smyFunction(alist, anotherList)
True
>>>
You need to sort them first in case the elements are out of order:
>>> alist = ['A','B','C','D']
>>> anotherList = ['D','A','C','B']
>>> alist == anotherList
False
>>> sorted(alist) == sorted(anotherList)
True
>>>
Actually, it would probably be better to test the length of the lists first and then use sorted:
return len(alist) == len(anotherList) and sorted(alist) == sorted(anotherList)
That way, we can avoid the sorting operations if the lengths of the list are different to begin with (using len on a list has O(1) (constant) complexity, so it is very cheap).
If there aren't duplicates, use a set, it doesn't have an order:
set(alist) == set(anotherList)
try like this:
def check(a,b):
return sorted(a) == sorted(b)

Get string of certain length out of a list

What I am after is like this:
list1 = ["well", "455", "antifederalist", "mooooooo"]
Something that pulls "455" from the list because of the number of characters.
You can use next() with a generator:
>>> list1 = ["well", "455", "antifederalist", "mooooooo"]
>>>
>>> next(s for s in list1 if len(s) == 3)
'455'
next() also lets you specify a "default" value to be returned if the list doesn't contain any string of length 3. For instance, to return None in such a case:
>>> list1 = ["well", "antifederalist", "mooooooo"]
>>>
>>> print next((s for s in list1 if len(s) == 3), None)
None
(I used an explicit print because Nones don't print by default in interactive mode.)
If you want all strings of length 3, you can easily turn the approach above into a list comprehension:
>>> [s for s in list1 if len(s) == 3]
['455']
filter(lambda s: len(s) == 3, list1)
And if you're looking to pull all items out of the list greater than some length:
list2 = [string for string in list1 if len(string) >= num_chars]

Categories