check for duplicates in a python list

check for duplicates in a python list - python

I've seen a lot of variations of this question from things as simple as remove duplicates to finding and listing duplicates. Even trying to take bits and pieces of these examples does not get me my result.
My question is how am I able to check if my list has a duplicate entry? Even better, does my list have a non-zero duplicate?
I've had a few ideas -
#empty list
myList = [None] * 9
#all the elements in this list are None
#fill part of the list with some values
myList[0] = 1
myList[3] = 2
myList[4] = 2
myList[5] = 4
myList[7] = 3
#coming from C, I attempt to use a nested for loop
j = 0
k = 0
for j in range(len(myList)):
for k in range(len(myList)):
if myList[j] == myList[k]:
print "found a duplicate!"
return
If this worked, it would find the duplicate (None) in the list. Is there a way to ignore the None or 0 case? I do not care if two elements are 0.
Another solution I thought of was turn the list into a set and compare the lengths of the set and list to determine if there is a duplicate but when running set(myList) it not only removes duplicates, it orders it as well. I could have separate copies, but it seems redundant.

Try changing the actual comparison line to this:
if myList[j] == myList[k] and not myList[j] in [None, 0]:

I'm not certain if you are trying to ascertain whether or a duplicate exists, or identify the items that are duplicated (if any). Here is a Counter-based solution for the latter:
# Python 2.7
from collections import Counter
#
# Rest of your code
#
counter = Counter(myList)
dupes = [key for (key, value) in counter.iteritems() if value > 1 and key]
print dupes
The Counter object will automatically count occurances for each item in your iterable list. The list comprehension that builds dupes essentially filters out all items appearing only once, and also upon items whose boolean evaluation are False (this would filter out both 0 and None).
If your purpose is only to identify that duplication has taken place (without enumerating which items were duplicated), you could use the same method and test dupes:
if dupes: print "Something in the list is duplicated"

If you simply want to check if it contains duplicates. Once the function finds an element that occurs more than once, it returns as a duplicate.
my_list = [1, 2, 2, 3, 4]
def check_list(arg):
for i in arg:
if arg.count(i) > 1:
return 'Duplicate'
print check_list(my_list) == 'Duplicate' # prints True

To remove dups and keep order ignoring 0 and None, if you have other falsey values that you want to keep you will need to specify is not None and not 0:
print [ele for ind, ele in enumerate(lst[:-1]) if ele not in lst[:ind] or not ele]
If you just want the first dup:
for ind, ele in enumerate(lst[:-1]):
if ele in lst[ind+1:] and ele:
print(ele)
break
Or store seen in a set:
seen = set()
for ele in lst:
if ele in seen:
print(ele)
break
if ele:
seen.add(ele)

You can use collections.defaultdict and specify a condition, such as non-zero / Truthy, and specify a threshold. If the count for a particular value exceeds the threshold, the function will return that value. If no such value exists, the function returns False.
from collections import defaultdict
def check_duplicates(it, condition, thresh):
dd = defaultdict(int)
for value in it:
dd[value] += 1
if condition(value) and dd[value] > thresh:
return value
return False
L = [1, None, None, 2, 2, 4, None, 3, None]
res = check_duplicates(L, condition=bool, thresh=1) # 2
Note in the above example the function bool will not consider 0 or None for threshold breaches. You could also use, for example, lambda x: x != 1 to exclude values equal to 1.

In my opinion, this is the simplest solution I could come up with. this should work with any list. The only downside is that it does not count the number of duplicates, but instead just returns True or False
for k, j in mylist:
return k == j

Here's a bit of code that will show you how to remove None and 0 from the sets.
l1 = [0, 1, 1, 2, 4, 7, None, None]
l2 = set(l1)
l2.remove(None)
l2.remove(0)

Related

Remove duplicates without using list mutation

I am trying to remove adjacent duplicates from a list without using list mutations like del or remove. Below is the code I tried:
def remove_dups(L):
L = [x for x in range(0,len(L)) if L[x] != L[x-1]]
return L
print(remove_dups([1,2,2,3,3,3,4,5,1,1,1]))
This outputs:
[1, 3, 6, 7, 8]
Can anyone explain me how this output occurred? I want to understand the flow but I wasn't able to do it even with debugging in VS code.
Input:
[1,2,2,3,3,3,4,5,1,1,1]
Expected output:
[1,2,3,4,5,1]

I'll replace the variables to make this more readable
def remove_dups(L):
L = [x for x in range(0,len(L)) if L[x] != L[x-1]]
becomes:
def remove_dups(lst):
return [index for index in range(len(lst)) if lst[index] != lst[index-1]]
You can see, instead of looping over the items of the list it is instead looping over the indices of the array comparing the value at one index lst[index] to the value at the previous index lst[index-1] and only migrating/copying the value if they don't match
The two main issues are:
the first index it is compared to is -1 which is the last item of the list (compared to the first)
this is actually returning the indices of the non-duplicated items.
To make this work, I'd use the enumerate function which returns the item and it's index as follows:
def remove_dups(lst):
return [item for index, item in enumerate(lst[:-1]) if item != lst[index+1]] + [lst[-1]]
Here what I'm doing is looping through all of the items except for the last one [:-1] and checking if the item matches the next item, only adding it if it doesn't
Finally, because the last value isn't read we append it to the output + [lst[-1]].

This is a job for itertools.groupby:
from itertools import groupby
def remove_dups(L):
return [k for k,g in groupby(L)]
L2 = remove_dups([1,2,2,3,3,3,4,5,1,1,1])
Output: [1, 2, 3, 4, 5, 1]

Identify a single difference in a python list

I would have to get some help concerning a part of my code.
I have some python list, example:
list1 = (1,1,1,1,1,1,5,1,1,1)
list2 = (6,7,4,4,4,1,6,7,6)
list3 = (8,8,8,8,9)
I would like, for each list, know if there is a single value that is different compare to every other values if and only if all of these other values are the same. For example, in the list1, it would identify "5" as a different value, in list2 it would identify nothing as there are more than 2 different values and in list3 it would identify "9"
What i already did is :
for i in list1:
if list1(i)==len(list1)-1
print("One value identified")
The problem is that i get "One value identified" as much time as "1" is present in my list ...
But what i would like to have is an output like that :
The most represented value equal to len(list1)-1 (Here "1")
The value that is present only once (Here "5")
The position in the list where the "5"

You could use something like that:
def odd_one_out(lst):
s = set(lst)
if len(s)!=2: # see comment (1)
return False
else:
return any(lst.count(x)==1 for x in s) # see comment (2)
which for the examples you provided, yields:
print(odd_one_out(list1)) # True
print(odd_one_out(list2)) # False
print(odd_one_out(list3)) # True
To explain the code I would use the first example list you provided [1,1,1,1,1,1,5,1,1,1].
(1) converting to set removes all the duplicate values from your list thus leaving you with {1, 5} (in no specific order). If the length of this set is anything other than 2 your list does not fulfill your requirements so False is returned
(2) Assuming the set does have a length of 2, what we need to check next is that at least one of the values it contains appear only once in the original list. That is what this any does.

You can use the built-in Counter from High-performance container datatypes :
from collections import Counter
def is_single_diff(iterable):
c = Counter(iterable)
non_single_items = list(filter(lambda x: c[x] > 1, c))
return len(non_single_items) == 1
Tests
list1 = (1,1,1,1,1,1,5,1,1,1)
list2 = (6,7,4,4,4,1,6,7,6)
list3 = (8,8,8,8,9)
In: is_single_diff(list1)
Out: True
In: is_single_diff(list2)
Out: False
In: is_single_diff(list3)
Out: True

Use numpy unique, it will give you all the information you need.
myarray = np.array([1,1,1,1,1,1,5,1,1,1])
vals_unique,vals_counts = np.unique(myarray,return_counts=True)

You can first check for the most common value. After that, go through the list to see if there is a different value, and keep track of it.
If you later find another value that isn't the same as the most common one, the list does not have a single difference.
list1 = [1,1,1,1,1,1,5,1,1,1]
def single_difference(lst):
most_common = max(set(lst), key=lst.count)
diff_idx = None
diff_val = None
for idx, i in enumerate(lst):
if i != most_common:
if diff_val is not None:
return "No unique single difference"
diff_idx = idx
diff_val = i
return (most_common, diff_val, diff_idx)
print(single_difference(list1))

searching a list for duplicate values in python 2.7

list1 = ["green","red","yellow","purple","Green","blue","blue"]
so I got a list, I want to loop through the list and see if the colour has not been mentioned more than once. If it has it doesn't get append to the new list.
so u should be left with
list2 = ["red","yellow","purple"]
So i've tried this
list1 = ["green","red","yellow","purple","Green","blue","blue","yellow"]
list2 =[]
num = 0
for i in list1:
if (list1[num]).lower == (list1[i]).lower:
num +=1
else:
list2.append(i)
num +=1
but i keep getting an error

Use a Counter: https://docs.python.org/2/library/collections.html#collections.Counter.
from collections import Counter
list1 = ["green","red","yellow","purple","green","blue","blue","yellow"]
list1_counter = Counter([x.lower() for x in list1])
list2 = [x for x in list1 if list1_counter[x.lower()] == 1]
Note that your example is wrong because yellow is present twice.

The first problem is that for i in list1: iterates over the elements in the list, not indices, so with i you have an element in your hands.
Next there is num which is an index, but you seem to increment it rather the wrong way.
I would suggest that you use the following code:
for i in range(len(list1)):
unique = True
for j in range(len(list1)):
if i != j and list1[i].lower() == list1[j].lower():
unique = False
break
if unique:
list2.append(list1[i])
How does this work: here i and j are indices, you iterate over the list and with i you iterate over the indices of element you potentially want to add, now you do a test: you check if somewhere in the list you see another element that is equal. If so, you set unique to False and do it for the next element, otherwise you add.
You can also use a for-else construct like #YevhenKuzmovych says:
for i in range(len(list1)):
for j in range(len(list1)):
if i != j and list1[i].lower() == list1[j].lower():
break
else:
list2.append(list1[i])
You can make the code more elegant by using an any:
for i in range(len(list1)):
if not any(i != j and list1[i].lower() == list1[j].lower() for j in range(len(list1))):
list2.append(list1[i])
Now this is more elegant but still not very efficient. For more efficiency, you can use a Counter:
from collections import Counter
ctr = Counter(x.lower() for x in list1)
Once you have constructed the counter, you look up the amount of times you have seen the element and if it is less than 2, you add it to the list:
from collections import Counter
ctr = Counter(x.lower() for x in list1)
for element in list1:
if ctr[element.lower()] < 2:
list2.append(element)
Finally you can now even use list comprehension to make it very elegantly:
from collections import Counter
ctr = Counter(x.lower() for x in list1)
list2 = [element for element in list1 if ctr[element.lower()] < 2]

Here's another solution:
list2 = []
for i, element in enumerate(list1):
if element.lower() not in [e.lower() for e in list1[:i] + list1[i + 1:]]:
list2.append(element)

By combining the built-ins, you can keep only unique items and preserve order of appearance, with just a single empty class definition and a one-liner:
from future_builtins import map # Only on Python 2 to get generator based map
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
pass
list1 = ["green","red","yellow","purple","Green","blue","blue"]
# On Python 2, use .iteritems() to avoid temporary list
list2 = [x for x, cnt in OrderedCounter(map(str.lower, list1)).items() if cnt == 1]
# Result: ['red', 'yellow', 'purple']
You use the Counter feature to count an iterable, while inheriting from OrderedDict preserves key order. All you have to do is filter the results to check and strip the counts, reducing your code complexity significantly.
It also reduces the work to one pass of the original list, with a second pass that does work proportional to the number of unique items in the list, rather than having to perform two passes of the original list (important if duplicates are common and the list is huge).

Here's yet another solution...
Firstly, before I get started, I think there is a typo... You have "yellow" listed twice and you specified you'd have yellow in the end. So, to that, I will provide two scripts, one that allows the duplicates and one that doesn't.
Original:
list1 = ["green","red","yellow","purple","Green","blue","blue","yellow"]
list2 =[]
num = 0
for i in list1:
if (list1[num]).lower == (list1[i]).lower:
num +=1
else:
list2.append(i)
num +=1
Modified (does allow duplicates):
colorlist_1 = ["green","red","yellow","purple","Green","blue","blue","yellow"]
colorlist_2 = []
seek = set()
i = 0
numberOfColors = len(colorlist_1)
while i < numberOfColors:
if numberOfColors[i].lower() not in seek:
seek.add(numberOfColors[i].lower())
colorlist_2.append(numberOfColors[i].lower())
i+=1
print(colorlist_2)
# prints ["green","red","yellow","purple","blue"]
Modified (does not allow duplicates):
EDIT
Willem Van Onsem's answer is totally applicable and already thorough.

How to find second smallest UNIQUE number in a list?

I need to create a function that returns the second smallest unique number, which means if
list1 = [5,4,3,2,2,1], I need to return 3, because 2 is not unique.
I've tried:
def second(list1):
result = sorted(list1)[1]
return result
and
def second(list1):
result = list(set((list1)))
return result
but they all return 2.
EDIT1:
Thanks guys! I got it working using this final code:
def second(list1):
b = [i for i in list1 if list1.count(i) == 1]
b.sort()
result = sorted(b)[1]
return result
EDIT 2:
Okay guys... really confused. My Prof just told me that if list1 = [1,1,2,3,4], it should return 2 because 2 is still the second smallest number, and if list1 = [1,2,2,3,4], it should return 3.
Code in eidt1 wont work if list1 = [1,1,2,3,4].
I think I need to do something like:
if duplicate number in position list1[0], then remove all duplicates and return second number.
Else if duplicate number postion not in list1[0], then just use the code in EDIT1.

Without using anything fancy, why not just get a list of uniques, sort it, and get the second list item?
a = [5,4,3,2,2,1] #second smallest is 3
b = [i for i in a if a.count(i) == 1]
b.sort()
>>>b[1]
3
a = [5,4,4,3,3,2,2,1] #second smallest is 5
b = [i for i in a if a.count(i) == 1]
b.sort()
>>> b[1]
5
Obviously you should test that your list has at least two unique numbers in it. In other words, make sure b has a length of at least 2.

Remove non unique elements - use sort/itertools.groupby or collections.Counter
Use min - O(n) to determine the minimum instead of sort - O(nlongn). (In any case if you are using groupby the data is already sorted) I missed the fact that OP wanted the second minimum, so sorting is still a better option here
Sample Code
Using Counter
>>> sorted(k for k, v in Counter(list1).items() if v == 1)[1]
1
Using Itertools
>>> sorted(k for k, g in groupby(sorted(list1)) if len(list(g)) == 1)[1]
3

Here's a fancier approach that doesn't use count (which means it should have significantly better performance on large datasets).
from collections import defaultdict
def getUnique(data):
dd = defaultdict(lambda: 0)
for value in data:
dd[value] += 1
result = [key for key in dd.keys() if dd[key] == 1]
result.sort()
return result
a = [5,4,3,2,2,1]
b = getUnique(a)
print(b)
# [1, 3, 4, 5]
print(b[1])
# 3

Okay guys! I got the working code thanks to all your help and helping me to think on the right track. This code works:
`def second(list1):
if len(list1)!= len(set(list1)):
result = sorted(list1)[2]
return result
elif len(list1) == len(set(list1)):
result = sorted(list1)[1]
return result`

Okay, here usage of set() on a list is not going to help. It doesn't purge the duplicated elements. What I mean is :
l1=[5,4,3,2,2,1]
print set(l1)
Prints
[0, 1, 2, 3, 4, 5]
Here, you're not removing the duplicated elements, but the list gets unique
In your example you want to remove all duplicated elements.
Try something like this.
l1=[5,4,3,2,2,1]
newlist=[]
for i in l1:
if l1.count(i)==1:
newlist.append(i)
print newlist
This in this example prints
[5, 4, 3, 1]
then you can use heapq to get your second largest number in your list, like this
print heapq.nsmallest(2, newlist)[-1]
Imports : import heapq, The above snippet prints 3 for you.
This should to the trick. Cheers!

How to get item's position in a list?

I am iterating over a list and I want to print out the index of the item if it meets a certain condition. How would I do this?
Example:
testlist = [1,2,3,5,3,1,2,1,6]
for item in testlist:
if item == 1:
print position

Hmmm. There was an answer with a list comprehension here, but it's disappeared.
Here:
[i for i,x in enumerate(testlist) if x == 1]
Example:
>>> testlist
[1, 2, 3, 5, 3, 1, 2, 1, 6]
>>> [i for i,x in enumerate(testlist) if x == 1]
[0, 5, 7]
Update:
Okay, you want a generator expression, we'll have a generator expression. Here's the list comprehension again, in a for loop:
>>> for i in [i for i,x in enumerate(testlist) if x == 1]:
... print i
...
0
5
7
Now we'll construct a generator...
>>> (i for i,x in enumerate(testlist) if x == 1)
<generator object at 0x6b508>
>>> for i in (i for i,x in enumerate(testlist) if x == 1):
... print i
...
0
5
7
and niftily enough, we can assign that to a variable, and use it from there...
>>> gen = (i for i,x in enumerate(testlist) if x == 1)
>>> for i in gen: print i
...
0
5
7
And to think I used to write FORTRAN.

What about the following?
print testlist.index(element)
If you are not sure whether the element to look for is actually in the list, you can add a preliminary check, like
if element in testlist:
print testlist.index(element)
or
print(testlist.index(element) if element in testlist else None)
or the "pythonic way", which I don't like so much because code is less clear, but sometimes is more efficient,
try:
print testlist.index(element)
except ValueError:
pass

Use enumerate:
testlist = [1,2,3,5,3,1,2,1,6]
for position, item in enumerate(testlist):
if item == 1:
print position

for i in xrange(len(testlist)):
if testlist[i] == 1:
print i
xrange instead of range as requested (see comments).

Here is another way to do this:
try:
id = testlist.index('1')
print testlist[id]
except ValueError:
print "Not Found"

Try the below:
testlist = [1,2,3,5,3,1,2,1,6]
position=0
for i in testlist:
if i == 1:
print(position)
position=position+1

[x for x in range(len(testlist)) if testlist[x]==1]

If your list got large enough and you only expected to find the value in a sparse number of indices, consider that this code could execute much faster because you don't have to iterate every value in the list.
lookingFor = 1
i = 0
index = 0
try:
while i < len(testlist):
index = testlist.index(lookingFor,i)
i = index + 1
print index
except ValueError: #testlist.index() cannot find lookingFor
pass
If you expect to find the value a lot you should probably just append "index" to a list and print the list at the end to save time per iteration.

I think that it might be useful to use the curselection() method from thte Tkinter library:
from Tkinter import *
listbox.curselection()
This method works on Tkinter listbox widgets, so you'll need to construct one of them instead of a list.
This will return a position like this:
('0',) (although later versions of Tkinter may return a list of ints instead)
Which is for the first position and the number will change according to the item position.
For more information, see this page:
http://effbot.org/tkinterbook/listbox.htm
Greetings.

Why complicate things?
testlist = [1,2,3,5,3,1,2,1,6]
for position, item in enumerate(testlist):
if item == 1:
print position

Just to illustrate complete example along with the input_list which has searies1 (example: input_list[0]) in which you want to do a lookup of series2 (example: input_list[1]) and get indexes of series2 if it exists in series1.
Note: Your certain condition will go in lambda expression if conditions are simple
input_list = [[1,2,3,4,5,6,7],[1,3,7]]
series1 = input_list[0]
series2 = input_list[1]
idx_list = list(map(lambda item: series1.index(item) if item in series1 else None, series2))
print(idx_list)
output:
[0, 2, 6]

l = list(map(int,input().split(",")))
num = int(input())
for i in range(len(l)):
if l[i] == num:
print(i)
Explanation:
Taken a list of integer "l" (separated by commas) in line 1.
Taken a integer "num" in line 2.
Used for loop in line 3 to traverse inside the list and checking if numbers(of the list) meets the given number(num) then it will print the index of the number inside the list.

testlist = [1,2,3,5,3,1,2,1,6]
num = 1
for item in range(len(testlist)):
if testlist[item] == num:
print(item)

testlist = [1,2,3,5,3,1,2,1,6]
for id, value in enumerate(testlist):
if id == 1:
print testlist[id]
I guess that it's exacly what you want. ;-)
'id' will be always the index of the values on the list.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

check for duplicates in a python list - python

Try changing the actual comparison line to this: if myList[j] == myList[k] and not myList[j] in [None, 0]:

If you simply want to check if it contains duplicates. Once the function finds an element that occurs more than once, it returns as a duplicate. my_list = [1, 2, 2, 3, 4] def check_list(arg): for i in arg: if arg.count(i) > 1: return 'Duplicate' print check_list(my_list) == 'Duplicate' # prints True

In my opinion, this is the simplest solution I could come up with. this should work with any list. The only downside is that it does not count the number of duplicates, but instead just returns True or False for k, j in mylist: return k == j

Here's a bit of code that will show you how to remove None and 0 from the sets. l1 = [0, 1, 1, 2, 4, 7, None, None] l2 = set(l1) l2.remove(None) l2.remove(0)

Related

Remove duplicates without using list mutation

Identify a single difference in a python list

searching a list for duplicate values in python 2.7

How to find second smallest UNIQUE number in a list?

How to get item's position in a list?

Categories

Resources