How to find two same items in a python list - python

So I want to be able to check if a list in a python program contains a particular value written twice.
list_one = [a, b, c, d, e, f, g]
Here I want to check if two different items have the same value, for instance a == d

Iterate through your list to see how many times each value appears. If count() evaluates to more than 1, you know there are duplicates of the current value.
for i in list_one:
if i.count() > 1:
print("Duplicate: " + i)
If you want to compare two values directly, you can use their indices. For example, in the case of a == d:
if list_one[0] == list_one[3]:
print("a and d are equal.")
You can also use a function, like follows, to find all of the indexes that there are duplicates:
def getIndexPositions(listOfElements, element): #pass it the list and element to check for duplicates of
''' Returns the indexes of all occurrences of give element in
the list- listOfElements '''
indexPosList = []
indexPos = 0
while True:
try:
# Search for item in list from indexPos to the end of list
indexPos = listOfElements.index(element, indexPos)
# Add the index position in list
indexPosList.append(indexPos)
indexPos += 1
except ValueError as e:
break
return indexPosLis

This will check a list for duplicate values and produce a new list containing any values that were duplicated in the first list:
items = ['a', 'b', 'c', 'b', 'd', 'e', 'f', 'g', 'a', 'c']
items.sort()
dups = [
item1
for item1, item2 in zip(items[:-1], items[1:])
if item1 == item2
]

Related

Find the duplicated data from two different list [duplicate]

This question already has answers here:
How can I compare two lists in python and return matches [duplicate]
(21 answers)
Closed 1 year ago.
I would like to find the duplicated data from two different list. The result should show me which data is duplicated.
Here is the list of my data
List1 = ['a', 'e', 'i', 'o', 'u']
List2 = ['a', 'b', 'c', 'd', 'e']
The function
def list_contains(List1, List2):
# Iterate in the 1st list
for m in List1:
# Iterate in the 2nd list
for n in List2:
# if there is a match
if m == n:
return m
return m
list_contains(List1,List2)
I get the result is 'a' but the result I expected is 'a','e'.
set(List1).intersection(set(List2))
Issue with your code:
return statement will exit from the function, so at the first match it will just return with the first match. Since you need all the matches not just the first match you need to capture them into say a list and then return rather then returning at the fist match.
Fix:
def list_contains(List1, List2):
result = []
# Iterate in the 1st list
for m in List1:
# Iterate in the 2nd list
for n in List2:
# if there is a match
if m == n:
result += m
return result

Return True and stop if first item in list is matches, return false if no item matches

I wrote a function to check a string based on the following conditions:
Check if any word in the string is in list 1.
If string in list 1, return position to see if the item next to it in
the string is in list 2.
If in list 1 and in list 2, return True.
Else return False.
The tricky part is that those lists are really long. So to be efficient, the first occurrence in list 2 will return True and move on to the next string.
Below is the code that I have choked up but I doubt it is working as intended. Efficiency is really the key her as well. I tried creating a full combination of list 1 and list 2 and then loop through but it seems a bit insane looping over 100 million times per string.
1st Code:
def double_loop(words):
all_words = words.split()
indices = [n for n,x in enumerate(allwords) if x in list1]
if len(indices) != 0:
for pos in indices[:-1]:
if all_words[pos+1] in list2:
return True
#the purpose here is not to continue looking #through the whole list of indices, but end as soon as list2 contains an item 1 position after.
break
else:
return False
I am unsure whether the code above is working based on my logic above. In comparison;
2nd Code:
def double_loop(words):
all_words = words.split()
indices = [n for n,x in enumerate(allwords) if x in list1]
if len(indices) != 0:
indices2 = [i for i in indices[:-1] if all_words[i+1] in list2]
if len(indices2) != 0:
return True
else:
return False
else:
return False
has the same run time based on my test data.
I guess the clearer question is, will my first code actually run till it finds the "first" element that meets its criteria and break. Or is it still running through all the elements like the second code.
If I understand your question correctly then you will achieve the fastest lookup time with a precomputed index. This index is the set of all items of list 1, where the following item is in list 2.
# Build index, can be reused
index = set(item1 for i, item1 in enumerate(list1[:-1]) if list1[i+1] in list2)
def double_loop(words):
return any(word in index for word in words.split())
Index lookups will be done in constant time, no matter how long list1 and list2 are. Index building will take longer when list1 and list2 get longer. Note that making list2 a set might speed up index building when list1 is large.
You could iterate the main list only once by combining your criteria in a single list comprehension:
list1 = ['h', 'e', 'l', 'l', 'o']
list2 = ['a', 'h', 'e', 'p', 'l', 'o']
set_list2 = set(list2)
check = [item for x, item in enumerate(list1) if item in set_list2 and list2[x+1] == item]
If you want the function to short-circuit:
def check(list1, list2):
set_list2 = set(list2)
for x, item in enumerate(list1):
if item in set_list2 and list2[x+1] == item:
return True
return False
a = check(list1, list2)
First, your list1 and list2 should not be lists but sets. That is because a set lookup is a hash lookup and a list lookup is a linear search.
And you don't need a nested loop.
def single_loop(words):
all_words = words.split()
for w1, w2 in ((all_words[i],all_words[i+1]) for i in range(len(all_words)-1)):
if w1 in set1 and w2 in set2:
return True
else:
return False

How do I test if a list slice matches a set of values in any order?

For example, if I have a list like [a,b,c,d], and I needed to check if the elements at index 1,2 were b or c, then it would be true. For example, in [a,b,c,d], at index 1 there's a 'b' and at index 2 there's a 'c' which returns true. However, if the elements in those indexes were not a combination of b and c then false. So there has to be a 'b' or 'c' in index 1 and 2, cannot be (b and b) or (c and c). I'm not sure how to compare this.. Any hints?
[a,b,c,d] is True because elements in index 1 and 2 contain a combination of b or c
A general solution to this is to use set membership. So you could so:
s = set(vals[1:3])
if s == set(["b", "c"]):
...
This will create a set from vals[1] and vals[2], and check if that set contains exactly the elements "b" and "c".
You can use a Counter for this, assuming the types you're interested in are hashable.
from collections import Counter
def check_counts(l, compare_to):
return Counter(l) == Counter(compare_to)
l = ['a', 'b', 'c', 'd']
print(check_counts(l[1:3], ['c', 'b']))
This is similar to a solution using set, but allows for cases where you care about the number of a certain element
Given a list of items and a slice to match:
items = ['a', 'b', 'c', 'd']
match = ['b', 'c']
all you need is:
matches = items[1:3] in (match, match[::-1])
Note: match[::-1] returns a reversed version of match.
Alternative set-based version:
match = set(('b', 'c'))
# Version 1
matches = match.intersection(items[1:3]) == match
# Version 2
matches = not match.difference(items[1:3])

Python removing duplicates in a list

I want to remove duplicated in a python list in a way that allows me to alter another corresponding list in the same way. In the below example original is the list I want to de-duplicate. Each element in key that shares the same index as original corresponds to each other:
original = [a,a,a,3,4,5,b,2,b]
key = [h,f,g,5,e,6,u,z,t]
So I want to remove duplicates in original such that whatever element I delete from original I delete the corresponding element (of the same index) in key. Results I want:
deduplicated_original = [a,3,4,5,b,2]
deduplicated_key = [h,5,e,6,u,z]
I can get deduplicated_original using list(set(original)) however I cannot get the corresponding deduplicated_key
You can use a set to keep track of duplicates and enumerate() to iterate over indexes/values of the original list:
seen = set()
lst = []
for i, v in enumerate(original):
if not v in seen:
lst.append(key[i])
seen.add(v)
print(lst)
maybe less elegant, less easy to follow the list revesal, index slicing
the inner list comp walks the input list org backwards, asking if there is a prior matching element, if so record the index of this duplicate
[len(org) - 1 - i
for i, e in enumerate(org[::-1]) if e in org[:-i-1]]
then the outer list comp uses .pop() to modify org, ky as a side effect
nested list comprehension 'dups', a 'one liner' (with line breaks):
org = ['a','a','a',3,4,5,'b',2,'b']
ky = ['h','f','g',5,'e',6,'u','z','t']
dups = [(org.pop(di), ky.pop(di))
for di in [len(org) - 1 - i
for i, e in enumerate(org[::-1]) if e in org[:-i-1]]]
org, ky, dups
Out[208]:
(['a', 3, 4, 5, 'b', 2],
['h', 5, 'e', 6, 'u', 'z'],
[('b', 't'), ('a', 'g'), ('a', 'f')])
of course you don't actually have to assign the list comp result to anything to get the side effect of modifying the lists
You can manually get all the indices of duplicates like this:
indices = []
existing = set()
for i, item in enumerate(original):
if item in existing:
indices.append(i)
else:
existing.add(item)
and then remove those indices from your key list, in reverse because deleting a key changes the indices of further items:
for i in reversed(indices):
del key[i]

Combining nested lists and removing duplicates

Hoping to get some help on a newbie question. I want to sort through a bunch of sublists and if any items inside of the sublists appear twice, remove them, and merge the values in their respective lists into a new sublist.
As an example:
my_currentlist = [[A,B],[C,D],[B,E]]
my_desiredlist = [[A,E],[C,D]]
Any ideas?
My failed attempt (sorry, new to python):
for i in range(len(items)):
j=i+1
for j in range(len(items)):
if items[i][0]==items[j][0]:
items.remove(items[i])
items.remove(items[j])
items.append([items[i][1], items[j][1]])
elif items[i][0]==items[j][1]:
items.remove(items[i])
items.remove(items[j])
items.append([items[i][1], items[j][0]])
elif items[i][1]==items[j][0]:
items.remove(items[i])
items.remove(items[j])
items.append([items[i][0], items[j][1]])
elif items[i][1]==items[j][1]:
items.remove(items[i])
items.remove(items[j])
items.append([items[i][0], items[j][0]])
you need a double loop, the first one to take one element (sublist) from the list, and the second one to check the presence of its the elements in the rest of the sublist
this should do the trick
>>> my_currentlist = [["A","B"],["C","D"],["B","E"]]
>>> my_desiredlist = []
>>> while my_currentlist:
a,b = my_currentlist.pop(0)
for i,temp in enumerate(my_currentlist):
if a in temp:
temp.remove(a)
a = temp[0]
my_currentlist.pop(i)
break
elif b in temp:
temp.remove(b)
b = temp[0]
my_currentlist.pop(i)
break
my_desiredlist.append( [a,b] )
>>> my_desiredlist
[['A', 'E'], ['C', 'D']]
>>>

Categories