Find the duplicated data from two different list [duplicate] - python

This question already has answers here:
How can I compare two lists in python and return matches [duplicate]
(21 answers)
Closed 1 year ago.
I would like to find the duplicated data from two different list. The result should show me which data is duplicated.
Here is the list of my data
List1 = ['a', 'e', 'i', 'o', 'u']
List2 = ['a', 'b', 'c', 'd', 'e']
The function
def list_contains(List1, List2):
# Iterate in the 1st list
for m in List1:
# Iterate in the 2nd list
for n in List2:
# if there is a match
if m == n:
return m
return m
list_contains(List1,List2)
I get the result is 'a' but the result I expected is 'a','e'.

set(List1).intersection(set(List2))
Issue with your code:
return statement will exit from the function, so at the first match it will just return with the first match. Since you need all the matches not just the first match you need to capture them into say a list and then return rather then returning at the fist match.
Fix:
def list_contains(List1, List2):
result = []
# Iterate in the 1st list
for m in List1:
# Iterate in the 2nd list
for n in List2:
# if there is a match
if m == n:
result += m
return result

Related

How to find two same items in a python list

So I want to be able to check if a list in a python program contains a particular value written twice.
list_one = [a, b, c, d, e, f, g]
Here I want to check if two different items have the same value, for instance a == d
Iterate through your list to see how many times each value appears. If count() evaluates to more than 1, you know there are duplicates of the current value.
for i in list_one:
if i.count() > 1:
print("Duplicate: " + i)
If you want to compare two values directly, you can use their indices. For example, in the case of a == d:
if list_one[0] == list_one[3]:
print("a and d are equal.")
You can also use a function, like follows, to find all of the indexes that there are duplicates:
def getIndexPositions(listOfElements, element): #pass it the list and element to check for duplicates of
''' Returns the indexes of all occurrences of give element in
the list- listOfElements '''
indexPosList = []
indexPos = 0
while True:
try:
# Search for item in list from indexPos to the end of list
indexPos = listOfElements.index(element, indexPos)
# Add the index position in list
indexPosList.append(indexPos)
indexPos += 1
except ValueError as e:
break
return indexPosLis
This will check a list for duplicate values and produce a new list containing any values that were duplicated in the first list:
items = ['a', 'b', 'c', 'b', 'd', 'e', 'f', 'g', 'a', 'c']
items.sort()
dups = [
item1
for item1, item2 in zip(items[:-1], items[1:])
if item1 == item2
]

How do you count the number of same entities in a list? [duplicate]

This question already has answers here:
How do I count the occurrences of a list item?
(29 answers)
Closed 4 years ago.
For example
MyList=[a,a,a,c,c,a,d,d,d,b]
Returns
[4,2,3,1]
from collections import Counter
MyList=['a','a','a','c','c','a','d','d','d','b']
Counter(MyList).values() # Counts's the frequency of each element
[4, 2, 1, 3]
Counter(MyList).keys() # The corresponding elements
['a', 'c', 'b', 'd']
just do it with dictionary :
counter_dict = {}
for elem in MyList:
if elem in counter_dict.keys():
counter_dict[elem] += 1
else :
counter_dict[elem] = 1
at the end you have a dictionary with key with key , which is the element in the list , and value which is the number of appearances.
If you need only a list:
# Your list
l1 = [1,1,2,3,1,2]
# Set will give you each element from l1 without duplicates
l2 = set(l1)
# Let`s see how set look
#print(l2)
# Create new empty list
l3 = []
# Use for loop to find count of each element from set and store that in l3
for i in l2:
k = l1.count(i)
l3.append(k)
# Check how l3 now looks like
print(l3)
Return:
[3, 2, 1]
First you should change your list elements to a string type:
my_list = ['a','a','a','c','c','a','d','d','d','b']
Because a, b, c or d without quote(') don't represent any type,so then you'll get error.
Try to use python dictionary, as follows:
result ={}
for each in my_list:
result.setdefault(each, 0)
result[each] += 1
print result.values()
Then this would be the output:
[4,2,3,1]
A first try would be doing this:
occurrencies = {n:a.count(n) for n in set(a)}
This returns a dictionary which has the element as key, and its occurrences as value. I use set to avoid counting elements more than once.
This is not a one-pass approach and it has quadratic complexity in time, hence this could be really slow.
Here's a way you could do to get the same result with one pass, linear complexity:
def count_occurrencies(inputArray):
result = {}
for element in inputArray:
if element not in result:
result[element] = 1
else:
result[element] += 1
return result

Return True and stop if first item in list is matches, return false if no item matches

I wrote a function to check a string based on the following conditions:
Check if any word in the string is in list 1.
If string in list 1, return position to see if the item next to it in
the string is in list 2.
If in list 1 and in list 2, return True.
Else return False.
The tricky part is that those lists are really long. So to be efficient, the first occurrence in list 2 will return True and move on to the next string.
Below is the code that I have choked up but I doubt it is working as intended. Efficiency is really the key her as well. I tried creating a full combination of list 1 and list 2 and then loop through but it seems a bit insane looping over 100 million times per string.
1st Code:
def double_loop(words):
all_words = words.split()
indices = [n for n,x in enumerate(allwords) if x in list1]
if len(indices) != 0:
for pos in indices[:-1]:
if all_words[pos+1] in list2:
return True
#the purpose here is not to continue looking #through the whole list of indices, but end as soon as list2 contains an item 1 position after.
break
else:
return False
I am unsure whether the code above is working based on my logic above. In comparison;
2nd Code:
def double_loop(words):
all_words = words.split()
indices = [n for n,x in enumerate(allwords) if x in list1]
if len(indices) != 0:
indices2 = [i for i in indices[:-1] if all_words[i+1] in list2]
if len(indices2) != 0:
return True
else:
return False
else:
return False
has the same run time based on my test data.
I guess the clearer question is, will my first code actually run till it finds the "first" element that meets its criteria and break. Or is it still running through all the elements like the second code.
If I understand your question correctly then you will achieve the fastest lookup time with a precomputed index. This index is the set of all items of list 1, where the following item is in list 2.
# Build index, can be reused
index = set(item1 for i, item1 in enumerate(list1[:-1]) if list1[i+1] in list2)
def double_loop(words):
return any(word in index for word in words.split())
Index lookups will be done in constant time, no matter how long list1 and list2 are. Index building will take longer when list1 and list2 get longer. Note that making list2 a set might speed up index building when list1 is large.
You could iterate the main list only once by combining your criteria in a single list comprehension:
list1 = ['h', 'e', 'l', 'l', 'o']
list2 = ['a', 'h', 'e', 'p', 'l', 'o']
set_list2 = set(list2)
check = [item for x, item in enumerate(list1) if item in set_list2 and list2[x+1] == item]
If you want the function to short-circuit:
def check(list1, list2):
set_list2 = set(list2)
for x, item in enumerate(list1):
if item in set_list2 and list2[x+1] == item:
return True
return False
a = check(list1, list2)
First, your list1 and list2 should not be lists but sets. That is because a set lookup is a hash lookup and a list lookup is a linear search.
And you don't need a nested loop.
def single_loop(words):
all_words = words.split()
for w1, w2 in ((all_words[i],all_words[i+1]) for i in range(len(all_words)-1)):
if w1 in set1 and w2 in set2:
return True
else:
return False

How to split a list based on whether the elements were next to each other in the list they came from?

I'm going through Problem 3 of the MIT lead python course, and I have an admittedly long drawn out script that feels like it's getting close. I need to print the longest substring of s in which the letters occur in alphabetical order. I'm able to pull out any characters that are in alphabetical order with regards to the character next to it. What I need to see is:
Input : 'aezcbobobegghakl'
needed output: 'beggh'
my output: ['a', 'e', 'b', 'b', 'b', 'e', 'g', 'g', 'a', 'k']
My code:
s = 'aezcbobobegghakl'
a = 'abcdefghijklmnopqrstuvwxyz'
len_a = len(a)
len_s = len(s)
number_list = []
letter_list = []
for i in range(len(s)):
n = 0
letter = s[i+n]
if letter in a:
number_list.append(a.index(letter))
n += 1
print(number_list)
for i in number_list:
letter_list.append(a[i])
print(letter_list)
index_list = []
for i in range(len(letter_list)):
index_list.append(i)
print(index_list)
first_check = []
for i in range(len(letter_list)-1):
while number_list[i] <= number_list[i+1]:
print(letter_list[i])
first_check.append(letter_list[i])
break
print(first_check)
I know after looking that there are much shorter and completely different ways to solve the problem, but for the sake of my understanding, is it even possible to finish this code to get the output I'm looking for? Or is this just a lost cause rabbit hole I've dug?
I would build a generator to output all the runs of characters such that l[i] >= l[i-1]. Then find the longest of those runs. Something like
def runs(l):
it = iter(l)
try:
run = [next(it)]
except StopIteration:
return
for i in it:
if i >= run[-1]:
run.append(i)
else:
yield run
run = [i]
yield run
def longest_increasing(l):
return ''.join(max(runs(l), key=len))
Edit: Notes on your code
for i in range(len(s)):
n = 0
letter = s[i+n]
if letter in a:
number_list.append(a.index(letter))
n += 1
is getting the "number value" for each letter. You can use the ord function to simplify this
number_list = [ord(c) - 97 for c in s if c.islower()]
You never use index_list, and you never should. Look into the enumerate function.
first_check = []
for i in range(len(letter_list)-1):
while number_list[i] <= number_list[i+1]:
print(letter_list[i])
first_check.append(letter_list[i])
break
this part doesn't make a ton of sense. You break out of the while loop every time, so it's basically an if. You have no way of keeping track of more than one run. You have no mechanism here for comparing runs of characters against one another. I think you might be trying to do something like
max_run = []
for i in range(len(letter_list)-1):
run = []
for j in range(i, len(letter_list)):
run.append(letter_list[j])
if letter_list[j] > letter_list[j+1]:
break
if len(run) > len(max_run):
max_run = run
(Disclaimer: I'm pretty sure the above is off by one but it should be illustrative). The above can be improved in a lot of ways. Note that it loops over the last character as many as len(s) times, making it a n**2 solution. Also, I'm not sure why you need number_list, as strings can be compared directly.
What about a simple recursive approach :
data = 'ezcbobobegghakl'
words=list(data)
string_s=list(map(chr,range(97,123)))
final_=[]
def ok(list_1,list_2):
if not list_1:
return 0
else:
first = list_1[0]
chunks = list_2[list_2.index(first):]
track = []
for j, i in enumerate(list_1):
if i in chunks:
track.append(i)
chunks=list_2[list_2.index(i):]
else:
final_.append(track)
return ok(list_1[j:],list_2)
final_.append(track)
print(ok(words,string_s))
print(max(final_,key=lambda x:len(x)))
output:
['b', 'e', 'g', 'g', 'h']
You can find a list of all substrings of the input string, and then find all the strings that are sorted alphabetically. To determine of a letter is sorted alphabetically, sorted the original string by position in the alphabet, and then see if the final string equals the original string:
from string import ascii_lowercase as l
s = 'aezcbobobegghakl'
substrings = set(filter(lambda x:x, [s[i:b] for i in range(len(s)) for b in range(len(s))]))
final_substring = max([i for i in substrings if i == ''.join(sorted(list(i), key=lambda x:l.index(x)))], key=len)
Output:
'beggh'
This is one way of getting the job done:
s = 'aezcbobobegghakl'
l = list(s)
run = []
allrun = []
element = 'a'
for e in l:
if e >= element:
run.append(e)
element = e
else:
allrun.append(run)
run = [e]
element = e
lengths = [len(e) for e in allrun]
result = ''.join(allrun[lengths.index(max(lengths))])
"run" is basically an uninterrupted run; it keeps growing as you add elements bigger than what is previously seen ("b" is bigger than "a", just string comparison), and resets else.
"allrun" contains all "run"s, which looks like this:
[['a', 'e', 'z'], ['c'], ['b', 'o'], ['b', 'o'], ['b', 'e', 'g', 'g', 'h']]
"result" finally picks the longest "run" in "allrun", and merges it into one string.
Regarding your code:
It is very very inefficient, I would not proceed with it. I would adopt one of the posted solutions.
Your number_list can be written as [a.index(_) for _ in s], one liner.
Your letter_list is actually just list(s), and you are using a loop for that!
Your index_list, what does it even do? It is equivalent to range(len(letter_list)), so what are you aiming with the append in the loop?
Finally, the way you write loops reminds me of matlab. You can just iterate on the elements of a list, no need to iterate on index and fetch the corresponding element in list.

Manipulation of list - Extract integers to new list [duplicate]

This question already has answers here:
How to remove items from a list while iterating?
(25 answers)
Closed 5 years ago.
I have a list which consist of string and integers. I have to pop only integers and put that in a separate list. My code:
list1=['a','b','c','d','e','f','g','h',1,2,3]
list=[]
x=0
for i in list1:
if isinstance(i,int) :
list.append(i)
list1.pop(x)
x += 1
print(list1)
print(list)
Output of above code
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 2]
[1, 3]
My question is: Why are all the integers not removed by my code? What is wrong with my code?
You iterate and manipulate the same list at the same time:
for i in list1: # iteration
if isinstance(i,int):
list.append(i)
list1.pop(x) # manipulate
x += 1
This usually does not work: the for loop works with a cursor. If you meanwhile removes an item, you thus will start skipping elements.
It is better to simply use a declarative and Pythonic approach, like for instance the following:
list_int = [i for i in list1 if isinstance(i,int)]
list1 = [i for i in list1 if not isinstance(i,int)]
Furthermore you should not name a variable list, since then you remove the reference to the list class.

Categories