Matching two lists in python - python

I have two lists with different lengths.
list1=['T','P','M','M','A','R','k','M','G','C']
list2=['T','P','M','M','A','R','k','S','G','C','N']
By comparing list1 and list2: The results must be:
new_list1=['T','P','M','M','A','R','k','mis','M', 'G','C','mis']
new_list2=['T','P','M','M','A','R','k','S', 'mis','G','C','N']
The method is by matching the elements in two lists with duplicates. If there are a non-matching elements in the same position. For example in list1 there are three copies of M, in the list2 there are two copies. The results must refer to missing M from list2 in this position. The character S is missing from list1, results also must assign to missing.
Can anyone help me?

Assuming "mis" is a special value:
from itertools import zip_longest
def create_matchs(alst, blst, mis="mis"):
for a, b in zip_longest(alst, blst, fillvalue=mis):
if a == b or mis in (a, b):
yield a, b
else:
yield mis, b
yield a, mis
list1 = ['T','P','M','M','A','R','k','M','G','C']
list2 = ['T','P','M','M','A','R','k','S','G','C','N']
new_list1, new_list2 = zip(*create_matchs(list1, list2))
print(new_list1)
print(new_list2)

You can also try it. Its simple:
list1=['T','P','M','M','A','R','k','M','G','C']
list2 =['T','P','M','M','A','R','k','S','G','C','N']
if len(list1) > len(list2):
diff = len(list1) - len(list2)
for i in range(0, diff):
list2.append('mis')
else:
diff = len(list2) - len(list1)
for i in range(0, diff):
list1.append('mis')
new_list1 = []
new_list2 = []
for i in zip(list1,list2):
if i[0] == i[1]:
new_list1.append(i[0])
new_list2.append(i[1])
elif i[0] == 'mis' or i[1] == 'mis':
new_list1.append(i[0])
new_list2.append(i[1])
else:
new_list1.append(i[0])
new_list2.append('mis')
new_list1.append('mis')
new_list2.append(i[1])
print new_list1
print new_list2
Output:
['T', 'P', 'M', 'M', 'A', 'R', 'k', 'M', 'mis', 'G', 'C', 'mis']
['T', 'P', 'M', 'M', 'A', 'R', 'k', 'mis', 'S', 'G', 'C', 'N']

Related

I need to return list 1 index and list 2 index, when list 1 matches with list 2

I need to return the index in list 1 and list 2, when list 1 matches with list 2
I have a code like:
def list_contains(List1, List2):
id1=0
id2=0
# Iterate in the 1st list
for idx, m in enumerate(List1):
# Iterate in the 2nd list
for idx2, n in enumerate(List2):
# if there is a match
if m != n:
None
else:
id1 = idx
id2 = idx2
return (id1,id2)
List1 = ['a', 'e', 'i', 'o', 'u']
List2 = ['s','a', 'b', 'c', 'd', 'e']
print(list_contains(List1, List2))
what to change here , so i can get all the matching indexes...
Try adding the matching indexes to a list. Create empty list and add all matching indexes to it. finally return the list.
res = []
# Iterate in the 1st list
for idx, m in enumerate(List1):
# Iterate in the 2nd list
for idx2, n in enumerate(List2):
# if there is a match
if m != n:
None
else:
res.append((idx, idx2))
or You can use yield
for idx, m in enumerate(List1):
# Iterate in the 2nd list
for idx2, n in enumerate(List2):
# if there is a match
if m != n:
None
else:
yield (idx, idx2)
then convert the result to list
print(list(list_contains(List1, List2)))
Efficient solution would be using a dictionary. Create dictionary with List2 where list element is key and index is value. This will work if List2 has no duplicates.
def list_contains(List1, List2):
d = {x: idx for idx, x in enumerate(List2)}
return [(idx, d[y]) for idx, y in enumerate(List1) if y in d]
basically this:
for each in List1:
if each in List2:
id1.append(List1.index(each))
id2.append(List2.index(each))
return (id1,id2)
oh ok, this then:
List1 = ['a', 'e', 'i', 'o', 'u']
List2 = ['s','a', 'b', 'c', 'd', 'e']
matches = []
for each in List1:
if each in List2:
matches.append(str(List1.index(each)) + str(List2.index(each)))
print(matches)
this will provide you the indexes in each list
List1 = ['a', 'e', 'i', 'o', 'u']
List2 = ['s', 'a', 'b', 'c', 'd', 'e']
List3 = []
for num1, l1 in enumerate(List1):
for num2, l2 in enumerate(List2):
if l1 == l2:
List3.append((num1, num2))
print(List3)
the output will be:
[(0, 1), (1, 5)]
To get all matches you are very close to what you want to do. Rather than do it externally I would suggest you do this
List1 = ['a', 'e', 'i', 'o', 'u']
List2 = ['s','a', 'b', 'c', 'd', 'e']
List3 = []
for x in List2:
for y in range(len(List1)):
if x = List1[y]
List3.append(y)
print(List3)
This will simply loop through each value of the second List and compare it to all the elements of the first list. If they match it will then store the index of the first list that matched. This is then outputted at the end.
I have assumed as per your example that you want the index based off of the first list.
IF you want to get both indexes then you can modify it as such
List1 = ['a', 'e', 'i', 'o', 'u']
List2 = ['s','a', 'b', 'c', 'd', 'e']
List3 = []
for x in range(len(List2)):
for y in range(len(List1)):
if List2[x] = List1[y]
List3.append(List2[x] + ',' + List1[y])
print(List3)
This will result in string comma separated answers. If you do not want this then simply remove the +','+ and put in a ,
You can try this approach to return/yield the index values when the values match for both List1 and List2:
def list_contains(List1, List2):
for index, val in enumerate(zip(List1, List2)):
if val[0]==val[1]:
yield index
yield gives you a generator object. Iterate over the generator object like any other iterator to get the values:
matching_index = list_contains(list1, list2)
for index_values in matching_index:
print(index_values)

extract sequences from python list

I have a list in python which looks like this:
['x','x','x','x','P','x','x','N','P','N','x','x','x','N','P','x','x,'x,','x','x','x','N','x,'x','P','N','x','x','x'....]
I need to process the list in some way such that I return individual sequences of P and N. In the above case I need to return:
[['P'],['N','P','N'],['N','P'],['N'],['P','N'].....]
I have looked at itertools but have not found anything that can do this. I have a lot of lists to process in this way so efficiency is also important.
You can do it using itertools.groupby:
from itertools import groupby
data = ['x','x','x','x','P','x','x','N','P','N','x','x','x','N',
'P','x','x','x','x','x','x','N','x','x','P','N','x','x','x']
out = list(list(g) for k, g in groupby(data, lambda item: item in {'N', 'P'}) if k)
print(out)
# [['P'], ['N', 'P', 'N'], ['N', 'P'], ['N'], ['P', 'N']]
We group according to item in {'N', 'P'}, and keep only the groups for which this is True.
main_list = []
def get_desired_value(_list):
new_list = []
for val in _list:
if val in ['N', 'P']:
new_list.append(val)
else:
if new_list:
main_list.append(new_list[:])
new_list.clear()
return main_list
print(get_desired_value(data))
>>>[['P'], ['N', 'P', 'N'], ['N', 'P'], ['N'], ['P', 'N']]

How to remove an item from a pre-existing list using a for loop and through the use of append?

Essentially what I'm trying to do is remove values within a list when a condition has been met.
The conditions are based upon the position of the value in the list, which is what remove_position stands for in the function definition. The code below is what I have so far and I'm using Python 3.6.0. The list in this case is from an external python file which imports the function from another python and is:
str_list5 = ['w', 'i', 'n', 'g']
new_list = list_function.remove_value(str_list5, 2)
print(new_list)
new_list = list_function.remove_value(str_list5, -1)
print(new_list)
new_list = list_function.remove_value(str_list5, 10)
print(new_list)
What I'm trying to do use the remove_position value above in an arithmetic function that will delete the item that corresponds to the function's result.
def remove_value(my_list, remove_position):
newlist = []
count = 0
for item in my_list:
if remove_position < count:
newlist.remove(item)
if remove_position > count:
newlist.remove(item)
return newlist
The output that I'm looking for is this:
remove_value Test
['w', 'i', 'g']
['i', 'n', 'g']
['w', 'i', 'n']
You can do something like this:
my_list = ['w','i','n','g']
def remove_value(my_list, position):
if position >= len(my_list):
return my_list[:-1]
elif position < 0:
return my_list[1:]
else:
return my_list[:position] + my_list[position+1:]
# Test
remove_value(my_list, 2)
remove_value(my_list, -1)
remove_value(my_list, 10)
Output:
['w', 'i', 'g']
['i', 'n', 'g']
['w', 'i', 'n']
First of all: list objects are mutable and when you change them inside of a function – the original changes too, so if you use list.remove method in remove_value your str_list5 loses element as well.
I advise you to create new object instead of mutating the old one.
We can write this
def remove_value(my_list, remove_position):
result = []
remove_position = min(max(remove_position, 0), len(my_list) - 1)
for position, element in enumerate(my_list):
if position == remove_position:
continue
result.append(element)
return result
or using list comprehension
def remove_value(my_list, remove_position):
remove_position = min(max(remove_position, 0), len(my_list) - 1)
return [element
for position, element in enumerate(my_list)
if position != remove_position]
or following #ChihebNexus suggestion using list slices
def remove_value(my_list, remove_position):
remove_position = min(max(remove_position, 0), len(my_list) - 1)
return my_list[0: remove_position] + my_list[remove_position + 1:]
So which one to choose? The last version looks more elegant to me.
Tests
str_list5 = ['w', 'i', 'n', 'g']
new_list = remove_value(str_list5, 2)
print(new_list)
new_list = remove_value(str_list5, -1)
print(new_list)
new_list = remove_value(str_list5, 10)
print(new_list)
gives us
['w', 'i', 'g']
['i', 'n', 'g']
['w', 'i', 'n']
Try this
def remove_value(my_list, remove_position):
my_list.pop(remove_position)
return my_list
str_list5 = ['w','i','n','g']
new_list = remove_value(str_list5, 2)
print(new_list)
list comprehension is a good choice
new_list = [x for x in old_list if your_condition]
specifically, you may want this
def remove_value(my_list, remove_position):
return [my_list[i] for i in range(len(my_list)) if i != remove_position]

Complicated array join in Python

I'm having an array like this:
pairs = [['a', 'b', 'c', 'd'], ['g', 'h', 'j', 'k', 'l', 'm']]
I'd like to join them and create a new array so that on the output I'd have something like this:
['a_b', 'b_c', 'c_d']
['g_h', 'h_j', 'j_k', 'k_l', 'l_m']
I'm struggling with an algorithm and can't come up with something. How can I do that?
[['{}_{}'.format(*x) for x in zip(p, p[1:])] for p in pairs]
the for p in pairs part iterate each of your input lists.
zip(p, p[1:]) returns pairs of each item with the next item
'{}_{}'.format(*x) gets the string you requested
joinedIt = []
for i in range(0, len(pairs)):
for j in range(0, len(pairs[i])-1):
joinedIt[i].append(pairs[i][j] + pairs[i][j+1])
>>> pairs = [['a', 'b', 'c', 'd'], ['g', 'h', 'j', 'k', 'l', 'm']]
>>> [[s[i] + '_' + s[i+1] for i in range(len(s)-1)] for s in pairs]
[['a_b', 'b_c', 'c_d'], ['g_h', 'h_j', 'j_k', 'k_l', 'l_m']]
You could do that by using for loops with ranges
def parse(list):
new_list = []
for i in range(len(list) - 1):
new_list.append(list[i] + "_" + list[i+1])
return new_list
zipped = (zip(x,x[1:]) for x in pairs)
[["{}_{}".format(ele[0], ele[1]) for ele in x ]for x in zipped]

Eliminating duplicated elements in a list

I was trying chp 10.15 in book Think Python and wrote following codes:
def turn_str_to_list(string):
res = []
for letter in string:
res.append(letter)
return res
def sort_and_unique (t):
t.sort()
for i in range (0, len(t)-2, 1):
for j in range (i+1, len(t)-1, 1):
if t[i]==t[j]:
del t[j]
return t
line=raw_input('>>>')
t=turn_str_to_list(line)
print t
print sort_and_unique(t)
I used a double 'for' structure to eliminate any duplicated elements in a sorted list.
However, when I ran it, I kept getting wrong outputs.
if I input 'committee', the output is ['c', 'e', 'i', 'm', 'o', 't', 't'], which is wrong because it still contains double 't'.
I tried different inputs, sometimes the program can't pick up duplicated letters in middle of the list, and it always can not pick up the ones at the end.
What was I missing? Thanks guys.
The reason why your program isn't removing all the duplicate letters is because the use of del t[j] in the nested for-loops is causing the program to skip letters.
I added some prints to help illustrate this:
def sort_and_unique (t):
t.sort()
for i in range (0, len(t)-2, 1):
print "i: %d" % i
print t
for j in range (i+1, len(t)-1, 1):
print "\t%d %s len(t):%d" % (j, t[j], len(t))
if t[i]==t[j]:
print "\tdeleting %c" % t[j]
del t[j]
return t
Output:
>>>committee
['c', 'o', 'm', 'm', 'i', 't', 't', 'e', 'e']
i: 0
['c', 'e', 'e', 'i', 'm', 'm', 'o', 't', 't']
1 e len(t):9
2 e len(t):9
3 i len(t):9
4 m len(t):9
5 m len(t):9
6 o len(t):9
7 t len(t):9
i: 1
['c', 'e', 'e', 'i', 'm', 'm', 'o', 't', 't']
2 e len(t):9
deleting e
3 m len(t):8
4 m len(t):8
5 o len(t):8
6 t len(t):8
7 t len(t):8
i: 2
['c', 'e', 'i', 'm', 'm', 'o', 't', 't']
3 m len(t):8
4 m len(t):8
5 o len(t):8
6 t len(t):8
i: 3
['c', 'e', 'i', 'm', 'm', 'o', 't', 't']
4 m len(t):8
deleting m
5 t len(t):7
6 t len(t):7
i: 4
['c', 'e', 'i', 'm', 'o', 't', 't']
5 t len(t):7
i: 5
['c', 'e', 'i', 'm', 'o', 't', 't']
i: 6
['c', 'e', 'i', 'm', 'o', 't', 't']
['c', 'e', 'i', 'm', 'o', 't', 't']
Whenever del t[j] is called, the list becomes one element smaller but the inner j variable for-loops keeps iterating.
For example:
i=1, j=2, t = ['c', 'e', 'e', 'i', 'm', 'm', 'o', 't', 't']
It sees that t[1] == t[2] (both 'e') so it removes t[2].
Now t = ['c', 'e', 'i', 'm', 'm', 'o', 't', 't']
However, the code continues with i=1, j=3, which compares 'e' to 'm' and skips over 'i'.
Lastly, it is not catching the last two 't's because by the time i=5, len(t) is 7, so the conditions of the inner for-loop is range(6,6,1) and is not executed.
In python you could make use of the inbuilt data structures and library functions like set() & list()
Your turn_str_to_list() can be done with list(). Maybe you know this but wanted to do it on your own.
Using the list() and set() APIs:
line=raw_input('>>>')
print list(set(line))
Your sort_and_unique() has a O(n^2) complexity. One of the ways to make cleaner:
def sort_and_unique2(t):
t.sort()
res = []
for i in t:
if i not in res:
res.append(i)
return res
This would still be O(n^2) since look up (i not in res) would be linear time, but code looks a bit cleaner. Deletion has complexity O(n), so instead you could do append to new list since append is O(1). See this for complexities of list API: https://wiki.python.org/moin/TimeComplexity
You can try the following code snippet
s = "committe"
res = sorted((set(list(s))))
Solution explained:
>>> word = "committee"
Turn string to list of characters:
>>> clst = list(word)
>>> clst
['c', 'o', 'm', 'm', 'i', 't', 't', 'e', 'e']
Use set to get only unique items:
>>> unq_clst = set(clst)
>>> unq_clst
{'c', 'e', 'i', 'm', 'o', 't'}
It turns out (thanks Blckknght), that the list step is not necessary and we could do that this way:
>>> unq_clst = set(word)
{'c', 'e', 'i', 'm', 'o', 't'}
Both, set and list are taking as parameter an iterable, and iterating over string returns one character by another.
Sort it:
>>> sorted(unq_clst)
['c', 'e', 'i', 'm', 'o', 't']
One line version:
>>> sorted(set("COMMITTEE"))
['C', 'E', 'I', 'M', 'O', 'T']
Here you go:
In [1]: word = 'committee'
In [3]: word_ = set(word)
In [4]: word_
Out[4]: {'c', 'e', 'i', 'm', 'o', 't'}
The standard way to check for unique elements in python is to use a set. The constructor of a set takes any sequential object. A string is a collection of sequential ascii codes (or unicode codepoints), so it qualifies.
If you have further problems, do leave a comment.
So you want to have explained, what is wrong in your code. Here you are:
Before we dive into coding, make test case(s)
It would make our coding faster, if we get test case at hand from very begining
For testing I will make small utility function:
def textinout(text):
return "".join(sort_and_unique(list(text)))
This allows quick test like:
>>> textinout("committee")
"ceimot"
and another helper function for readable error traces:
def checkit(textin, expected):
msg = "For input '{textin}' we expect '{expected}', got '{result}'"
result = textinout(textin)
assert result == expected, msg.format(textin=textin, expected=expected, result=result)
And make the test case function:
def testit():
checkit("abcd", 'abcd')
checkit("aabbccdd", 'abcd')
checkit("a", 'a')
checkit("ddccbbaa", 'abcd')
checkit("ddcbaa", 'abcd')
checkit("committee", 'ceimot')
Let us make first test with existing function:
def sort_and_unique (t):
t.sort()
for i in range (0, len(t)-2, 1):
for j in range (i+1, len(t)-1, 1):
if t[i]==t[j]:
del t[j]
return t
Now we can test it:
testit()
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-11-291a15d81032> in <module>()
----> 1 testit()
<ipython-input-4-d8ad9abb3338> in testit()
1 def testit():
2 checkit("abcd", 'abcd')
----> 3 checkit("aabbccdd", 'abcd')
4 checkit("a", 'a')
5 checkit("ddccbbaa", 'abcd')
<ipython-input-10-620ac3b14f51> in checkit(textin, expected)
2 msg = "For input '{textin}' we expect '{expected}', got '{result}'"
3 result = textinout(textin)
----> 4 assert result == expected, msg.format(textin=textin, expected=expected, result=result)
AssertionError: For input 'aabbccdd' we expect 'abcd', got 'abcdd'
Reading the last line of error trace we know, what is wrong.
General comments to your code
Accessing list members via index
In most cases this is not efficient and it makes the code hard to read.
Instead of:
lst = ["a", "b", "c"]
for i in range(len(lst)):
itm = lst[i]
# do something with the itm
You should use:
lst = ["a", "b", "c"]
for itm in lst:
# do something with the itm
print itm
If you need to access subset of a list, use slicing
Instead of:
for i in range (0, len(lst)-2, 1):
itm = lst[i]
Use:
for itm in lst[:-2]:
# do something with the itm
print itm
If you really need to know position of processed item for inner loops, use enumerate:
Instead of:
lst = ["a", "b", "c", "d", "e"]
for i in range(0, len(lst)):
for j in range (i+1, len(lst)-1, 1):
itm_i = lst[i]
itm_j = lst[j]
# do something
Use enumerate, which turn each list item into tuple (index, item):
lst = ["a", "b", "c", "d", "e"]
for i, itm_i in enumerate(lst):
for itm_j in lst[i+1, -1]
print itm_i, itm_j
# do something
Manipulating a list which is processed
You are looping over a list and suddenly delete an item from it. List modification during iteration is generally better to avoid, if you have to do it, you have to
think twice and take care, like iterating backward so that you do not modify that part, which is
about to be processed in some next iteration.
As alternative to deleting an item from iterated list you can note findings (like duplicated items) to another list and
after you are out of the loop, use it somehow.
How could be your code rewritten
def sort_and_unique (lst):
lst.sort()
to_remove = []
for i, itm_i in enumerate(lst[:-2]):
for j, itm_j in enumerate(lst[i+1: -1]):
if itm_i == itm_j:
to_remove.append(itm_j)
# now we are out of loop and can modify the lst
# note, we loop over one list and modify another, this is safe
for itm in to_remove:
lst.remove(itm)
return lst
Reading the code, the problem turns out: you never touch last item in the sorted list. That is why you do not get "t" removed as it is alphabetically the last item after applying sort.
So your code could be corrected this way:
def sort_and_unique (lst):
lst.sort()
to_remove = []
for i, itm_i in enumerate(lst[:-1]):
for j, itm_j in enumerate(lst[i+1:]):
if itm_i == itm_j:
to_remove.append(itm_j)
for itm in to_remove:
lst.remove(itm)
return lst
From now on, the code is correct, and you shall prove it by calling testit()
>>> testit()
Silent test output is what we were dreaming about.
Having the test function make further code modification easy, as it will be quick to check, if things are still working as expected.
Anyway, the code can be shortened by getting tuples (itm_i, itm_j) using zip
def sort_and_unique (lst):
lst.sort()
to_remove = []
for itm_i, itm_j in zip(lst[:-1], lst[1:]):
if itm_i == itm_j:
to_remove.append(itm_j)
for itm in to_remove:
lst.remove(itm)
return lst
Test it:
>>> testit()
or using list comprehension:
def sort_and_unique (lst):
lst.sort()
to_remove = [itm_j for itm_i, itm_j in zip(lst[:-1], lst[1:]) if itm_i == itm_j]
for itm in to_remove:
lst.remove(itm)
return lst
Test it:
>>> testit()
As list comprehension (using []) completes creation of returned value sooner then are the values
used, we can remove another line:
def sort_and_unique (lst):
lst.sort()
for itm in [itm_j for itm_i, itm_j in zip(lst[:-1], lst[1:]) if itm_i == itm_j]:
lst.remove(itm)
return lst
Test it:
>>> testit()
Note, that so far, the code still reflects your original algorithm, only two bugs were removed:
- not manipulating list, we are iterating over
- taking into account also last item from the list

Categories