multiple iteration for finding shared items in two lists - python

I have the following two lists:
list1 = [(('diritti', 'umani'), 'diritto uomo'), (('sgomberi', 'forzati'), 'sgombero forza'), (('x', 'x'), 'x x'), ...] ## list of tuples, each tuple contains term and lemma of term
list2 = ['diritto uomo', 'sgombero forza'] ### a small list of lemmas of terms
The task is to extract from the list1 the terms whose lemmas are present in the list2. note that one element in list2 can share the lemma with more than one term in list1, so for every item in list2 I need to find its shared items in list1. I tried this code:
result = []
for item in list2:
for x in list1:
for i, ii in x:
if item.split()[0] in ii or item.split()[1] in ii :
result.append(i)
This code takes a long time to do the task, can someone suggest another way to do this. Thanks

If you just want to match the equal lemmas you don't need to split your words and check the membership you can simply use == operation within a list comprehension:
>>> [item for item, lemm in list1 for w in list2 if w == lemm]
[('diritti', 'umani'), ('sgomberi', 'forzati')]
Otherwise by splitting the lemmas and membership checking within list1's lemmas it won't give you any result.

Related

Return matching strings from two lists based on last 4 characters

list1 = ['1_Maths','2_Chemistry','10.1_Geography','12_History']
list2 = ['1_Maths', '2_Physics', '3_Chemistry','11.1_Geography','13_History']
I want to produce two outputs from list1 and list2 based on last 4 characters.
lists = [item for itm in list1 if itm in list2]
The above only prints 1_Maths. Cannot think of a way to produce all the matching subjects.
last_4char = [sub[ : -4] for sub in list1]
This could be an idea but I'm not sure how I can implement this to product the exact results from list1/2
Output
print(new_list1) = ['1_Maths','2_Chemistry','10.1_Geography','12_History']
print(new_list2) = ['1_Maths', '3_Chemistry','11.1_Geography','13_History']
Try this:
def common_4_last(list1, list2):
return [[i for i in list1 if i[-4:] in {k[-4:] for k in list2}], [i for i in list2 if i[-4:] in {k[-4:] for k in list1}]
This will result to a list with 2 elements, one list for items form list1 and a second list for items from list2 that fit the criteria of common last 4
You can run the function for any pair of lists
For example for your given list1 and list2 result will be:
common_4_last(list1, list2)
[['1_Maths', '2_Chemistry', '10.1_Geography', '12_History'], ['1_Maths', '3_Chemistry', '11.1_Geography', '13_History']]
If you want the first list you can get it by
common_4_last(list1, list2)[0] and the same for second list

How to make a list of certain items from a different list

For example: list1 = [1t, 1r, 2t, 2r, 3t, 3r...., nt, nr]. How do I make a list list_t that has all t items from list1? I tried using the following for loop:
for i in list1[0:]:
list_t =[i.t]
But this only assigns the first item to list_t.
If your list has the same items repeated at the same time step, then:
list1 = ['1t', '1r', '2t', '2r', '3t', '3r']
# list[start:stop:step]
l2 = list1[0::2]
print(l2)
will solve your problem.
However, if what you mean is that you have a list of strings and you need to extract the strings with t on it, then you can just test if t is in the element, like so:
l2 = list()
for i in list1:
if 't' in i :
l2.append(i)
print(l2)

Python trouble with matching tuples

For reference this is my code:
list1 = [('10.180.13.101', '10.50.60.30', 'STCMGMTUNIX01')]
list2 = [('0.0.0.0', 'STCMGMTUNIX01')]
for i in list1:
for j in list2:
for k in j:
print (k)
if k.upper() in i:
matching_app.add(j)
for i in matching_app:
print (i)
When I run it, it does not match. This list can contain two or three variables and I need it to add it to the matching_app set if ANY value from list2 = ANY value from list1. It does not work unless the tuples are of equal length.
Any direction to how to resolve this logic error will be appreciated.
You can solve this in a few different ways. Here are two approaches:
Looping:
list1 = [('10.180.13.101', '10.50.60.30', 'STCMGMTUNIX01')]
list2 = [('0.0.0.0', 'STCMGMTUNIX01')]
matches = []
for i in list1[0]:
if i in list2[0]:
matches.append(i)
print(matches)
#['STCMGMTUNIX01']
List Comp with a set
merged = list(list1[0] + list2[0])
matches2 = set([i for i in merged if merged.count(i) > 1])
print(matches2)
#{'STCMGMTUNIX01'}
I'm not clear of what you want to do. You have two lists, each containing exactly one tuple. There also seems to be one missing comma in the first tuple.
For finding an item from a list in another list you can:
list1 = ['10.180.13.101', '10.50.60.30', 'STCMGMTUNIX01']
list2 = ['0.0.0.0', 'STCMGMTUNIX01']
for item in list2:
if item.upper() in list1: # Check if item is in list
print(item, 'found in', list1)
Works the same way with tuples.

Find the nearest value of a list element

I have two lists which are:
>>> list1 = ['gain','archive','win','success']
>>> list2 = ['i','win','game','i','am','success','cool']
and also I found the same values of both list by comparing the lists.
>>> result= set(list1) & set(list2)
Output is
set(['win', 'success'])
Now I want to find the next element value of the result. Here it would be: 'game' and 'cool'.
How can I do this (using python 2.7)?
Given that you have the intersection words
result = { 'win', 'success' }
You could find the next words in list2 like this:
next_words = [list2[list2.index(word)+1] for word in result]
index gets you the index of the given element in the list. You can add 1 to it to get the next element.
If your element is at the end of the list, it will throw an exception, because there is no "next" element to get.
You can use the index function and add 1. Be careful though, if your common element is the last one of your list, it will generate an error
list1 = ['gain','archive','win','success']
list2 = ['i','win','game','i','am','success','cool']
result= set(list1) & set(list2)
list3 = [list2[list2.index(e)+1] for e in result]
edit For the case where you last element is a common element:
result= set(list1) & set(list2)
list4 = []
for e in result:
try:
list4.append(list2[list2.index(e)+1])
except:
pass
Output: ['game', 'cool']
You could do a pairwise iteration over your list2 and do the "intersection" manually:
list1 = ['gain','archive','win','success']
list2 = ['i','win','game','i','am','success','cool']
set1 = set(list1)
result = []
for item, nextitem in zip(list2, list2[1:]): # pairwise iteration
if item in set1:
result.append(nextitem) # append the next item if the current item is in the intersection
print(result) # ['game', 'cool']
This does the trick for the next element in list2:
next_result = [list2[list2.index(el)+1] for el in result if list2.index(el)+1<len(list2)]
You could use list2.index, but that's doing a full search just for finding back an index, and artificially increasing complexity from O(n) to O(n*n).
Just keep track of the indexes of each words. There are several ways to do that.
Create your own function that search for common words, and return them as the index of those words in list2. This probably the least pythonic but the fastest.
Create a dictionary from the words of list2 to their index, then after computing the set intersection, lookup on the dict to find the index and increase by one. You need to build a full dictionary the size of list2, this might be expensive (but still better than O(n*n)).
Create a dictionary from the words of list2 to their next word or None if there aren't and do a lookup on the dict to find the index. You need to build a full dictionary the size of list2, this might be expensive.
If you know how to use itertools, you could do an iterator on list2 that yield the index and the word, and filter the result if the word is in list1, then pick only the indexes.

Only using items from one list once in nested list comprehension

I'm trying to use list comprehension to generate a new list that consists of a letter taken from a list1 directly followed (after a colon) by the words from list2 that start with that particular letter. I managed to code this using nested for loops as following:
list1=["A","B"]
list2=["Apple","Banana","Balloon","Boxer","Crayons","Elephant"]
newlist=[]
for i in list1:
newlist.append(i+":")
for j in list2:
if j[0]==i:
newlist[-1]+=j+","
resulting in the intended result: ['A:Apple,', 'B:Banana,Balloon,Boxer,']
Trying the same using list comprehension, I came up with the following:
list1=["A","B"]
list2=["Apple","Banana","Balloon","Boxer","Crayons","Elephant"]
newlist=[i+":"+j+"," for i in list1 for j in list2 if i==j[0]]
resulting in: ['A:Apple,', 'B:Banana,', 'B:Balloon,', 'B:Boxer,']
In which each time a word with that starting letter is found, a new item is created in newlist, while my intention is to have one item per letter.
Is there a way to edit the list comprehension code in order to obtain the same result as using the nested for loops?
All you need to do is to remove the second for loop and replace it with a ','.join(matching_words) call where you use j now in the string concatenation now:
newlist = ['{}:{}'.format(l, ','.join([w for w in list2 if w[0] == l])) for l in list1]
This isn't very efficient; you loop over all the words in list2 for each letter. To do this efficiently, you would be better of to preprocess the lists into a dictionary:
list2_map = {}
for word in list2:
list2_map.setdefault(word[0], []).append(word)
newlist = ['{}:{}'.format(l, ','.join(list2_map.get(l, []))) for l in list1]
The first loop builds a dictionary mapping initial letter to a list of words, so that you can directly use those lists instead of using a nested list comprehension.
Demo:
>>> list1 = ['A', 'B']
>>> list2 = ['Apple', 'Banana', 'Balloon', 'Boxer', 'Crayons', 'Elephant']
>>> list2_map = {}
>>> for word in list2:
... list2_map.setdefault(word[0], []).append(word)
...
>>> ['{}:{}'.format(l, ','.join(list2_map.get(l, []))) for l in list1]
['A:Apple', 'B:Banana,Balloon,Boxer']
The above algorithm loops twice through all of list2, and once through list1, making this a O(N) linear algorithm (adding a single word to list2 or a single letter to list1 increases the amount of time with a constant amount). Your version loops over list2 once for every letter in list1, making it a O(NM) algorithm, creating increasing the amount of time it takes exponentially whenever you add a letter or word.
To put that into numbers, if you expanded list1 to cover all 26 ASCII uppercase letters and expanded list2 to contain 1000 words, your approach (scanning all of list2 for words with a given letter) would make 26000 steps. My version, including pre-building the map, takes only 2026 steps. With list2 containing 1 million words, your version has to make 26 million steps, mine 2 million and 26.
list1=["A","B"]
list2=["Apple","Banana","Balloon","Boxer","Crayons","Elephant"]
res = [l1 + ':' + ','.join(l2 for l2 in list2 if l2.startswith(l1)) for l1 in list1]
print(res)
# ['A:Apple', 'B:Banana,Balloon,Boxer']
But it seems to be complicated to read, so I would advice to use nested loops. You can create generator for more readability (if you think this version is more readable):
def f(list1, list2):
for l1 in list1:
val = ','.join(l2 for l2 in list2 if l2.startswith(l1))
yield l1 + ':' + val
print(list(f(list1, list2)))
# ['A:Apple', 'B:Banana,Balloon,Boxer']

Categories