Cross-matching two lists - python

I have two lists where I am trying to see if there is any matches between substrings in elements in both lists.
["Po2311tato","Pin2231eap","Orange2231edg","add22131dfes"]
["2311","233412","2231"]
If any substrings in an element matches the second list such as "Po2311tato" will match with "2311". Then I would want to put "Po2311tato" in a new list in which all elements of the first that match would be placed in the new list. So the new list would be ["Po2311tato","Pin2231eap","Orange2231edg"]

You can use the syntax 'substring' in string to do this:
a = ["Po2311tato","Pin2231eap","Orange2231edg","add22131dfes"]
b = ["2311","233412","2231"]
def has_substring(word):
for substring in b:
if substring in word:
return True
return False
print filter(has_substring, a)
Hope this helps!

This can be a little more concise than the jobby's answer by using a list comprehension:
>>> list1 = ["Po2311tato","Pin2231eap","Orange2231edg","add22131dfes"]
>>> list2 = ["2311","233412","2231"]
>>> list3 = [string for string in list1 if any(substring in string for substring in list2)]
>>> list3
['Po2311tato', 'Pin2231eap', 'Orange2231edg']
Whether or not this is clearer / more elegant than jobby's version is a matter of taste!

import re
list1 = ["Po2311tato","Pin2231eap","Orange2231edg","add22131dfes"]
list2 = ["2311","233412","2231"]
matchlist = []
for str1 in list1:
for str2 in list2:
if (re.search(str2, str1)):
matchlist.append(str1)
break
print matchlist

Related

Del part of list in python by for loop

I am trying to remove some words from string list of words.
list1= "abc dfc kmc jhh jkl".
My goal is to remove the words from 'dfc' to 'jhh'. I am new in Python, so I am trying some things with the index from c#, but they don't work here.
I am trying this:
index=0
for x in list1:
if x=='dfc'
currentindex=index
for y in list1[currentindex:]
if y!='jhh'
break;
del list1[currentindex]
currentindex=index
elif x=='jhh'
break;
Instead of a long for loop, a simple slice in Python does the trick:
words = ['abc', 'dfc', 'kmc', 'jhh', 'jkl']
del words[1:4]
print(words)
indexes start at 0. So you want to delete index 1-3. We enter 4 in the slice because Python stops -1 before the last index argument (so at index 3). Much easier than a loop.
Here is your output:
['abc', 'jkl']
>>> a = "abc dfc kmc jhh jkl"
>>> print(a.split("dfc")[0] + a.split("jhh")[1])
abc jkl
You can do this sample treatment with lambda:
b = lambda a,b,c : a.split(b)[0] + a.split(c)[1]
print(b(a, "dfc", "jhh"))
First, split the string into words:
list1 = "abc dfc kmc jhh jkl"
words = list1.split(" ")
Next, iterate through the words until you find a match:
start_match = "dfc"
start_index = 0
end_match = "jhh"
end_index = 0
for i in range(len(words)):
if words[i] == start_match:
start_index = i
if words[i] == end_match:
end_index = j
break
print ' '.join(words[:start_index]+words[end_index+1:])
Note: In the case of multiple matches, this will delete the least amount of words (choose the last start_match and first end_match).
list1= "abc dfc kmc jhh jkl".split() makes list1 as follows:
['abc', 'dfc', 'kmc', 'jhh', 'jkl']
Now if you want to remove a list element you can try either
list1.remove(item) #removes first occurrence of 'item' in list1
Or
list1.pop(index) #removes item at 'index' in list1
Create a list of words by splitting the string
list1= "abc dfc kmc jhh jkl".split()
Then iterate over the list, using a flag variable to indicate whether an element should be deleted from the list
flag = False
for x in list1:
if x=='dfc':
flag = True
if x == 'jhh':
list1.remove(x)
flag = False
if flag == True:
list1.remove(x)
There are several problems with what you have tried, especially:
list1 is a string, not a list
when you write list1[i], you get the character at index i (not a word)
in your for loop, you try to modify the string you iterate on: it is a very bad idea.
Here is my one-line style suggestion using re.sub(), which simply substitute a part of the string matching with the given regex pattern. It may be sufficient for your purpose:
import re
list1= "abc dfc kmc jhh jkl"
list1 = re.sub(r'dfc .* jhh ', "", list1)
print(list1)
Note: I kept the identifier list1 even if it is a string.
You can do like this
test = list1.replace("dfc", "")

Matching characters in strings

I have 2 lists of strings:
list1 = ['GERMANY','FRANCE','SPAIN','PORTUAL','UK']
list2 = ['ERMANY','FRANCE','SPAN','PORTUGAL','K']
I wanted to obtain a list where only the respective strings with 1 character less are shown. i.e:
final_list = ['ERMANY','SPAN','K']
What's the best way to do it? Using regular expressions?
Thanks
You can try this:
list1 = ['GERMANY','FRANCE','SPAIN','PORTUGAL','UK']
list2 = ['ERMANY','FRANCE','SPAN','PORTUGAL','K']
new = [a for a, b in zip(list2, list1) if len(a) < len(b)]

Reduce list based off of element substrings

I'm looking for the most efficient way to reduce a given list based off of substrings already in the list.
For example
mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']
would be reduced to:
mylist = ['abcd','qrs']
because both 'abcd' and 'qrs' are the smallest substring of other elements in that list. I was able to do this with about 30 lines of code, but I suspect there is a crafty one-liner out there..
this seems to be working (but not so efficient i suppose)
def reduce_prefixes(strings):
sorted_strings = sorted(strings)
return [element
for index, element in enumerate(sorted_strings)
if all(not previous.startswith(element) and
not element.startswith(previous)
for previous in sorted_strings[:index])]
tests:
>>>reduce_prefixes(['abcd', 'abcde', 'abcdef',
'qrs', 'qrst', 'qrstu'])
['abcd', 'qrs']
>>>reduce_prefixes(['abcd', 'abcde', 'abcdef',
'qrs', 'qrst', 'qrstu',
'gabcd', 'gab', 'ab'])
['ab', 'gab', 'qrs']
Probably not the most efficient, but at least short:
mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']
outlist = []
for l in mylist:
if any(o.startswith(l) for o in outlist):
# l is a prefix of some elements in outlist, so it replaces them
outlist = [ o for o in outlist if not o.startswith(l) ] + [ l ]
if not any(l.startswith(o) for o in outlist):
# l has no prefix in outlist yet, so it becomes a prefix candidate
outlist.append(l)
print(outlist)
One solution is to iterate over all the strings and split them based on if they had different characters, and recursively apply that function.
def reduce_substrings(strings):
return list(_reduce_substrings(map(iter, strings)))
def _reduce_substrings(strings):
# A dictionary of characters to a list of strings that begin with that character
nexts = {}
for string in strings:
try:
nexts.setdefault(next(string), []).append(string)
except StopIteration:
# Reached the end of this string. It is the only shortest substring.
yield ''
return
for next_char, next_strings in nexts.items():
for next_substrings in _reduce_substrings(next_strings):
yield next_char + next_substrings
This splits it into a dictionary based on the character, and tries to find the shortest substring out of those that it split into a different list in the dictionary.
Of course, because of the recursive nature of this function, a one-liner wouldn't be possible as efficiently.
Try this one:
import re
mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']
new_list=[]
for i in mylist:
if re.match("^abcd$",i):
new_list.append(i)
elif re.match("^qrs$",i):
new_list.append(i)
print(new_list)
#['abcd', 'qrs']

How can I extract words before some string?

I have several strings like this:
mylist = ['pearsapple','grapevinesapple','sinkandapple'...]
I want to parse the parts before apple and then append to a new list:
new = ['pears','grapevines','sinkand']
Is there a way other than finding starting points of 'apple' in each string and then appending before the starting point?
By using slicing in combination with the index method of strings.
>>> [x[:x.index('apple')] for x in mylist]
['pears', 'grapevines', 'sinkand']
You could also use a regular expression
>>> import re
>>> [re.match('(.*?)apple', x).group(1) for x in mylist]
['pears', 'grapevines', 'sinkand']
I don't see why though.
I hope the word apple will be fix (fixed length word), then we can use:
second_list = [item[:-5] for item in mylist]
If some elements in the list don't contain 'apple' at the end of the string, this regex leaves the string untouched:
>>> import re
>>> mylist = ['pearsapple','grapevinesapple','sinkandapple', 'test', 'grappled']
>>> [re.sub('apple$', '', word) for word in mylist]
['pears', 'grapevines', 'sinkand', 'test', 'grappled']
By also using string split and list comprehension
new = [x.split('apple')[0] for x in mylist]
['pears', 'grapevines', 'sinkand']
One way to do it would be to iterate through every string in the list and then use the split() string function.
for word in mylist:
word = word.split("apple")[0]

Use list comprehension to print out a list with words of length 4

I am trying to write a list comprehension that uses List1 to create a list of words of length 4.
List1 = ['jacob','batman','mozarella']
wordList = [words for i in range(1)]
print(wordList)
This prints out the wordList however with words of length higher than 4
I am looking for this program to print out instead:
['jaco','batm','moza']
which are the same words in List1 but with length 4
I tried this and it didn't work
wordList = [[len(4)] words for i in range(1)]
any thoughts ?
You could use this list comp
>>> List1 = ['jacob','batman','mozarella']
>>> [i[:4] for i in List1]
['jaco', 'batm', 'moza']
Ref:
i[:4] is a slice of the string of first 4 characters
Other ways to do it (All have their own disadvantages)
[re.sub(r'(?<=^.{4}).*', '', i) for i in List1]
[re.match(r'.{4}', i).group() for i in List1]
[''.join(i[j] for j in range(4)) for i in List1]
[i.replace(i[4:],'') for i in List1] ----- Fails in case of moinmoin or bongbong
Credit - Avinash Raj
len() function return the length of string in your case. So list compression with len function will give the list of all item lenght.
e.g.
>>> List1 = ['jacob','batman','mozarella']
>>> [len(i) for i in List1]
[5, 6, 9]
>>>
Use slice() list method to get substring from the string. more info
e.g.
>>> a = "abcdef"
>>> a[:4]
'abcd'
>>> [i[:4] for i in List1]
['jaco', 'batm', 'moza']
Python beginner
Define List1.
Define empty List2
Use for loop to iterate every item from the List1
Use list append() method to add item into list with slice() method.
Use print to see result.
sample code:
>>> List1 = ['jacob','batman','mozarella']
>>> List2 = []
>>> for i in List1:
... List2.append(i[:4])
...
>>> print List2
['jaco', 'batm', 'moza']
>>>
One more way, now using map function:
List1 = ['jacob','batman','mozarella']
List2 = map(lambda x: x[:4], List1)

Categories