Find substring in string without white spaces - python

Let's say I have the following list ['house', 'John', 'garden']and a string 'MynameisJohn'. Is there a way in Python to check if any of the words in the list are part of the string even when there are no white spaces? The goal would finally be to have a function which returns the words which are part of the string and maybe something that describes where in the string the words start. So something like this:
def function(list, string):
returns [(word, position in the string)]
I tried some things but essentially nothing works because I don't know how to deal with the missing white spaces... The only method I could think of is checking if any sequence in the string corresponds to one of the words, the problem is I don't know how to implement something like that and it doesn't seem to be very efficient.
I found a question here on StackOverflow which deals with kind of the same problem, but since I have a concrete list to compare the string to, I shouldn't run into the same problem, right?

An IDLE example:
>>> find = ['house', 'John', 'garden']
>>> s = 'MynameisJohn'
>>> results = [item for item in find if item in s]
>>> print( results )
[John]
Explanation:
[item for item in find if item in s] is a list comprehension.
For every item in the list named find, it checks if item in s. This will return True if item is any substring in s. If True, then that item will be in the results list.

For finding position of some string in other string, you can use str.index() method.
This function() accepts list and string and yield words that match and position of the word in the string:
def function(lst, s):
for i in lst:
if i not in s:
continue
yield i, s.index(i)
lst = ['house', 'John', 'garden']
s = 'MynameisJohn'
for word, position in function(lst, s):
print(word, position)
Output:
John 8

Related

How to replace a character within a string in a list?

I have a list that has some elements of type string. Each item in the list has characters that are unwanted and want to be removed. For example, I have the list = ["string1.", "string2."]. The unwanted character is: ".". Therefore, I don't want that character in any element of the list. My desired list should look like list = ["string1", "string2"] Any help? I have to remove some special characters; therefore, the code must be used several times.
hola = ["holamundoh","holah","holish"]
print(hola[0])
print(hola[0][0])
for i in range(0,len(hola),1):
for j in range(0,len(hola[i]),1):
if (hola[i][j] == "h"):
hola[i] = hola[i].translate({ord('h'): None})
print(hola)
However, I have an error in the conditional if: "string index out of range". Any help? thanks
Modifying strings is not efficient in python because strings are immutable. And when you modify them, the indices may become out of range at the end of the day.
list_ = ["string1.", "string2."]
for i, s in enumerate(list_):
l[i] = s.replace('.', '')
Or, without a loop:
list_ = ["string1.", "string2."]
list_ = list(map(lambda s: s.replace('.', ''), list_))
You can define the function for removing an unwanted character.
def remove_unwanted(original, unwanted):
return [x.replace(unwanted, "") for x in original]
Then you can call this function like the following to get the result.
print(remove_unwanted(hola, "."))
Use str.replace for simple replacements:
lst = [s.replace('.', '') for s in lst]
Or use re.sub for more powerful and more complex regular expression-based replacements:
import re
lst = [re.sub(r'[.]', '', s) for s in lst]
Here are a few examples of more complex replacements that you may find useful, e.g., replace everything that is not a word character:
import re
lst = [re.sub(r'[\W]+', '', s) for s in lst]

Remove all words from a string that exist in a list

community.
I need to write a function that goes through a string and checks if each word exists in a list, if the word exists in the (Remove list) it should remove that word if not leave it alone.
i wrote this:
def remove_make(x):
a = x.split()
for word in a:
if word in remove: # True
a = a.remove(word)
else:
pass
return a
But it returns back the string with the (Remove) word still in there. Any idea how I can achieve this?
A more terse way of doing this would be to form a regex alternation based on the list of words to remove, and then do a single regex substitution:
inp = "one two three four"
remove = ['two', 'four']
regex = r'\s*(?:' + r'|'.join(remove) + ')\s*'
out = re.sub(regex, ' ', inp).strip()
print(out) # prints 'one three'
You can try something more simple:
import re
remove_list = ['abc', 'cde', 'edf']
string = 'abc is walking with cde, wishing good luck to edf.'
''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])
And the result would be:
' is walking with , wishing good luck to .'
The important part is the last line:
''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])
What it does:
You are converthing the string to list of words with re.split(r'(\W+)', string), preserving all the whitespaces and punctuation as list items.
You are creating another list with list comprehension, filtering all the items, which are not in remove_list
You are converting the result list back to string with str.join()
The BNF notation for list comprehensions and a little bit more information on them may be found here
PS: Of course, you may make this a little bit more readable if you break the one-liner into peaces and assign the result of re.split(r'(\W+)', string) to a variable and decouple the join and the comprehension.
You can create a new list without the words you want to remove and then use join() function to concatenate all the words in that list. Try
def remove_words(string, rmlist):
final_list = []
word_list = string.split()
for word in word_list:
if word not in rmlist:
final_list.append(word)
return ' '.join(final_list)
list.remove(x) returns None and modifies the list in-place by removing x it exists inside the list. When you do
a = a.remove(word)
you will be effectively storing None in a and this would give an exception in the next iteration when you again do a.remove(word) (None.remove(word) is invalid), but you don’t get that either since you immediately return after the conditional (which is wrong, you need to return after the loop has finished, outside its scope). This is how your function should look like (without modifying a list while iterating over it):
remove_words = ["abc", ...] # your list of words to be removed
def remove_make(x):
a = x.split()
temp = a[:]
for word in temp:
if word in remove_words: # True
a.remove(word)
# no need of 'else' also, 'return' outside the loop's scope
return " ".join(a)

Check if the characters in a string are contained in any of the word of a list

I need to find a way to check if given characters are contained in any of the words of a very long list.
I suppose you could do it by checking every indexes of the words in the list, a bit like so:
for i in list:
if i[0] == 'a' or 'b':
found_words.append(i)
if i[1] == 'a' or 'b':
found_words.append(i)
But this is not a very stylish and not a very efficient way of doing it.
Thanks for your help
A more understandable way of doing this is the following:
character='e'
for i in list:
if character in i:
found_words.append(i)
If you want to match characters in lists, you can use regular expressions.
import re
for i in lst:
re.match(str,i) #returns "true", use in conditionals
Replace "str" with the characters you want to check for, e.g "[abcde]", which matches "a","b","c","d", or "e" in any word, or "[abcde][pqrst]" which matches any combination of "ap", "at", "eq", etc. Do so with a variable so you can change it far more easily.
You could do the following:
check = set('ab').intersection # the letters to check against
lst = [...] # the words, do not shadow the built-in 'list'
found_words = [w for w in lst if check(w)]
or shorter:
found_words = list(filter(check, lst))

Changing lists with a given string

On Python 3 I am trying to write a function find(string_list, search) that takes a list of strings string_list and a single string search as parameters and returns a list of all those strings in string_list that contain the given search string.
So print(find(['she', 'sells', 'sea', 'shells', 'on', 'the', 'sea-shore'], 'he'))
would print:
['she', 'shells', 'the']
Here's what I tried so far:
def find(string_list, search):
letters = set(search)
for word in string_list:
if letters & set(word):
return word
return (object in string_list) in search
running print(find(['she', 'sells', 'sea', 'shells', 'on', 'the', 'sea-shore'], 'he'))
What I expected = [she, shells, the]
what I got = [she]
You can do this with:
def find(string_list, search):
return [s for s in string_list if search in s]
The main problem with your code example is that you can only return from a function once, at which point the function stops executing. This is why your function returns only one value.
If you wish to return multiple values you must return a container object like a list or a set. Here's how your code might look if you use a list:
def find(string_list, search):
letters = set(search)
result = [] # create an empty list
for word in string_list:
if letters & set(word):
# append the word to the end of the list
result.append(word)
return result
The if test here is actually not doing quite what your problem statement called for. Since a set is an unordered collection, the & operation can test only if the two sets have any elements in common, not that they appear in the same order as the input. For example:
>>> letters = set("hello")
>>> word = set("olleh")
>>> word & letters
set(['h', 'e', 'l', 'o'])
As you can see, the operator is returning a set whose elements are those that are common between the two sets. Since a set is True if it contains any elements at all, this is actually testing whether all of the letters in the search string appear in a given item, not that they appear together in the given order.
A better approach is to test the strings directly using the in operator, which (when applied to strings) tests if one string is a substring of another, in sequence:
def find(string_list, search):
result = []
for word in string_list:
if search in word:
result.append(word)
return result
Since this pattern of iterating over every item in a list and doing a test on it is so common, Python provides a shorter way to write this called a list comprehension, which allows you to do this whole thing in one expression:
def find(string_list, search):
return [word for word in string_list if search in word]
This executes just as the prior example but is more concise.

reversal of a list of strings using a recursive function

I am trying to reverse a list using a recursive function. Unfortunatley I am fairly new to recursion. Is this possible? That is my code thus far
def stringRev (word):
worLen = len(word)
if worLen == 1:
return word
return (word[-1]) + stringRev(word[:-1])
listWord = ["hey", "there", "jim"]
print(stringRev(listWord))
Your problem is that (word[-1]) is a string, not a list. So you are trying to add/concatenate a string and a list. I changed that expression to [word[-1]] to create a list.
>>> def stringRev (word):
... worLen = len(word)
... if worLen == 1:
... return word
... return [word[-1]] + stringRev(word[:-1])
...
>>> listWord = ["hey", "there", "jim"]
>>> print(stringRev(listWord))
['jim', 'there', 'hey']
>>>
PS. It would be helpful if you included the error you received when running your code: TypeError: Can't convert 'list' object to str implicitly
To reverse the order of the elements of the list, change:
return (word[-1]) + stringRev(word[:-1])
to
return [word[-1]] + stringRev(word[:-1])
(note the square brackets).
The problem is that you are trying to concatenate a string (word[-1]) with a list (word[:-1]).
The problem is that your function is expecting a single word, yet you're calling it with a list of words.
If you call it as follows, you'll see that it works just fine:
for word in ["hey", "there", "jim"]:
print(stringRev(word))
Or, if you wish to store the reversed strings in a list:
l = [stringRev(w) for w in ["hey", "there", "jim"]]
The one corner case where your function would fail is the empty string. I don't know whether that's a valid input, so it could be a non-issue (but trivial to fix nonetheless).
If you want it done in Python:
reversed(listWord)
assuming word is a list or a tuple
http://docs.python.org/2/library/functions.html#reversed
And to get a list:
list(reversed(listWord))
should work
But if you want an algorithm, I guess reversed is not your friend !

Categories