On Python 3 I am trying to write a function find(string_list, search) that takes a list of strings string_list and a single string search as parameters and returns a list of all those strings in string_list that contain the given search string.
So print(find(['she', 'sells', 'sea', 'shells', 'on', 'the', 'sea-shore'], 'he'))
would print:
['she', 'shells', 'the']
Here's what I tried so far:
def find(string_list, search):
letters = set(search)
for word in string_list:
if letters & set(word):
return word
return (object in string_list) in search
running print(find(['she', 'sells', 'sea', 'shells', 'on', 'the', 'sea-shore'], 'he'))
What I expected = [she, shells, the]
what I got = [she]
You can do this with:
def find(string_list, search):
return [s for s in string_list if search in s]
The main problem with your code example is that you can only return from a function once, at which point the function stops executing. This is why your function returns only one value.
If you wish to return multiple values you must return a container object like a list or a set. Here's how your code might look if you use a list:
def find(string_list, search):
letters = set(search)
result = [] # create an empty list
for word in string_list:
if letters & set(word):
# append the word to the end of the list
result.append(word)
return result
The if test here is actually not doing quite what your problem statement called for. Since a set is an unordered collection, the & operation can test only if the two sets have any elements in common, not that they appear in the same order as the input. For example:
>>> letters = set("hello")
>>> word = set("olleh")
>>> word & letters
set(['h', 'e', 'l', 'o'])
As you can see, the operator is returning a set whose elements are those that are common between the two sets. Since a set is True if it contains any elements at all, this is actually testing whether all of the letters in the search string appear in a given item, not that they appear together in the given order.
A better approach is to test the strings directly using the in operator, which (when applied to strings) tests if one string is a substring of another, in sequence:
def find(string_list, search):
result = []
for word in string_list:
if search in word:
result.append(word)
return result
Since this pattern of iterating over every item in a list and doing a test on it is so common, Python provides a shorter way to write this called a list comprehension, which allows you to do this whole thing in one expression:
def find(string_list, search):
return [word for word in string_list if search in word]
This executes just as the prior example but is more concise.
Related
community.
I need to write a function that goes through a string and checks if each word exists in a list, if the word exists in the (Remove list) it should remove that word if not leave it alone.
i wrote this:
def remove_make(x):
a = x.split()
for word in a:
if word in remove: # True
a = a.remove(word)
else:
pass
return a
But it returns back the string with the (Remove) word still in there. Any idea how I can achieve this?
A more terse way of doing this would be to form a regex alternation based on the list of words to remove, and then do a single regex substitution:
inp = "one two three four"
remove = ['two', 'four']
regex = r'\s*(?:' + r'|'.join(remove) + ')\s*'
out = re.sub(regex, ' ', inp).strip()
print(out) # prints 'one three'
You can try something more simple:
import re
remove_list = ['abc', 'cde', 'edf']
string = 'abc is walking with cde, wishing good luck to edf.'
''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])
And the result would be:
' is walking with , wishing good luck to .'
The important part is the last line:
''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])
What it does:
You are converthing the string to list of words with re.split(r'(\W+)', string), preserving all the whitespaces and punctuation as list items.
You are creating another list with list comprehension, filtering all the items, which are not in remove_list
You are converting the result list back to string with str.join()
The BNF notation for list comprehensions and a little bit more information on them may be found here
PS: Of course, you may make this a little bit more readable if you break the one-liner into peaces and assign the result of re.split(r'(\W+)', string) to a variable and decouple the join and the comprehension.
You can create a new list without the words you want to remove and then use join() function to concatenate all the words in that list. Try
def remove_words(string, rmlist):
final_list = []
word_list = string.split()
for word in word_list:
if word not in rmlist:
final_list.append(word)
return ' '.join(final_list)
list.remove(x) returns None and modifies the list in-place by removing x it exists inside the list. When you do
a = a.remove(word)
you will be effectively storing None in a and this would give an exception in the next iteration when you again do a.remove(word) (None.remove(word) is invalid), but you don’t get that either since you immediately return after the conditional (which is wrong, you need to return after the loop has finished, outside its scope). This is how your function should look like (without modifying a list while iterating over it):
remove_words = ["abc", ...] # your list of words to be removed
def remove_make(x):
a = x.split()
temp = a[:]
for word in temp:
if word in remove_words: # True
a.remove(word)
# no need of 'else' also, 'return' outside the loop's scope
return " ".join(a)
I'm working with a string and a dictionary in Python, trying to loop through the string in order to create a list of the words which appear both in the string and amongst the keys of the dictionary. What I have currently is:
## dictionary will be called "dict" below
sentence = "is this is even really a sentence"
wordsinboth = []
for w in sentence.split():
if w in dict:
wordsinboth += w
Instead of returning a list of words split by whitespace, however, this code returns a list of every character in the sentence.
The same thing occurs even when I attempt to create a list of split words before looping, as seen below:
sentence = "is this is even really a sentence"
wordsinboth = []
sent = sentence.split()
for w in sent:
if w in dict:
wordsinboth += w
I guess I'm not able to specify "if w in dict" and still split by whitespace? Any suggestions on how to fix this?
Use append instead of +=:
sentence = "is this is even really a sentence"
wordsinboth = []
for w in sentence.split():
if w in dict:
wordsinboth.append(w)
The += operator doesn't work as you'd expect:
a = []
myString = "hello"
a.append(myString)
print(a) # ['hello']
b = []
b += myString
print(b) # ['h', 'e', 'l', 'l', 'o']
If you're interested on why this happens, the following questions are a good read:
Why does += behave unexpectedly on lists?
What is the difference between Python's list methods append and extend?
Also, note that using list comprehensions might result in a more elegant solution to your problem:
wordsinboth = [word for word in sentence.split() if word in dict]
You can use += on a list, but you must add a list to it, not a value, otherwise the value gets converted to a list before being added. In your case, the w strings are being converted to a list of all the characters in them (e.g. 'if' => ['i', 'f']). To work around that, make the value into a list by adding [] around it:
for w in sentence.split():
if w in dict:
wordsinboth += [w]
Use list comprehensions it's more shortest and elegant way for your case:
wordsinboth = [word for word in sentence.split() if w in dict]
Problem in your cycle that you have to use append for adding new item to wordsinboth instead of + operator, also please keep in mind that it can create duplicates, if you need uniq items you can wrap your result to set which gives you uniq words.
Like this:
wordsinboth = {word for word in sentence.split() if w in dict}
I'm trying to extract numbers that are mixed in sentences. I am doing this by splitting the sentence into elements of a list, and then I will iterate through each character of each element to find the numbers. For example:
String = "is2 Thi1s T4est 3a"
LP = String.split()
for e in LP:
for i in e:
if i in ('123456789'):
result += i
This can give me the result I want, which is ['2', '1', '4', '3']. Now I want to write this in list comprehension. After reading the List comprehension on a nested list?
post I understood that the right code shall be:
[i for e in LP for i in e if i in ('123456789') ]
My original code for the list comprehension approach was wrong, but I'm trying to wrap my heads around the result I get from it.
My original incorrect code, which reversed the order:
[i for i in e for e in LP if i in ('123456789') ]
The result I get from that is:
['3', '3', '3', '3']
Could anyone explain the process that leads to this result please?
Just reverse the same process you found in the other post. Nest the loops in the same order:
for i in e:
for e in LP:
if i in ('123456789'):
print(i)
The code requires both e and LP to be set beforehand, so the outcome you see depends entirely on other code run before your list comprehension.
If we presume that e was set to '3a' (the last element in LP from your code that ran full loopss), then for i in e will run twice, first with i set to '3'. We then get a nested loop, for e in LP, and given your output, LP is 4 elements long. So that iterates 4 times, and each iteration, i == '3' so the if test passes and '3' is added to the output. The next iteration of for i in e: sets i = 'a', the inner loop runs 4 times again, but not the if test fails.
However, we can't know for certain, because we don't know what code was run last in your environment that set e and LP to begin with.
I'm not sure why your original code uses str.split(), then iterates over all the characters of each word. Whitespace would never pass your if filter anyway, so you could just loop directly over the full String value. The if test can be replaced with a str.isdigit() test:
digits = [char for char in String if char.isdigit()]
or a even a regular expression:
digits = re.findall(r'\d', String)
and finally, if this is a reordering puzzle, you'd want to split out your strings into a number (for ordering) and the remainder (for joining); sort the words on the extracted number, and extract the remainder after sorting:
# to sort on numbers, extract the digits and turn to an integer
sortkey = lambda w: int(re.search(r'\d+', w).group())
# 'is2' -> 2, 'Th1s1' -> 1, etc.
# sort the words by sort key
reordered = sorted(String.split(), key=sortkey)
# -> ['Thi1s', 'is2', '3a', 'T4est']
# replace digits in the words and join again
rejoined = ' '.join(re.sub(r'\d+', '', w) for w in reordered)
# -> 'This is a Test'
From the question you asked in a comment ("how would you proceed to reorder the words using the list that we got as index?"):
We can use custom sorting to accomplish this. (Note that regex is not required, but makes it slightly simpler. Use any method to extract the number out of the string.)
import re
test_string = 'is2 Thi1s T4est 3a'
words = test_string.split()
words.sort(key=lambda s: int(re.search(r'\d+', s).group()))
print(words) # ['Thi1s', 'is2', '3a', 'T4est']
To remove the numbers:
words = [re.sub(r'\d', '', w) for w in words]
Final output is:
['This', 'is', 'a', 'Test']
Let's say I have the following list ['house', 'John', 'garden']and a string 'MynameisJohn'. Is there a way in Python to check if any of the words in the list are part of the string even when there are no white spaces? The goal would finally be to have a function which returns the words which are part of the string and maybe something that describes where in the string the words start. So something like this:
def function(list, string):
returns [(word, position in the string)]
I tried some things but essentially nothing works because I don't know how to deal with the missing white spaces... The only method I could think of is checking if any sequence in the string corresponds to one of the words, the problem is I don't know how to implement something like that and it doesn't seem to be very efficient.
I found a question here on StackOverflow which deals with kind of the same problem, but since I have a concrete list to compare the string to, I shouldn't run into the same problem, right?
An IDLE example:
>>> find = ['house', 'John', 'garden']
>>> s = 'MynameisJohn'
>>> results = [item for item in find if item in s]
>>> print( results )
[John]
Explanation:
[item for item in find if item in s] is a list comprehension.
For every item in the list named find, it checks if item in s. This will return True if item is any substring in s. If True, then that item will be in the results list.
For finding position of some string in other string, you can use str.index() method.
This function() accepts list and string and yield words that match and position of the word in the string:
def function(lst, s):
for i in lst:
if i not in s:
continue
yield i, s.index(i)
lst = ['house', 'John', 'garden']
s = 'MynameisJohn'
for word, position in function(lst, s):
print(word, position)
Output:
John 8
I have a text file and two lists of strings.
The first list is the keyword list
k = [hi, bob]
The second list is the words I want to replace the keywords with
r = [ok, bye]
I want to take the text file as input, where when k appears, it's replaced with r, thus, "hi, how are you bob" would be changed to "ok, how are you bye"
Let's say you have already parsed your sentence:
sentence = ['hi', 'how', 'are', 'you', 'bob']
What you want to do is to check whether each word in this sentence is present in k. If yes, replace it by the corresponding element in r; else, use the actual word. In other words:
if word in k:
word_index = k.index(word)
new_word = r[word_index]
This can be written in a more concise way:
new_word = r[k.index(word)] if word in k else word
Using list comprehensions, here's how you go about processing the whole sentence:
new_sentence = [r[k.index(word)] if word in k else word for word in sentence]
new_sentence is now equal to ['ok', 'how', 'are', 'you', 'bye'] (which is what you want).
Note that in the code above we perform two equivalent search operations: word in k and k.index(word). This is inefficient. These two operations can be reduced to one by catching exceptions from the index method:
def get_new_word(word, k, r):
try:
word_index = k.find(word)
return r[word_index]
except ValueError:
return word
new_sentence = [get_new_word(word, k, r) for word in sentence]
Now, you should also note that searching for word in sentence is a search with O(n) complexity (where n is the number of keywords). Thus the complexity of this algorithm is O(n.m) (where is the sentence length). You can reduce this complexity to O(m) by using a more appropriate data structure, as suggested by the other comments. This is left as an exercise :-p
I'll assume you've got the "reading string from file" part covered, so about that "replacing multiple strings" part: First, as suggested by Martijn, you can create a dictionary, mapping keys to replacements, using dict and zip.
>>> k = ["hi", "bob"]
>>> r = ["ok", "bye"]
>>> d = dict(zip(k, r))
Now, one way to replace all those keys at once would be to use a regular expression, being a disjunction of all those keys, i.e. "hi|bob" in your example, and using re.sub with a replacement function, looking up the respective key in that dictionary.
>>> import re
>>> re.sub('|'.join(k), lambda m: d[m.group()], "hi, how are you bob")
'ok, how are you bye'
Alternatively, you can just use a loop to replace each key-replacement pair one after the other:
s = "hi, how are you bob"
for (x, y) in zip(k, r):
s = s.replace(x, y)