How to check if any word of string match another string - python

I'm trying to find some function in Python which can help me with finding some word-matches of two different strings.
For example we have 2 strings:
"I am playing basketball everyday"
"basketball is the worst game ever"
And I want this function to return true if "basketball" was found in both strings.

You can find which are the common words in two phrases:
common_words = set(phrase1.split()).intersection(phrase2.split())
You can check if a word is in both phrases by simply checking if it is in the common_words set (example: if word in common_words: ...).
You can also check how many elements this set has. If len(common_words) == 0 then phrase1 and phrase2 contain no common words.

l = ["I am playing basketball everyday", "basketball is the worst game ever"]
for x in l:
print (x)
if "basketball" in x.lower():
print (True)

str1 = "I am playing basketball everyday"
str2 = "basketball is the worst game ever"
if "basketball" in str1 and "basketball" in str2:
print "basketball is in both strings!"
See: Python - Check If Word Is In A String

Related

Using 'not in' with an if statement inside a for loop in python

I have 2 lists. First list has a few sentences as strings. The second list has a few words as strings. I want to iterate through the list of words, and if none of the words are in a sentence from the first list, I want to add it to a counter. Following is the code I wrote:
sentences = [
"Wow, what a great day today!! #sunshine",
"I feel sad about the things going on around us. #covid19",
"This is a really nice song. #linkinpark",
"The python programming language is useful for data science",
"Why do bad things happen to me?",
"Apple announces the release of the new iPhone 12. Fans are excited.",
"Spent my day with family!! #happy",
]
words = ['great', 'excited', 'happy', 'nice', 'wonderful', 'amazing', 'good', 'best']
counter = 0
for sentence in sentences:
for word in words:
if word not in sentence:
counter += 1
print(counter)
Instead of printing 3, it prints 52.
I understand what it's doing, it's checking for each word, and if it's not in the sentence, it's counting that sentence multiple times for each word that's not in the sentence.
But I can't figure out how to make it do what I want it to do. Any help will be greatly appreciated!
All the details are above.
for sentence in sentences:
counter += 1
for word in words:
if word in sentence:
counter -= 1
break
This would do the trick. As long as no word is found in the sentence, the counter is increased. If a word is found, that counter is reversed.
You can use all() (or any()) builtin function to check if all words are not in the sentence:
out = 0
for sentence in sentences:
out += all(word not in sentence for word in words)
print(out)
Prints:
3
Or one-liner:
out = sum(all(word not in sentence for word in words) for sentence in sentences)
print(out)

how to remove instances and possible multiple instances of a certain word in a string and return a string (CODEWARS dubstep)

I have had a go at the CODEWARS dubstep challenge using python.
My code is below, it works and I pass the kata test. However, it took me a long time and I ended up using a brute force approach (newbie).
(basically replacing and striping the string until it worked)
Any ideas with comments on how my code could be improved please?
TASK SUMMARY:
Let's assume that a song consists of some number of words (that don't contain WUB). To make the dubstep remix of this song, Polycarpus inserts a certain number of words "WUB" before the first word of the song (the number may be zero), after the last word (the number may be zero), and between words (at least one between any pair of neighbouring words), and then the boy glues together all the words, including "WUB", in one string and plays the song at the club.
For example, a song with words "I AM X" can transform into a dubstep remix as "WUBWUBIWUBAMWUBWUBX" and cannot transform into "WUBWUBIAMWUBX".
song_decoder("WUBWEWUBAREWUBWUBTHEWUBCHAMPIONSWUBMYWUBFRIENDWUB")
# => WE ARE THE CHAMPIONS MY FRIEND
song_decoder("AWUBBWUBC"), "A B C","WUB should be replaced by 1 space"
song_decoder("AWUBWUBWUBBWUBWUBWUBC"), "A B C","multiples WUB should be replaced by only 1 space"
song_decoder("WUBAWUBBWUBCWUB"), "A B C","heading or trailing spaces should be removed"
Thanks in advance, (I am new to stackoverflow also)
MY CODE:
def song_decoder(song):
new_song = song.replace("WUB", " ")
new_song2 = new_song.strip()
new_song3 = new_song2.replace(" ", " ")
new_song4 = new_song3.replace(" ", " ")
return(new_song4)
I don't know if it can improve it but I would use split and join
text = 'WUBWEWUBAREWUBWUBTHEWUBCHAMPIONSWUBMYWUBFRIENDWUB'
text = text.replace("WUB", " ")
print(text)
words = text.split()
print(words)
text = " ".join(words)
print(text)
Result
WE ARE THE CHAMPIONS MY FRIEND
['WE', 'ARE', 'THE', 'CHAMPIONS', 'MY', 'FRIEND']
WE ARE THE CHAMPIONS MY FRIEND
EDIT:
Dittle different version. I split usinsg WUB but then it creates empty elements between two WUB and it needs to remove them
text = 'WUBWEWUBAREWUBWUBTHEWUBCHAMPIONSWUBMYWUBFRIENDWUB'
words = text.split("WUB")
print(words)
words = [x for x in words if x] # remove empty elements
#words = list(filter(None, words)) # remove empty elements
print(words)
text = " ".join(words)
print(text)

replace any words in string that match an entry in list with a single tag (python)

I have a list of sentences (~100k sentences total) and a list of "infrequent words" (length ~20k). I would like to run through each sentence and replace any word that matches an entry in "infrequent_words" with the tag "UNK".
(so as a small example, if
infrequent_words = ['dog','cat']
sentence = 'My dog likes to chase after cars'
Then after applying the transformation it should be
sentence = 'My unk likes for chase after cars'
I am having trouble finding an efficient way to do this. This function below (applied to each sentence) works, but it is very slow and I know there must be something better. Any suggestions?
def replace_infrequent_words(text,infrequent_words):
for word in infrequent_words:
text = text.replace(word,'unk')
return text
Thank you!
infrequent_words = {'dog','cat'}
sentence = 'My dog likes to chase after cars'
def replace_infrequent_words(text, infrequent_words):
words = text.split()
for i in range(len(words)):
if words[i] in infrequent_words:
words[i] = 'unk'
return ' '.join(words)
print(replace_infrequent_words(sentence, infrequent_words))
Two things that should improve performance:
Use a set instead of a list for storing infrequent_words.
Use a list to store each word in text so you don't have to scan the entire text string with each replacement.
This doesn't account for grammar and punctuation but this should be a performance improvement from what you posted.

Searching for key words in python

If I ask a question in Python and the answer is chicken, I want to output something related to chicken. And, if the answer is beef I want to output something related to beef, dependent on the answer provided.
How could I structure this? Should I have multiple lists with key words and related answers? Newbie.
I would use a dict of lists:
import random
similar_words = {
'chicken': ['poultry', 'wings'],
'beef': ['cow', 'ground-beef', 'roast-beef', 'steak'],
}
word = raw_input("Enter a word: ").strip()
if word in similar_words:
print random.choice(similar_words[word])
else:
print "Not found!"
See the Python manual on Data Structures for more information. Note that I'm also using random.choice() to select a random item from each list.
Here's the output of it running:
$ python words.py
Enter a word: chicken
poultry
$ python words.py
Enter a word: beef
cow
$
Edit: You were asking in the comments how you could do this if the words were contained inside a whole sentence. Here's one example:
words = raw_input("Enter a word or sentence: ").strip().split()
for word in words:
if word.lower() in similar_words:
print random.choice(similar_words[word.lower()])
else:
print "Not found!"
Here, we're using split() to split the sentence into a list of words. Then we loop through each word, and see if (the lowercase version of) the word exists in our dict, and do the same thing as we did above with a single word.

String splitting issue problem with multiword expressions

I have a series of strings like:
'i would like a blood orange'
I also have a list of strings like:
["blood orange", "loan shark"]
Operating on the string, I want the following list:
["i", "would", "like", "a", "blood orange"]
What is the best way to get the above list? I've been using re throughout my code, but I'm stumped with this issue.
This is a fairly straightforward generator implementation: split the string into words, group together words which form phrases, and yield the results.
(There may be a cleaner way to handle skip, but for some reason I'm drawing a blank.)
def split_with_phrases(sentence, phrase_list):
words = sentence.split(" ")
phrases = set(tuple(s.split(" ")) for s in phrase_list)
print phrases
max_phrase_length = max(len(p) for p in phrases)
# Find a phrase within words starting at the specified index. Return the
# phrase as a tuple, or None if no phrase starts at that index.
def find_phrase(start_idx):
# Iterate backwards, so we'll always find longer phrases before shorter ones.
# Otherwise, if we have a phrase set like "hello world" and "hello world two",
# we'll never match the longer phrase because we'll always match the shorter
# one first.
for phrase_length in xrange(max_phrase_length, 0, -1):
test_word = tuple(words[idx:idx+phrase_length])
if test_word in phrases:
return test_word
return None
skip = 0
for idx in xrange(len(words)):
if skip:
# This word was returned as part of a previous phrase; skip it.
skip -= 1
continue
phrase = find_phrase(idx)
if phrase is not None:
skip = len(phrase)
yield " ".join(phrase)
continue
yield words[idx]
print [s for s in split_with_phrases('i would like a blood orange',
["blood orange", "loan shark"])]
Ah, this is crazy, crude and ugly. But looks like it works. You may wanna clean and optimize it but certain ideas here might work.
list_to_split = ['i would like a blood orange', 'i would like a blood orange ttt blood orange']
input_list = ["blood orange", "loan shark"]
for item in input_list:
for str_lst in list_to_split:
if item in str_lst:
tmp = str_lst.split(item)
lst = []
for itm in tmp:
if itm!= '':
lst.append(itm)
lst.append(item)
print lst
output:
['i would like a ', 'blood orange']
['i would like a ', 'blood orange', ' ttt ', 'blood orange']
One quick and dirty, completely un-optimized approach might be to just replace the compounds in the string with a version including a different separator (preferably one that does not occur anywhere else in your target string or compound words). Then split and replace. A more efficient approach would be to iterate only once through the string, matching the compound words where appropriate - but you may have to watch out for instances where there are nested compounds, etc., depending on your array.
#!/usr/bin/python
import re
my_string = "i would like a blood orange"
compounds = ["blood orange", "loan shark"]
for i in range(0,len(compounds)):
my_string = my_string.replace(compounds[i],compounds[i].replace(" ","&"))
my_segs = re.split(r"\s+",my_string)
for i in range(0,len(my_segs)):
my_segs[i] = my_segs[i].replace("&"," ")
print my_segs
Edit: Glenn Maynard's solution is better.

Categories