Erasing list of phrases from list of texts in python - python

I am trying to erase specific words found in a list. Lets say that I have the following example:
a= ['you are here','you are there','where are you','what is that']
b = ['you','what is']
The desired output should be the following:
['are here', 'are there', 'where are', 'that']
I created the following code for that task:
import re
def _find_word_and_remove(w,strings):
"""
w:(string)
strings:(string)
"""
temp= re.sub(r'\b({0})\b'.format(w),'',strings).strip()# removes word from string
return re.sub("\s{1,}", " ", temp)# removes double spaces
def find_words_and_remove(words,strings):
"""
words:(list)
strings:(list)
"""
if len(words)==1:
return [_find_word_and_remove(words[0],word_a) for word_a in strings]
else:
temp =[_find_word_and_remove(words[0],word_a) for word_a in strings]
return find_words_and_remove(words[1:],temp)
find_words_and_remove(b,a)
>>> ['are here', 'are there', 'where are', 'that']
It seems that I am over-complicating the 'things' by using recursion for this task. Is there a more simple and readable way to do this task?

You can use list comprehension:
def find_words_and_remove(words, strings):
return [" ".join(word for word in string.split() if word not in words) for string in strings]
That will work only when there are single words in b, but because of your edit and comment, I now know that you really do need _find_word_and_remove(). Your recursion way isn't really too bad, but if you don't want recursion, do this:
def find_words_and_remove(words, strings):
strings_copy = strings[:]
for i, word in enumerate(words):
for string in strings:
strings_copy[i] = _find_word_and_remove(word, string)
return strings_copy

the simple way is to use regex:
import re
a= ['you are here','you are there','where are you','what is that']
b = ['you','what is']
here you go:
def find_words_and_remove(b,a):
return [ re.sub("|".join(b), "", x).strip() if len(re.sub("|".join(b), "", x).strip().split(" ")) < len(x.split(' ')) else x for x in a ]
find_words_and_remove(b,a)
>> ['are here', 'are there', 'where are', 'that']

Related

split string into sentences everytime there is punctuation, with punctuation?

I would like to split a string into separate sentences in a list.
example:
string = "Hey! How are you today? I am fine."
output should be:
["Hey!", "How are you today?", "I am fine."]
You can use a built-in regular expression library.
import re
string = "Hey! How are you today? I am fine."
output = re.findall(".*?[.!\?]", string)
output>> ['Hey!', ' How are you today?', ' I am fine.']
Update:
You may use split() method but it'll not return the character used for splitting.
import re
string = "Hey! How are you today? I am fine."
output = re.split("!|?", string)
output>> ['Hey', ' How are you today', ' I am fine.']
If this works for you, you can use replace() and split().
string = "Hey! How are you today? I am fine."
output = string.replace("!", "?").split("?")
you can try
>>> a='Beautiful, is; better*than\nugly'
>>> import re
>>> re.split('; |, |\*|\n',a)
['Beautiful', 'is', 'better', 'than', 'ugly']
I find it in here
You can use the methode split()
import re
string = "Hey! How are you today? I am fine."
yourlist = re.split("!|?",string)
You don't need regex for this. Just create your own generator:
def split_punc(text):
punctuation = '!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~'
# Alternatively, can use:
# from string import punctuation
j = 0
for i, x in enumerate(text):
if x in punctuation:
yield text[j:i+1]
j = i + 1
return text[j:i+1]
Usage:
list(split_punc(string))
# ['Hey!', ' How are you today?', ' I am fine.']

split string by using regex in python

What is the best way to split a string like
text = "hello there how are you"
in Python?
So I'd end up with an array like such:
['hello there', 'there how', 'how are', 'are you']
I have tried this:
liste = re.findall('((\S+\W*){'+str(2)+'})', text)
for a in liste:
print(a[0])
But I'm getting:
hello there
how are
you
How can I make the findall function move only one token when searching?
Here's a solution with re.findall:
>>> import re
>>> text = "hello there how are you"
>>> re.findall(r"(?=(?:(?:^|\W)(\S+\W\S+)(?:$|\W)))", text)
['hello there', 'there how', 'how are', 'are you']
Have a look at the Python docs for re: https://docs.python.org/3/library/re.html
(?=...) Lookahead assertion
(?:...) Non-capturing regular parentheses
If regex isn't require you could do something like:
l = text.split(' ')
out = []
for i in range(len(l)):
try:
o.append(l[i] + ' ' + l[i+1])
except IndexError:
continue
Explanation:
First split the string on the space character. The result will be a list where each element is a word in the sentence. Instantiate an empty list to hold the result. Loop over the list of words adding the two word combinations seperated by a space to the output list. This will throw an IndexError when accessing the last word in the list, just catch it and continue since you don't seem to want that lone word in your result anyway.
I don't think you actually need regex for this.
I understand you want a list, in which each element contains two words, the latter also being the former of the following element. We can do this easily like this:
string = "Hello there how are you"
liste = string.split(" ").pop(-1)
# we remove the last index, as otherwise we'll crash, or have an element with only one word
for i in range(len(liste)-1):
liste[i] = liste[i] + " " + liste[i+1]
I don't know if it's mandatory for you need to use regex, but I'd do this way.
First, you can get the list of words with the str.split() method.
>>> sentence = "hello there how are you"
>>> splited_sentence = sentence.split(" ")
>>> splited_sentence
['hello', 'there', 'how', 'are', 'you']
Then, you can make pairs.
>>> output = []
>>> for i in range (1, len(splited_sentence) ):
... output += [ splited[ i-1 ] + ' ' + splited_sentence[ i ] ]
...
output
['hello there', 'there how', 'how are', 'are you']
An alternative is just to split, zip, then join like so...
sentence = "Hello there how are you"
words = sentence.split()
[' '.join(i) for i in zip(words, words[1:])]
Another possible solution using findall.
>>> liste = list(map(''.join, re.findall(r'(\S+(?=(\s+\S+)))', text)))
>>> liste
['hello there', 'there how', 'how are', 'are you']

Write a for loop to remove punctuation

I've been tasked with writing a for loop to remove some punctuation in a list of strings, storing the answers in a new list. I know how to do this with one string, but not in a loop.
For example: phrases = ['hi there!', 'thanks!'] etc.
import string
new_phrases = []
for i in phrases:
if i not in string.punctuation
Then I get a bit stuck at this point. Do I append? I've tried yield and return, but realised that's for functions.
You can either update your current list or append the new value in another list. the update will be better because it takes constant space while append takes O(n) space.
phrases = ['hi there!', 'thanks!']
i = 0
for el in phrases:
new_el = el.replace("!", "")
phrases[i] = new_el
i += 1
print (phrases)
will give output: ['hi there', 'thanks']
Give this a go:
import re
new_phrases = []
for word in phrases:
new_phrases.append(re.sub(r'[^\w\s]','', word))
This uses the regex library to turn all punctuation into a 'blank' string. Essentially, removing it
You can use re module and list comprehension to do it in single line:
phrases = ['hi there!', 'thanks!']
import string
import re
new_phrases = [re.sub('[{}]'.format(string.punctuation), '', i) for i in phrases]
new_phrases
#['hi there', 'thanks']
If phrases contains any punctuation then replace it with "" and append to the new_phrases
import string
new_phrases = []
phrases = ['hi there!', 'thanks!']
for i in phrases:
for pun in string.punctuation:
if pun in i:
i = i.replace(pun,"")
new_phrases.append(i)
print(new_phrases)
OUTPUT
['hi there', 'thanks']
Following your forma mentis, I'll do like this:
for word in phrases: #for each word
for punct in string.punctuation: #for each punctuation
w=w.replace(punct,'') # replace the punctuation character with nothing (remove punctuation)
new_phrases.append(w) #add new "punctuationless text" to your output
I suggest you using the powerful translate() method on each string of your input list, which seems really appropriate. It gives the following code, iterating over the input list throug a list comprehension, which is short and easily readable:
import string
phrases = ['hi there!', 'thanks!']
translationRule = str.maketrans({k:"" for k in string.punctuation})
new_phrases = [phrase.translate(translationRule) for phrase in phrases]
print(new_phrases)
# ['hi there', 'thanks']
Or to only allow spaces and letters:
phrases=[''.join(x for x in i if x.isalpha() or x==' ') for i in phrases]
Now:
print(phrases)
Is:
['hi there', 'thanks']
you should use list comprehension
new_list = [process(string) for string in phrases]

Python how to strip a string from a string based on items in a list

I have a list as shown below:
exclude = ["please", "hi", "team"]
I have a string as follows:
text = "Hi team, please help me out."
I want my string to look as:
text = ", help me out."
effectively stripping out any word that might appear in the list exclude
I tried the below:
if any(e in text.lower()) for e in exclude:
print text.lower().strip(e)
But the above if statement returns a boolean value and hence I get the below error:
NameError: name 'e' is not defined
How do I get this done?
Something like this?
>>> from string import punctuation
>>> ' '.join(x for x in (word.strip(punctuation) for word in text.split())
if x.lower() not in exclude)
'help me out
If you want to keep the trailing/leading punctuation with the words that are not present in exclude:
>>> ' '.join(word for word in text.split()
if word.strip(punctuation).lower() not in exclude)
'help me out.'
First one is equivalent to:
>>> out = []
>>> for word in text.split():
word = word.strip(punctuation)
if word.lower() not in exclude:
out.append(word)
>>> ' '.join(out)
'help me out'
You can use Use this (remember it is case sensitive)
for word in exclude:
text = text.replace(word, "")
This is going to replace with spaces everything that is not alphanumeric or belong to the stopwords list, and then split the result into the words you want to keep. Finally, the list is joined into a string where words are spaced. Note: case sensitive.
' '.join ( re.sub('\W|'+'|'.join(stopwords),' ',sentence).split() )
Example usage:
>>> import re
>>> stopwords=['please','hi','team']
>>> sentence='hi team, please help me out.'
>>> ' '.join ( re.sub('\W|'+'|'.join(stopwords),' ',sentence).split() )
'help me out'
Using simple methods:
import re
exclude = ["please", "hi", "team"]
text = "Hi team, please help me out."
l=[]
te = re.findall("[\w]*",text)
for a in te:
b=''.join(a)
if (b.upper() not in (name.upper() for name in exclude)and a):
l.append(b)
print " ".join(l)
Hope it helps
if you are not worried about punctuation:
>>> import re
>>> text = "Hi team, please help me out."
>>> text = re.findall("\w+",text)
>>> text
['Hi', 'team', 'please', 'help', 'me', 'out']
>>> " ".join(x for x in text if x.lower() not in exclude)
'help me out'
In the above code, re.findall will find all words and put them in a list.
\w matches A-Za-z0-9
+ means one or more occurrence

Swapping characters within a list of strings in Python

Given a list of strings such as: ['Math is cool', 'eggs and bacon']
How would one swap words from one list item to the other to turn them into something like
['bacon is cool', 'eggs and Math']
I would post code if I had any but I really have no idea where to start with this. Thanks.
I'm using Python 3
Start by creating your lists.
text1 = 'Math is cool'
text2 = 'eggs and bacon'
mylist = []
mylist.append(text1.split())
mylist.append(text2.split()
print mylist
Output:
[['Math', 'is', 'cool'], ['eggs', 'and', 'bacon']]
Now that you have the lists, play with them. Use append() to add texts that the user enters, etc.
I think that you can see where to go from here.
You're not providing too much information as to your purpose in general (to say the least)... so the answer below refers only to the specific example given in your question, in order to help you get started:
list = ['Math is cool', 'eggs and bacon']
list0 = list[0].split(' ')
list1 = list[1].split(' ')
newList = [list1[-1]+' '+' '.join(list0[1:]), ' '.join(list1[:-1])+' '+list0[0]]
import random
def swap_words(s1, s2):
s1 = s1.split() # split string into words
s2 = s2.split()
w1 = random.randrange(len(s1)) # decide which words to swap
w2 = random.randrange(len(s2))
s1[w1], s2[w2] = s2[w2], s1[w1] # do swap
return " ".join(s1), " ".join(s2)
then swap_words('Math is cool', 'eggs and bacon') returns sentences like
('Math and cool', 'eggs is bacon')
('bacon is cool', 'eggs and Math')

Categories