Replace word only if it stands alone [duplicate] - python

This question already has answers here:
How to match a whole word with a regular expression?
(4 answers)
Closed 3 years ago.
I have a function with which I want to anonymize texts by replacing the name of a person by 'visitor'.
To do so, I have written the following function:
def replaceName(text, name):
newText = text.replace(name, 'visitor')
return str(newText)
And I apply it using:
all_transcripts['msgText'] = all_transcripts.apply(lambda x: replaceName(x['msgText'], x['nameGuest']), axis=1)
However, this also replaces the name if it is a part of another word. Therefore, I want to only replace the instances where this word stands by itself. I have tried it as " "+name+" ", however, this does not work if the name is at the beginning or end of a sentence.
Furthermore, I have considered: Python regular expression match whole word. However, here they say how to find such words, but not how to replace it. I am having trouble to both find and replace it.
Who can help me with this?

import re
text = "Mark this isMark example Mark."
print (re.sub(r"\bMark\b", "visitor", text))
output:
visitor this isMark example visitor.

Related

Replacing a list of words with a certain word in python [duplicate]

This question already has answers here:
How to replace multiple substrings of a string?
(28 answers)
Closed 2 years ago.
For say if I have a paragraph and I wanna find and replace certain words in it with one certain word.
And I'm trying to do this using a for loop, after defining my word list.
Here's my code
script = """ In this sense, netting can represent , which gives Howie return on Zachary."""
ROE = ["In", "this"] #the word list I'm defining (the list of words I want it replaced)
for ROE in script:
script.replace(ROE, "ROE")
#desired output = ROE ROE sense, netting can represent , which gives Howie return on Zachary.
It doesn't really work, can someone help me fix it?
You have several problems:
You're not looping over the list of words to replace, you're looping over the characters in script.
You're not assigning the result of replace anywhere. It's not an in-place operation, since strings are immutable.
You're reassigning the ROE variable.
for word in ROE:
script = script.replace(word, 'ROE')
Note that replace() doesn't know anything about word boundaries. Your code will convert Inside to ROEside. If you want better, you can use regular expressions and wrap the words in \b boundaries. A regular expression would also allow you to perform all the replacements at once.
import re
regex = re.compile(r'\b(?:' + '|'.join(re.escape(word) for word in ROE) + r')\b')
script = regex.sub('ROE', script)
This creates a regular expression \b(?:In|this)\b, which matches either word.
The string str data type in Python is immutable. This means that if you want to change a string, you basically have to create a new string that has the changes and then you can assign the result to a variable.
Of course, you can assign the result to the same variable the original string was assigned to, which may have had the last reference to the old string, causing it to get cleaned up. But for a brief moment, there will always be a new copy of the string.
For example:
s = 'Hello'
s += ' world!'
print(s)
This seem to add ' world!' onto the existing s with 'Hello', but it really just creates a new string 'Hello world!' and assigns that to s, replacing the old one.
In your case, this explains why you can't just call .replace() on a string and expect it to change. Instead, that method returns the new string you want and you can assign it to a variable:
script = """ In this sense, netting can represent , which gives Howie return on Zachary."""
roe = ["In", "this"]
for word_to_replace in roe:
script = script.replace(word_to_replace, 'ROE')
(note that there were some other issues as well, but the above should work)
I found a solution that is relatively easy
stopwords=['In','this','to']
for i in stopwords:
n=a.replace(i,'ROE')
a=n
and I was helped by this link: Removing list of words from a string
script = """ In this sense, netting can represent , which gives Howie return on Zachary."""
ROE = ["In", "this"] #the word list I'm defining (the list of words I want it replaced)
for word in ROE:
script = script.replace(word, "ROE")
print(script)
Output:
ROE ROE sense, netting can represent , which gives Howie return on Zachary.
i.e. identical with your desired one.

remove certain charicters from a string python [duplicate]

This question already has answers here:
Remove specific characters from a string in Python
(26 answers)
Closed 2 years ago.
is there a function in python that does something like this:
input:
text = "s.om/e br%0oken tex!t".remove(".","/","%","0","!")
print(text)
output:
some broken text
The only thing that i know that can kinda to this is .replace("x", "") and that takes way too long to get rid of lots of different charicters. Thanks in advance.
Use regex module re to replace them all. The [] means any character in it :
text = re.sub("[./%0!]", "", "s.om/e br%0oken tex!t")
There is a module named re which is used in Regular expressions. You can use its sub function to replace or substitute characters from a string. Then you can try like this:
from re import sub
text = sub("[./%0!]","","The string")
print(text)
Regex details: Character class of . / % 0 ! if these are found in string replace them with a blank string and later print the text variable.
You might use str.maketrans combined with .translate; example:
t = str.maketrans("","","./%0!")
text = "s.om/e br%0oken tex!t"
cleantext = text.translate(t)
print(cleantext) # print(cleantext)
maketrans accept 3 arguments, every n-th character from first will be replaced with n-th character from second, all characters present in third will be jettisoned. In this case we only want to jettison so 1st and 2nd arguments are empty strs.
Alternatively you might use comprehension as follows:
text = "s.om/e br%0oken tex!t"
cleantext = ''.join(i for i in text if i not in "./%0!")
print(cleantext) # some broken text

Use a variable name in re.sub [duplicate]

This question already has answers here:
How to use a variable inside a regular expression?
(12 answers)
Closed 3 years ago.
I have a function with which I'm using regular expressions to replace words in sentences.
The function that I have looks as follows:
def replaceName(text, name):
newText = re.sub(r"\bname\b", "visitor", text)
return str(newText)
To illustrate:
text = "The sun is shining"
name = "sun"
print(re.sub((r"\bsun\b", "visitor", "The sun is shining"))
>>> "The visitor is shining"
However:
replaceName(text,name)
>>> "The sun is shining"
I think this doesn't work because I'm using the name of a string (name in this case) rather than the string itself. Who knows what I can do so this function works?
I have considered:
Using variable for re.sub,
however although the name is similar, its a different question.
Python use variable in re.sub, however this is just about date and time.
You can use string formatting here:
def replaceName(text, name):
newText = re.sub(r"\b{}\b".format(name), "visitor", text)
return str(newText)
Otherwise in your case re.sub is just looking for the exact match "\bname\b".
text = "The sun is shining"
name = "sun"
replaceName(text,name)
# 'The visitor is shining'
Or for python versions of 3.6< you can use f-strings as #wiktor has pointed out in the comments:
def replaceName(text, name):
newText = re.sub(rf"\b{name}\b", "visitor", text)
return str(newText)

Use two lists instead of .replace() [duplicate]

This question already has answers here:
Capitalize a string
(9 answers)
Closed 4 years ago.
If you've got a string containing for example "how are you?", I would do stringname.replace("how", "How") to have the first word written in Capital H.
So far so good.
The problem now is, I'm writing this script which at some point accesses the Open Weather Map, and the words are driving me crazy.
Until now I just did a
weather2 = self.weather.replace("partly", "Partly")
weather3 = weather2.replace("cloudy", "Cloudy")
weather4 = weather3.replace("foggy", "Foggy")
weather5 = weather4.replace("sunny", "Sunny")
weather6 = weather5.replace("rain", "Rain") #and so on
But I can't have 20 .replace().
So I was thinking, and here comes my question:
Could I create two lists, one containing the OWM originals, and the other list with the Words it shall be replaced with, and do something like
for name in names
do something
To capitalize first letter use mystring.title()
for name in names:
name.title()
https://www.geeksforgeeks.org/title-in-python/
If you want to capitalize the string you can use .capitalize()
self.weather= self.weather.capitalize()
or if you want to use a list/dictionary solution:
dictionary={'cloudy':'Cloudy','foggy':'Foggy','sunny':'Sunny','rain':'Rain'}
if self.weather in dictionary:
self.weather=dictionary[self.weather]
use .capitalize() function. .title() function also behaves similar. but it will capitalize all the first letter in a string if your string contains more than one word.
for name in names:
name.capitalize()

python regex - extract value from string [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I would like to know how can I get from a string and using reg expressions all values until the comma starting from the end. See below example, I would like to get the value "CA 0.810" into a variable:
prue ="VA=-0.850,0.800;CA=-0.863,0.800;SP=-0.860,0.810;MO=-0.860,0.810;SUN=MO -0.850,CA 0.810"
So far, I have the below code:
test = re.findall('([0-9]+)$',prue)
print test
However, I only get below output:
['810']
Could you please advise how can I get "CA 0.810" into the test variable?
You can do this using the split method. From the docs, it will:
Return a list of the words in the string, using sep as the delimiter string.
So if you can take your string:
prue = "VA=-0.850,0.800;CA=-0.863,0.800;SP=-0.860,0.810;MO=-0.860,0.810;SUN=MO -0.850,CA 0.810"
you can do :
prue.split(",")
which will return a list of the strings split by the commas:
['VA=-0.850', '0.800;CA=-0.863', '0.800;SP=-0.860', '0.810;MO=-0.860', '0.810;SUN=MO -0.850', 'CA 0.810']
So if you just want the last item ('CA 0.8101') into a variable named test, you can just take the last element from the list by indexing with -1:
test = prue.split(",")[-1]
test is now: 'CA 0.810'
Hope this helps!

Categories