Apply multiple regular expressions to a text at same time in python - python

Assume I have a text containing '.' and ',' and sometimes they both are followed by empty spaces. I need to write a regular expression which removes ['.' and space] or [',' and space] from the whole text. I have the regular expression as mentioned below:-
text = re.sub('[.]+[ ]+', " ", text)
text = re.sub('[,]+[ ]+', " ", text)
Here, I am applying multiple patterns to string multiple times. Is there an efficient way to do this in one pass? Also, the output is stored in the same variable. Is this an efficient way or we do have a copy created in this case. Kindly let me know.
Thanks.

You are already using character sets, put both . and , into one::
text = re.sub('[.,]+ +', " ", text)

since you need to replace only the . or , followed by a space. you could use
text = re.sub('[.,]\s', " ", text)

Related

how to specify the exact words, like Legend and not derivates like Legendary? [duplicate]

I need to find a way to figure out a way to find the exact word in a string.
All the information I have read online has only given me how to search for letters in a string, so
98787This is correct
will still come back as true in an if statement.
This is what I have so far.
if 'This is correct' in text:
print("correct")
This will work with any combination of letters before the This is correct... For example fkrjThis is correct, 4123This is correct and lolThis is correct will all come back as true in the if statement. When I want it to come back as true only if it exactly matches This is correct.
You can use the word-boundaries of regular expressions. Example:
import re
s = '98787This is correct'
for words in ['This is correct', 'This', 'is', 'correct']:
if re.search(r'\b' + words + r'\b', s):
print('{0} found'.format(words))
That yields:
is found
correct found
For an exact match, replace \b assertions with ^ and $ to restrict the match to the begin and end of line.
Use the comparison operator == instead of in then:
if text == 'This is correct':
print("Correct")
This will check to see if the whole string is just 'This is correct'. If it isn't, it will be False
Actually, you should look for 'This is correct' string surrounded by word boundaries.
So
import re
if re.search(r'\bThis is correct\b', text):
print('correct')
should work for you.
I suspect that you are looking for the startswith() function. This checks to see if the characters in a string match at the start of another string
"abcde".startswith("abc") -> true
"abcde".startswith("bcd") -> false
There is also the endswith() function, for checking at the other end.
You can make a few changes.
elif 'This is correct' in text[:len('This is correct')]:
or
elif ' This is correct ' in ' '+text+' ':
Both work. The latter is more flexible.
It could be a complicated problem if we want to solve it without using regular expression. But I came up with a little trick.
First we need to pad the original string with whitespaces.
After that we can search the text, which is also padded with whitespaces.
Example code here:
incorrect_str = "98787This is correct"
correct_str = "This is a great day and This is correct"
# Padding with whitespaces
new_incorrect_str = " " + incorrect_str + " "
new_correct_str = " " + correct_str + " "
if " This is correct " in new_correct_str:
print("correct")
else:
print("incorrect")
Break up the string into a list of strings with .split() then use the in operator.
This is much simpler than using regular expressions.
Below is a solution without using regular expressions. The program searches for exact word in this case 'CASINO' and prints the sentence.
words_list = [ "The Learn Python Challenge Casino.", "They bought a car while at
the casino", "Casinoville" ]
search_string = 'CASINO'
def text_manipulation(words_list, search_string):
search_result = []
for sentence in words_list:
words = sentence.replace('.', '').replace(',', '').split(' ')
[search_result.append(sentence) for w in words if w.upper() ==
search_string]
print(search_result)
text_manipulation(words_list, search_string)
This will print the results - ['The Learn Python Challenge Casino.', 'They bought a car while at the casino']

how to remove substring with and without space

I have to remove first instances of the word "hard" from the given string but I am not sure how to do remove it both with and without spaces:
For example:
string1 = "it is a hard rock" needs to become "it is a rock"
string2 = "play hard" needs to become "play"
However, when I use
string1 = string1.replace(hard+ ' ', '', 1)
it will not work on string2 as hard comes at the end without spaces. Any way to deal with this?
Lastly if we have string3
string3 = "play hard to be hard" becomes "play to be hard"
We want only the first occurrence to be replaced.
Maybe a simple
.replace(" hard", "").replace("hard ", "")
already works?
If not, I would suggest using a regular expression. But then you would have to give us a few more examples that need to be covered.
Seems like a job for some regular expression:
import re
' '.join(filter(bool, re.split(r' *\bhard\b *', 'it is a hard rock', maxsplit=1)))
* eats up spaces around the word, \b guarantees only full words match, filter(bool, ...) removes empty strings between consecutive separators (if any) and finally ' '.join reinstates a single space.
Use str.partition—
# using if block
" ".join(s.strip() for s in thestring.partition("hard") if s != "hard")
# or with slice notation
" ".join(s.strip() for a in thestring.partition("hard")[::2])

String substitution by regular expression while excluding quoted strings

I searched a bit but couldn't find any questions addressing my problem. Sorry if my question is repetitive. I'm trying to edit python code say to replace all -/+/= operators that don't have white space on either side.
string = 'new_str=str+"this is a quoted string-having some operators+=- within the code."'
I would use '([^\s])(=|+|-)([^\s])' to find such operators. The problem is, I want to exclude those findings within the quoted string. Is there any way to do this by regular expression substitution.
The output I'm trying to get is:
edited_string = 'new_str = str + "this is a quoted string-having some operators+=- within the code."'
This example is just to help to understand the issue. I'm looking for an answer working on general cases.
You can do it in two steps: first adding space to the chars doesn't have space before them and then chars don't have space after them:
string = 'new_str=str+"this is a quoted string-having some operators+=- within the code."'
new_string = re.sub("(?<!\s>)(\+|\=)[^\+=-]", r" \g<0>", string)
new_string = re.sub("(\+|\=)(?=[^\s|=|-])", r"\g<0> ", new_string)
print(new_string)
>>> new_str = str + "this is a quoted string-having some operators+=- within the code."

replace multiple words - python

There can be an input "some word".
I want to replace this input with "<strong>some</strong> <strong>word</strong>" in some other text which contains this input
I am trying with this code:
input = "some word".split()
pattern = re.compile('(%s)' % input, re.IGNORECASE)
result = pattern.sub(r'<strong>\1</strong>',text)
but it is failing and i know why: i am wondering how to pass all elements of list input to compile() so that (%s) can catch each of them.
appreciate any help
The right approach, since you're already splitting the list, is to surround each item of the list directly (never using a regex at all):
sterm = "some word".split()
result = " ".join("<strong>%s</strong>" % w for w in sterm)
In case you're wondering, the pattern you were looking for was:
pattern = re.compile('(%s)' % '|'.join(sterm), re.IGNORECASE)
This works on your string because the regular expression would become
(some|word)
which means "matches some or matches word".
However, this is not a good approach as it does not work for all strings. For example, consider cases where one word contains another, such as
a banana and an apple
which becomes:
<strong>a</strong> <strong>banana</strong> <strong>a</strong>nd <strong>a</strong>n <strong>a</strong>pple
It looks like you're wanting to search for multiple words - this word or that word. Which means you need to separate your searches by |, like the script below:
import re
text = "some word many other words"
input = '|'.join('some word'.split())
pattern = re.compile('(%s)' % input, flags=0)
print pattern.sub(r'<strong>\1</strong>',text)
I'm not completely sure if I know what you're asking but if you want to pass all the elements of input in as parameters in the compile function call, you can just use *input instead of input. * will split the list into its elements. As an alternative, could't you just try joining the list with and adding at the beginning and at the end?
Alternatively, you can use the join operator with a list comprehension to create the intended result.
text = "some word many other words".split()
result = ' '.join(['<strong>'+i+'</strong>' for i in text])

Replace the single quote (') character from a string

I need to strip the character "'" from a string in python. How do I do this?
I know there is a simple answer. Really what I am looking for is how to write ' in my code. for example \n = newline.
As for how to represent a single apostrophe as a string in Python, you can simply surround it with double quotes ("'") or you can escape it inside single quotes ('\'').
To remove apostrophes from a string, a simple approach is to just replace the apostrophe character with an empty string:
>>> "didn't".replace("'", "")
'didnt'
Here are a few ways of removing a single ' from a string in python.
str.replace
replace is usually used to return a string with all the instances of the substring replaced.
"A single ' char".replace("'","")
str.translate
In Python 2
To remove characters you can pass the first argument to the funstion with all the substrings to be removed as second.
"A single ' char".translate(None,"'")
In Python 3
You will have to use str.maketrans
"A single ' char".translate(str.maketrans({"'":None}))
re.sub
Regular Expressions using re are even more powerful (but slow) and can be used to replace characters that match a particular regex rather than a substring.
re.sub("'","","A single ' char")
Other Ways
There are a few other ways that can be used but are not at all recommended. (Just to learn new ways). Here we have the given string as a variable string.
Using list comprehension
''.join([c for c in string if c != "'"])
Using generator Expression
''.join(c for c in string if c != "'")
Another final method can be used also (Again not recommended - works only if there is only one occurrence )
Using list call along with remove and join.
x = list(string)
x.remove("'")
''.join(x)
Do you mean like this?
>>> mystring = "This isn't the right place to have \"'\" (single quotes)"
>>> mystring
'This isn\'t the right place to have "\'" (single quotes)'
>>> newstring = mystring.replace("'", "")
>>> newstring
'This isnt the right place to have "" (single quotes)'
You can escape the apostrophe with a \ character as well:
mystring.replace('\'', '')
I met that problem in codewars, so I created temporary solution
pred = "aren't"
pred = pred.replace("'", "99o")
pred = pred.title()
pred = pred.replace("99O", "'")
print(pred)
You can use another char combination, like 123456k and etc., but the last char should be letter

Categories