I have a string such as "Hey people #Greetings how are we? #Awesome" and every time there is a hashtag I need to replace the word with another string.
I have the following code which works when only one hashtag but the problem is that because it uses the sub to replace all instances, it overwrites the every string with the last string.
match = re.findall(tagRE, content)
print(match)
for matches in match:
print(matches)
newCode = "The result is: " + matches + " is it correct?"
match = re.sub(tagRE, newCode, content)
What should I be doing instead to replace just the current match? Is there a way of using re.finditer to replace the current match or another way?
Peter's method would work. You could also just supply the match object as the regex string so that it only replaces that specific match. Like so:
newCode = "whatever" + matches + "whatever"
content = re.sub(matches, newCode, content)
I ran some sample code and this was the output.
import re
content = "This is a #wonderful experiment. It's #awesome!"
matches = re.findall('#\w+', content)
print(matches)
for match in matches:
newCode = match[1:]
print(content)
content = re.sub(match, newCode, content)
print(content)
#['#wonderful', '#awesome']
#This is a #wonderful experiment. It's #awesome!
#This is a wonderful experiment. It's #awesome!
#This is a wonderful experiment. It's #awesome!
#This is a wonderful experiment. It's awesome!
You can try like this:
In [1]: import re
In [2]: s = "Hey people #Greetings how are we? #Awesome"
In [3]: re.sub(r'(?:^|\s)(\#\w+)', ' replace_with_new_string', s)
Out[3]: 'Hey people replace_with_new_string how are we? replace_with_new_string'
Related
Im trying to remove multiple white-spaces in a string. I've read about regular expressions in python langauge and i've tried to make it match all white-sapces in the string, but no success. The return msg part returns empty:
CODE
import re
def correct(string):
msg = ""
fmatch = re.match(r'\s', string, re.I|re.L)
if fmatch:
msg = fmatch.group
return msg
print correct("This is very funny and cool.Indeed!")
To accomplish this task, you can instead replace consecutive whitespaces with a single space character, for example, using re.sub.
Example:
import re
def correct(string):
fmatch = re.sub(r'\s+', ' ', string)
return fmatch
print correct("This is very funny and cool.Indeed!")
The output will be:
This is very funny and cool.Indeed!
re.match matches only at the beginning of the string. You need to use re.search instead.
Maybe this code helps you?
import re
def correct(string):
return " ".join(re.split(' *', string))
One line no direct import
ss= "This is very funny and cool.Indeed!"
ss.replace(" ", " ")
#ss.replace(" ", " "*2)
#'This is very funny and cool.Indeed!'
Or, as the question states:
ss= "This is very funny and cool.Indeed!"
ss.replace(" ", "")
#'Thisisveryfunnyandcool.Indeed!'
I am writing a script that introduces misspellings into sentence. I am using python re module to replace the original word with the misspelling. The script looks like this:
# replacing original word by error
pattern = re.compile(r'%s' % original_word)
replace_by = r'\1' + err
modified_sentence = re.sub(pattern, replace_by, sentence, count=1)
But the problem is this will replace even if original_word was part of another word for example:
If i had
original_word = 'in'
err = 'il'
sentence = 'eating food in'
it would replace the occurrence of 'in' in eating like:
> 'eatilg food in'
I was checking in the re documentation but it doesn't give any example on how to include regex options, for example:
If my pattern is:
regex_pattern = '\b%s\b' % original_word
this would solve the problem as \b represents 'word boundary'. But it doesn't seem to work.
I tried to find to find a work around it by doing:
pattern = re.compile(r'([^\w])%s' % original_word)
but that does not work. For example :
original_word = 'to'
err = 'vo'
sentence = 'I will go tomorrow to the'
it replaces it to:
> I will go vomorrow to the
Thank you, any help appreciated
See here for an example of word boundaries in python re module. It looks like you were close just need to put it all together. The following script gives you the output you want...
import re
original_word = 'to'
err = 'vo'
sentence = 'I will go tomorrow to the'
pattern = re.compile(r'\b%s\b' % re.escape(original_word))
modified_sentence = re.sub(pattern, err, sentence, count=1)
print modified_sentence
Output --> I will go tomorrow vo the
I have many fill-in-the-blank sentences in strings,
e.g. "6d) We took no [pains] to hide it ."
How can I efficiently parse this string (in Python) to be
"We took no to hide it"?
I also would like to be able to store the word in brackets (e.g. "pains") in a list for use later. I think the regex module could be better than Python string operations like split().
This will give you all the words inside the brackets.
import re
s="6d) We took no [pains] to hide it ."
matches = re.findall('\[(.*?)\]', s)
Then you can run this to remove all bracketed words.
re.sub('\[(.*?)\]', '', s)
just for fun (to do the gather and substitution in one iteration)
matches = []
def subber(m):
matches.append(m.groups()[0])
return ""
new_text = re.sub("\[(.*?)\]",subber,s)
print new_text
print matches
import re
s = 'this is [test] string'
m = re.search(r"\[([A-Za-z0-9_]+)\]", s)
print m.group(1)
Output
'test'
For your example you could use this regex:
(.*\))(.+)\[(.+)\](.+)
You will get four groups that you can use to create your resulting string and save the 3. group for later use:
6d)
We took no
pains
to hide it .
I used .+ here because I don't know if your strings always look like your example. You can change the .+ to alphanumeric or sth. more special to your case.
import re
s = '6d) We took no [pains] to hide it .'
m = re.search(r"(.*\))(.+)\[(.+)\](.+)", s)
print(m.group(2) + m.group(4)) # "We took no to hide it ."
print(m.group(3)) # pains
import re
m = re.search(".*\) (.*)\[.*\] (.*)","6d) We took no [pains] to hide it .")
if m:
g = m.groups()
print g[0] + g[1]
Output :
We took no to hide it .
This is one of those things where I'm sure I'm missing something simple, but... In the sample program below, I'm trying to use Python's RE library to parse the string "line" to get the floating-point number just before the percent sign, i.e. "90.31". But the code always prints "no match".
I've tried a couple other regular expressions as well, all with the same result. What am I missing?
#!/usr/bin/python
import re
line = ' 0 repaired, 90.31% done'
pct_re = re.compile(' (\d+\.\d+)% done$')
#pct_re = re.compile(', (.+)% done$')
#pct_re = re.compile(' (\d+.*)% done$')
match = pct_re.match(line)
if match: print 'got match, pct=' + match.group(1)
else: print 'no match'
match only matches from the beginning of the string. Your code works fine if you do pct_re.search(line) instead.
You should use re.findall instead:
>>> line = ' 0 repaired, 90.31% done'
>>>
>>> pattern = re.compile("\d+[.]\d+(?=%)")
>>> re.findall(pattern, line)
['90.31']
re.match will match at the start of the string. So you would need to build the regex for complete string.
try this if you really want to use match:
re.match(r'.*(\d+\.\d+)% done$', line)
r'...' is a "raw" string ignoring some escape sequences, which is a good practice to use with regexp in python. – kratenko (see comment below)
I need a way to remove all whitespace from a string, except when that whitespace is between quotes.
result = re.sub('".*?"', "", content)
This will match anything between quotes, but now it needs to ignore that match and add matches for whitespace..
I don't think you're going to be able to do that with a single regex. One way to do it is to split the string on quotes, apply the whitespace-stripping regex to every other item of the resulting list, and then re-join the list.
import re
def stripwhite(text):
lst = text.split('"')
for i, item in enumerate(lst):
if not i % 2:
lst[i] = re.sub("\s+", "", item)
return '"'.join(lst)
print stripwhite('This is a string with some "text in quotes."')
Here is a one-liner version, based on #kindall's idea - yet it does not use regex at all! First split on ", then split() every other item and re-join them, that takes care of whitespaces:
stripWS = lambda txt:'"'.join( it if i%2 else ''.join(it.split())
for i,it in enumerate(txt.split('"')) )
Usage example:
>>> stripWS('This is a string with some "text in quotes."')
'Thisisastringwithsome"text in quotes."'
You can use shlex.split for a quotation-aware split, and join the result using " ".join. E.g.
print " ".join(shlex.split('Hello "world this is" a test'))
Oli, resurrecting this question because it had a simple regex solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)
Here's the small regex:
"[^"]*"|(\s+)
The left side of the alternation matches complete "quoted strings". We will ignore these matches. The right side matches and captures spaces to Group 1, and we know they are the right spaces because they were not matched by the expression on the left.
Here is working code (and an online demo):
import re
subject = 'Remove Spaces Here "But Not Here" Thank You'
regex = re.compile(r'"[^"]*"|(\s+)')
def myreplacement(m):
if m.group(1):
return ""
else:
return m.group(0)
replaced = regex.sub(myreplacement, subject)
print(replaced)
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
Here little longish version with check for quote without pair. Only deals with one style of start and end string (adaptable for example for example start,end='()')
start, end = '"', '"'
for test in ('Hello "world this is" atest',
'This is a string with some " text inside in quotes."',
'This is without quote.',
'This is sentence with bad "quote'):
result = ''
while start in test :
clean, _, test = test.partition(start)
clean = clean.replace(' ','') + start
inside, tag, test = test.partition(end)
if not tag:
raise SyntaxError, 'Missing end quote %s' % end
else:
clean += inside + tag # inside not removing of white space
result += clean
result += test.replace(' ','')
print result