Python regular expression question - sub string but not prepended with - python

:)
I'm trying to sub foo to bar, but only if it's not prepended with ie. /. So...
foobar should change to barbar, but /foobar not.
I've tried to add [^/] at beginning of my re, but that doesn't work if foo is at beginning of string.
I hate regular expressions! :P

Use a negative lookbehind assertion.
>>> re.search('(?<!/)foo', 'foo')
<_sre.SRE_Match object at 0x7f44891518b8>
>>> re.search('(?<!/)foo', '/foo')
>>> re.search('(?<!/)foo', 'barfoo')
<_sre.SRE_Match object at 0x7f4489151850>

try using
\bfoo\b
\b is a word boundary, it deals with a lot of common cases like beginning of line, whitespace, etc.

Related

Is there a Python re.sub that starts from the end?

Is there a version of Python's re.sub() that acts like str.rfind and begins searching from the last match occurrence?
I want to do a regex substitution on the last match in a string, but there doesn't seem to be an out-of-box solution in the stdlib.
If you mean it literally, no. That's not how the regex engine works.
You can either reverse the string and apply re.sub(pattern, sub, string, count=1) with a reversed pattern, like one of the comment said.
Or you can construct a regex that match only the last match, like below:
>>> import re
>>> s = "hello hello hello hello hello world"
>>> re.sub(r"hello(?!.*hello.*$)", "hay", s)
'hello hello hello hello hay world'
You could use re.sub in the ordinary way but start the regexp with a (.*) to match as much of the string as possible, and then in the replacement you could use \1 to include unchanged the part that the .* matched.
>>> re.sub("(.*)a", r"\1A", "bananas")
'bananAs'
(Note here the r to ensure that the \ is passed verbatim to re.sub and not treated as starting an escape sequence.)

How can Python's regular expressions work with patterns that have escaped special characters?

Is there a way to get Python's regular expressions to work with patterns that have escaped special characters? As far as my limited understanding can tell, the following example should work, but the pattern fails to match.
import re
string = r'This a string with ^g\.$s' # A string to search
pattern = r'^g\.$s' # The pattern to use
string = re.escape(string) # Escape special characters
pattern = re.escape(pattern)
print(re.search(pattern, string)) # This prints "None"
Note:
Yes, this question has been asked elsewhere (like here). But as you can see, I'm already implementing the solution described in the answers and it's still not working.
Why on earth are you applying re.escape to the string?! You want to find the "special" characters in that! If you just apply it to the pattern, you'll get a match:
>>> import re
>>> string = r'This a string with ^g\.$s'
>>> pattern = r'^g\.$s'
>>> re.search(re.escape(pattern), re.escape(string)) # nope
>>> re.search(re.escape(pattern), string) # yep
<_sre.SRE_Match object at 0x025089F8>
For bonus points, notice that you just need to re.escape the pattern one more times than the string:
>>> re.search(re.escape(re.escape(pattern)), re.escape(string))
<_sre.SRE_Match object at 0x025D8DE8>

How does re.search match raw strings?

re.search(r'c\.t', 'c.t abc') matches successfully to c.t. But the pattern being matched is c\.t, how is c.t matching to c\.t? What happened to the backslash?
Inside a regular expression, the dot character has a special meaning, which is that it can match any character at all other than a newline (unless the re.S/re.DOTALL flag is used). In this case, the backslash has the effect of escaping the dot from its special meaning and letting the regular expression engine interpret it as literally matching only a dot (and no other character). Consider if the backslash is not there:
>>> re.search(r'c.t', 'c.t abc')
<_sre.SRE_Match object at 0x7fe7378d8370>
The original string you provided as input still matches. But now the following will also match:
>>> re.search(r'c.t', 'I saw a cat')
<_sre.SRE_Match object at 0x7fe7378d83d8>
Because the a in cat qualifies as any non-newline character, which is what . will match if unescaped with a backslash. You can see that if we add the backslash back in, it no longer matches.
>>> print(re.search(r'c\.t', 'I saw a cat'))
None
More on Python's implementation of regular expressions here:
Python 2.7.x: https://docs.python.org/2/library/re.html
Python 3.4.x: https://docs.python.org/3/library/re.html
Edited to reflect #cdarke's excellent point about newlines

url dispatcher detect all and nothing with regexp

I am trying to create a URL dispatcher whic detects [a-zA-Z] chars (one word) and nothing at all.
I tried something like this, but the nothing does not work, only the chars.
url(r'(?P<search_word>[a-zA-Z].*?)/?$', 'website.views.index_view', name='website_index'),
What am i missing?
I think you want something like this instead (note the lack of a dot after [a-zA-Z]):
ur'^(?P<search_word>[a-zA-Z]*)/?$'
In your original regex, .*? will allow for any character(s) (even spaces, for example). Also, [a-zA-Z] will only match a single character unless it is followed by an * or a +.
Here is an example of my regex using the re module:
>>> import re
>>> re.match(ur'^(?P<search_word>[a-zA-Z]*)/?$', 'testString/')
<_sre.SRE_Match object at 0x02BF4F20> # matches 'testString/'
>>> re.match(ur'^(?P<search_word>[a-zA-Z]*)/?$', 'test-String/') # does not match 'test-String/' because of the hyphen
>>> re.match(ur'^(?P<search_word>[a-zA-Z]*)/?$', '') # also matches empty string ''
<_sre.SRE_Match object at 0x02BF44A0>

Python re.match doesnt match the same regexp

I'm facing a weird problem; I hope nobody asked this question before
I need to match two regexp containing "(" ")".
Here is the kind of tests I made to see why it's not working:
>>> import re
>>> re.match("a","a")
<_sre.SRE_Match object at 0xb7467218>
>>> re.match(re.escape("a"),re.escape("a"))
<_sre.SRE_Match object at 0xb7467410>
>>> re.escape("a(b)")
'a\\(b\\)'
>>> re.match(re.escape("a(b)"),re.escape("a(b)"))
=> No match
Can someone explain me why the regexp doesn't match itself ?
Thanks a lot
You've escaped special characters, so your regex will match the string "a(b)", not the string 'a\(b\)' which is the result of re.escape('a(b)').
The first argument is the pattern object, the second is the actual string you are matching against. You shouldn't escape the string itself. Remember, re.escape escapes special characters in regexp.
>>> help(re.match)
Help on function match in module re:
match(pattern, string, flags=0)
Try to apply the pattern at the start of the string, returning
a match object, or None if no match was found.
>>> re.match(re.escape('a(b)'), 'a(b)')
<_sre.SRE_Match object at 0x10119ad30>

Categories