url dispatcher detect all and nothing with regexp - python

I am trying to create a URL dispatcher whic detects [a-zA-Z] chars (one word) and nothing at all.
I tried something like this, but the nothing does not work, only the chars.
url(r'(?P<search_word>[a-zA-Z].*?)/?$', 'website.views.index_view', name='website_index'),
What am i missing?

I think you want something like this instead (note the lack of a dot after [a-zA-Z]):
ur'^(?P<search_word>[a-zA-Z]*)/?$'
In your original regex, .*? will allow for any character(s) (even spaces, for example). Also, [a-zA-Z] will only match a single character unless it is followed by an * or a +.
Here is an example of my regex using the re module:
>>> import re
>>> re.match(ur'^(?P<search_word>[a-zA-Z]*)/?$', 'testString/')
<_sre.SRE_Match object at 0x02BF4F20> # matches 'testString/'
>>> re.match(ur'^(?P<search_word>[a-zA-Z]*)/?$', 'test-String/') # does not match 'test-String/' because of the hyphen
>>> re.match(ur'^(?P<search_word>[a-zA-Z]*)/?$', '') # also matches empty string ''
<_sre.SRE_Match object at 0x02BF44A0>

Related

Regex to extra part of the url

I'm trying to extract part of a url using regex. I'm trying todo this ideally in one line and word for both url types.
I'm trying the following but not sure how I should get the second url. I am trying to extract the 4FHP from both.
>>> import re
>>>
>>> a="/url_redirect/4FHP"
>>> b="/url/4FHP/asdfasdfas/"
>>>
>>> re.search('^\/(url_redirect|url)\/(.*)', a).group(2)
'4FHP'
>>> re.search('^\/(url_redirect|url)\/(.*)', b).group(2)
'4FHP/asdfasdfas/'
The following code will extract 4FHP from either string. Noticed that I changed .* (match a sequence of any non-newline character) to [^/]* (match a sequence of any non-/ character).
re.search('^\/(url_redirect|url)\/([^/]*)', b).group(2)
Your problem is that the * operator is 'greedy', so it will grab to the end of the string which is why you get '4FHP/asdfasdfas/' in your second example
you need to stop matching when you see another /, the easiest way is to use a character class that specifically excludes it, eg [^/]
you can also use non-capturing groups (?: <regex> ) to only return matched group that you're interested in
re.search('^\/(?:url_redirect|url)\/([^/]*)', b).group(1)

How can Python's regular expressions work with patterns that have escaped special characters?

Is there a way to get Python's regular expressions to work with patterns that have escaped special characters? As far as my limited understanding can tell, the following example should work, but the pattern fails to match.
import re
string = r'This a string with ^g\.$s' # A string to search
pattern = r'^g\.$s' # The pattern to use
string = re.escape(string) # Escape special characters
pattern = re.escape(pattern)
print(re.search(pattern, string)) # This prints "None"
Note:
Yes, this question has been asked elsewhere (like here). But as you can see, I'm already implementing the solution described in the answers and it's still not working.
Why on earth are you applying re.escape to the string?! You want to find the "special" characters in that! If you just apply it to the pattern, you'll get a match:
>>> import re
>>> string = r'This a string with ^g\.$s'
>>> pattern = r'^g\.$s'
>>> re.search(re.escape(pattern), re.escape(string)) # nope
>>> re.search(re.escape(pattern), string) # yep
<_sre.SRE_Match object at 0x025089F8>
For bonus points, notice that you just need to re.escape the pattern one more times than the string:
>>> re.search(re.escape(re.escape(pattern)), re.escape(string))
<_sre.SRE_Match object at 0x025D8DE8>

How does re.search match raw strings?

re.search(r'c\.t', 'c.t abc') matches successfully to c.t. But the pattern being matched is c\.t, how is c.t matching to c\.t? What happened to the backslash?
Inside a regular expression, the dot character has a special meaning, which is that it can match any character at all other than a newline (unless the re.S/re.DOTALL flag is used). In this case, the backslash has the effect of escaping the dot from its special meaning and letting the regular expression engine interpret it as literally matching only a dot (and no other character). Consider if the backslash is not there:
>>> re.search(r'c.t', 'c.t abc')
<_sre.SRE_Match object at 0x7fe7378d8370>
The original string you provided as input still matches. But now the following will also match:
>>> re.search(r'c.t', 'I saw a cat')
<_sre.SRE_Match object at 0x7fe7378d83d8>
Because the a in cat qualifies as any non-newline character, which is what . will match if unescaped with a backslash. You can see that if we add the backslash back in, it no longer matches.
>>> print(re.search(r'c\.t', 'I saw a cat'))
None
More on Python's implementation of regular expressions here:
Python 2.7.x: https://docs.python.org/2/library/re.html
Python 3.4.x: https://docs.python.org/3/library/re.html
Edited to reflect #cdarke's excellent point about newlines

How to find a specific character in a string and put it at the end of the string

I have this string:
'Is?"they'
I want to find the question mark (?) in the string, and put it at the end of the string. The output should look like this:
'Is"they?'
I am using the following regular expression in python 2.7. I don't know why my regex is not working.
import re
regs = re.sub('(\w*)(\?)(\w*)', '\\1\\3\\2', 'Is?"they')
print regs
Is?"they # this is the output of my regex.
Your regex doesn't match because " is not in the \w character class. You would need to change it to something like:
regs = re.sub('(\w*)(\?)([^"\w]*)', '\\1\\3\\2', 'Is?"they')
As shown here, " is not captured by \w. Hence, it would probably be best to just use a .:
>>> import re
>>> re.sub("(.*)(\?)(.*)", r'\1\3\2', 'Is?"they')
'Is"they?'
>>>
. captures anything/everything in Regex (except newlines).
Also, you'll notice that I used a raw-string for the second argument of re.sub. Doing so is cleaner than having all those backslashes.

Python re.match doesnt match the same regexp

I'm facing a weird problem; I hope nobody asked this question before
I need to match two regexp containing "(" ")".
Here is the kind of tests I made to see why it's not working:
>>> import re
>>> re.match("a","a")
<_sre.SRE_Match object at 0xb7467218>
>>> re.match(re.escape("a"),re.escape("a"))
<_sre.SRE_Match object at 0xb7467410>
>>> re.escape("a(b)")
'a\\(b\\)'
>>> re.match(re.escape("a(b)"),re.escape("a(b)"))
=> No match
Can someone explain me why the regexp doesn't match itself ?
Thanks a lot
You've escaped special characters, so your regex will match the string "a(b)", not the string 'a\(b\)' which is the result of re.escape('a(b)').
The first argument is the pattern object, the second is the actual string you are matching against. You shouldn't escape the string itself. Remember, re.escape escapes special characters in regexp.
>>> help(re.match)
Help on function match in module re:
match(pattern, string, flags=0)
Try to apply the pattern at the start of the string, returning
a match object, or None if no match was found.
>>> re.match(re.escape('a(b)'), 'a(b)')
<_sre.SRE_Match object at 0x10119ad30>

Categories