What does this Django regular expression mean? `?P` - python

I have the following regular expression (regex) in my urls.py and I'd like to know what it means. Specifically the (?P<category_slug> portion of the regex.
r'^category/(?P<category_slug>[-\w]+)/$

In django, named capturing groups are passed to your view as keyword arguments.
Unnamed capturing groups (just a parenthesis) are passed to your view as arguments.
The ?P is a named capturing group, as opposed to an unnamed capturing group.
http://docs.python.org/library/re.html
(?P<name>...) Similar to regular parentheses, but the substring
matched by the group is accessible within the rest of the regular
expression via the symbolic group name name. Group names must be valid
Python identifiers, and each group name must be defined only once
within a regular expression. A symbolic group is also a numbered
group, just as if the group were not named. So the group named id in
the example below can also be referenced as the numbered group 1.

(?P<name>regex) - Round brackets group the regex between them. They capture the text matched by the regex inside them that can be referenced by the name between the sharp brackets. The name may consist of letters and digits.
Copy paste from: http://www.regular-expressions.info/refext.html

(?P<category_slug>) creates a match group named category_slug.
The regex itself matches a string starting with category/ and then a mix of alphanumeric characters, the dash - and the underscore _, followed by a trailing slash.
Example URLs accepted by the regex:
category/foo/
category/foo_bar-baz/
category/12345/
category/q1e2_asdf/

In pattern matching,
Use this pattern for passing string
(?P<username2>[-\w]+)
This for interger value
(?P<user_id>[0-9]+)

New in version 3.6.
(?P<name>...)
Similar to regular parentheses, but the substring matched by the group is accessible via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named.
copy paste from Python3Regex

Related

`(?P<name>...) ` and `\g<quote>` in re module

Upon reading the python regex module, (?P<name>...) usually confused me.
I knew P here stanrds for nothing but random things as foo bar zoo from the answer python - Named regular expression group "(?Pregexp)": what does "P" stand for? - Stack Overflow
(?P<name>...)
Similar to regular parentheses, but the substring matched by the group is accessible via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named.
Named groups can be referenced in three contexts. If the pattern is (?P['"]).*?(?P=quote) (i.e. matching a string quoted with either single or double quotes):
Mistakes often make in repl argument in re.sub to employ \g<quote>.
Since I try to be pythonic and explain the root things to others.
why use g instead of p in \g<quote> or why not use G in (?P<name>...)?
so there will be some consistent than chaos.

Extracting version number at the end of a filename with regular expression

I have a list of filenames, some of which end with a version number at the end. I'm trying to extract the version number using a single regular expression:
filename.doc --> NULL
filename.1.0.doc --> 1.0
filename.2.0.pdf --> 2.0
filename.3.0.docx --> 3.0
So far, I found that the following regex extracts it along with the extension:
[0-9]+\.[0-9]+\.(docx|pdf|rtf|doc|docm)$
But I'd rather not have the extension. So what I'm searching is for the [0-9]+\.[0-9]+ just before the last occurrence of a dot in the string, but I can't find how to do that.
Thanks for your help!
what I'm searching is for the [0-9]+\.[0-9]+ just before the last occurrence of a dot in the string
You may use
r'[0-9]+\.[0-9]+(?=\.[^.]*$)'
See the regex demo.
Details
[0-9]+\.[0-9]+ - 1+ digits, . and 1+ digits
(?=\.[^.]*$) - a positive lookahead that requires ., then 0+ chars other than . and the end of the string immediately to the right of the current location.
See the regex graph:
Python regexs have named groups:
A more significant feature is named groups: instead of referring to them by numbers, groups can be referenced by a name.
The syntax for a named group is one of the Python-specific extensions: (?P...). name is, obviously, the name of the group. Named groups behave exactly like capturing groups, and additionally associate a name with a group. The match object methods that deal with capturing groups all accept either integers that refer to the group by number or strings that contain the desired group’s name. Named groups are still given numbers, so you can retrieve information about a group in two ways:
>> p = re.compile(r'(?P<word>\b\w+\b)')
>> m = p.search( '(((( Lots of punctuation )))' )
>> m.group('word')
'Lots'
>> m.group(1)
'Lots'
So in your case you can modify your regex as:
(?P<version>[0-9]+\.[0-9]+)\.(docx|pdf|rtf|doc|docm)$
and use:
found.group('version')
to select version from the found regex match.
Try this-
import re
try:
version = [float(s) for s in re.findall(r'-?\d+\.?\d*', 'filename.1.0.doc')][0]
print(version)
except:
pass
Here, if it has a number, then it will store it in the variable version, else it will pass.
This shoud work! :)

Named non-capturing group in python?

Is it possible to have named non-capturing group in python? For example I want to match string in this pattern (including the quotes):
"a=b"
'bird=angel'
I can do the following:
s = '"bird=angel"'
myre = re.compile(r'(?P<quote>[\'"])(\w+)=(\w+)(?P=quote)')
m = myre.search(s)
m.groups()
# ('"', 'bird', 'angel')
The result captures the quote group, which is not desirable here.
No, named groups are always capturing groups. From the documentation of the re module:
Extensions usually do not create a new group; (?P<name>...) is the
only exception to this rule.
And regarding the named group extension:
Similar to regular parentheses, but the substring matched by the group
is accessible within the rest of the regular expression via the
symbolic group name name
Where regular parentheses means (...), in contrast with (?:...).
You do need a capturing group in order to match the same quote: there is no other mechanism in re that allows you to do this, short of explicitly distinguishing the two quotes:
myre = re.compile('"{0}"' "|'{0}'" .format('(\w+)=(\w+)'))
(which has the downside of giving you four groups, two for each style of quotes).
Note that one does not need to give a name to the quotes, though:
myre = re.compile(r'([\'"])(\w+)=(\w+)\1')
works as well.
In conclusion, you are better off using groups()[1:] in order to get only what you need, if at all possible.

Parentheses in regular expression pattern when splitting a string

I would like to know the reason behind the following behaviour:
>>> re.compile("(b)").split("abc")[1]
'b'
>>> re.compile("b").split("abc")[1]
'c'
I seems that when I add parentheses around the splitting pattern, re adds it into the split array. But why? Is it something consistent, or simply an isolated feature of regular expressions.
It's a feature of re.split, according to the documentation:
If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.
In general, parenthesis denote capture groups and are used to extract certain parts of a string. Read more about capture groups.
In any regular expression, parentheses denote a capture group. Capture groups are typically used to extract values from the matched string (in conjunction with re.match or re.search). For details, refer to the official documentation (search for (...)).
re.split adds the matched groups in between the splitted values:
If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

what is this url mean in django

this is my code :
(r'^q/(?P<terminal_id>[^/]+)/(?P<cmd_type>[^/]+)/?$', 'send_query_cmd'),
the view is :
def send_query_cmd(request, terminal_id, cmd_type):
waht about ?p mean .
i dont know what is this url mean ,
thanks
(?P<id>REGEXP) is the syntax for python regular expression named group capturing.
http://docs.python.org/library/re.html ->> scroll down to (?P...
As for what the P stands for.. parameter? python? Origin sounds fun.
Anyways, these same regular expressions are what the django URL resolver uses to match a URL to a view, along with capturing named groups as arguments to your view function.
http://docs.djangoproject.com/en/dev/topics/http/urls/#captured-parameters
The simplest example is this:
(r'^view/(?P<post_number>\d+)/$', 'foofunc'),
# we're capturing a very simple regular expression \d+ (any digits) as post_number
# to be passed on to foofunc
def foofunc(request, post_number):
print post_number
# visiting /view/3 would print 3.
It comes from Python regular expression syntax. The (?P...) syntax is a named group. This means that the matched text is available using the given name, or using Django as a named parameter in your view function. If you just use brackets with the ?P then it's an unnamed group and is available using an integer which is the order in which the group was captured.
Your URL regex means the following...
^ - match the start of the string
q/ - match a q followed by a slash
(?P<terminal_id>[^/]+) - match at least one character that isn't a slash, give it the name terminal_id
/ - match a slash
(?P<cmd_type>[^/]+) - match at least one character that isn't a slash, give it the name cmd_type
/? - optionality match a slash
$ - match the end of the string

Categories