this is my code :
(r'^q/(?P<terminal_id>[^/]+)/(?P<cmd_type>[^/]+)/?$', 'send_query_cmd'),
the view is :
def send_query_cmd(request, terminal_id, cmd_type):
waht about ?p mean .
i dont know what is this url mean ,
thanks
(?P<id>REGEXP) is the syntax for python regular expression named group capturing.
http://docs.python.org/library/re.html ->> scroll down to (?P...
As for what the P stands for.. parameter? python? Origin sounds fun.
Anyways, these same regular expressions are what the django URL resolver uses to match a URL to a view, along with capturing named groups as arguments to your view function.
http://docs.djangoproject.com/en/dev/topics/http/urls/#captured-parameters
The simplest example is this:
(r'^view/(?P<post_number>\d+)/$', 'foofunc'),
# we're capturing a very simple regular expression \d+ (any digits) as post_number
# to be passed on to foofunc
def foofunc(request, post_number):
print post_number
# visiting /view/3 would print 3.
It comes from Python regular expression syntax. The (?P...) syntax is a named group. This means that the matched text is available using the given name, or using Django as a named parameter in your view function. If you just use brackets with the ?P then it's an unnamed group and is available using an integer which is the order in which the group was captured.
Your URL regex means the following...
^ - match the start of the string
q/ - match a q followed by a slash
(?P<terminal_id>[^/]+) - match at least one character that isn't a slash, give it the name terminal_id
/ - match a slash
(?P<cmd_type>[^/]+) - match at least one character that isn't a slash, give it the name cmd_type
/? - optionality match a slash
$ - match the end of the string
Related
I have a list of filenames, some of which end with a version number at the end. I'm trying to extract the version number using a single regular expression:
filename.doc --> NULL
filename.1.0.doc --> 1.0
filename.2.0.pdf --> 2.0
filename.3.0.docx --> 3.0
So far, I found that the following regex extracts it along with the extension:
[0-9]+\.[0-9]+\.(docx|pdf|rtf|doc|docm)$
But I'd rather not have the extension. So what I'm searching is for the [0-9]+\.[0-9]+ just before the last occurrence of a dot in the string, but I can't find how to do that.
Thanks for your help!
what I'm searching is for the [0-9]+\.[0-9]+ just before the last occurrence of a dot in the string
You may use
r'[0-9]+\.[0-9]+(?=\.[^.]*$)'
See the regex demo.
Details
[0-9]+\.[0-9]+ - 1+ digits, . and 1+ digits
(?=\.[^.]*$) - a positive lookahead that requires ., then 0+ chars other than . and the end of the string immediately to the right of the current location.
See the regex graph:
Python regexs have named groups:
A more significant feature is named groups: instead of referring to them by numbers, groups can be referenced by a name.
The syntax for a named group is one of the Python-specific extensions: (?P...). name is, obviously, the name of the group. Named groups behave exactly like capturing groups, and additionally associate a name with a group. The match object methods that deal with capturing groups all accept either integers that refer to the group by number or strings that contain the desired group’s name. Named groups are still given numbers, so you can retrieve information about a group in two ways:
>> p = re.compile(r'(?P<word>\b\w+\b)')
>> m = p.search( '(((( Lots of punctuation )))' )
>> m.group('word')
'Lots'
>> m.group(1)
'Lots'
So in your case you can modify your regex as:
(?P<version>[0-9]+\.[0-9]+)\.(docx|pdf|rtf|doc|docm)$
and use:
found.group('version')
to select version from the found regex match.
Try this-
import re
try:
version = [float(s) for s in re.findall(r'-?\d+\.?\d*', 'filename.1.0.doc')][0]
print(version)
except:
pass
Here, if it has a number, then it will store it in the variable version, else it will pass.
This shoud work! :)
Hello i am a newbie and currently trying to learn about regex pattern by experimenting on various patterns. I tried to create the regex pattern for this url but failed. It's a pagination link of amazon.
http://www.amazon.in/s/lp_6563520031_pg_2?rh=n%3A5866078031%2Cn%3A%215866079031%2Cn%3A6563520031&page=2s&ie=UTF8&qid=1446802571
Or
http://www.amazon.in/Tena-Wet-Wipe-Pulls-White/dp/B001O1G242/ref=sr_1_46?s=industrial&ie=UTF8&qid=1446802608&sr=1-46
I just want to check the url by only these two things.
If the url has dp directory or product directory
If the url has query string page having any digit
I tried to create the regex pattern but failed. I want that if the first thing is not there the regex pattern should match the second (or vice versa).
Here's the regex pattern I made:
.*\/(dp|product)\/ | .*page
Here is my regex101 link: https://regex101.com/r/zD2gP5/1#python
Since you just want to check if a string contains some pattern, you can use
\/(?:dp|product)\/|[&?]page=
See regex demo
In Python, just check with re.search:
import re
p = re.compile(r'/(?:dp|product)/|[&?]page=')
test_str = "http://w...content-available-to-author-only...n.in/s/lp_6563520031_pg_2?rh=n%3A5866078031%2Cn%3A%215866079031%2Cn%3A6563520031&page=2s&ie=UTF8&qid=14468025716"
if p.search(test_str):
print ("Found!")
Also, in Python regex patterns, there is no need to escape / slashes.
The regex matches two alternative subpatterns (\/(?:dp|product)\/ and [&?]page=):
/ - a forward slash
(?:dp|product) - either dp or product (without storing the capture inside the capture buffer since it is a non-capturing group)
/ - a slash
| - or...
[&?] - either a & or ? (we check the start of a query string parameter)
page= - literal sequence of symbols page=.
\/(dp|product)\/|page=(?=[^&]*\d)[^&]+
This would be my idea, please test it and let me know if you have question about.
I have some confusion regarding the pattern matching in the following expression. I tried to look up online but couldn't find an understandable solution:
imgurUrlPattern = re.compile(r'(http://i.imgur.com/(.*))(\?.*)?')
What exactly are the parentheses doing ? I understood up until the first asterisk , but I can't figure out what is happening after that.
Regular expressions can be represented as graphs to understand there operation. A parallel connection between nodes indicate that it is optional a serial connection indicates taht it is mandatory and a loop indicated repitition over the same node.
(http://i.imgur.com/(.*))(\?.*)?
Debuggex Demo
So this starts with an imgur URL http://i.imgur.com/(.*) (mandatorily) having any characters untill a '?'(optional) is encountered. Following any characters after the '?'. Notice '?' has been escaped of its regular behaviour. The pink highlights indicate the capture groups.
(http://i.imgur.com/(.*))(\?.*)?
The first capturing group (http://i.imgur.com/(.*)) means that the string should start with http://i.imgur.com/ followed by any number of characters (.*) (this is a poor regex, you shouldn't do it this way). (.*) is also the second capturing group.
The third capturing group (\?.*) means that this part of the string must start with ? and then contain any number of any characters, as above.
The last ? means that the last capturing group is optional.
EDIT:
These groups can then be used as:
p = re.compile(r'(http://i.imgur.com/(.*))(\?.*)?')
m = p.match('ab')
m.group(0);
m.group(2);
To improve the regex, you must limit the engine to what characters you need, like:
(http://i.imgur.com/([A-z0-9\-]+))(\?[[^/]+*)?
[A-z0-9\-]+ limit to alphanumeric characters
[^/] exclude /
The (.*) means any character repeated any amount of times, the (\?.*)? matches the query string of a url for example (a imgur search of "cat"):
http://imgur.com/search?q=cat
http://imgur.com/search is matched by the (http://i.imgur.com/(.*)) (the search is specifically matched by the (.*)) section of the regex. The ?q=cat is matched by the (\?.*)? of the regex. In the regex the ? in the end means optional, so it means there might or might not be a query string. There is no query string in the url http://www.imgur.com. The parenthesis are used for grouping. We want to group (http://i.imgur.com/(.*)) as one thing because it matches the url, and there is another group within this that matches the page you are request (this is (.*)). We want to group (\?.*)? because it matches the query string.
Here is a diagram to help you
I want a regular expression to grab urls that does not contain specific word in their domain name but no matter if there is that word in the query string or other subdirectories of the domain.Also it doesn't matter how the hrl starts for exmaple by http/fttp/https/without any of them. I found this expression ^((?!foo).)*$") I don't know how should I change it to fit into these conditions.
These are the accepted url for the word "foo":
whatever.whatever.whatever/foo/pic
whatever.whatever.whatever?sdfd="foo"
and these are not accepted:
whatever.whateverfoo.whatever
whatever.foowhatever.whatever
whatever.foo.whatever.whatever
whatever.whatever.foo.whatever
Try this (explanation):
^(?:(?!foo).)*?[\/\?]
What this means is basically:
match anthing not containing foo
until a slash or question mark is encountered
The precise syntax may vary depending on your programming language/editor. The explanation link shows the PHP example. The regex elements I've used are pretty common, so it should work for you. If not, let me know.
This regex can only be matched against a single URL at a time. So if you are trying this in regex101, don't enter all URLs at once.
Update: Example in Java (now using turner instead of foo):
Pattern p = Pattern.compile("^(?:(?!turner).)*?[\\/\\?].*");
System.out.println(p.matcher(
"i.cdn.turner.com/cnn/.e/img/3.0/1px.gif").matches());
System.out.println(p.matcher(
"www.facebook.com/plugins/like.php?href=http%3A%2F%2F"
+ "www.facebook.com%2Fturnerkjljl").matches());
Output:
false
true
Here is your regex in java
"^[^/?]+(?<!foo)"
Explanation - From beginning search for characters which does not matches with / or ?. The moment it finds any one of the above two characters then the pattern search backward for negative match for foo. If foo is found then it returns false else true. This is in java. Also the regex will vary from language to language.
in grep cmd (unix or shell script) you have to take negation of the following regex match
"^[^/?]+foo"
Here's a regex that will match the cases that you want to reject
(?:.+://){0,1}(?<subdomain>[^.]+\.){0,1}(?<domain>[^.]*whatever[^.]*\.)(?<top>[^.]+).*
(?: ) is a non-capturing group
(?<groupName> ) is a named group (useful for testing, in regexhero you can see what is being captured by the group)
{0,1} means 0 or 1
. means any character except new line
[^.] means any character except "."
means 0 or more
means 1 or more, for example, .+ means 1 or many "any characters"
. escapes the special character .
See http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet
you can try it here: http://regexhero.net/tester/
I have the following regular expression (regex) in my urls.py and I'd like to know what it means. Specifically the (?P<category_slug> portion of the regex.
r'^category/(?P<category_slug>[-\w]+)/$
In django, named capturing groups are passed to your view as keyword arguments.
Unnamed capturing groups (just a parenthesis) are passed to your view as arguments.
The ?P is a named capturing group, as opposed to an unnamed capturing group.
http://docs.python.org/library/re.html
(?P<name>...) Similar to regular parentheses, but the substring
matched by the group is accessible within the rest of the regular
expression via the symbolic group name name. Group names must be valid
Python identifiers, and each group name must be defined only once
within a regular expression. A symbolic group is also a numbered
group, just as if the group were not named. So the group named id in
the example below can also be referenced as the numbered group 1.
(?P<name>regex) - Round brackets group the regex between them. They capture the text matched by the regex inside them that can be referenced by the name between the sharp brackets. The name may consist of letters and digits.
Copy paste from: http://www.regular-expressions.info/refext.html
(?P<category_slug>) creates a match group named category_slug.
The regex itself matches a string starting with category/ and then a mix of alphanumeric characters, the dash - and the underscore _, followed by a trailing slash.
Example URLs accepted by the regex:
category/foo/
category/foo_bar-baz/
category/12345/
category/q1e2_asdf/
In pattern matching,
Use this pattern for passing string
(?P<username2>[-\w]+)
This for interger value
(?P<user_id>[0-9]+)
New in version 3.6.
(?P<name>...)
Similar to regular parentheses, but the substring matched by the group is accessible via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named.
copy paste from Python3Regex