Using Python Regular Expression in Django - python

I have an web address:
http://www.example.com/org/companyA
I want to be able to pass CompanyA to a view using regular expressions.
This is what I have:
(r'^org/?P<company_name>\w+/$',"orgman.views.orgman")
and it doesn't match.
Ideally all URL's that look like example.com/org/X would pass x to the view.
Thanks in advance!

You need to wrap the group name in parentheses. The syntax for named groups is (?P<name>regex), not ?P<name>regex. Also, if you don't want to require a trailing slash, you should make it optional.
It's easy to test regular expression matching with the Python interpreter, for example:
>>> import re
>>> re.match(r'^org/?P<company_name>\w+/$', 'org/companyA')
>>> re.match(r'^org/(?P<company_name>\w+)/?$', 'org/companyA')
<_sre.SRE_Match object at 0x10049c378>
>>> re.match(r'^org/(?P<company_name>\w+)/?$', 'org/companyA').groupdict()
{'company_name': 'companyA'}

Your regex isn't valid. It should probably look like
r'^org/(?P<company_name>\w+)/$'

It should look more like r'^org/(?P<company_name>\w+)'
>>> r = re.compile(r'^org/(?P<company_name>\w+)')
>>> r.match('org/companyA').groups()
('companyA',)

Related

In Python, is there a way to capture the following YYYYMMDD-N from a URL

I am looking for a way to capture the following with either a regular expression or a built-in function in Python.
From /url-path/YYYYMMDD-N/url-path-cont I only need YYYYMMDD-N. Sometimes the -N is present and sometimes it is not. I have tried various methods but so far all my attempts either stop at YYYMMDD or capture part of /url-path-cont.
I would like to capture only the YYYYMMDD-N with the -N as optional whenever present.
There are probably better ways of doing this, but as long as there's always the same amount of / then you could use the split method:
url_path = "/url-path/YYYYMMDD-N/url-path-cont"
date_only = url_path.split("/")[2]
print(date_only)
Here is a regular expression that will extract the date from a string.
>>> import re
>>> url = "url-path/YYYYMMDD-N/url-path-cont"
>>> re.compile(r"/(\w+-?\w?)/").search(url).group(1)
'YYYYMMDD-N'
>>>

How do I access an object, knowing only its memory address and not a variable that refers to it?

I am trying to debug a script and there is a certain regular expression that I want to check. (the regex is passed into my program as a parameter from another script, so I cannot manually just look up the regex.) During the debugging process I notice that the regex in question is loaded into memory, so I see something like
<_sre.SRE_Pattern object at 0x000000000DB6E5D0>
I want to access that regex object so I can see exactly what it is doing, but I cannot seem to find a way to access memory in python. Does anyone know how to do this?
It looks like you're trying to print the regex object directly, along these lines:
>>> import re
>>> p = re.compile('my regex')
>>> print p
_sre.SRE_Pattern object at 0x02274380
You can simply refer to the pattern as p.pattern:
>>> p.pattern
'my regex'
Docs: https://docs.python.org/2/library/re.html
You can access the pattern the regular expression was compiled with from the pattern attribute:
>>> import re
>>> re.compile(r'^(.*)$')
>>> r = re.compile(r'^(.*)$')
>>> r
<_sre.SRE_Pattern object at 0x7fa384ce5918>
>>> r.pattern
'^(.*)$'

Regex match string beginning with ?code=

I'm using python and django to match urls for my site. I need to match a url that looks like this:
/company/code/?code=34k3593d39k
The part after ?code= is any combination of letters and numbers, and any length.
I've tried this so far:
r'^company/code/(.+)/$'
r'^company/code/(\w+)/$'
r'^company/code/(\D+)/$'
r'^company/code/(.*)/$'
But so far none are catching the expression. Any ideas? Thanks
code=34k3593d39k is GET parameter and you don't need to define the pattern for it in URL pattern. You can access it using request.GET.get('code') under view. The pattern should be just:
r'^company/code/$'
Usage, accessing GET parameter:
def my_view(request):
code = request.GET.get('code')
print code
Check the documentation:
The URLconf searches against the requested URL, as a normal Python
string. This does not include GET or POST parameters, or the domain
name.
The first pattern will work if you move the last / to just after the ^:
>>> import re
>>> re.match(r'^/company/code/(.+)$', '/company/code/?code=34k3593d39k')
<_sre.SRE_Match object at 0x0209C4A0>
>>> re.match(r'^/company/code/(.+)$', '/company/code/?code=34k3593d39k').groups()
('?code=34k3593d39k',)
>>>
Note too that the ^ is unnecessary because re.match matches from the start of the string:
>>> re.match(r'/company/code/(.+)$', '/company/code/?code=34k3593d39k').groups()
('?code=34k3593d39k',)
>>>

python regex on variable

Please help with my regex problem
Here is my string
source="http://www.amazon.com/ref=s9_hps_bw_g200_t2?pf_rd_m=ATVPDKIKX0DER&pf_rd_i=3421"
source_resource="pf_rd_m=ATVPDKIKX0DER"
The source_resource is in the source may end with & or with .[for example].
So far,
regex = re.compile("pf_rd_m=ATVPDKIKX0DER+[&.]")
regex.findall(source)
[u'pf_rd_m=ATVPDKIKX0DER&']
I have used the text here. Rather using text, how can i use source_resource variable with & or . to find this out.
If the goal is to extract the pf_rd_m value (which it apparently is as you are using regex.findall), than I'm not sure regex are the easiest solution here:
>>> import urlparse
>>> qs = urlparse.urlparse(source).query
>>> urlparse.parse_qs(qs)
{'pf_rd_m': ['ATVPDKIKX0DER'], 'pf_rd_i': ['3421']}
>>> urlparse.parse_qs(qs)['pf_rd_m']
['ATVPDKIKX0DER']
You also have to escape the .
pattern=re.compile(source_resource + '[&\.]')
You can just build the string for the regular expression like a normal string, utilizing all string-formatting options available in Python:
import re
source_and="http://rads.stackoverflow.com/amzn/click/B0030DI8NA/pf_rd_m=ATVPDKIKX0DER&"
source_dot="http://rads.stackoverflow.com/amzn/click/B0030DI8NA/pf_rd_m=ATVPDKIKX0DER."
source_resource="pf_rd_m=ATVPDKIKX0DER"
regex_string = source_resource + "[&\.]"
regex = re.compile(regex_string)
print regex.findall(source_and)
print regex.findall(source_dot)
>>> ['pf_rd_m=ATVPDKIKX0DER&']
['pf_rd_m=ATVPDKIKX0DER.']
I hope this is what you mean.
Just take note that I modified your regular expression: the . is a special symbol and needs to be escaped, as is the + (I just assumed the string will only occur once, which makes the use of + unnecessary).

Parse URL with a regex in Python

I want to get the query name and values to be displayed from a URL.
For example, url='http://host:port_num/file/path/file1.html?query1=value1&query2=value2'
From this, parse the query names and its values and to print it.
Don't use a regex! Use urlparse.
>>> import urlparse
>>> urlparse.parse_qs(urlparse.urlparse(url).query)
{'query2': ['value2'], 'query1': ['value1']}
I agree that it's best not to use a regular expression and better to use urlparse, but here is my regular expression.
Classes like urlparse were developed specifically to handle all URLs efficiently and are much more reliable than a regular expression is, so make use of them if you can.
>>> x = 'http://www.example.com:8080/abcd/dir/file1.html?query1=value1&query2=value2'
>>> query_pattern='(query\d+)=(\w+)'
>>> # query_pattern='(\w+)=(\w+)' a more general pattern
>>> re.findall(query_pattern, x)
[('query1', 'value1'), ('query2', 'value2')]

Categories