I am currently going through various Django tutorials in order to understand how url mapping is working . I came across an example which is something like this
this is in my urls.py
url(r'admin_page_edit$',"adminApp.views.showClientDetails",name="admin_page_edit"),
this is in my html page that is currently being displayed to the user
<a href="{% url "admin_page_edit" %}?uname=SomeVal&par2=value" >
Now the URL the browser shows when the above href link is clicked. No problem there
http://127.0.0.1:8000/admin_page_edit?uname=SomeVal&par2=value
And the above URL lands in the corresponding view
adminApp.views.showClientDetails
Now here is the proble, this seems to all work but I am confused as to why this is working ? since the url of the browser is
http://127.0.0.1:8000/admin_page_edit?uname=SomeVal&par2=value
which does not match the regex string in the url
admin_page_edit$
(The above regex means if the string ends with admin_page_edit) but the url string does not end with admin_page_edit instead it is
http://127.0.0.1:8000/admin_page_edit?uname=SomeVal&par2=value
thus ending with par2=value
My question is why is this hitting the corresponding definition in the view when the url regex is not matching ?
Query strings (parts following ?) are not processed by the Django url parser. Why? Because they don't have to be processed. You can just about append any query string to any url:
Like: https://www.facebook.com/?request=pleasedonotwork which works all the same. Unless redirects (or some logging) are done based on queries sent in urls, you can consider the query part of urls as passive.
These query strings can be accessed in your Django views via the request.GET QueryDict
Related
I am trying to build an website which renders some books and the corresponding pages. I want to make possible to access a page like this:
path('/<str:book_pk>-<int:book_page>/', views.TestClass.as_view(), name='book-test')
I want a user to access it very simple, something like: mysite.com/5-12/ - which redirects him to the book nr 5 at page 12. The problem is that when I access this page from the website itself, using href, the real path becomes:
mysite.com/%2F5-13/
If I want to write in the browser, the following path: myste.com/5-13/, it throws me 404 page not found, because the real path is mysite.com/%2F5-13/ . This is pretty obvious, but my question is:
How can I stick to my initial path, and make it possible to be accessed via myste.com/5-13/? For some reason, Django URL Patterns, adds an extra %2F string at the beginning. Can somebody explain me why, and how to solve this issue?
I much appreciate your time and effort! Thank you so much!
You don't have to include / at the beginning of the url, simply:
path('<str:book_pk>-<int:book_page>/', views.TestClass.as_view(), name='book-test')
/ is encoded automatically as %2F in urls (read the full list here)
I have a Django template filter where I'm trying to check the request URL to see whether it includes an anchor tag or not.
For example, I want to check if the request URL is:
/mypath/mypage/my-slug/ or /mypath/mypage/my-slug/#myanchor
If it is the latter, I want to get the anchor tag ID.
Currently, the anchor tag seems to be stripped from the request information: request.path, request.build_absolute_uri() and so on. I can only get /mypath/mypage/my-slug/.
The URL pattern for the page in question is:
re_path(r'^(?P<product_slug>[\w-]*)_(?P<pk>\d+)/$', self.detail_view.as_view(), name='detail')
I can see the regex looks for the line end after the last slash, which suggests the anchor is being excluded, but I can easily retrieve GET params from the URL and I expected this to be straightforward too. It feels like I am missing something pretty obvious.
The section tag is not sent to the server, but interpreted by the browser :
Is the anchor part of a URL being sent to a web server?
Now, if you need to the anchor ID, you will have to do so using AJAX and write your own view to receive the data.
Is there a way of grabbing a subset of the URL in Django views.py?
I have tried request.build_absolute_uri() but that captures slightly more than I want.
E.G.
If the URL was http://127.0.0.1:8000/jobs/new I would like to get http://127.0.0.1:8000/
Instead I end up getting the entire URL, with the page names.
I am wondering if there is a rootURL function in Django or something similar maybe?
This question has a few approaches which are:
Using request.get_host(),
Go back home
Using href='/',
Go back home
Or Using the root url defined in your url conf.
url(r'^mah_root/$', 'someapp.views.mah_view', name='mah_view'),
Then in your template:
Go back home
How to prevent scrapy from crawling a website endless, when only the url particularly the session id or something like that is altered and the content behind the urls is the same.
Is there a way to detect that?
I've read this Avoid Duplicate URL Crawling, Scrapy - how to identify already scraped urls and that how to filter duplicate requests based on url in scrapy, but for solving my problem this is sadly not enough.
There are a couple of ways to do this, both related to the questions you've linked to.
With one, you decide what URL parameters make a page unique, and tell your custom duplicate request filter to ignore the other portions of the URL. This is similar to the answer at https://stackoverflow.com/a/13605919 .
Example:
url: http://www.example.org/path/getArticle.do?art=42&sessionId=99&referrerArticle=88
important bits: protocol, host, path, query parameter "art"
implementation:
def url_fingerprint(self, url):
pr = urlparse.urlparse(url)
queryparts = pr.query.split('&')
for prt in queryparts:
if prt.split("=")[0] != 'art':
queryparts.remove(prt)
return urlparse.urlunparse(ParseResult(scheme=pr.scheme, netloc=pr.netloc, path=pr.path, params=pr.params, query='&'.join(queryparts), fragment=pr.fragment))
The other way is to determine what bit of information on the page make it unique, and use either the IgnoreVisitedItems middleware (as per https://stackoverflow.com/a/4201553) or a dictionary/set in your spider's code. If you go the dictionary/set route, you'll have your spider extract that bit of information from the page and check the dictionary/set to see if you've seen that page before; if so, you can stop parsing and return.
What bit of information you'll need to extract depends on your target site. It could be the title of the article, an OpenGraph <og:url> tag, etc.
My application is listing some game servers IP addresses.
I want to add a simple search engine, taking a regular expression in it. I would type ^200. to list only the IP addresses beginning with 200.
The form would redirect me to the results page by sending a GET request like that :
/servers/search/^200./page/1/range/30/
This is the line I'm using in urls.py :
url(r'^servers/search/(?P<search>[a-zA-Z0-9.]+)/page/(?P<page>\d+)/range/(?P<count>\d+)/$', gameservers.views.index)
But it doesn't work the way I expected. No results are shown. I've intentionally made a syntax error to see the local variables. Then I realized that the search variable's value is the following :
^200./page/1/range/30/
How can I fix this ? I've thought about moving the search parameter in the url's ending, but it might be very interesting to see if there is a way to limit the value with the next /.
Your regex doesn't match at all: you are not accepting the ^ character. But even if it was, there's no way that the full URL could all be captured in the search variable, because then the rest of the URL wouldn't match.
However, I wouldn't try to fix this. Trying to capture complicated patterns in the URL itself is usually a mistake. For a search value, it's perfectly acceptable to move that to a GET query parameter, so that your URL would look something like this:
/servers/search/?search=^200.&page=1&range=30
or, if you like, you could still capture the page and range values in the URL, but leave the search value as a query param.