How can I escape certain characters while using python's urllib.urlencode()? - python

I have a dictionary that I want to urlencode as query parameters.
The server that I am hitting expects the the query to look like this: http://www.example.com?A=B,C
But when I try to use urllib.urlencode to build the URL, I find that the comma gets turned into %2C:
>>> import urllib
>>> urllib.urlencode({"A":"B,C"})
'A=B%2CC'
Is there any way I can escape the comma so that urlencode treats it like a normal character?
If not, how can I work around this problem?

You can escape certain characters by specifying them explicitly as safe argument value
urllib.quote(str, safe='~()*!.\'')
More : https://docs.python.org/3.0/library/urllib.parse.html#urllib.parse.quote

You can do this by adding the query params as a string before hitting the endpoint.
I have used requests for making a request.
For example:
GET Request
import requests
url = "https://www.example.com/?"
query = "A=B,C"
url_final = url + query
url = requests.get(url_final)
print(url.url)
# https://www.example.com/?A=B,C
The comma (along with some other characters) is defined in RFC 3986 as a reserved character. This means the comma has defined meaning at various parts in a URL, and if it is not being used in that context it needs to be percent-encoded.
That said, the query parameter doesn't give the comma any special syntax, so in query parameters, we probably shouldn't be encoding it. That said, it's not entirely Requests' fault: the parameters are encoded using urllib.urlencode(), which is what is percent-encoding the query parameters.
This isn't easy to fix though, because some web services use , and some use %2C, and neither is wrong. You might just have to handle this encoding yourself.

Related

django.urls.reverse(): URL-encoding the slash

When one supplies URL args or kwargs to a django.urls.reverse() call,
Django will nicely URL-encode non-Ascii characters and URL-reserved characters.
For instance, given a declaration such as
path("prefix/<stuff>", view=MyView.as_view(), name="myurl")
we get
reverse('myurl', args=['aaa bbb']) == "/prefix/aaa%20bbb"
reverse('myurl', args=['aaa%bbb']) == "/prefix/aaa%25bbb"
reverse('myurl', args=['Ä']) == "/prefix/%C3%84"
and so on. So far, so good.
What Django will not encode, however, is the slash:
reverse('myurl', args=['aaa/bbb'])
will give us
django.urls.exceptions.NoReverseMatch:
Reverse for 'myurl' with arguments '('aaa/bbb',)' not found.
1 pattern(s) tried: ['prefix/(?P<stuff>[^/]+)$']
(The question what to encode and what not
has been discussed as a Django issue.
It's complicated.)
I found a remark in the code that may explain why
the slash is a special case:
_reverse_with_prefix in django/urls/resolvers.py contains a comment that says
# WSGI provides decoded URLs, without %xx escapes, and the URL
# resolver operates on such URLs. First substitute arguments
# without quoting to build a decoded URL and look for a match.
# Then, if we have a match, redo the substitution with quoted
# arguments in order to return a properly encoded URL.
Given that unencoded arguments are used in the matching initially,
it is no wonder that it does not work:
The slash looks like the end of the argument to Django and so there is
one more argument than expected.
My question:
I dearly want to use user-supplied data in natural-looking URLs,
so slashes occur occasionally. How can I make them work?
The URL structure I need is basically
/show_rooms/<organization>/<department>/<building>
I can think of these approaches:
Replace a slash in an argument with some exotic Unicode character
that will never occur otherwise. And back for received arguments.
This would sort of do the job, but is inconvenient,
non-standard, and therefore ugly.
Use slugs instead of the real names.
This would require extending my models to store the slugs (because
the ORM needs to find objects by them) and appears out of proportion to me.
URL-quote my arguments before passing them to reverse()
and unquote arguments when I receive them.
This is as inconvenient as (1).
It leads to URLs that are more difficult to read than
those from (1), because each % produced by quoting
will subsequently be encoded as %25.
But at least it is a standard-ish approach.
Sigh. Is this really the "right" way?
Any comments or fourth solutions are welcome!
Now that I've written it up, solution (1) does not look quite
so horrible to me. What replacement character would you use for a slash?
I suggest you try to pass the slash in as a regular URL and see if your view is able to match with it. If that's the case and the problem is in the reverse function itself not the view. How about passing the slash already encoded %2F?
It's me, the asker.
Here is the solution that I finally used:
I decided for solution (1).
It turned out less inconvenient than I had expected and
it works spectacularly well.
My Firefox browser shows the URL in text form, not urlencoded form,
and when you pick the right replacement character it looks almost natural.
Very nice.
Here is the code for the escaping (to be called in the template, hence a
custom templatetag)
and unescaping (to be called in the view):
import django.template as djt
register = djt.Library()
# see https://docs.djangoproject.com/en/stable/howto/custom-template-tags/
ALT_SLASH = '\N{DIVISION SLASH}'
#register.filter
def escape_slash(urlparam: str) -> str:
"""
Avoid having a slash in the urlparam URL part,
because it would not get URL-encoded.
See https://stackoverflow.com/questions/67849991/django-urls-reverse-url-encoding-the-slash
Possible replacement characters are
codepoint char utf8 name oldname
U+2044 ⁄ e2 81 84 FRACTION SLASH
U+2215 ∕ e2 88 95 DIVISION SLASH
U+FF0F / ef bc 8f FULLWIDTH SOLIDUS FULLWIDTH SLASH
None of them will look quite right if the browser shows the char rather than
the %-escape in the address line, but DIVISION SLASH comes close.
The normal slash is
U+002F / 2f SOLIDUS SLASH
To get back the urlparam after calling escape_slash,
the URL will be formed (via {% url ... } or reverse()) and URL-encoded,
sent to the browser,
received by Django in a request, URL-unencoded, split,
its param parts handed to a view as args or kwargs,
and finally unescape_slash will be called by the view.
"""
return urlparam.replace('/', ALT_SLASH)
def unescape_slash(urlparam_q: str) -> str:
return urlparam_q.replace(ALT_SLASH, '/')

How do you build a URL that contains a username and password from user input in Python?

I see you can parse usernames and passwords with:
from urlparse import urlparse
r = urlparse('http://myuser:mypass#example.com')
print r.username
# => 'myuser'
How can I go the other way? I can't use urlunparse because I can't do this:
r.username = request.args['username']
# => AttributeError: can't set attribute
I'm interested because the username contains characters that need escaped (namely: #, /).
Edit:
This string is coming from user input, so I won't know ahead of time what it is. It's a security risk to maintain your own list of escape characters, so string concatenation and custom character escaping with replace won't help here.
Turns out the username part of basic auth URLs use the same escape characters that query parameters do.
So the answer is to use urllib.quote(username, safe='') then concatenate. (You'll need the safe parameter or / won't be escapted.)

Urls.py unable to pass #(pound) character to a view in Django,

I used to pass data through django URL while passing #character is not able to pass through urls.py, I am using pattern as
url(r'^pass/(?P<sentence>[\w|\W]*)/$',pass)
I tried with these pattern also
url(r'^pass/(?P<sentence>[a-zA-Z0-9-/:-?##{-~!^_\'\[\]*]*)/$',pass)
Thanks in advance.
The "#" character marks inline anchors (links within the same page) in a URL, so the browser will never send it to Django.
For example, if the URL is /something/pass/#test/something-else/ the browser will sent only /something/pass/ to the server. You can try /something/pass/%23test/something-else/ instead, 23 is the hexadecimal ascii code for # - not pretty (ugly by ugly just pass it as a get variable instead).
There is nothing you can do on the Django side - you better avoid characters with special meanings in the URL path when designing your routes - of course it is a matter of taste, but I really think that strings passed in the URL path should be "slugfied" in order to remove any funny character.
Browsers won't send the url fragment part (ends with "#") to servers. Why not converting your data to base64 first, then pass the data via url.
RFC 1808 (Relative Uniform Resource Locators) : Note that the fragment identifier (and the "#" that precedes it) is
not considered part of the URL. However, since it is commonly used
within the same string context as a URL, a parser must be able to
recognize the fragment when it is present and set it aside as part of
the parsing process.

Python regex with question mark literal

I'm using Django's URLconf, the URL I will receive is /?code=authenticationcode
I want to match the URL using r'^\?code=(?P<code>.*)$' , but it doesn't work.
Then I found out it is the problem of '?'.
Becuase I tried to match /aaa?aaa using r'aaa\?aaa' r'aaa\\?aaa' even r'aaa.*aaa' , all failed, but it works when it's "+" or any other character.
How to match the '?', is it special?
>>> s="aaa?aaa"
>>> import re
>>> re.findall(r'aaa\?aaa', s)
['aaa?aaa']
The reason /aaa?aaa won't match inside your URL is because a ? begins a new GET query.
So, the matchable part of the URL is only up to the first 'aaa'. The remaining '?aaa' is a new query string separated by the '?' mark, containing a variable "aaa" being passed as a GET parameter.
What you can do here is encode the variable before it makes its way into the URL. The encoded form of ? is %3F.
You should also not match a GET query such as /?code=authenticationcode using regex at all. Instead, match your URL up to / using r'^$'. Django will pass the variable code as a GET parameter to the request object, which you can obtain in your view using request.GET.get('code').
You are not allowed to use ? in a URL as a variable value. The ? indicates that there are variables coming in.
Like: http://www.example.com?variable=1&another_variable=2
Replace it or escape it. Here's some nice documentation.
Django's urls.py does not parse query strings, so there is no way to get this information at the urls.py file.
Instead, parse it in your view:
def foo(request):
code = request.GET.get('code')
if code:
# do stuff
else:
# No code!
"How to match the '?', is it special?"
Yes, but you are properly escaping it by using the backslash. I do not see where you have accounted for the leading forward slash, though. That bit just needs to be added in:
r'^/\?code=(?P<code>.*)$'
supress the regex metacharacters with []
>>> s
'/?code=authenticationcode'
>>> r=re.compile(r'^/[?]code=(.+)')
>>> m=r.match(s)
>>> m.groups()
('authenticationcode',)

How do you do string interpolation with a URL that contains formatting characters?

I'm trying to use URLLIB2 to open a URL and read back the contents into an array. The issue seems to be that you cannot use string interpolation in a URL that has formatting characters such as %20 for space, %3C for '<'. The URL in question has spaces and a bit of xml in it.
My code is pretty simple, looks something like this:
#Python script to fetch NS Client Policies using GUID
import sys
import urllib2
def GetPolicies(ns, guid):
ns = sys.argv[1]
guid = sys.argv[2]
fetch = urllib2.urlopen('http://%s/Altiris/NS/Agent/GetClientPolicies.aspx?xml=%3Crequest%20configVersion=%222%22%20guid=%22{%s}%22') % (ns, guid)
I've shortened the URL for brevity but you get the general idea, you get a 'Not enough arguments for format string' error since it assumes you're wanting to use the %3, %20, and other things as string interpolations. How do you get around this?
Edit: Solution requires Python 2.6+, 2.5 or prior has no support for the string.format() method
You can double the % signs
url = 'http://%s/Altiris/NS/Agent/GetClientPolicies.aspx?xml=%%3Crequest%%20configVersion=%%222%%22%%20guid=%%22{%s}%%22' % (ns, guid)
or you can use .format() method
url = 'http://{hostname}/Altiris/NS/Agent/GetClientPolicies.aspx?xml=%3Crequest%20configVersion=%222%22%20guid=%22{id}%%2''.format(hostname=ns, id=guid)
Use the .format method on the string instead. From its documentation:
str.format(*args, **kwargs)
Perform a string formatting operation. The string on which this method is called can contain literal text or replacement fields delimited by braces {}. Each replacement field contains either the numeric index of a positional argument, or the name of a keyword argument. Returns a copy of the string where each replacement field is replaced with the string value of the corresponding argument.
>>> "The sum of 1 + 2 is {0}".format(1+2)
'The sum of 1 + 2 is 3'
While we all sin by sticking to % as we're used to from C, the format method is really a more robust method of interpolating values into strings.
Build your string up in steps, doing each encoding layer separately. Much more manageable than trying to cope with the multiple levels of escaping in one go.
xml= '<request configVersion="2" guid="{%s}"/>' % cgi.escape(guid, True)
query= 'xml=%s' % urllib2.quote(xml)
url= 'http://%s/Altiris/NS/Agent/GetClientPolicies.aspx?%s' % (ns, query)
fetch= urllib2.urlopen(url)
If you're trying to build the url yourself use urllib.urlencode. It will deal with a lot of the quoting issues for you. Just pass it a dict of the info you want:
from urllib import urlencode
args = urlencode({'xml': '<',
'request configVersion': 'bar',
'guid': 'zomg'})
As for replacing hostname in the base of your url string, just do what everyone else said and use the %s formatting. The final string could be somthing like:
print 'http://%s/Altiris/NS/Agent/GetClientPolicies.aspx?%s' % ('foobar.com', args)

Categories