Why are the + not converted to spaces:
>>> import urllib
>>> url = 'Q=Who+am+I%3F'
>>> urllib.unquote(url)
'Q=Who+am+I?'
>>>
There are two variants; urllib.unquote() and urllib.unquote_plus(). Use the latter:
>>> import urllib
>>> url = 'Q=Who+am+I%3F'
>>> urllib.unquote_plus(url)
'Q=Who am I?'
That's because there are two variants of URL quoting; one for URL path segments, and one for URL query parameters; the latter uses a different specification. See Wikipedia:
When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email. The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20".
So forms using the application/x-www-form-urlencoded mime type in a GET or POST request use slightly different rules, one where spaces are encoded to +, but when encoding characters in a URL, %20 is used. When decoding you need to pick the right variant. You have form data (from the query part of the URL) so you need to use unquote_plus().
Now, if you are parsing a query string, you may want to use the urlparse.parse_qs() or urlparse.parse_qsl() functions; these not only will use the right unquote*() function, but parse out the parameters into a dictionary or list of key-value pairs as well:
>>> import urlparse
>>> urlparse.parse_qs(url)
{'Q': ['Who am I?']}
>>> urlparse.parse_qsl(url)
[('Q', 'Who am I?')]
Related
I am using Python's requests module to post some data to my company's SharePoint site:
response = requests.post("https//my-sharepoint-site.com",
data={"myPost": "this is a test"},
auth=HttpNtlmAuth("\\MY_USER_NAME", MY_PASS))
The resulting post on my company's sharepoint site looks like:
myPost=this+is+a+test
How do I 1) remove "myPost=" and 2) stop the filling of white space with '+' ?
Note: I do not have access to the company's server-side application logic.
Another note: curl does not encode white spaces with '+'. The result of:
curl –f –v --ntlm –u MY_USER_NAME --data “this is a test” https://my-sharepoint-site.com
is:
this is a test
You're passing a dict as the data parameter, which means the dict is interpreted as key-value pairs to be encoded as application/x-url-encoded. URL encoding (such as is used for GET requests) substitutes + characters instead of %20 because that is the behavior of most browsers.
If you want to pass a POST body literally, you need to encode it into bytes yourself.
In your case, simply passing data="this is a test".encode("utf-8") should give you the behavior you want.
I am using python requests to connect to a website. I am passing some strings to get data about them.
The problem is some string contains slash /, so when they are passed in url, I got a ValueError.
this is my url:
https://api.flipkart.net/sellers/skus/%s/listings % string
when string is passed (string that does not contain slash), I get:
https://api.flipkart.net/sellers/skus/A35-Charry-228_39/listings
It returns a valid response. but when i pass string which contains a slash:
string = "L20-ORG/BLUE-109(38)"
I get url like:
https://api.flipkart.net/sellers/skus/L20-ORG/BLUE-109(38)/listings
Which throws the error.
how to solve this?
Raw string literals in Python
string = r"L20-ORG/BLUE-109(38)"
You could find more info here and here.
urllib.quote_plus is your friend. As urllib is a module from the standard library, you just have to import it with import urllib.
If you want to be conservative, just use it with default value:
string = urllib.quote_plus("L20-ORG/BLUE-109(38)")
gives 'L20-ORG%2FBLUE-109%2838%29'
If you know that some characters are harmless for your use case (say parentheses):
string = urllib.quote_plus("L20-ORG/BLUE-109(38)", '()')
gives 'L20-ORG%2FBLUE-109(38)'
params = {'token': 'JVFQ%2FFb5Ri2aKNtzTjOoErWvAaHRHsWHc8x%2FKGS%2FKAuoS4IRJI161l1rz2ab7rovBzGB86bGsh8pmDVaW8jj6AiJ2jT2rLIyt%2Bbpm80MCOE%3D'}
rsp = requests.get("http://xxxx/access", params=params)
print rsp.url
print params
when print rsp.url, I get
http://xxxx/access?token=JVFQ%252FFb5Ri2aKNtzTjOoErWvAaHRHsWHc8x%252FKGS%252FKAuoS4IRJI161l1rz2ab7rovBzGB86bGsh8pmDVaW8jj6AiJ2jT2rLIyt%252Bbpm80MCOE%253D
JVFQ%2FF
JVFQ%252FF
The value of the ?token= in the url is different from params['token'].
Why does it change?
You passed in a URL encoded value, but requests encodes the value for you. As a result, the value is encoded twice; the % character is encoded to %25.
Don't pass in a URL-encoded value. Decode it manually if you must:
from urllib import unquote
params['token'] = unquote(params['token'])
URL's use a special type of syntax. The % character is a reserved character in URLs. It is used as an escape character to allow you to type other characters (such as space, #, and % itself).
Requests automatically encodes URLs to proper syntax when necessary. The % key had to be econded to "%25". In other words, the URL parameters never changed. They are the same. The URL was just encoded to proper syntax. Everywhere you put "%" it was encoded to the proper form of "%25"
You can check out URL Syntax info here if you want:
http://en.wikipedia.org/wiki/Uniform_resource_locator#Syntax
And you can encode/decode URLs here. Try encoding "%" or try decoding "%25" to see what you get:
http://www.url-encode-decode.com/
I am sending my GET request to python server my query string is having
"http://192.168.4.106:3333/xx/xx/xx/xx?excelReport**&detail=&#tt**=475&dee=475&empi=&qwer=&start_date=03/01/2014&end_date=03/13/2014&SearchVar=0&report_format=D"
my query string is containing one character # so when i am doing request.keys() in my server its not showing me any params passed.Its working with other special character??
I am stuck in this problem from quite a long time??
I am using zope framework??
Please suggest??
The # character cannot be used like that in a query string.
You should encode it with %23 and decode it when you parse the string.
The reason behind that can be found at W3 site
# marks the end of the 'query' part of an URL and the start of the 'fragment'. If you need to have a '#' inside your query (that is, the GET params that you get with request.keys()), you need to encode it (with the standard urllib.urlencode or with whatever your framework provides).
I'm not sure what's the purpose of # in that URL, though. Is it supposed to be a key #tt** in your request.keys()? Is it in fact the start of the fragment?
Nowadays fragments are often used to have some routing in the client side of a webapp, since if you go from #a to #b inside a webpage, you don't need to reload the page. So if that may be the case then you can't encode the #, since it would lose its meaning. You would need then to extract the parameters you want from the fragment part manually.
You can use urllib.quote to solve your problem generally.
>>> import urllib
>>> urllib.quote('#')
'%23'
I've got a string from an HTTP header, but it's been escaped.. what function can I use to unescape it?
myemail%40gmail.com -> myemail#gmail.com
Would urllib.unquote() be the way to go?
I am pretty sure that urllib's unquote is the common way of doing this.
>>> import urllib
>>> urllib.unquote("myemail%40gmail.com")
'myemail#gmail.com'
There's also unquote_plus:
Like unquote(), but also replaces plus signs by spaces, as required for unquoting HTML form values.
In Python 3, these functions are urllib.parse.unquote and urllib.parse.unquote_plus.
The latter is used for example for query strings in the HTTP URLs, where the space characters () are traditionally encoded as plus character (+), and the + is percent-encoded to %2B.
In addition to these there is the unquote_to_bytes that converts the given encoded string to bytes, which can be used when the encoding is not known or the encoded data is binary data. However there is no unquote_plus_to_bytes, if you need it, you can do:
def unquote_plus_to_bytes(s):
if isinstance(s, bytes):
s = s.replace(b'+', b' ')
else:
s = s.replace('+', ' ')
return unquote_to_bytes(s)
More information on whether to use unquote or unquote_plus is available at URL encoding the space character: + or %20.
Yes, it appears that urllib.unquote() accomplishes that task. (I tested it against your example on codepad.)