params = {'token': 'JVFQ%2FFb5Ri2aKNtzTjOoErWvAaHRHsWHc8x%2FKGS%2FKAuoS4IRJI161l1rz2ab7rovBzGB86bGsh8pmDVaW8jj6AiJ2jT2rLIyt%2Bbpm80MCOE%3D'}
rsp = requests.get("http://xxxx/access", params=params)
print rsp.url
print params
when print rsp.url, I get
http://xxxx/access?token=JVFQ%252FFb5Ri2aKNtzTjOoErWvAaHRHsWHc8x%252FKGS%252FKAuoS4IRJI161l1rz2ab7rovBzGB86bGsh8pmDVaW8jj6AiJ2jT2rLIyt%252Bbpm80MCOE%253D
JVFQ%2FF
JVFQ%252FF
The value of the ?token= in the url is different from params['token'].
Why does it change?
You passed in a URL encoded value, but requests encodes the value for you. As a result, the value is encoded twice; the % character is encoded to %25.
Don't pass in a URL-encoded value. Decode it manually if you must:
from urllib import unquote
params['token'] = unquote(params['token'])
URL's use a special type of syntax. The % character is a reserved character in URLs. It is used as an escape character to allow you to type other characters (such as space, #, and % itself).
Requests automatically encodes URLs to proper syntax when necessary. The % key had to be econded to "%25". In other words, the URL parameters never changed. They are the same. The URL was just encoded to proper syntax. Everywhere you put "%" it was encoded to the proper form of "%25"
You can check out URL Syntax info here if you want:
http://en.wikipedia.org/wiki/Uniform_resource_locator#Syntax
And you can encode/decode URLs here. Try encoding "%" or try decoding "%25" to see what you get:
http://www.url-encode-decode.com/
Related
I have been searching all over the place for this, but I couldn't solve my issue.
I am using a local API to fetch some data, in that API, the wildcard is the percent character %.
The URL is like so : urlReq = 'http://myApiURL?ID=something¶meter=%w42'
And then I'm passing this to the get function:
req = requests.get(urlReq,auth=HTTPBasicAuth(user, pass))
And get the following error: InvalidURL: Invalid percent-escape sequence: 'w4'
I have tried escaping the % character using %%, but in vain. I also tried the following:
urlReq = 'http://myApiURL?ID=something¶meter=%sw42' % '%' but didn't work as well.
Does anyone know how to solve this?
PS I'm using Python 2.7.8 :: Anaconda 1.9.1 (64-bit)
You should have a look at urllib.quote - that should do the trick. Have a look at the docs for reference.
To expand on this answer: The problem is, that % (+ a hexadecimal number) is the escape sequence for special characters in URLs. If you want the server to interpret your % literaly, you need to escape it as well, which is done by replacing it with %25. The aforementioned qoute function does stuff like that for you.
Let requests construct the query string for you by passing the parameters in the params argument to requests.get() (see documentation):
api_url = 'http://myApiURL'
params = {'ID': 'something', 'parameter': '%w42'}
r = requests.get(api_url, params=params, auth=(user, pass))
requests should then percent encode the parameters in the query string for you. Having said that, at least with requests version 2.11.1 on my machine, I find that the % is encoded when passing it in the url, so perhaps you could check which version you are using.
Also for basic authentication you can simply pass the user name and password in a tuple as shown above.
in requests you should use requests.compat.quote_plus here's take alook
example :
>>> requests.compat.quote_plus('example: parameter=%w42')
'example%3A+parameter%3D%25w42'
Credits to #Tryph:
the % is used to encode special characters in urls. you can encode the % character with this sequence %25. see here for more detail: w3schools.com/tags/ref_urlencode.asp
I am using python requests to connect to a website. I am passing some strings to get data about them.
The problem is some string contains slash /, so when they are passed in url, I got a ValueError.
this is my url:
https://api.flipkart.net/sellers/skus/%s/listings % string
when string is passed (string that does not contain slash), I get:
https://api.flipkart.net/sellers/skus/A35-Charry-228_39/listings
It returns a valid response. but when i pass string which contains a slash:
string = "L20-ORG/BLUE-109(38)"
I get url like:
https://api.flipkart.net/sellers/skus/L20-ORG/BLUE-109(38)/listings
Which throws the error.
how to solve this?
Raw string literals in Python
string = r"L20-ORG/BLUE-109(38)"
You could find more info here and here.
urllib.quote_plus is your friend. As urllib is a module from the standard library, you just have to import it with import urllib.
If you want to be conservative, just use it with default value:
string = urllib.quote_plus("L20-ORG/BLUE-109(38)")
gives 'L20-ORG%2FBLUE-109%2838%29'
If you know that some characters are harmless for your use case (say parentheses):
string = urllib.quote_plus("L20-ORG/BLUE-109(38)", '()')
gives 'L20-ORG%2FBLUE-109(38)'
Now I'm working on Wikipedia. In many articles, I noticed some URLs, for example, https://www.google.com/search?q=%26%E0%B8%89%E0%B8%B1%E0%B8%99, are very long. The example URL can be replaced with "https://www.google.com/search?q=%26ฉัน" (ฉัน is a Thai word) which is shorter and cleaner. However, when I use urllib.unquote function to decode URL, it decodes even %26 and get "https://www.google.com/search?q=&ฉัน" as the result. As you might have noticed, this URL is useless; it doesn't make a valid link.
Therefore, I want to know how to get decode link while it is valid. I think that decoding only non-ascii character would get the valid URL. Is it correct? and how to do that?
Thanks :)
Easiest way, you can replace all URL encode sequence below %80 (%00-%7F) with some placeholder, do a URL decode, and replace the original URL encode sequence back into the placeholder.
Another way is look for UTF-8 sequences. Your URL appears to be encoded in UTF-8, and Wikipedia uses UTF-8. You can see the Wikipedia entry for UTF-8 for how UTF-8 characters are encoded.
So, when encoded in URLs, each valid non-ascii UTF-8 character would follow one of these patterns:
(%C0-%DF)(%80-%BF)
(%E0-%EF)(%80-%BF)(%80-%BF)
(%F0-%F7)(%80-%BF)(%80-%BF)(%80-%BF)
(%F8-%FB)(%80-%BF)(%80-%BF)(%80-%BF)(%80-%BF)
(%FC-%FD)(%80-%BF)(%80-%BF)(%80-%BF)(%80-%BF)(%80-%BF)
So you can match these patterns in the URL and unquote each character separately.
However, remember that not all URLs are encoded in UTF-8.
In some old websites, they still use other character sets, such as Windows-874 for Thai language.
In such cases, "ฉัน" for that particular website is encoded as "%A9%D1%B9" instead of "%E0%B8%89%E0%B8%B1%E0%B8%99". If you decode it using urllib.unquote you will get some garbled text like "?ѹ" instead of "ฉัน" and that could break the link.
So you have to be careful and check if the URL decoding break the link or not. Make sure that the URL you're decoding is in UTF-8.
When I enter a URL into maps.google.com such as https://dl.dropbox.com/u/94943007/file.kml , it will encode this URL into:
https:%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml
I am wondering what is this encoding called and is there a way to encode a URL like this using python?
I tried this:
The process is called URL encoding:
>>> urllib.quote('https://dl.dropbox.com/u/94943007/file.kml', '')
'https%3A%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml'
but did not get the expected results:
'https%3A//dl.dropbox.com/u/94943007/file.kml'
what i need is this:
https:%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml
how do i encode this URL properly?
the documentation here:
https://developers.google.com/maps/documentation/webservices/
states:
All characters to be URL-encoded are encoded using a '%' character and
a two-character hex value corresponding to their UTF-8 character. For
example, 上海+中國 in UTF-8 would be URL-encoded as
%E4%B8%8A%E6%B5%B7%2B%E4%B8%AD%E5%9C%8B. The string ? and the
Mysterians would be URL-encoded as %3F+and+the+Mysterians.
Use
urllib.quote_plus(url, safe=':')
Since you don't want the colon encoded you need to specify that when calling urllib.quote():
>>> expected = 'https:%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml'
>>> url = 'https://dl.dropbox.com/u/94943007/file.kml'
>>> urllib.quote(url, safe=':') == expected
True
urllib.quote() takes a keyword argument safe that defaults to / and indicates which characters are considered safe and therefore don't need to be encoded. In your first example you used '' which resulted in the slashes being encoded. The unexpected output you pasted below where the slashes weren't encoded probably was from a previous attempt where you didn't set the keyword argument safe at all.
Overriding the default of '/' and instead excluding the colon with ':' is what finally yields the desired result.
Edit: Additionally, the API calls for spaces to be encoded as plus signs. Therefore urllib.quote_plus() should be used (whose keyword argument safe doesn't default to '/').
I've got a string from an HTTP header, but it's been escaped.. what function can I use to unescape it?
myemail%40gmail.com -> myemail#gmail.com
Would urllib.unquote() be the way to go?
I am pretty sure that urllib's unquote is the common way of doing this.
>>> import urllib
>>> urllib.unquote("myemail%40gmail.com")
'myemail#gmail.com'
There's also unquote_plus:
Like unquote(), but also replaces plus signs by spaces, as required for unquoting HTML form values.
In Python 3, these functions are urllib.parse.unquote and urllib.parse.unquote_plus.
The latter is used for example for query strings in the HTTP URLs, where the space characters () are traditionally encoded as plus character (+), and the + is percent-encoded to %2B.
In addition to these there is the unquote_to_bytes that converts the given encoded string to bytes, which can be used when the encoding is not known or the encoded data is binary data. However there is no unquote_plus_to_bytes, if you need it, you can do:
def unquote_plus_to_bytes(s):
if isinstance(s, bytes):
s = s.replace(b'+', b' ')
else:
s = s.replace('+', ' ')
return unquote_to_bytes(s)
More information on whether to use unquote or unquote_plus is available at URL encoding the space character: + or %20.
Yes, it appears that urllib.unquote() accomplishes that task. (I tested it against your example on codepad.)