I have been searching all over the place for this, but I couldn't solve my issue.
I am using a local API to fetch some data, in that API, the wildcard is the percent character %.
The URL is like so : urlReq = 'http://myApiURL?ID=something¶meter=%w42'
And then I'm passing this to the get function:
req = requests.get(urlReq,auth=HTTPBasicAuth(user, pass))
And get the following error: InvalidURL: Invalid percent-escape sequence: 'w4'
I have tried escaping the % character using %%, but in vain. I also tried the following:
urlReq = 'http://myApiURL?ID=something¶meter=%sw42' % '%' but didn't work as well.
Does anyone know how to solve this?
PS I'm using Python 2.7.8 :: Anaconda 1.9.1 (64-bit)
You should have a look at urllib.quote - that should do the trick. Have a look at the docs for reference.
To expand on this answer: The problem is, that % (+ a hexadecimal number) is the escape sequence for special characters in URLs. If you want the server to interpret your % literaly, you need to escape it as well, which is done by replacing it with %25. The aforementioned qoute function does stuff like that for you.
Let requests construct the query string for you by passing the parameters in the params argument to requests.get() (see documentation):
api_url = 'http://myApiURL'
params = {'ID': 'something', 'parameter': '%w42'}
r = requests.get(api_url, params=params, auth=(user, pass))
requests should then percent encode the parameters in the query string for you. Having said that, at least with requests version 2.11.1 on my machine, I find that the % is encoded when passing it in the url, so perhaps you could check which version you are using.
Also for basic authentication you can simply pass the user name and password in a tuple as shown above.
in requests you should use requests.compat.quote_plus here's take alook
example :
>>> requests.compat.quote_plus('example: parameter=%w42')
'example%3A+parameter%3D%25w42'
Credits to #Tryph:
the % is used to encode special characters in urls. you can encode the % character with this sequence %25. see here for more detail: w3schools.com/tags/ref_urlencode.asp
Related
I want to mimic URL encoding for Chinese characters. For my use case, I have a searching URL for a e-commerce site
'https://search.jd.com/Search?keyword={}'.format('ipad')
When I search a product in english, this works fine. However, I need to have input in Chinese, I tried
'https://search.jd.com/Search?keyword={}'.format('耐克t恤')
, and found the following encoding under the network tab
https://list.tmall.com/search_product.htm?q=%C4%CD%BF%CBt%D0%F4
So basically, I need to encode inputs like '耐克t恤' into '%C4%CD%BF%CBt%D0%F4'. I'm not sure which encoding the website is using? Also, how to convert Chinese characters to these encodings with python?
Update: I checked headers and it seems like content encoding is gzip?
Try using the library urllib.parse module. More specifically, urllib.parse.urlencode() function. You can pass the encoding (in this case it appears to be 'gb2312') and a dict containing the query parameters to get a valid valid url suffix which you can use directly.
In this case, your code will look something like:
import urllib.parse
keyword = '耐克t恤'
url = 'https://search.jd.com/Search?{url_suffix}'.format(url_suffix=urllib.parse.urlencode({'keyword': keyword}, encoding='gb2312'))
More info about encoding here
More info about urlencode here
The encoding used seems to be GB2312
This could help you:
def encodeGB2312(data):
hexData = data.encode(encoding='GB2312').hex().upper()
encoded = '%' + '%'.join(hexData[i:i + 2] for i in range(0, len(hexData), 2))
return encoded
output = encodeGB2312('耐克t恤')
print(output)
url = f'https://list.tmall.com/search_product.htm?q={output}'
print(url)
Output:
%C4%CD%BF%CB%74%D0%F4
https://list.tmall.com/search_product.htm?q=%C4%CD%BF%CB%74%D0%F4
The only problem with my code is that it doesn't seem to 100% corrospond with the link you are trying to achieve. It converts the t chacaracter into GB2312 encoding. While it seems to use the non encoded t character in your link. Altough it still seems to work when opening the url.
Edit:
Vignesh Bayari R his post handles the URL in the correct (intended) way. But in this case my solution works too.
Facing some issue in calling API using request library. Problem is described as follows
The code:.
r = requests.post(url, data=json.dumps(json_data), headers=headers)
When I perform r.text the apostrophe in the string is giving me as
like this Bachelor\u2019s Degree. This should actually give me the response as Bachelor's Degree.
I tried json.loads also but the single quote problem remains the same,
How to get the string value correctly.
What you see here ("Bachelor\u2019s Degree") is the string's inner representation, where "\u2019" is the unicode codepoint for "RIGHT SINGLE QUOTATION MARK". This is perfectly correct, there's nothing wrong here, if you print() this string you'll get what you expect:
>>> s = 'Bachelor\u2019s Degree'
>>> print(s)
Bachelor’s Degree
Learning about unicode and encodings might save you quite some time FWIW.
EDIT:
When I save in db and then on displaying on HTML it will cause issue
right?
Have you tried ?
Your database connector is supposed to encode it to the proper encoding (according to your fields, tables and client encoding settings).
wrt/ "displaying it on HTML", it mostly depends on whether you're using Python 2.7.x or Python 3.x AND on how you build your HTML, but if you're using some decent framework with a decent template engine (if not you should reconsider your stack) chances are it will work out of the box.
As I already mentionned, learning about unicode and encodings will save you a lot of time.
It's just using a UTF-8 encoding, it is not "wrong".
string = 'Bachelor\u2019s Degree'
print(string)
Bachelor’s Degree
You can decode and encode it again, but I can't see any reason why you would want to do that (this might not work in Python 2):
string = 'Bachelor\u2019s Degree'.encode().decode('utf-8')
print(string)
Bachelor’s Degree
From requests docs:
When you make a request, Requests makes educated guesses about the
encoding of the response based on the HTTP headers. The text encoding
guessed by Requests is used when you access r.text
On the response object, you may use .content instead of .text to get the response in UTF-8
i have a function to download a website html code with the urllib3 library. I'm using the request_encode_url function to pass arguments by GET and it works fine if i do not use special latin characters like 'ñ'. If i use 'ñ', the url is not properly encoded.
For instance, if i pass an argument like "El señor" this function converts it to "El+señor" instead of "El+se%F1or".
z='El señor'
fields={'sec':'search','value': z}
http = urllib3.PoolManager()
r = http.request_encode_url('GET', 'http://www.myurl.com/search.php',fields)
The expected url must be like:
http://www.myurl.com/search.php?sec=search&value=El+se%F1or
but if i use special characters i obtain next url:
http://www.myurl.com/search.php?sec=search&value=El+señor
Somebody can say me how can i pass arguments with special characters to encode a correct url?
I'm using Python 3.4
I found a solution, maybe it's a silly thing but i have a low level in python.
I solve it encoding string into latin1:
z='El señor'
fields={'sec':'search','value': z.encode('latin1')}
http = urllib3.PoolManager()
r = http.request_encode_url('GET', 'http://www.myurl.com/search.php',fields)
I am using python requests to connect to a website. I am passing some strings to get data about them.
The problem is some string contains slash /, so when they are passed in url, I got a ValueError.
this is my url:
https://api.flipkart.net/sellers/skus/%s/listings % string
when string is passed (string that does not contain slash), I get:
https://api.flipkart.net/sellers/skus/A35-Charry-228_39/listings
It returns a valid response. but when i pass string which contains a slash:
string = "L20-ORG/BLUE-109(38)"
I get url like:
https://api.flipkart.net/sellers/skus/L20-ORG/BLUE-109(38)/listings
Which throws the error.
how to solve this?
Raw string literals in Python
string = r"L20-ORG/BLUE-109(38)"
You could find more info here and here.
urllib.quote_plus is your friend. As urllib is a module from the standard library, you just have to import it with import urllib.
If you want to be conservative, just use it with default value:
string = urllib.quote_plus("L20-ORG/BLUE-109(38)")
gives 'L20-ORG%2FBLUE-109%2838%29'
If you know that some characters are harmless for your use case (say parentheses):
string = urllib.quote_plus("L20-ORG/BLUE-109(38)", '()')
gives 'L20-ORG%2FBLUE-109(38)'
params = {'token': 'JVFQ%2FFb5Ri2aKNtzTjOoErWvAaHRHsWHc8x%2FKGS%2FKAuoS4IRJI161l1rz2ab7rovBzGB86bGsh8pmDVaW8jj6AiJ2jT2rLIyt%2Bbpm80MCOE%3D'}
rsp = requests.get("http://xxxx/access", params=params)
print rsp.url
print params
when print rsp.url, I get
http://xxxx/access?token=JVFQ%252FFb5Ri2aKNtzTjOoErWvAaHRHsWHc8x%252FKGS%252FKAuoS4IRJI161l1rz2ab7rovBzGB86bGsh8pmDVaW8jj6AiJ2jT2rLIyt%252Bbpm80MCOE%253D
JVFQ%2FF
JVFQ%252FF
The value of the ?token= in the url is different from params['token'].
Why does it change?
You passed in a URL encoded value, but requests encodes the value for you. As a result, the value is encoded twice; the % character is encoded to %25.
Don't pass in a URL-encoded value. Decode it manually if you must:
from urllib import unquote
params['token'] = unquote(params['token'])
URL's use a special type of syntax. The % character is a reserved character in URLs. It is used as an escape character to allow you to type other characters (such as space, #, and % itself).
Requests automatically encodes URLs to proper syntax when necessary. The % key had to be econded to "%25". In other words, the URL parameters never changed. They are the same. The URL was just encoded to proper syntax. Everywhere you put "%" it was encoded to the proper form of "%25"
You can check out URL Syntax info here if you want:
http://en.wikipedia.org/wiki/Uniform_resource_locator#Syntax
And you can encode/decode URLs here. Try encoding "%" or try decoding "%25" to see what you get:
http://www.url-encode-decode.com/