Character encoding in a GET request with http.client lib (Python) - python

I am a beginner in python and I coded this little script to send an HTTP GET request on my local server (localhost). It works great, except that I wish I could send Latin characters such as accents.
import http.client
httpMethod = "GET"
url = "localhost"
params = "Hello World"
def httpRequest(httpMethod, url, params):
conn = http.client.HTTPConnection(url)
conn.request(httpMethod, '/?param='+params)
conn.getresponse().read()
conn.close()
return
httpRequest(httpMethod, url, params)
When I insert the words with accent in my parameter "params", this is the error message that appears:
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 14: ordinal not in range(128)
I don't know if there is a solution using http.client library but I think so. When I look in the documentation http.client, I can see this:
HTTPConnection.request
Strings are encoded as ISO-8859-1, the default charset for HTTP

You shouldn't construct arguments manually. Use urlencode instead:
>>> from urllib.parse import urlencode
>>> params = 'Aserejé'
>>> urlencode({'params': params})
'params=Aserej%C3%A9'
So, you can do:
conn.request(httpMethod, '/?' + urlencode({'params': params}))
Also note that yout string will be encoded as UTF-8 before being URL-escaped.

Related

UnicodeEncodeError : 'latin-1' codec can't encode character '\u01e2' in position 8: ordinal not in range(256). How to solve? [duplicate]

This question already has answers here:
What are the character encodings UTF-8 and ISO-8859-1 rules
(2 answers)
Closed 2 years ago.
I have to access to a db through this code that is provided by MobiDB to have disorder prediction in proteins.
import urllib2
import json
# Define request
acceptHeader = 'My_File_TrEMBL.txt' # text/csv and text/plain supported
request = urllib2.Request("https://mobidb.org/ws/P04050/uniprot", headers={"Accept" : acceptHeader})
# Send request
response = urllib2.urlopen(request)
# Parse JSON response di Python dict
data = json.load(response)
# handle data
print(data)
Since I'm not using Python 2.6 I changed the script as follows:
import urllib.request
import json
# Define request
acceptHeader ='My_File_TrEMBL.txt'
# text/csv and text/plain supported
request = urllib.request.Request("https://mobidb.org/ws/P04050/uniprot", headers={"Accept" :
acceptHeader})
# Send request
response = urllib.request.urlopen(request)
# Parse JSON response di Python dict
data = json.load(response)
# handle data
print(data)
So I am not using urllib2 but urllib.request. The problem arises when the variable request is passed to urllib.request.urlopen that returns me this instance:
" 'latin-1' codec can't encode character '\u01e2' in position 8: ordinal not in range(256) "
I understood that is something related to ASCII code, but since I am new to Python and I am eager given the deadline of the work I'd like any help you can give me.
Obliged
Decode the bytes content using utf-8 encoding and read the content sing json.loads
response = urllib.request.urlopen(request)
#get the content and decode it using utf-8
respcontent = response.read().decode('utf-8')
data = json.loads(respcontent)
print(data)

UnicodeDecodeError on Windows, but not when running the exact same code on Mac

I'm trying to download json data via an API. The code is as follows:
import urllib.request, ssl, json
context = ssl._create_unverified_context()
rsbURL = "https://rsbuddy.com/exchange/summary.json"
with urllib.request.urlopen(rsbURL, context=context) as url:
data = json.loads(url.read().decode('UTF-8'))
This code works perfectly fine on my Mac, and I confirmed that data is what is supposed to be the JSON string. However, when I run the exact same code on windows, I get this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
What is going on and how do I fix it?
Looks like the server is sending a compressed response for some reason (it shouldn't be doing that unless you explicitly set the accept-encoding header). You can adapt your code to work with compressed responses like this:
import gzip
import urllib.request, ssl, json
context = ssl._create_unverified_context()
rsbURL = "https://rsbuddy.com/exchange/summary.json"
with urllib.request.urlopen(rsbURL, context=context) as url:
if url.info().get('Content-Encoding') == 'gzip':
body = gzip.decompress(url.read())
else:
body = url.read()
data = json.loads(body)

sending binary data over json

I wanted to upload file to s3 using python. I am using requests_aws4 auth library for this
import requests
from requests_aws4auth import AWS4Auth
# data=encode_audio(data)
endpoint = 'http://bucket.s3.amazonaws.com/testing89.mp3'
data = //some binary data(mp3) here
auth = AWS4Auth('xxxxxx', 'xxxxxx', 'eu-west-2', 's3')
response = requests.put(endpoint,data=data, auth=auth, headers={'x-amz-acl': 'public-read'})
print response.text
I have tried the above code and got the following error.
'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128).
This works fine if I send text data since it is also ascii but when the binary data is being sent I think there is some concatenation error of binary data with auth data. Am I wrong somewhere? Someone please guide me. Thanks.

URLLIB Request unable to handle Chinese Character

I'm trying to do a curl request to api but the chinese chracter is giving a error
UnicodeEncodeError: 'ascii' codec can't encode characters in position 96-97: ordinal not in range(128)
Unlike in php where i can simply just throw the character into curl but python seem to be different. Any idea how i can do the same?
Here's the code I'm using:
query = urllib.parse.urlencode({'': '不存'})
url = 'https://www.googleapis.com/language/translate/v2' + query
print(url);
search_response = urllib.request.urlopen(url)
search_results = search_response.read().decode("utf8")
results = json.loads(search_results)
data = results['responseData']
print(data);

Decode Cookie variable extracted from a HTTP Stream - Python

I am using python to send a request to a server. I get a cookie from the server. I am trying to decode the encoding scheme used by the server - I suspect it's either utf-8 or base64.
So I create my header and connection objects.
resp, content = httpobj.request(server, 'POST', headers=HTTPheader, body=HTTPbody)
And then i extract the cookie from the HTTP Stream
cookie= resp['set-cookie']
I have tried str.decode() and unicode() but I am unable to get the unpacked content of the cookie.
Assume the cookie is
MjAyMTNiZWE4ZmYxYTMwOVPJ7Jh0B%2BMUcE4si5oDcH7nKo4kAI8CMYgKqn6yXpgtXOSGs8J9gm20bgSlYMUJC5rmiQ1Ch5nUUlQEQNmrsy5LDgAuuidQaZJE5z%2BFqAJPnlJaAqG2Fvvk5ishG%2FsH%2FA%3D%3D
The output I am expecting is
20213bea8ff1a309SÉì˜tLQÁ8².hÁûœª8<Æ
*©úÉzµs’Ïö¶Ñ¸•ƒ$.kš$5gQIPf®Ì¹,8�ºèA¦IœöZ€$ùå% *ao¾Nb²¶ÁöÃ
Try like this:
import urllib
import base64
cookie_val = """MjAyMTNiZWE4ZmYxYTMwOVPJ7Jh0B%2BMUcE4si5oDcH7nKo4kAI8CMYgKqn6yXpgtXOSGs8J9gm20bgSlYMUJC5rmiQ1Ch5nUUlQEQNmrsy5LDgAuuidQaZJE5z%2BFqAJPnlJaAqG2Fvvk5ishG%2FsH%2FA%3D%3D"""
res = base64.b64decode(urllib.unquote(cookie_val))
print repr(res)
Output:
"20213bea8ff1a309S\xc9\xec\x98t\x07\xe3\x14pN,\x8b\x9a\x03p~\xe7*\x8e$\x00\x8f\x021\x88\n\xaa~\xb2^\x98-\\\xe4\x86\xb3\xc2}\x82m\xb4n\x04\xa5`\xc5\t\x0b\x9a\xe6\x89\rB\x87\x99\xd4RT\x04#\xd9\xab\xb3.K\x0e\x00.\xba'Pi\x92D\xe7?\x85\xa8\x02O\x9eRZ\x02\xa1\xb6\x16\xfb\xe4\xe6+!\x1b\xfb\x07\xfc"
Of course the result here is a 8-bit string, so you have to decode it to get the the string that you want, i'm not sure which encoding to use, but there is the decoding result using the unicode-escape (unicode literal) :
>>> print unicode(res, 'unicode-escape')
20213bea8ff1a309SÉìtãpN,p~ç*$1ª~²^-\ä³Â}m´n¥`ÅBÔRT#Ù«³.K.º'PiDç?¨ORZ¡¶ûäæ+!ûü
Well Hope this can help .

Categories