Json UnicodeEncodeError for requests response - python

I get this error:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2082' in position 1088: character maps to <undefined>
when I try to load a request text to JSON in python 3.9, so my code is:
rq= requests.get(url)
rqJson= json.loads(rq.text)
print(rqJson)
I'm using Windows 10 and sys.stdout.encoding is utf-8
any help with this please?

Related

How to decode python byte code to ASCII? (Selenium. Getting xml from network response)

How to decode pyhon bytecode to ascii?
I extract data with selenium from network response. Should get xml.
Getting: ['b'\xa5\xff\xff\xc7\x88\xe4\xb4\xd7\x03\xa0\x11:|\xce\xdb\xb7\x0f\xf1\xdf\xfc\x1f\xdb\x93\x91^\xbc\xa3\xdd\xc2\x02V\x00\xba$\xbd\x10\xd2\xd0E\xf2\x90\xb6\xca\xee\x10\xbf\xbf_\xbf\xfc\xef?\xe9\x13{H\xf1\xa1\xa0\x00\x1c\x01(\x80\x1c\x81\x02(s\xe7Z\xf3\xb3N\xf5L\xdc>\xe7\x8f\xbbwl\xbf\x99\x91\xd4O\xde\xb4,\xf3PH\x02L1\x00\xc98\xc3,\x13!\x82\xc6\xc2\xa6Bd"k\xcb\x9d(\xb9\x13%WQr\x15%W\xb1\xe5J\t\x9e:\x8a\x03\x99\x06H\xd0\x8f\xd8\xfe\x9f9\xbc\xfc\x157\x111\xd7\x15\xaab\xfb\xe8;\xab\xee\xfc\x9b\xeeu\x10<d\x04\x06Y\xa8\xd7\x9f\x11...
Code:
...
for request in driver.requests: if request.response: text_file.write(str(request.response.body))
I've tried:
decoded = request.response.body.decode('ascii')
or request.response.body.decode('utf-8') or cp1251/1252
I get:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa5 in position 0: ordinal not in range(128)
Response should be xml (~1,5mb) in attached photoresponse
If I use:
decoded = base64.b64decode(request.response.body)
I'm getting smth like: b'T#\x00\xad\x9a\xb5\xba\xfa3u\xca\x84PG\xbd\x8a\xab\x1f\xcdcJ%\r\xd4\xff\x0c$)\x9a>.... not what to be expected.
Combining decoded = base64.b64decode(request.response.body).decode('ascii') also doesnt help:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 0: ordinal not in range(128)
Help me, please.
Its because of the Header 'Content-Encoding': 'br'
Installing brotly helped. Also deleting
This message helped a lot

How to fix UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d : character maps to <undefined>?

I am using curl to data.
import os
cmd = "curl --data \"action=getdata\" https:localhost:8070"
print(cmd)
data = os.popen(cmd).read()
The line above produces an error UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 565334: character maps to <undefined>.
When I debugged using breakpoints, the command os.popen generates a large corpus of text and when it goes to read() the error arises in file cp1252.py in IncrementalDecoder class. I tried doing,
data = os.popen(cmd).read().encode('utf-8').decode('ascii')
and
data = os.popen(cmd).read().encode().decode('utf-8')
But the error persists. How can we solve this?

UnicodeEncodeError: 'charmap' codec can't encode character in Python 3.5

I can't find solution for error:
UnicodeEncodeError: 'charmap' codec can't encode character '\x96' in position 582: character maps to <undefined>
which appears when I try to redirect output to file with:
python.exe page_query.py > output.html
There is no problem to display output in powershell with just:
python.exe page_query.py
but I had to use chcp 65001 command first.
Here is my short code:
import requests
payload = {'st': 'C3225X6S0J107M250AC'}
r = requests.get('http://pl.farnell.com/webapp/wcs/stores/servlet/Search?catalogId=15001&langId=-22&storeId=10170&categoryName=Wszystkie%20kategorie&selectedCategoryId=&gs=true&', params=payload)
print(r.encoding)
unicode_str = r.text
print(unicode_str)
Could you help with it?

Character encoding error: UnicodeEncodeError: 'charmap' codec can't encode character X in position Y: character maps to <undefined>

I'm trying to scrape yahoo finance web pages to get stock price data with Python 3.3, httplib2, and beautifulsoup4. Here is the code:
def getData (symbol = 'GOOG', period = 'm'):
baseUrl = 'http://finance.yahoo.com/q/hp?s='
url = baseUrl + symbol + '&g=' + period
h = httplib2.Http('.cache')
response, content = h.request(url)
soup = BeautifulSoup(content)
print(soup.prettify())
getData()
I get the following error trace:
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/encodings/mac_roman.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xd7' in position 11875: character maps to <undefined>
I'm new to python and the libraries and would greatly appreciate your help!
This is due to the encoding of your console.
Depending on which console you're working in (Windows, Mac, Linux) the console is trying to display characters it doesn't recognize and therefore can't print to screen.
You could try converting the output string into the encoding of your console.
I found an easy way was to just convert your data into a string and it prints just fine.

UnicodeEncodeError with BeautifulSoup 3.1.0.1 and Python 2.5.2

With BeautifulSoup 3.1.0.1 and Python 2.5.2, and trying to parse a web page in French. However, as soon as I call findAll, I get the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1146: ordinal not in range(128)
Below is the code I am currently running:
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://fr.encarta.msn.com/encyclopedia_761561798/Paris.html")
soup = BeautifulSoup(page, fromEncoding="latin1")
r = soup.findAll("table")
print r
Does anybody have an idea why?
Thanks!
UPDATE: As resquested, below is the full Traceback
Traceback (most recent call last):
File "[...]\test.py", line 6, in <module>
print r
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1146-1147: ordinal not in range(128)
Here is another idea. Your terminal is not capable of displaying an unicode string from Python. The interpreter tries to convert it to ASCII first. You should encode it explicitly before printing. I don't know the exact semantics of soup.findAll(). But it is probably something like:
for t in soup.findAll("table"):
print t.encode('latin1')
If t really is a string. Maybe its just another object from which you have to build the data that you want to display.

Categories