Another BeautifulSoup error 'charmap' codec can't encode character - python

I'm hoping to scrape data from the table for passengers going through TSA security lines, but I keep getting this error.
UnicodeEncodeError: 'charmap' codec can't encode character '\u2713' in position 33780: character maps to <undefined>
from this code
url = "https://www.tsa.gov/coronavirus/passenger-throughput"
page = requests.get(url).content
soup = BeautifulSoup(page, features = 'lxml')
text = soup.get_text()
soup.prettify()
print(soup)
Are there any suggestions?

Well let me explain for you what happened actually.
Read the following error:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2713' in position 33780: character maps to <undefined>
Now from my side if ran the following:
print("\u2713")
Output will be the following Unicode:
✓
I believe that you are using Windows where the default encoding of it is cp1252 not UTF-8.
You can verify that using the following:
import sys
print(sys.getdefaultencoding())
print(sys.stdin.encoding)
print(sys.stdout.encoding)
print(sys.stderr.encoding)
Or directly via cmd by running the following command: chcp
Now you can change the system encoding by opening cmd and run the following cmd:
cp 65001
Check the official doc.
Identifier .NET Name Additional information
65001 utf-8 Unicode (UTF-8)
note that if you are using VSCode with Code-Runner, kindly run your code in the terminal as py code.py or change append the following setting:
{
"code-runner.executorMap": {
"python": "set PYTHONIOENCODING=utf8 && python"
}
}
Check my previous answer for similar issue here

Related

Json UnicodeEncodeError for requests response

I get this error:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2082' in position 1088: character maps to <undefined>
when I try to load a request text to JSON in python 3.9, so my code is:
rq= requests.get(url)
rqJson= json.loads(rq.text)
print(rqJson)
I'm using Windows 10 and sys.stdout.encoding is utf-8
any help with this please?

How to use utf-8 characters in dryscrape in Python?

I need use utf-8 characters in set dryscrape method. But after run show this error:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)
My code (for example):
site = dryscrape.Session()
site.visit("https://www.website.com")
search = site.at_xpath('//*[#name="search"]')
search.set(u'فارسی')
search.form().submit()
Also u'فارسی' change to search.set(unicode('فارسی', 'utf-8')), But show this error.
Its very easy... This method working perfectly with google. Also try with any other if you know the url prams
import dryscrape as d
d.start_xvfb()
br = d.Session()
import urllib.parse
query = urllib.parse.quote("فارسی")
print(query) #it prints : '%D9%81%D8%A7%D8%B1%D8%B3%DB%8C'
Url = "http://google.com/search?q="+query
br.visit(Url)
print(br.xpath('//title')[0].text())
#it prints : Google Search - فارسی
#You can also check it with br.render("url_screenshot.png")

UnicodeEncodeError: 'charmap' codec can't encode character in Python 3.5

I can't find solution for error:
UnicodeEncodeError: 'charmap' codec can't encode character '\x96' in position 582: character maps to <undefined>
which appears when I try to redirect output to file with:
python.exe page_query.py > output.html
There is no problem to display output in powershell with just:
python.exe page_query.py
but I had to use chcp 65001 command first.
Here is my short code:
import requests
payload = {'st': 'C3225X6S0J107M250AC'}
r = requests.get('http://pl.farnell.com/webapp/wcs/stores/servlet/Search?catalogId=15001&langId=-22&storeId=10170&categoryName=Wszystkie%20kategorie&selectedCategoryId=&gs=true&', params=payload)
print(r.encoding)
unicode_str = r.text
print(unicode_str)
Could you help with it?

Character encoding error: UnicodeEncodeError: 'charmap' codec can't encode character X in position Y: character maps to <undefined>

I'm trying to scrape yahoo finance web pages to get stock price data with Python 3.3, httplib2, and beautifulsoup4. Here is the code:
def getData (symbol = 'GOOG', period = 'm'):
baseUrl = 'http://finance.yahoo.com/q/hp?s='
url = baseUrl + symbol + '&g=' + period
h = httplib2.Http('.cache')
response, content = h.request(url)
soup = BeautifulSoup(content)
print(soup.prettify())
getData()
I get the following error trace:
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/encodings/mac_roman.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xd7' in position 11875: character maps to <undefined>
I'm new to python and the libraries and would greatly appreciate your help!
This is due to the encoding of your console.
Depending on which console you're working in (Windows, Mac, Linux) the console is trying to display characters it doesn't recognize and therefore can't print to screen.
You could try converting the output string into the encoding of your console.
I found an easy way was to just convert your data into a string and it prints just fine.

UnicodeEncodeError with BeautifulSoup 3.1.0.1 and Python 2.5.2

With BeautifulSoup 3.1.0.1 and Python 2.5.2, and trying to parse a web page in French. However, as soon as I call findAll, I get the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1146: ordinal not in range(128)
Below is the code I am currently running:
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://fr.encarta.msn.com/encyclopedia_761561798/Paris.html")
soup = BeautifulSoup(page, fromEncoding="latin1")
r = soup.findAll("table")
print r
Does anybody have an idea why?
Thanks!
UPDATE: As resquested, below is the full Traceback
Traceback (most recent call last):
File "[...]\test.py", line 6, in <module>
print r
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1146-1147: ordinal not in range(128)
Here is another idea. Your terminal is not capable of displaying an unicode string from Python. The interpreter tries to convert it to ASCII first. You should encode it explicitly before printing. I don't know the exact semantics of soup.findAll(). But it is probably something like:
for t in soup.findAll("table"):
print t.encode('latin1')
If t really is a string. Maybe its just another object from which you have to build the data that you want to display.

Categories