UnicodeEncodeError in Python 3.6 - python

I'm trying to find a way to disable this error logging in my python code. The program seems to actually run fine, the search function just returns a whole buttload of json objects with dozens of attributes each whenever it finds a character that it cant print it will print the thousands of json objects returned
to the console.
I wrapped the guilty code (below) in a try block but it hasn't changed anything.
try:
results = api.search(query)
print('Station hits: ', len(results['station_hits']), '\nSong hits: ', len(results['song_hits']), '\nArtist hits: ', len(results['artist_hits']), '\nAlbum hits: ', len(results['album_hits'])).encode('ascii', 'ignore')
except UnicodeEncodeError:
pass
Here is the error that is printed to the console. (Without the buttload of text referenced earlier)
--- Logging error ---
Traceback (most recent call last):
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\logging\__init__.py", line 989, in emit
stream.write(msg)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2117' in position 108194: character maps to <undefined>
Call stack:
File "gpm.py", line 247, in <module>
main()
File "gpm.py", line 181, in main
results = api.search(query)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\site-packages\gmusicapi\clients\mobileclient.py", line 1806, in search
res = self._make_call(mobileclient.Search, query, max_results)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\site-packages\gmusicapi\clients\shared.py", line 84, in _make_call
return protocol.perform(self.session, self.validate, *args, **kwargs)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\site-packages\gmusicapi\protocol\shared.py", line 243, in perform
log.debug(cls.filter_response(parsed_response))

Trace-back etc reveals: Can't encode u"\u2117" using cp1252, not surprising, use utf8 instead.

Related

indian rupee symbol UnicodeEncodeError while uploading file to s3 using pandas

I have scraped some data from a website for my assignment. It consists of Indian rupee character - "₹". The data when I'm trying to save into CSV file in utf-8 characters on local machine using pandas, it is saving effortlessly. The same file, I have changed the delimiters and tried to save the file to s3 using pandas, but it gave "UnicodeEncodeError" error. I'm scraping the web page using scrapy framework.
Earlier I was trying to save the file in Latin-1 i.e. "ISO-8859-1" formatting and hence changed to "utf-8" but the same error is occurring. I'm using pythn 3.7 for the development.
Below code used for saving on the local machine which is working:
result_df.to_csv(filename+str2+'.csv',index=False)
Below code is used to save the file to S3:
search_df.to_csv('s3://my-bucket/folder_path/filename_str2.csv',encoding = 'utf-8',line_terminator='^',sep='~',index=False)
Below is the error while saving the file to S3:
2019-10-29 19:24:27 [scrapy.utils.signal] ERROR: Error caught on signal handler: <function Spider.close at 0x0000019CD3B1AA60>
Traceback (most recent call last):
File "c:\programdata\anaconda3\lib\site-packages\twisted\internet\defer.py", line 151, in maybeDeferred
result = f(*args, **kw)
File "c:\programdata\anaconda3\lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "c:\programdata\anaconda3\lib\site-packages\scrapy\spiders\__init__.py", line 94, in close
return closed(reason)
File "C:\local_path\spiders\Pduct_Scrape.py", line 430, in closed
search_df.to_csv('s3://my-bucket/folder_path/filename_str2.csv',encoding = 'utf-8',line_terminator='^',sep='~',index=False)
File "c:\programdata\anaconda3\lib\site-packages\pandas\core\generic.py", line 3020, in to_csv
formatter.save()
File "c:\programdata\anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 172, in save
self._save()
File "c:\programdata\anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 288, in _save
self._save_chunk(start_i, end_i)
File "c:\programdata\anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 315, in _save_chunk
self.cols, self.writer)
File "pandas/_libs/writers.pyx", line 75, in pandas._libs.writers.write_csv_rows
File "c:\programdata\anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20b9' in position 2661: character maps to <undefined>
I am very new to this StackOverflow platform and please let me know if more information is to be presented.
The error gives an evidence that the code tries to encode the filename_str2.csv file in cp1252. From your stack trace:
...File "C:\local_path\spiders\Pduct_Scrape.py", line 430, in closed
search_df.to_csv('s3://my-bucket/folder_path/ filename_str2.csv ',......
File "c:\programdata\anaconda3\lib\encodings\ cp1252.py ", line 19, in encode
The reason I do not know, because you explicitely ask for an utf-8 encoding. But as the codecs page in the Python Standard Library reference says that the canonical name for utf8 is utf_8 (notice the underline instead of minus sign) and does not list utf-8 in allowed aliases, I would first try to use utf_8. If it still uses cp1252, then you will have to give the exact versions of Python and pandas that you are using.

Geograpy3 library is not working properly and give traceback error

I just try to extract location from a text using the geograpy3 library. But it throws an error.
for content in feedContent:
if content != "":
place = geograpy.get_place_context(text=content)
placesInFeed.append(place.places)
else:
placesInFeed.append("null")
The result is
Traceback (most recent call last):
File "C:/Users/Peshala/Documents/SDGP/Location-based-news-recommendation-master/Backend/rss_scraper.py", line 46, in <module>
place = geograpy.get_place_context(text=content)
File "C:\Users\Peshala\PycharmProjects\Location-based-news-recommendation\venv\lib\site-packages\geograpy\__init__.py", line 11, in get_place_context
pc.set_cities()
File "C:\Users\Peshala\PycharmProjects\Location-based-news-recommendation\venv\lib\site-packages\geograpy\places.py", line 137, in set_cities
self.populate_db()
File "C:\Users\Peshala\PycharmProjects\Location-based-news-recommendation\venv\lib\site-packages\geograpy\places.py", line 30, in populate_db
for row in reader:
File "C:\Users\Peshala\AppData\Local\Programs\Python\Python36\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 277: character maps to <undefined>
As a committer of geograpy3 to reproduce your issue i added a test to the most recent geograpy3 https://github.com/somnathrakshit/geograpy3/blob/master/tests/test_extractor.py:
def testStackoverflow55548116(self):
'''
see https://stackoverflow.com/questions/55548116/geograpy3-library-is-not-working-properly-and-give-traceback-error
'''
feedContent=['Las Vegas is a city in Nevada']
placesInFeed=[]
for content in feedContent:
if content != "":
e=Extractor(text=content)
e.find_entities()
places = e.places
if self.debug:
print(places)
placesInFeed.append(places)
The result might not be what you expect:
['Las', 'Vegas', 'Nevada']
but the test does not show any error so please supply the feedContent that does - you might want to fork the project and modify the test and add a pull request for your problem.

Umlauts in JSON files lead to errors in Python code created by ANTLR4

I've created python modules from the JSON grammar on github / antlr4 with
antlr4 -Dlanguage=Python3 JSON.g4
I've written a main program "JSON2.py" following this guide: https://github.com/antlr/antlr4/blob/master/doc/python-target.md
and downloaded the example1.json also from github.
python3 ./JSON2.py example1.json # works perfectly, but
python3 ./JSON2.py bookmarks-2017-05-24.json # the bookmarks contain German Umlauts like "ü"
...
File "/home/xyz/lib/python3.5/site-packages/antlr4/FileStream.py", line 27, in readDataFrom
return codecs.decode(bytes, encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 227: ordinal not in range(128)
The offending line in JSON2.py is
input = FileStream(argv[1])
I've searched stackoverflow and tried this instead of using the above FileStream:
fp = codecs.open(argv[1], 'rb', 'utf-8')
try:
input = fp.read()
finally:
fp.close()
lexer = JSONLexer(input)
stream = CommonTokenStream(lexer)
parser = JSONParser(stream)
tree = parser.json() # This is line 39, mentioned in the error message
Execution of this program ends with an error message, even if the input file doesn't contain Umlauts:
python3 ./JSON2.py example1.json
Traceback (most recent call last):
File "./JSON2.py", line 46, in <module>
main(sys.argv)
File "./JSON2.py", line 39, in main
tree = parser.json()
File "/home/x/Entwicklung/antlr/links/JSONParser.py", line 108, in json
self.enterRule(localctx, 0, self.RULE_json)
File "/home/xyz/lib/python3.5/site-packages/antlr4/Parser.py", line 358, in enterRule
self._ctx.start = self._input.LT(1)
File "/home/xyz/lib/python3.5/site-packages/antlr4/CommonTokenStream.py", line 61, in LT
self.lazyInit()
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 186, in lazyInit
self.setup()
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 189, in setup
self.sync(0)
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 111, in sync
fetched = self.fetch(n)
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 123, in fetch
t = self.tokenSource.nextToken()
File "/home/xyz/lib/python3.5/site-packages/antlr4/Lexer.py", line 111, in nextToken
tokenStartMarker = self._input.mark()
AttributeError: 'str' object has no attribute 'mark'
This parses correctly:
javac *.java
grun JSON json -gui bookmarks-2017-05-24.json
So the grammar itself is not the problem.
So finally the question: How should I process the input file in python, so that lexer and parser can digest it?
Thanks in advance.
Make sure your input file is actually encoded as UTF-8. Many problems with character recognition by the lexer are caused by using other encodings. I just took a testbed application, added ëto the list of available characters for an IDENTIFIER and it works again. UTF-8 is the key -- and make sure your grammar also allows these characters where you want to accept them.
I solved it by passing the encoding info:
input = FileStream(sys.argv[1], encoding = 'utf8')
If without the encoding info, I will have the same issue as yours.
Traceback (most recent call last):
File "test.py", line 20, in <module>
main()
File "test.py", line 9, in main
input = FileStream(sys.argv[1])
File ".../lib/python3.5/site-packages/antlr4/FileStream.py", line 20, in __init__
super().__init__(self.readDataFrom(fileName, encoding, errors))
File ".../lib/python3.5/site-packages/antlr4/FileStream.py", line 27, in readDataFrom
return codecs.decode(bytes, encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 1: ordinal not in range(128)
Where my input data is
[今明]天(台南|高雄)的?天氣如何

Python UnicodeEncodeError: codec can't encode character [duplicate]

Python throws this when using the wolfram alpha api:
Traceback (most recent call last):
File "c:\Python27\lib\threading.py", line 530, in __bootstrap_inner
self.run()
File "c:\Python27\lib\site-packages\Skype4Py\utils.py", line 225, in run
handler(*self.args, **self.kwargs)
File "s.py", line 38, in OnMessageStatus
if body[0:5] == '!math':wolfram(body[5:], '')
File "s.py", line 18, in wolfram
print "l: "+l
File "c:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xd7' in position 3
: character maps to <undefined>
how can I solve this?
Looks like you're passing in high-byte data to the API, and it's not liking that (\xd7 is the "Times" character; looks like an X). I'm not certain what purpose the print is for, but changing it to be print "l: " + repr(l) or print "l: ", l might at least get you past the above error, assuming you don't want to be in the business of converting the body to unicode (I'm assuming it's not...).
If that doesn't help, we'll need more details. Where is your input coming from? Is body unicode, or a byte string? Are you using python 2.7 or 3.x?

UnicodeEncodeError: 'charmap' codec can't encode character

Python throws this when using the wolfram alpha api:
Traceback (most recent call last):
File "c:\Python27\lib\threading.py", line 530, in __bootstrap_inner
self.run()
File "c:\Python27\lib\site-packages\Skype4Py\utils.py", line 225, in run
handler(*self.args, **self.kwargs)
File "s.py", line 38, in OnMessageStatus
if body[0:5] == '!math':wolfram(body[5:], '')
File "s.py", line 18, in wolfram
print "l: "+l
File "c:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xd7' in position 3
: character maps to <undefined>
how can I solve this?
Looks like you're passing in high-byte data to the API, and it's not liking that (\xd7 is the "Times" character; looks like an X). I'm not certain what purpose the print is for, but changing it to be print "l: " + repr(l) or print "l: ", l might at least get you past the above error, assuming you don't want to be in the business of converting the body to unicode (I'm assuming it's not...).
If that doesn't help, we'll need more details. Where is your input coming from? Is body unicode, or a byte string? Are you using python 2.7 or 3.x?

Categories