Python UnicodeEncodeError: codec can't encode character [duplicate] - python

Python throws this when using the wolfram alpha api:
Traceback (most recent call last):
File "c:\Python27\lib\threading.py", line 530, in __bootstrap_inner
self.run()
File "c:\Python27\lib\site-packages\Skype4Py\utils.py", line 225, in run
handler(*self.args, **self.kwargs)
File "s.py", line 38, in OnMessageStatus
if body[0:5] == '!math':wolfram(body[5:], '')
File "s.py", line 18, in wolfram
print "l: "+l
File "c:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xd7' in position 3
: character maps to <undefined>
how can I solve this?

Looks like you're passing in high-byte data to the API, and it's not liking that (\xd7 is the "Times" character; looks like an X). I'm not certain what purpose the print is for, but changing it to be print "l: " + repr(l) or print "l: ", l might at least get you past the above error, assuming you don't want to be in the business of converting the body to unicode (I'm assuming it's not...).
If that doesn't help, we'll need more details. Where is your input coming from? Is body unicode, or a byte string? Are you using python 2.7 or 3.x?

Related

How do we remove all emoji values from strings in python 3?

I am trying to write a program that will get tweets and then insert them into a csv file but I get this error:
Traceback (most recent call last):
File "c:/Users/Fateh Aliyev/Desktop/Python/AI/Data Mining/data.py", line 30, in <module>
csv.writerow([text, 0])
File "C:\Python\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f44c' in position 41: character maps to <undefined>
I am sure that this is from the emojis that are in the strings. I tried this solution but I got the same error. Is this caused by python not being able to encode the string in the first place or something else? How do we get rid of the emojis?
You can remove the emoji by ignoring it when it cannot be encoded:
import codecs
codecs.charmap_encode('\U0001f44c', 'ignore')
# outputs: (b'', 1)

UnicodeEncodeError in Python 3.6

I'm trying to find a way to disable this error logging in my python code. The program seems to actually run fine, the search function just returns a whole buttload of json objects with dozens of attributes each whenever it finds a character that it cant print it will print the thousands of json objects returned
to the console.
I wrapped the guilty code (below) in a try block but it hasn't changed anything.
try:
results = api.search(query)
print('Station hits: ', len(results['station_hits']), '\nSong hits: ', len(results['song_hits']), '\nArtist hits: ', len(results['artist_hits']), '\nAlbum hits: ', len(results['album_hits'])).encode('ascii', 'ignore')
except UnicodeEncodeError:
pass
Here is the error that is printed to the console. (Without the buttload of text referenced earlier)
--- Logging error ---
Traceback (most recent call last):
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\logging\__init__.py", line 989, in emit
stream.write(msg)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2117' in position 108194: character maps to <undefined>
Call stack:
File "gpm.py", line 247, in <module>
main()
File "gpm.py", line 181, in main
results = api.search(query)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\site-packages\gmusicapi\clients\mobileclient.py", line 1806, in search
res = self._make_call(mobileclient.Search, query, max_results)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\site-packages\gmusicapi\clients\shared.py", line 84, in _make_call
return protocol.perform(self.session, self.validate, *args, **kwargs)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\site-packages\gmusicapi\protocol\shared.py", line 243, in perform
log.debug(cls.filter_response(parsed_response))
Trace-back etc reveals: Can't encode u"\u2117" using cp1252, not surprising, use utf8 instead.

Python num2words bug in french translation

I am using the num2words library for Python in order to translate numbers into french words.
However I have an encoding issue with the following case :
num2words((5.05),lang='fr')
output :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/num2words/__init__.py", line 50, in num2words
return converter.to_cardinal(number)
File "/usr/local/lib/python2.7/dist-packages/num2words/base.py", line 93, in to_cardinal
return self.to_cardinal_float(value)
File "/usr/local/lib/python2.7/dist-packages/num2words/base.py", line 127, in to_cardinal_float
out.append(str(self.to_cardinal(curr)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)
It seems to be coming from the "zéro" string. But I do not have the problem when there is no dot:
num2words((0),lang='fr')
output:
u'z\xe9ro'
Can you please help ?
Thanks !

UnicodeEncodeError: 'charmap' codec can't encode character

Python throws this when using the wolfram alpha api:
Traceback (most recent call last):
File "c:\Python27\lib\threading.py", line 530, in __bootstrap_inner
self.run()
File "c:\Python27\lib\site-packages\Skype4Py\utils.py", line 225, in run
handler(*self.args, **self.kwargs)
File "s.py", line 38, in OnMessageStatus
if body[0:5] == '!math':wolfram(body[5:], '')
File "s.py", line 18, in wolfram
print "l: "+l
File "c:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xd7' in position 3
: character maps to <undefined>
how can I solve this?
Looks like you're passing in high-byte data to the API, and it's not liking that (\xd7 is the "Times" character; looks like an X). I'm not certain what purpose the print is for, but changing it to be print "l: " + repr(l) or print "l: ", l might at least get you past the above error, assuming you don't want to be in the business of converting the body to unicode (I'm assuming it's not...).
If that doesn't help, we'll need more details. Where is your input coming from? Is body unicode, or a byte string? Are you using python 2.7 or 3.x?

UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)

I want to parse my XML document. So I have stored my XML document as below
class XMLdocs(db.Expando):
id = db.IntegerProperty()
name=db.StringProperty()
content=db.BlobProperty()
Now my below is my code
parser = make_parser()
curHandler = BasketBallHandler()
parser.setContentHandler(curHandler)
for q in XMLdocs.all():
parser.parse(StringIO.StringIO(q.content))
I am getting below error
'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 517, in __call__
handler.post(*groups)
File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/base_handler.py", line 59, in post
self.handle()
File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/handlers.py", line 168, in handle
scan_aborted = not self.process_entity(entity, ctx)
File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/handlers.py", line 233, in process_entity
handler(entity)
File "/base/data/home/apps/parsepython/1.348669006354245654/parseXML.py", line 71, in process
parser.parse(StringIO.StringIO(q.content))
File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 207, in feed
self._parser.Parse(data, isFinal)
File "/base/data/home/apps/parsepython/1.348669006354245654/parseXML.py", line 136, in characters
print ch
UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)
The actual best answer for this problem depends on your environment, specifically what encoding your terminal expects.
The quickest one-line solution is to encode everything you print to ASCII, which your terminal is almost certain to accept, while discarding characters that you cannot print:
print ch #fails
print ch.encode('ascii', 'ignore')
The better solution is to change your terminal's encoding to utf-8, and encode everything as utf-8 before printing. You should get in the habit of thinking about your unicode encoding EVERY time you print or read a string.
Just putting .encode('utf-8') at the end of object will do the job in recent versions of Python.
It seems you are hitting a UTF-8 byte order mark (BOM). Try using this unicode string with BOM extracted out:
import codecs
content = unicode(q.content.strip(codecs.BOM_UTF8), 'utf-8')
parser.parse(StringIO.StringIO(content))
I used strip instead of lstrip because in your case you had multiple occurences of BOM, possibly due to concatenated file contents.
This worked for me:
from django.utils.encoding import smart_str
content = smart_str(content)
The problem according to your traceback is the print statement on line 136 of parseXML.py. Unfortunately you didn't see fit to post that part of your code, but I'm going to guess it is just there for debugging. If you change it to:
print repr(ch)
then you should at least see what you are trying to print.
The problem is that you're trying to print an unicode character to a possibly non-unicode terminal. You need to encode it with the 'replace option before printing it, e.g. print ch.encode(sys.stdout.encoding, 'replace').
An easy solution to overcome this problem is to set your default encoding to utf8. Follow is an example
import sys
reload(sys)
sys.setdefaultencoding('utf8')

Categories