UnicodeEncodeError: 'charmap' codec can't encode character - python

Python throws this when using the wolfram alpha api:
Traceback (most recent call last):
File "c:\Python27\lib\threading.py", line 530, in __bootstrap_inner
self.run()
File "c:\Python27\lib\site-packages\Skype4Py\utils.py", line 225, in run
handler(*self.args, **self.kwargs)
File "s.py", line 38, in OnMessageStatus
if body[0:5] == '!math':wolfram(body[5:], '')
File "s.py", line 18, in wolfram
print "l: "+l
File "c:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xd7' in position 3
: character maps to <undefined>
how can I solve this?

Looks like you're passing in high-byte data to the API, and it's not liking that (\xd7 is the "Times" character; looks like an X). I'm not certain what purpose the print is for, but changing it to be print "l: " + repr(l) or print "l: ", l might at least get you past the above error, assuming you don't want to be in the business of converting the body to unicode (I'm assuming it's not...).
If that doesn't help, we'll need more details. Where is your input coming from? Is body unicode, or a byte string? Are you using python 2.7 or 3.x?

Related

How do we remove all emoji values from strings in python 3?

I am trying to write a program that will get tweets and then insert them into a csv file but I get this error:
Traceback (most recent call last):
File "c:/Users/Fateh Aliyev/Desktop/Python/AI/Data Mining/data.py", line 30, in <module>
csv.writerow([text, 0])
File "C:\Python\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f44c' in position 41: character maps to <undefined>
I am sure that this is from the emojis that are in the strings. I tried this solution but I got the same error. Is this caused by python not being able to encode the string in the first place or something else? How do we get rid of the emojis?
You can remove the emoji by ignoring it when it cannot be encoded:
import codecs
codecs.charmap_encode('\U0001f44c', 'ignore')
# outputs: (b'', 1)

Python num2words bug in french translation

I am using the num2words library for Python in order to translate numbers into french words.
However I have an encoding issue with the following case :
num2words((5.05),lang='fr')
output :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/num2words/__init__.py", line 50, in num2words
return converter.to_cardinal(number)
File "/usr/local/lib/python2.7/dist-packages/num2words/base.py", line 93, in to_cardinal
return self.to_cardinal_float(value)
File "/usr/local/lib/python2.7/dist-packages/num2words/base.py", line 127, in to_cardinal_float
out.append(str(self.to_cardinal(curr)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)
It seems to be coming from the "zéro" string. But I do not have the problem when there is no dot:
num2words((0),lang='fr')
output:
u'z\xe9ro'
Can you please help ?
Thanks !

Python Pandas to_clipboard() UnicodeEncodeError: 'ascii' codec can't encode character

I want to pass dataframe data to my clipboard so I can paste into Excel. Problem is, the character '\xe9' is causing an encoding issue, like so:
>>> df.to_clipboard()
Traceback (most recent call last):
File "C:\Python34\lib\site-packages\pandas\util\clipboard.py", line 65, in winSetClipboard
hCd = ctypes.windll.kernel32.GlobalAlloc(GMEM_DDESHARE, len(bytes(text))+1)
TypeError: string argument without an encoding
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<pyshell#51>", line 1, in <module>
df.to_clipboard()
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1028, in to_clipboard
clipboard.to_clipboard(self, excel=excel, sep=sep, **kwargs)
File "C:\Python34\lib\site-packages\pandas\io\clipboard.py", line 98, in to_clipboard
clipboard_set(objstr)
File "C:\Python34\lib\site-packages\pandas\util\clipboard.py", line 68, in winSetClipboard
hCd = ctypes.windll.kernel32.GlobalAlloc(GMEM_DDESHARE, len(bytes(text, 'ascii'))+1)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 543793: ordinal not in range(128)
I decoded the character and it's an accent é
>>> '\xe9'.encode().decode()
'é'
After reading the documentation for to_clipboard(), I noticed it says:
other keywords are passed to to_csv. OK, so by 'other keywords', I assume that means keyword arguments from to_csv() -- specifically I want to use encoding='cp1252'.
When I try this, to_clipboard() doesn't recognize the encoding keyword:
df.to_clipboard(encoding='cp1252')
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1028, in to_clipboard
clipboard.to_clipboard(self, excel=excel, sep=sep, **kwargs)
File "C:\Python34\lib\site-packages\pandas\io\clipboard.py", line 95, in to_clipboard
objstr = obj.to_string(**kwargs)
TypeError: to_string() got an unexpected keyword argument 'encoding'
Is there a way to pass all data to the clipboard (both ascii and non-ascii)?
df.to_clipboard(df.to_csv(encoding='cp1252'))
just encode it as a csv with the encoding specified then throw it on the clipboard.

Python UnicodeEncodeError: codec can't encode character [duplicate]

Python throws this when using the wolfram alpha api:
Traceback (most recent call last):
File "c:\Python27\lib\threading.py", line 530, in __bootstrap_inner
self.run()
File "c:\Python27\lib\site-packages\Skype4Py\utils.py", line 225, in run
handler(*self.args, **self.kwargs)
File "s.py", line 38, in OnMessageStatus
if body[0:5] == '!math':wolfram(body[5:], '')
File "s.py", line 18, in wolfram
print "l: "+l
File "c:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xd7' in position 3
: character maps to <undefined>
how can I solve this?
Looks like you're passing in high-byte data to the API, and it's not liking that (\xd7 is the "Times" character; looks like an X). I'm not certain what purpose the print is for, but changing it to be print "l: " + repr(l) or print "l: ", l might at least get you past the above error, assuming you don't want to be in the business of converting the body to unicode (I'm assuming it's not...).
If that doesn't help, we'll need more details. Where is your input coming from? Is body unicode, or a byte string? Are you using python 2.7 or 3.x?

Prevent encoding errors in Python

I have scripts which print out messages by the logging system or sometimes print commands. On the Windows console I get error messages like
Traceback (most recent call last):
File "C:\Python32\lib\logging\__init__.py", line 939, in emit
stream.write(msg)
File "C:\Python32\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 4537:character maps to <undefined>
Is there a general way to make all encodings in the logging system, print commands, etc. fail-safe (ignore errors)?
The problem is that your terminal/shell (cmd as your are on Windows) cannot print every Unicode character.
You can fail-safe encode your strings with the errors argument of the str.encode method. For example you can replace not supported chars with ? by setting errors='replace'.
>>> s = u'\u2019'
>>> print s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\cp850.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can\'t encode character u'\u2019' in position
0: character maps to <undefined>
>>> print s.encode('cp850', errors='replace')
?
See the documentation for other options.
Edit If you want a general solution for the logging, you can subclass StreamHandler:
class CustomStreamHandler(logging.StreamHandler):
def emit(self, record):
record = record.encode('cp850', errors='replace')
logging.StreamHandler.emit(self, record)

Categories