Python Pandas to_clipboard() UnicodeEncodeError: 'ascii' codec can't encode character - python

I want to pass dataframe data to my clipboard so I can paste into Excel. Problem is, the character '\xe9' is causing an encoding issue, like so:
>>> df.to_clipboard()
Traceback (most recent call last):
File "C:\Python34\lib\site-packages\pandas\util\clipboard.py", line 65, in winSetClipboard
hCd = ctypes.windll.kernel32.GlobalAlloc(GMEM_DDESHARE, len(bytes(text))+1)
TypeError: string argument without an encoding
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<pyshell#51>", line 1, in <module>
df.to_clipboard()
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1028, in to_clipboard
clipboard.to_clipboard(self, excel=excel, sep=sep, **kwargs)
File "C:\Python34\lib\site-packages\pandas\io\clipboard.py", line 98, in to_clipboard
clipboard_set(objstr)
File "C:\Python34\lib\site-packages\pandas\util\clipboard.py", line 68, in winSetClipboard
hCd = ctypes.windll.kernel32.GlobalAlloc(GMEM_DDESHARE, len(bytes(text, 'ascii'))+1)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 543793: ordinal not in range(128)
I decoded the character and it's an accent é
>>> '\xe9'.encode().decode()
'é'
After reading the documentation for to_clipboard(), I noticed it says:
other keywords are passed to to_csv. OK, so by 'other keywords', I assume that means keyword arguments from to_csv() -- specifically I want to use encoding='cp1252'.
When I try this, to_clipboard() doesn't recognize the encoding keyword:
df.to_clipboard(encoding='cp1252')
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1028, in to_clipboard
clipboard.to_clipboard(self, excel=excel, sep=sep, **kwargs)
File "C:\Python34\lib\site-packages\pandas\io\clipboard.py", line 95, in to_clipboard
objstr = obj.to_string(**kwargs)
TypeError: to_string() got an unexpected keyword argument 'encoding'
Is there a way to pass all data to the clipboard (both ascii and non-ascii)?

df.to_clipboard(df.to_csv(encoding='cp1252'))
just encode it as a csv with the encoding specified then throw it on the clipboard.

Related

Python write to json file gives UnicodeDecodeError [duplicate]

I know that this is a very common error, but it's the first time I've encountered it when trying to write a file.
I'm using networkx to work with graphs for network analysis, and when I try to write into any format:
nx.write_gml(G, "Graph.gml")
nx.write_pajek(G, "Graph.net")
nx.write_gexf(G, "graph.gexf")
I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 2, in write_pajek
File "/Library/Python/2.7/site-packages/networkx/utils/decorators.py", line 263, in _open_file
result = func(*new_args, **kwargs)
File "/Library/Python/2.7/site-packages/networkx/readwrite/pajek.py", line 100, in write_pajek
path.write(line.encode(encoding))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)
I haven't found documentation on this, so quite confused.
Wondering if you can make use of codec module to solve it or not. Just create a file object by codec as following before feeding to networkx.
ex,
import codecs
f = codecs.open("graph.gml", "w", "utf-8")

Umlauts in JSON files lead to errors in Python code created by ANTLR4

I've created python modules from the JSON grammar on github / antlr4 with
antlr4 -Dlanguage=Python3 JSON.g4
I've written a main program "JSON2.py" following this guide: https://github.com/antlr/antlr4/blob/master/doc/python-target.md
and downloaded the example1.json also from github.
python3 ./JSON2.py example1.json # works perfectly, but
python3 ./JSON2.py bookmarks-2017-05-24.json # the bookmarks contain German Umlauts like "ü"
...
File "/home/xyz/lib/python3.5/site-packages/antlr4/FileStream.py", line 27, in readDataFrom
return codecs.decode(bytes, encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 227: ordinal not in range(128)
The offending line in JSON2.py is
input = FileStream(argv[1])
I've searched stackoverflow and tried this instead of using the above FileStream:
fp = codecs.open(argv[1], 'rb', 'utf-8')
try:
input = fp.read()
finally:
fp.close()
lexer = JSONLexer(input)
stream = CommonTokenStream(lexer)
parser = JSONParser(stream)
tree = parser.json() # This is line 39, mentioned in the error message
Execution of this program ends with an error message, even if the input file doesn't contain Umlauts:
python3 ./JSON2.py example1.json
Traceback (most recent call last):
File "./JSON2.py", line 46, in <module>
main(sys.argv)
File "./JSON2.py", line 39, in main
tree = parser.json()
File "/home/x/Entwicklung/antlr/links/JSONParser.py", line 108, in json
self.enterRule(localctx, 0, self.RULE_json)
File "/home/xyz/lib/python3.5/site-packages/antlr4/Parser.py", line 358, in enterRule
self._ctx.start = self._input.LT(1)
File "/home/xyz/lib/python3.5/site-packages/antlr4/CommonTokenStream.py", line 61, in LT
self.lazyInit()
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 186, in lazyInit
self.setup()
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 189, in setup
self.sync(0)
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 111, in sync
fetched = self.fetch(n)
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 123, in fetch
t = self.tokenSource.nextToken()
File "/home/xyz/lib/python3.5/site-packages/antlr4/Lexer.py", line 111, in nextToken
tokenStartMarker = self._input.mark()
AttributeError: 'str' object has no attribute 'mark'
This parses correctly:
javac *.java
grun JSON json -gui bookmarks-2017-05-24.json
So the grammar itself is not the problem.
So finally the question: How should I process the input file in python, so that lexer and parser can digest it?
Thanks in advance.
Make sure your input file is actually encoded as UTF-8. Many problems with character recognition by the lexer are caused by using other encodings. I just took a testbed application, added ëto the list of available characters for an IDENTIFIER and it works again. UTF-8 is the key -- and make sure your grammar also allows these characters where you want to accept them.
I solved it by passing the encoding info:
input = FileStream(sys.argv[1], encoding = 'utf8')
If without the encoding info, I will have the same issue as yours.
Traceback (most recent call last):
File "test.py", line 20, in <module>
main()
File "test.py", line 9, in main
input = FileStream(sys.argv[1])
File ".../lib/python3.5/site-packages/antlr4/FileStream.py", line 20, in __init__
super().__init__(self.readDataFrom(fileName, encoding, errors))
File ".../lib/python3.5/site-packages/antlr4/FileStream.py", line 27, in readDataFrom
return codecs.decode(bytes, encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 1: ordinal not in range(128)
Where my input data is
[今明]天(台南|高雄)的?天氣如何

Python UnicodeEncodeError: codec can't encode character [duplicate]

Python throws this when using the wolfram alpha api:
Traceback (most recent call last):
File "c:\Python27\lib\threading.py", line 530, in __bootstrap_inner
self.run()
File "c:\Python27\lib\site-packages\Skype4Py\utils.py", line 225, in run
handler(*self.args, **self.kwargs)
File "s.py", line 38, in OnMessageStatus
if body[0:5] == '!math':wolfram(body[5:], '')
File "s.py", line 18, in wolfram
print "l: "+l
File "c:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xd7' in position 3
: character maps to <undefined>
how can I solve this?
Looks like you're passing in high-byte data to the API, and it's not liking that (\xd7 is the "Times" character; looks like an X). I'm not certain what purpose the print is for, but changing it to be print "l: " + repr(l) or print "l: ", l might at least get you past the above error, assuming you don't want to be in the business of converting the body to unicode (I'm assuming it's not...).
If that doesn't help, we'll need more details. Where is your input coming from? Is body unicode, or a byte string? Are you using python 2.7 or 3.x?

UnicodeEncodeError: 'charmap' codec can't encode character

Python throws this when using the wolfram alpha api:
Traceback (most recent call last):
File "c:\Python27\lib\threading.py", line 530, in __bootstrap_inner
self.run()
File "c:\Python27\lib\site-packages\Skype4Py\utils.py", line 225, in run
handler(*self.args, **self.kwargs)
File "s.py", line 38, in OnMessageStatus
if body[0:5] == '!math':wolfram(body[5:], '')
File "s.py", line 18, in wolfram
print "l: "+l
File "c:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xd7' in position 3
: character maps to <undefined>
how can I solve this?
Looks like you're passing in high-byte data to the API, and it's not liking that (\xd7 is the "Times" character; looks like an X). I'm not certain what purpose the print is for, but changing it to be print "l: " + repr(l) or print "l: ", l might at least get you past the above error, assuming you don't want to be in the business of converting the body to unicode (I'm assuming it's not...).
If that doesn't help, we'll need more details. Where is your input coming from? Is body unicode, or a byte string? Are you using python 2.7 or 3.x?

UnicodeDecodeError reading string in CSV

I'm having a problem reading some chars in python.
I have a csv file in UTF-8 format, and I'm reading, but when script read:
Preußen Münster-Kaiserslautern II
I get this error:
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/__init__.py", line 515, in __call__
handler.get(*groups)
File "/Users/fermin/project/gae/cuotastats/controllers/controllers.py", line 50, in get
f.name = unicode( row[1])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
I tried to use Unicode functions and convert string to Unicode, but I haven't found the solution. I tried to use sys.setdefaultencoding('utf8') but that doesn't work either.
Try the unicode_csv_reader() generator described in the csv module docs.

Categories