UnicodeDecodeError with the sys.stdout inside traceback.print_exc()

UnicodeDecodeError with the sys.stdout inside traceback.print_exc() - python

I am getting UnicodeDecodeError with the traceback.print_exc(file=sys.stdout). I am using Python3.4 and did not get the problem with Python2.7.
Am I missing something here? How can I make sure that sys.stdout passes the correct encoded/decoded to the traceback.print_exc() ?
My code looks something similar to this:
try:
# do something which might throw an exception
except Exception as e:
# do something
traceback.print_exc(file=sys.stdout) # Here I am getting the error
Error log:
traceback.print_exc(file=sys.stdout)
File "C:\Python34\lib\traceback.py", line 252, in print_exc
print_exception(*sys.exc_info(), limit=limit, file=file, chain=chain)
File "C:\Python34\lib\traceback.py", line 169, in print_exception
for line in _format_exception_iter(etype, value, tb, limit, chain):
File "C:\Python34\lib\traceback.py", line 153, in _format_exception_iter
yield from _format_list_iter(_extract_tb_iter(tb, limit=limit))
File "C:\Python34\lib\traceback.py", line 18, in _format_list_iter
for filename, lineno, name, line in extracted_list:
File "C:\Python34\lib\traceback.py", line 65, in _extract_tb_or_stack_iter
line = linecache.getline(filename, lineno, f.f_globals)
File "C:\Python34\lib\linecache.py", line 15, in getline
lines = getlines(filename, module_globals)
File "C:\Python34\lib\linecache.py", line 41, in getlines
return updatecache(filename, module_globals)
File "C:\Python34\lib\linecache.py", line 127, in updatecache
lines = fp.readlines()
File "C:\Python34\lib\codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 5213: invalid continuation byte

The traceback module wants to include source code lines with the traceback. Normally, a traceback consists only of pointers to source code, not the source code itself, as Python has been executing the compiled bytecode. In the bytecode are hints as to exactly what source code line the bytecode came from.
To then show the sourcecode, the actual source is read from disk, using the linecache module. This also means that Python has to determine the encoding for those files too. The default encoding for a Python 3 source file is UTF-8, but you can set a PEP 263 comment to let Python know if you are deviating from that.
Because source code is read after the code is already loaded and a traceback took place, it is possible that you changed the source code after starting the script, or there was a byte cache file (in a __pycache__ subdirectory) that appeared to be fresh but was no longer matching your source files.
Either way, when you started the script, Python was able to re-use a bytecache file or read the source code just fine and run your code. But when the traceback was being printed, at least one of the source code files was no longer decodable as UTF-8.
If you can reliably reproduce the traceback (so start the Python script again without encoding problems but printing the traceback fails), it is most likely a stale bytecode file somewhere, one that could even hold pointers to a filename that now contains nothing but binary data, not plain source.
If you know how to use the pdb module, add a pdb.set_trace() call before the traceback.print_exc() call and trace what filenames are being loaded from by the linecache module.
Otherwise edit your C:\Python34\lib\traceback.py file and insert a print('Loading {} from the linecache'.format(filename)) statement just before the linecache.checkcache(filename) line in the _extract_tb_or_stack_iter function.

Related

indian rupee symbol UnicodeEncodeError while uploading file to s3 using pandas

I have scraped some data from a website for my assignment. It consists of Indian rupee character - "₹". The data when I'm trying to save into CSV file in utf-8 characters on local machine using pandas, it is saving effortlessly. The same file, I have changed the delimiters and tried to save the file to s3 using pandas, but it gave "UnicodeEncodeError" error. I'm scraping the web page using scrapy framework.
Earlier I was trying to save the file in Latin-1 i.e. "ISO-8859-1" formatting and hence changed to "utf-8" but the same error is occurring. I'm using pythn 3.7 for the development.
Below code used for saving on the local machine which is working:
result_df.to_csv(filename+str2+'.csv',index=False)
Below code is used to save the file to S3:
search_df.to_csv('s3://my-bucket/folder_path/filename_str2.csv',encoding = 'utf-8',line_terminator='^',sep='~',index=False)
Below is the error while saving the file to S3:
2019-10-29 19:24:27 [scrapy.utils.signal] ERROR: Error caught on signal handler: <function Spider.close at 0x0000019CD3B1AA60>
Traceback (most recent call last):
File "c:\programdata\anaconda3\lib\site-packages\twisted\internet\defer.py", line 151, in maybeDeferred
result = f(*args, **kw)
File "c:\programdata\anaconda3\lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "c:\programdata\anaconda3\lib\site-packages\scrapy\spiders\__init__.py", line 94, in close
return closed(reason)
File "C:\local_path\spiders\Pduct_Scrape.py", line 430, in closed
search_df.to_csv('s3://my-bucket/folder_path/filename_str2.csv',encoding = 'utf-8',line_terminator='^',sep='~',index=False)
File "c:\programdata\anaconda3\lib\site-packages\pandas\core\generic.py", line 3020, in to_csv
formatter.save()
File "c:\programdata\anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 172, in save
self._save()
File "c:\programdata\anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 288, in _save
self._save_chunk(start_i, end_i)
File "c:\programdata\anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 315, in _save_chunk
self.cols, self.writer)
File "pandas/_libs/writers.pyx", line 75, in pandas._libs.writers.write_csv_rows
File "c:\programdata\anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20b9' in position 2661: character maps to <undefined>
I am very new to this StackOverflow platform and please let me know if more information is to be presented.

The error gives an evidence that the code tries to encode the filename_str2.csv file in cp1252. From your stack trace:
...File "C:\local_path\spiders\Pduct_Scrape.py", line 430, in closed
search_df.to_csv('s3://my-bucket/folder_path/ filename_str2.csv ',......
File "c:\programdata\anaconda3\lib\encodings\ cp1252.py ", line 19, in encode
The reason I do not know, because you explicitely ask for an utf-8 encoding. But as the codecs page in the Python Standard Library reference says that the canonical name for utf8 is utf_8 (notice the underline instead of minus sign) and does not list utf-8 in allowed aliases, I would first try to use utf_8. If it still uses cp1252, then you will have to give the exact versions of Python and pandas that you are using.

tf.gfile.Glob gives me UnicodeDecodeError error anyway to fix this?

I was trying to get the list of name of txt file that was written in Korean in the specified directory with the code below
dir_list = tf.gfile.Glob(engine.TXT_DIR+"/*.txt")
However, This one gives me the following error:
Traceback (most recent call last):
File "D:/Prj_mayDay/Prj_FrankenShtine/shakespear_reborn/main.py", line 108, in <module>
dir_list = tf.gfile.Glob(engine.TXT_DIR+"/*.txt")
File "D:\KimKanna's Class\python35\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 326, in get_matching_files
compat.as_bytes(filename), status)
File "D:\KimKanna's Class\python35\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 325, in <listcomp>
for matching_filename in pywrap_tensorflow.GetMatchingFiles(
File "D:\KimKanna's Class\python35\lib\site-packages\tensorflow\python\util\compat.py", line 106, in as_str_any
return as_str(value)
File "D:\KimKanna's Class\python35\lib\site-packages\tensorflow\python\util\compat.py", line 84, in as_text
return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 19: invalid start byte
Now, throughout some research, I found out the reason
The error is because there is some non-ascii character in the dictionary and it can't be encoded/decoded
However, I do not see any way to apply the solution into my code. or is there?
**if there is alternative code for this. It should be applicable for both cloud stroage bucket / my personal hard drive as the code above did.
I'm using python3, Tensorflow version of 1.2.0-rc2

so after few hours of fiddling around with my code I finally found the solution.
Afterall one of the file inside of the directory I specified had a name in Korean. After I took that out of the directory. problem was gone.

Processing a for loop with an error

I have a for loop which has an error in it.
try:
for line in text:
'do stuff'
except:
pass
When the error occurs python just exits the for loop. I can't get python to ignore the error and keep iterating through the loop. Incidentally, it is a text file and I am looping through the lines. I should also point out that the error does not occur in the do stuff part, it literally occurs in the for loop. I also frankly don't understand why the error is being thrown since the line is just like any other line. I tried deleting the line to see if it was just a one time thing but the next line has an error in it too which leads me to believe that I cannot just delete bad lines. The name of the error is unicodedecodeerror
Here's the text:
https://drive.google.com/file/d/0B9zzW6-3m2qGVFRTbzlXMS0tVUU/view?usp=sharing
I'm trying to make a list of all the words that follow the word 'abstract'
Here's the actual code
with open(full_path_of_old_file) as old:
for i, line in enumerate(old):
if "abstract," not in line:
b = line.find('abstract')
line2 = line[b+9:]
list1 = line2.split()
list1[0] = list1[0].replace(",","")
list1[0] = list1[0].replace(".", "")
try:
if list1[0][-1] == "s":
list1[0] = list1[0][:-1]
except:
pass
objects_of_abstract.append(list1[0])
Here's the full traceback
Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1596, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1023, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/kylefoley/PycharmProjects/inference_engine2/inference2/Proofs/z_natural_language.py", line 25, in <module>
for i, line in enumerate(old):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 764: invalid continuation byte
We've got an error while stopping in post-mortem: <class 'KeyboardInterrupt'>

Your problem is your file encoding. Change it to latin-1 like this:
objects_of_abstract = list()
with open('abstract.txt', encoding="latin-1") as old:
for i, line in enumerate(old):
if "abstract," not in line:
b = line.find('abstract')
line2 = line[b+9:]
list1 = line2.split()
list1[0] = list1[0].replace(",","")
list1[0] = list1[0].replace(".", "")
try:
if list1[0][-1] == "s":
list1[0] = list1[0][:-1]
except:
pass
objects_of_abstract.append(list1[0])

Somewhere in your code you open the file. There are two parameters in the open function that pertain to your problem:
encoding=None: encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any encoding supported by Python can be used. See the codecs module for the list of supported encodings.
errors=None: errors is an optional string that specifies how encoding and decoding errors are to be handled–this cannot be used in binary mode. A variety of standard error handlers are available, though any error handling name that has been registered with codecs.register_error() is also valid.
[there is more; see the standard library docs]
Since you don't show how you open the file, I can't tell you specifically what is wrong. But the error message indicates that it is an encoding error, and you should start by looking into that.

A possible answer, which should bypass your problem to the other answers, is to use the suppress function from the contextlib
module in the standard library:
from contextlib import suppress
with suppress(UnicodeDecodeError): #might have to some some work here to get the error right
for i, line in enumerate(old):
.....
as it says in the docs though, better to try and fix your problem, rather than silently ignore it, if possible.

This should help with the first question (keeping the loop going):
In a python try...except block, it will try to run all the code in the try block, and if an error is thrown, it will stop and move to the except
In your case, you could move the for outside the try, so that if an error occurs it will be handled in the except and then continue on to the next iteration:
for line in text:
try:
'do stuff'
except: # if 'do stuff' throws an error, we just go to the next iteration
pass

You have to keep try/except block inside for loop
for line in text:
try:
# your Operation
except:
pass

UnicodeDecodeError when reading dictionary words file with simple Python script

First time doing Python in a while, and I'm having trouble doing a simple scan of a file when I run the following script with Python 3.0.1,
with open("/usr/share/dict/words", 'r') as f:
for line in f:
pass
I get this exception:
Traceback (most recent call last):
File "/home/matt/install/test.py", line 2, in <module>
for line in f:
File "/home/matt/install/root/lib/python3.0/io.py", line 1744, in __next__
line = self.readline()
File "/home/matt/install/root/lib/python3.0/io.py", line 1817, in readline
while self._read_chunk():
File "/home/matt/install/root/lib/python3.0/io.py", line 1565, in _read_chunk
self._set_decoded_chars(self._decoder.decode(input_chunk, eof))
File "/home/matt/install/root/lib/python3.0/io.py", line 1299, in decode
output = self.decoder.decode(input, final=final)
File "/home/matt/install/root/lib/python3.0/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1689-1692: invalid data
The line in the file it blows up on is "Argentinian", which doesn't seem to be unusual in any way.
Update: I added,
encoding="iso-8559-1"
to the open() call, and it fixed the problem.

How have you determined from "position 1689-1692" what line in the file it has blown up on? Those numbers would be offsets in the chunk that it's trying to decode. You would have had to determine what chunk it was -- how?
Try this at the interactive prompt:
buf = open('the_file', 'rb').read()
len(buf)
ubuf = buf.decode('utf8')
# splat ... but it will give you the byte offset into the file
buf[offset-50:60] # should show you where/what the problem is
# By the way, from the error message, looks like a bad
# FOUR-byte UTF-8 character ... interesting

Can you check to make sure it is valid UTF-8? A way to do that is given at this SO question:
iconv -f UTF-8 /usr/share/dict/words -o /dev/null
There are other ways to do the same thing.

EOFError in Python script

I have the following code fragment:
def database(self):
databasename=""
host=""
user=""
password=""
try:
self.fp=file("detailing.dat","rb")
except IOError:
self.fp=file("detailing.dat","wb")
pickle.dump([databasename,host,user,password],self.fp,-1)
self.fp.close()
selffp=file("detailing.dat","rb")
[databasename,host,user,password]=pickle.load(self.fp)
return
It has the error:
Traceback (most recent call last):
File "detailing.py", line 91, in ?
app=myApp()
File "detailing.py", line 20, in __init__
wx.App.__init__(self,redirect,filename,useBestVisual,clearSigInt)
File "/usr/lib64/python2.4/site-packages/wx-2.6-gtk2-unicode/wx/_core.py", line 7473, in __init__
self._BootstrapApp()
File "/usr/lib64/python2.4/site-packages/wx-2.6-gtk2-unicode/wx/_core.py", line 7125, in _BootstrapApp
return _core_.PyApp__BootstrapApp(*args, **kwargs)
File "detailing.py", line 33, in OnInit
self.database()
File "detailing.py", line 87, in database
[databasename,host,user,password]=pickle.load(self.fp)
File "/usr/lib64/python2.4/pickle.py", line 1390, in load
return Unpickler(file).load()
File "/usr/lib64/python2.4/pickle.py", line 872, in load
dispatch[key](self)
File "/usr/lib64/python2.4/pickle.py", line 894, in load_eof
raise EOFError
EOFError
What am I doing wrong?

Unless you've got a typo, the issue may be in this line where you assign the file handle to selffp not self.fp:
selffp=file("detailing.dat","rb")
If that is a typo, and your code actually opens the file to self.fp, then you may wish to verify that the file actually has contents (ie: that the previous pickle worked)... the error suggests that the file is empty.
Edit: In the comments to this answer, S. Lott has a nice summary of why the typo generated the error you saw that I'm pasting here for completeness of the answer: "selffp will be the unused opened file, and self.fp (the old closed file) will be used for the load".

Here's the version that I would recommend using:
def database(self):
databasename=""
host=""
user=""
password=""
try:
self.fp=open("detailing.dat","rb")
except IOError:
with open("detailing.dat", "wb") as fp:
pickle.dump([databasename,host,user,password],fp,-1)
self.fp=open("detailing.dat","rb")
[databasename,host,user,password]=pickle.load(self.fp)
return
As has been pointed out, there was a typo on self.fp. But here are a few other things that I notice that can cause problems.
First of all, you shouldn't be using the file constructor directly. You should instead use the built-in open function.
Secondly, you should avoid calling a file's close method outside a finally block. In this case, I've used python 2.6's with block. You can use this in Python 2.5 with the following command:
from __future__ import with_statement
This will prevent the file from being stuck open if an exception is thrown anywhere (as it will close the file when the with block is exited). Although this isn't the cause of your problem, it is an important thing to remember because if one of the file object's methods throws an exception, the file will get held open in sys.traceback indefinitely.
(note that you should probably accept Jarret Hardie's answer though, he caught the bug :-) )

I got this error when I didn't chose the correct mode to read the file (wb instead of rb). Changing back to rb was not sufficient to solve the issue. However, generating again a new clean pickle file solved the issue. It seems that not choosing the correct mode to open the binary file somehow "damages" the file which is then not openable whatsoever afterward.
But I am quite a beginner with Python so I may have miss something too.

While this is not a direct answer to the OP's question -- I happened upon this answer while searching for a reason for an EOFError when trying to unpickle a binary file with : pickle.load(open(filename, "r")).
import cPickle as pickle
A = dict((v, i) for i, v in enumerate(words))
with open("words.pkl", "wb") as f:
pickle.dump(A, f)
#...later open the file -- mistake:trying to read a binary with non-binary method
with open("words.pkl", "r") as f:
A =pickle.load(f) # EOFError
# change that to
with open ("words.pkl", "rb") as f: # notice the "rb" instead of "r"
A = pickle.load(f)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.