Processing a for loop with an error - python

I have a for loop which has an error in it.
try:
for line in text:
'do stuff'
except:
pass
When the error occurs python just exits the for loop. I can't get python to ignore the error and keep iterating through the loop. Incidentally, it is a text file and I am looping through the lines. I should also point out that the error does not occur in the do stuff part, it literally occurs in the for loop. I also frankly don't understand why the error is being thrown since the line is just like any other line. I tried deleting the line to see if it was just a one time thing but the next line has an error in it too which leads me to believe that I cannot just delete bad lines. The name of the error is unicodedecodeerror
Here's the text:
https://drive.google.com/file/d/0B9zzW6-3m2qGVFRTbzlXMS0tVUU/view?usp=sharing
I'm trying to make a list of all the words that follow the word 'abstract'
Here's the actual code
with open(full_path_of_old_file) as old:
for i, line in enumerate(old):
if "abstract," not in line:
b = line.find('abstract')
line2 = line[b+9:]
list1 = line2.split()
list1[0] = list1[0].replace(",","")
list1[0] = list1[0].replace(".", "")
try:
if list1[0][-1] == "s":
list1[0] = list1[0][:-1]
except:
pass
objects_of_abstract.append(list1[0])
Here's the full traceback
Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1596, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1023, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/kylefoley/PycharmProjects/inference_engine2/inference2/Proofs/z_natural_language.py", line 25, in <module>
for i, line in enumerate(old):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 764: invalid continuation byte
We've got an error while stopping in post-mortem: <class 'KeyboardInterrupt'>

Your problem is your file encoding. Change it to latin-1 like this:
objects_of_abstract = list()
with open('abstract.txt', encoding="latin-1") as old:
for i, line in enumerate(old):
if "abstract," not in line:
b = line.find('abstract')
line2 = line[b+9:]
list1 = line2.split()
list1[0] = list1[0].replace(",","")
list1[0] = list1[0].replace(".", "")
try:
if list1[0][-1] == "s":
list1[0] = list1[0][:-1]
except:
pass
objects_of_abstract.append(list1[0])

Somewhere in your code you open the file. There are two parameters in the open function that pertain to your problem:
encoding=None: encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any encoding supported by Python can be used. See the codecs module for the list of supported encodings.
errors=None: errors is an optional string that specifies how encoding and decoding errors are to be handled–this cannot be used in binary mode. A variety of standard error handlers are available, though any error handling name that has been registered with codecs.register_error() is also valid.
[there is more; see the standard library docs]
Since you don't show how you open the file, I can't tell you specifically what is wrong. But the error message indicates that it is an encoding error, and you should start by looking into that.

A possible answer, which should bypass your problem to the other answers, is to use the suppress function from the contextlib
module in the standard library:
from contextlib import suppress
with suppress(UnicodeDecodeError): #might have to some some work here to get the error right
for i, line in enumerate(old):
.....
as it says in the docs though, better to try and fix your problem, rather than silently ignore it, if possible.

This should help with the first question (keeping the loop going):
In a python try...except block, it will try to run all the code in the try block, and if an error is thrown, it will stop and move to the except
In your case, you could move the for outside the try, so that if an error occurs it will be handled in the except and then continue on to the next iteration:
for line in text:
try:
'do stuff'
except: # if 'do stuff' throws an error, we just go to the next iteration
pass

You have to keep try/except block inside for loop
for line in text:
try:
# your Operation
except:
pass

Related

Python ldif3 parser and exception in for loop

From site: https://pypi.python.org/pypi/ldif3/3.2.0
I have this code:
from ldif3 import LDIFParser
from pprint import pprint
parser = LDIFParser(open('data.ldif', 'rb'))
for dn, entry in parser.parse():
print('got entry record: %s' % dn)
pprint(record)
And now, reading my file data.ldif I have exception in parser.parse().
Question is how to catch this exception and allow for loop to go to next record (continue)?
Trackback:
Traceback (most recent call last):
File "ldif.py", line 16, in <module>
for dn, entry in parser.parse():
File "/home/dlubom/anaconda2/lib/python2.7/site-packages/ldif3.py", line 373, in parse
yield self._parse_entry_record(block)
File "/home/dlubom/anaconda2/lib/python2.7/site-packages/ldif3.py", line 346, in _parse_entry_record
attr_type, attr_value = self._parse_attr(line)
File "/home/dlubom/anaconda2/lib/python2.7/site-packages/ldif3.py", line 309, in _parse_attr
return attr_type, attr_value.decode('utf8')
File "/home/dlubom/anaconda2/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb3 in position 6: invalid start byte
i think it's not possible to handle exceptions in such case because they happend before / during the variable assignment.
BTW probably want to use the attribute:
strict (boolean) – If set to False, recoverable parse errors will
produce log warnings rather than exceptions.
Example:
parser = LDIFParser(ldif_file, strict=False)
https://ldif3.readthedocs.io/en/latest/
That helped me parsing an invalid ldif file containing commas inside CN attributes.

UnicodeDecodeError with the sys.stdout inside traceback.print_exc()

I am getting UnicodeDecodeError with the traceback.print_exc(file=sys.stdout). I am using Python3.4 and did not get the problem with Python2.7.
Am I missing something here? How can I make sure that sys.stdout passes the correct encoded/decoded to the traceback.print_exc() ?
My code looks something similar to this:
try:
# do something which might throw an exception
except Exception as e:
# do something
traceback.print_exc(file=sys.stdout) # Here I am getting the error
Error log:
traceback.print_exc(file=sys.stdout)
File "C:\Python34\lib\traceback.py", line 252, in print_exc
print_exception(*sys.exc_info(), limit=limit, file=file, chain=chain)
File "C:\Python34\lib\traceback.py", line 169, in print_exception
for line in _format_exception_iter(etype, value, tb, limit, chain):
File "C:\Python34\lib\traceback.py", line 153, in _format_exception_iter
yield from _format_list_iter(_extract_tb_iter(tb, limit=limit))
File "C:\Python34\lib\traceback.py", line 18, in _format_list_iter
for filename, lineno, name, line in extracted_list:
File "C:\Python34\lib\traceback.py", line 65, in _extract_tb_or_stack_iter
line = linecache.getline(filename, lineno, f.f_globals)
File "C:\Python34\lib\linecache.py", line 15, in getline
lines = getlines(filename, module_globals)
File "C:\Python34\lib\linecache.py", line 41, in getlines
return updatecache(filename, module_globals)
File "C:\Python34\lib\linecache.py", line 127, in updatecache
lines = fp.readlines()
File "C:\Python34\lib\codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 5213: invalid continuation byte
The traceback module wants to include source code lines with the traceback. Normally, a traceback consists only of pointers to source code, not the source code itself, as Python has been executing the compiled bytecode. In the bytecode are hints as to exactly what source code line the bytecode came from.
To then show the sourcecode, the actual source is read from disk, using the linecache module. This also means that Python has to determine the encoding for those files too. The default encoding for a Python 3 source file is UTF-8, but you can set a PEP 263 comment to let Python know if you are deviating from that.
Because source code is read after the code is already loaded and a traceback took place, it is possible that you changed the source code after starting the script, or there was a byte cache file (in a __pycache__ subdirectory) that appeared to be fresh but was no longer matching your source files.
Either way, when you started the script, Python was able to re-use a bytecache file or read the source code just fine and run your code. But when the traceback was being printed, at least one of the source code files was no longer decodable as UTF-8.
If you can reliably reproduce the traceback (so start the Python script again without encoding problems but printing the traceback fails), it is most likely a stale bytecode file somewhere, one that could even hold pointers to a filename that now contains nothing but binary data, not plain source.
If you know how to use the pdb module, add a pdb.set_trace() call before the traceback.print_exc() call and trace what filenames are being loaded from by the linecache module.
Otherwise edit your C:\Python34\lib\traceback.py file and insert a print('Loading {} from the linecache'.format(filename)) statement just before the linecache.checkcache(filename) line in the _extract_tb_or_stack_iter function.

Unicode problems in Python again

I have a strange problem. A friend of mine sent me a text file. If I copy/paste the text and then paste it into my text editor and save it, the following code works. If I choose the option to save the file directly from the browser, the following code breaks. What's going on? Is it the browser's fault for saving invalid characters?
This is an example line.
When I save it, the line says
What�s going on?
When I copy/paste it, the line says
What’s going on?
This is the code:
import codecs
def do_stuff(filename):
with codecs.open(filename, encoding='utf-8') as f:
def process_line(line):
return line.strip()
lines = f.readlines()
for line in lines:
line = process_line(line)
print line
do_stuff('stuff.txt')
This is the traceback I get:
Traceback (most recent call last):
File "test-encoding.py", line 13, in <module>
do_stuff('stuff.txt')
File "test-encoding.py", line 8, in do_stuff
lines = f.readlines()
File "/home/somebody/.python/lib64/python2.7/codecs.py", line 679, in readlines
return self.reader.readlines(sizehint)
File "/home/somebody/.python/lib64/python2.7/codecs.py", line 588, in readlines
data = self.read()
File "/home/somebody/.python/lib64/python2.7/codecs.py", line 477, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 4: invalid start byte
What can I do in such cases?
How can I distribute the script if I don't know what encoding the user who runs it will use?
Fixed:
codecs.open(filename, encoding='utf-8', errors='ignore') as f:
The "file-oriented" part of the browser works with raw bytes, not characters. The specific encoding used by the page should be specified either in the HTTP headers or in the HTML itself. You must use this encoding instead of assuming that you have UTF-8 data.

While reading file on Python, I got a UnicodeDecodeError. What can I do to resolve this?

This is one of my own projects. This will later help benefit other people in a game I am playing (AssaultCube). Its purpose is to break down the log file and make it easier for users to read.
I kept getting this issue. Anyone know how to fix this? Currently, I am not planning to write/create the file. I just want this error to be fixed.
The line that triggered the error is a blank line (it stopped on line 66346).
This is what the relevant part of my script looks like:
log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r')
for line in log:
and the exception is:
Traceback (most recent call last):
File "C:\Users\Owner\Desktop\Exodus Logs\Log File Translater.py", line 159, in <module>
main()
File "C:\Users\Owner\Desktop\Exodus Logs\Log File Translater.py", line 7, in main
for line in log:
File "C:\Python32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3074: character maps to <undefined>
Try:
enc = 'utf-8'
log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r', encoding=enc)
if it won't work try:
enc = 'utf-16'
log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r', encoding=enc)
you could also try it with
enc = 'iso-8859-15'
also try:
enc = 'cp437'
wich is very old but it also has the "ü" at 0x81 wich would fit to the string "üßer" wich I found on the homepage of assault cube.
If all the codings are wrong try to contact some of the guys developing assault cube or as mentioned in a comment: have a look at https://pypi.python.org/pypi/chardet

EOFError in Python script

I have the following code fragment:
def database(self):
databasename=""
host=""
user=""
password=""
try:
self.fp=file("detailing.dat","rb")
except IOError:
self.fp=file("detailing.dat","wb")
pickle.dump([databasename,host,user,password],self.fp,-1)
self.fp.close()
selffp=file("detailing.dat","rb")
[databasename,host,user,password]=pickle.load(self.fp)
return
It has the error:
Traceback (most recent call last):
File "detailing.py", line 91, in ?
app=myApp()
File "detailing.py", line 20, in __init__
wx.App.__init__(self,redirect,filename,useBestVisual,clearSigInt)
File "/usr/lib64/python2.4/site-packages/wx-2.6-gtk2-unicode/wx/_core.py", line 7473, in __init__
self._BootstrapApp()
File "/usr/lib64/python2.4/site-packages/wx-2.6-gtk2-unicode/wx/_core.py", line 7125, in _BootstrapApp
return _core_.PyApp__BootstrapApp(*args, **kwargs)
File "detailing.py", line 33, in OnInit
self.database()
File "detailing.py", line 87, in database
[databasename,host,user,password]=pickle.load(self.fp)
File "/usr/lib64/python2.4/pickle.py", line 1390, in load
return Unpickler(file).load()
File "/usr/lib64/python2.4/pickle.py", line 872, in load
dispatch[key](self)
File "/usr/lib64/python2.4/pickle.py", line 894, in load_eof
raise EOFError
EOFError
What am I doing wrong?
Unless you've got a typo, the issue may be in this line where you assign the file handle to selffp not self.fp:
selffp=file("detailing.dat","rb")
If that is a typo, and your code actually opens the file to self.fp, then you may wish to verify that the file actually has contents (ie: that the previous pickle worked)... the error suggests that the file is empty.
Edit: In the comments to this answer, S. Lott has a nice summary of why the typo generated the error you saw that I'm pasting here for completeness of the answer: "selffp will be the unused opened file, and self.fp (the old closed file) will be used for the load".
Here's the version that I would recommend using:
def database(self):
databasename=""
host=""
user=""
password=""
try:
self.fp=open("detailing.dat","rb")
except IOError:
with open("detailing.dat", "wb") as fp:
pickle.dump([databasename,host,user,password],fp,-1)
self.fp=open("detailing.dat","rb")
[databasename,host,user,password]=pickle.load(self.fp)
return
As has been pointed out, there was a typo on self.fp. But here are a few other things that I notice that can cause problems.
First of all, you shouldn't be using the file constructor directly. You should instead use the built-in open function.
Secondly, you should avoid calling a file's close method outside a finally block. In this case, I've used python 2.6's with block. You can use this in Python 2.5 with the following command:
from __future__ import with_statement
This will prevent the file from being stuck open if an exception is thrown anywhere (as it will close the file when the with block is exited). Although this isn't the cause of your problem, it is an important thing to remember because if one of the file object's methods throws an exception, the file will get held open in sys.traceback indefinitely.
(note that you should probably accept Jarret Hardie's answer though, he caught the bug :-) )
I got this error when I didn't chose the correct mode to read the file (wb instead of rb). Changing back to rb was not sufficient to solve the issue. However, generating again a new clean pickle file solved the issue. It seems that not choosing the correct mode to open the binary file somehow "damages" the file which is then not openable whatsoever afterward.
But I am quite a beginner with Python so I may have miss something too.
While this is not a direct answer to the OP's question -- I happened upon this answer while searching for a reason for an EOFError when trying to unpickle a binary file with : pickle.load(open(filename, "r")).
import cPickle as pickle
A = dict((v, i) for i, v in enumerate(words))
with open("words.pkl", "wb") as f:
pickle.dump(A, f)
#...later open the file -- mistake:trying to read a binary with non-binary method
with open("words.pkl", "r") as f:
A =pickle.load(f) # EOFError
# change that to
with open ("words.pkl", "rb") as f: # notice the "rb" instead of "r"
A = pickle.load(f)

Categories