python tarfile unpredictable error tarfile.ReadError: empty header

python tarfile unpredictable error tarfile.ReadError: empty header - python

When opening a tar file with the python tarfile module like
tarfile.open(path, mode='a')
i get the error
Traceback (most recent call last):
File "/home/IPP-HGW/dboe/anaconda2/lib/python2.7/tarfile.py", line 1711, in open
return cls.taropen(name, mode, fileobj, **kwargs)
File "/home/IPP-HGW/dboe/anaconda2/lib/python2.7/tarfile.py", line 1721, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/home/IPP-HGW/dboe/anaconda2/lib/python2.7/tarfile.py", line 1601, in __init__
raise ReadError(str(e))
tarfile.ReadError: empty header
I have tried to reproduce this for one day now, but can not find a general rule, when this occurs and when not. Thus it is impossible to provide a minimal example. Can anybody explain to me when this error can occur and how the header can become empty?
Many thanks in advance,
Daniel

This exception is raised when the buffer length is zero while parsing headers for the tarfile.
It is raised for an empty archive.
Reference:
https://github.com/python/cpython/blob/master/Lib/tarfile.py#L1028
http://bugs.python.org/issue6123

Related

Why don't I get a full traceback from a saved exception - and how do I get and save the full trace?

I have a user submitted data validation interface for a scientific site in django, and I want the user to be able to submit files of scientific data that will aid them in resolving simple problems with their data before they're allowed to make a formal submission (to reduce workload on the curators who actually load the data into our database).
The validation interface re-uses the loading code, which is good for code re-use. It has a "validate mode" that doesn't change the database. Everything is in an atomic transaction block and it gets rolled back in any case when it runs in validate mode.
I'm in the middle of a refactor to alleviate a problem. The problem is that the user has to submit the files multiple times, each time, getting the next error. So I've been refining the code to be able to "buffer" the exceptions in an array and only really stop if any error makes further processing impossible. So far, it's working great.
Since unexpected errors are expected in this interface (because the data is complex and lab users are continually finding new ways to screw up the data), I am catching and buffering any exception and intend to write custom exception classes for each case as I encounter them.
The problem is that when I'm adding new features and encounter a new error, the tracebacks in the buffered exceptions aren't being fully preserved, which makes it annoying to debug - even when I change the code to raise and immediately catch the exception so I can add it to the buffer with the traceback. For example, in my debugging, I may get an exception from a large block of code, and I can't tell what line it is coming from.
I have worked around this problem by saving the traceback as a string inside the buffered exception object, which just feels wrong. I had to play around in the shell to get it to work. Here is my simple test case to demonstrate what's happening. It's reproducible for me, but apparently not for others who try this toy example - and I don't know why:
import traceback
class teste(Exception):
"""This is an exception class I'm going to raise to represent some unanticipated exception - for which I will want a traceback."""
pass
def buf(exc, args):
"""This represents my method I call to buffer an exception, but for this example, I just return the exception and keep it in main in a variable. The actual method in my code appends to a data member array in the loader object."""
try:
raise exc(*args)
except Exception as e:
# This is a sanity check that prints the trace that I will want to get from the buffered exception object later
print("STACK:")
traceback.print_stack()
# This is my workaround where I save the trace as a string in the exception object
e.past_tb = "".join(traceback.format_stack())
return e
The above example raises the exception inside buf. (My original code supports both raising the exception for the first time and buffering an already raised and caught exception. In both cases, I wasn't getting a saved full traceback, so I'm only providing the one example case (where I raise it inside the buf method).
And here's what I see when I use the above code in the shell. This first call shows my sanity check - the whole stack, which is what I want to be able to access later:
In [5]: es = buf(teste, ["This is a test"])
STACK:
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/manage.py", line 22, in <module>
main()
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/manage.py", line 18, in main
execute_from_command_line(sys.argv)
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
utility.execute()
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/django/core/management/__init__.py", line 413, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/django/core/management/base.py", line 354, in run_from_argv
self.execute(*args, **cmd_options)
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/django/core/management/base.py", line 398, in execute
output = self.handle(*args, **options)
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/django/core/management/commands/shell.py", line 100, in handle
return getattr(self, shell)(options)
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/django/core/management/commands/shell.py", line 36, in ipython
start_ipython(argv=[])
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/IPython/__init__.py", line 126, in start_ipython
return launch_new_instance(argv=argv, **kwargs)
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/traitlets/config/application.py", line 846, in launch_instance
app.start()
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/IPython/terminal/ipapp.py", line 356, in start
self.shell.mainloop()
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/IPython/terminal/interactiveshell.py", line 566, in mainloop
self.interact()
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/IPython/terminal/interactiveshell.py", line 557, in interact
self.run_cell(code, store_history=True)
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2914, in run_cell
result = self._run_cell(
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2960, in _run_cell
return runner(coro)
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/IPython/core/async_helpers.py", line 78, in _pseudo_sync_runner
coro.send(None)
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3185, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3377, in run_ast_nodes
if (await self.run_code(code, result, async_=asy)):
File "/Users/rleach/PROJECT-local/TRACEBASE/tracebase/.venv/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3457, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-92f4a0db918d>", line 1, in <module>
es = buf(teste, ["This is a test"])
File "<ipython-input-2-86e515dc1ec1>", line 6, in buf
traceback.print_stack()
But this is what I see when I want to see the original traceback from the es object (i.e. the buffered exception) later. It only has the last item from the traceback. This is exactly what I see in the original source code - a single item for the line of code inside the buffer method:
In [8]: traceback.print_exception(type(es), es, es.__traceback__)
Traceback (most recent call last):
File "<ipython-input-2-86e515dc1ec1>", line 3, in buf
raise exc(*args)
teste: This is a test
My workaround suffices for now, but I'd like to have a proper traceback object.
I debugged the issue by re-cloning our repo in a second directory to make sure I hadn't messed up my sandbox. I guess I should try this on another computer too - my office mac. But can anyone point me in the right direction to debug this issue? What could be the cause for losing the full traceback?

Python has a really weird way of building exception tracebacks. You might expect it to build the traceback when the exception is created, or when it's raised, but that's not how it works.
Python builds a traceback as an exception propagates. Every time the exception propagates up to a new stack frame, a traceback entry for that stack frame is added to the exception's traceback.
This means that an exception's traceback only goes as far as the exception itself propagates. If you catch it (and don't reraise it), the traceback only goes up to the point where it got caught.
Unfortunately, your workaround is about as good as it gets. You're not really losing the full traceback, because a full traceback was never created. If you want full stack info, you need to record it yourself, with something like the traceback.format_stack() function you're currently using.

shutil.move() throws TypeError: coercing to Unicode: need string or buffer, instance found

I am trying to use shutil.move but getting error as below:
Traceback (most recent call last):
File "packageTest.py", line 202, in <module>
writeResult()
File "packageTest.py", line 92, in writeResult
shutil.move(tempfile,'report.csv')
File "/usr/local/lib/python2.7/shutil.py", line 294, in move
os.rename(src, real_dst)
TypeError: coercing to Unicode: need string or buffer, instance found
I am new to python and not sure what is wrong in this. Can anybody help me in this? I checked even whether i am trying to open the file twice but it is not so.
def writeResult():
tempfile=NamedTemporaryFile(delete=False)
result='FAIL'
with open('report.csv','rb') as infile,tempfile:
csvreader=csv.DictReader(infile)
fieldnames=csvreader.fieldnames
csvwriter=csv.DictWriter(tempfile,fieldnames)
csvwriter.writeheader()
for node,row in enumerate(csvreader,1):
if(row['EXIST_IN_BOM']=='Yes'):
if(row['EXIST_IN_TAR']=='No'):
csvwriter.writerow(dict(row,RESULT='FAIL'))
csvwriter.writerow(dict(row,COMMENT='File is missing in package'))
else:
result='PASS'
else:
if(row['EXIST_IN_TAR']=='Yes'):
csvwriter.writerow(dict(row,RESULT='FAIL'))
csvwriter.writerow(dict(row,COMMENT='File is missing in BOM'))
if(row['SIZE_IN_TAR']==row['SIZE_IN_ALPHA']):
result='PASS'
else:
csvwriter.writerow(dict(row,RESULT='FAIL'))
csvwriter.writerow(dict(row,COMMENT='File size not same'))
if(result=='PASS'):
csvwriter.writerow(dict(row,RESULT='PASS'))
shutil.move(tempfile,'report.csv')
return
Edit: The below code works though:
tempfile=NamedTemporaryFile(delete=False)
with open('dict.csv','rb') as infile,tempfile:
csvreader=csv.DictReader(infile)
fieldnames=csvreader.fieldnames
csvwriter=csv.DictWriter(tempfile,fieldnames)
csvwriter.writeheader()
for node,row in enumerate(csvreader,1):
if(row['EmpId']=='119093'):
csvwriter.writerow(dict(row,Rank='1'))
if(row['EmpId']=='119094'):
csvwriter.writerow(dict(row,Rank='2'))
shutil.move(tempfile.name,'dict.csv')

shutil.move() takes two filenames, not a file-like and a filename. If you want to copy the contents of a file-like to a filename then open the filename in write mode and use shutil.copyfileobj() instead.

UnicodeDecodeError with the sys.stdout inside traceback.print_exc()

I am getting UnicodeDecodeError with the traceback.print_exc(file=sys.stdout). I am using Python3.4 and did not get the problem with Python2.7.
Am I missing something here? How can I make sure that sys.stdout passes the correct encoded/decoded to the traceback.print_exc() ?
My code looks something similar to this:
try:
# do something which might throw an exception
except Exception as e:
# do something
traceback.print_exc(file=sys.stdout) # Here I am getting the error
Error log:
traceback.print_exc(file=sys.stdout)
File "C:\Python34\lib\traceback.py", line 252, in print_exc
print_exception(*sys.exc_info(), limit=limit, file=file, chain=chain)
File "C:\Python34\lib\traceback.py", line 169, in print_exception
for line in _format_exception_iter(etype, value, tb, limit, chain):
File "C:\Python34\lib\traceback.py", line 153, in _format_exception_iter
yield from _format_list_iter(_extract_tb_iter(tb, limit=limit))
File "C:\Python34\lib\traceback.py", line 18, in _format_list_iter
for filename, lineno, name, line in extracted_list:
File "C:\Python34\lib\traceback.py", line 65, in _extract_tb_or_stack_iter
line = linecache.getline(filename, lineno, f.f_globals)
File "C:\Python34\lib\linecache.py", line 15, in getline
lines = getlines(filename, module_globals)
File "C:\Python34\lib\linecache.py", line 41, in getlines
return updatecache(filename, module_globals)
File "C:\Python34\lib\linecache.py", line 127, in updatecache
lines = fp.readlines()
File "C:\Python34\lib\codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 5213: invalid continuation byte

The traceback module wants to include source code lines with the traceback. Normally, a traceback consists only of pointers to source code, not the source code itself, as Python has been executing the compiled bytecode. In the bytecode are hints as to exactly what source code line the bytecode came from.
To then show the sourcecode, the actual source is read from disk, using the linecache module. This also means that Python has to determine the encoding for those files too. The default encoding for a Python 3 source file is UTF-8, but you can set a PEP 263 comment to let Python know if you are deviating from that.
Because source code is read after the code is already loaded and a traceback took place, it is possible that you changed the source code after starting the script, or there was a byte cache file (in a __pycache__ subdirectory) that appeared to be fresh but was no longer matching your source files.
Either way, when you started the script, Python was able to re-use a bytecache file or read the source code just fine and run your code. But when the traceback was being printed, at least one of the source code files was no longer decodable as UTF-8.
If you can reliably reproduce the traceback (so start the Python script again without encoding problems but printing the traceback fails), it is most likely a stale bytecode file somewhere, one that could even hold pointers to a filename that now contains nothing but binary data, not plain source.
If you know how to use the pdb module, add a pdb.set_trace() call before the traceback.print_exc() call and trace what filenames are being loaded from by the linecache module.
Otherwise edit your C:\Python34\lib\traceback.py file and insert a print('Loading {} from the linecache'.format(filename)) statement just before the linecache.checkcache(filename) line in the _extract_tb_or_stack_iter function.

Strange exception with python's cgitb and inspect.py

I have a function that decodes an exception and pushes the info to a file. Following is what I do basically:
exc_info = sys.exc_info
txt = cgitb.text(exc_info)
Using this, I got the following exception trace:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\job_queue\utils\start_workers.py", line 40, in start_worker
worker_loop(r_jq, worktype, worker_id)
File "C:\Python27\lib\site-packages\job_queue\server\jq_worker.py", line 55, in worker_loop
_job_machine(*job)
File "C:\Python27\lib\site-packages\job_queue\server\jq_worker.py", line 34, in _job_machine
do_verbose_exception()
File "C:\Python27\lib\site-packages\job_queue\server\errors.py", line 23, in do_verbose_exception
txt = cgitb.text(exc_info)
File "C:\Python27\lib\cgitb.py", line 214, in text
formatvalue=lambda value: '=' + pydoc.text.repr(value))
File "C:\Python27\lib\inspect.py", line 885, in formatargvalues
specs.append(strseq(args[i], convert, join))
File "C:\Python27\lib\inspect.py", line 840, in strseq
return convert(object)
File "C:\Python27\lib\inspect.py", line 882, in convert
return formatarg(name) + formatvalue(locals[name])
KeyError: 'connection'
I ran the code multiple times after this exception, but couldn't reproduce it. However, I didn't find any reference in files cgitb.py or inspect.py to a dict with 'connection' key either.
Will anybody know if this is an issue with python's cgitb or inspect files? Any helpful inputs?

You passed a wrong type to text function
below is the correct way.
cgitb.text((sys.last_type, sys.last_value, sys.last_traceback))

Im not sure specifically why this exception is happening, but have you read the docs for cgitb module? It seems that since python 2.2 it has supported writing exceptions to a file:
http://docs.python.org/library/cgitb.html
Probably something like:
cgitb.enable(0, "/my/log/directory") # or 1 if you want to see it in browser
As far as your actual traceback, are you sure 'connection' isnt a name you are using in your own code? 'inspect' module is most likely trying to examine your own code to build the cgi traceback info and getting a bad key somewhere?

EOFError in Python script

I have the following code fragment:
def database(self):
databasename=""
host=""
user=""
password=""
try:
self.fp=file("detailing.dat","rb")
except IOError:
self.fp=file("detailing.dat","wb")
pickle.dump([databasename,host,user,password],self.fp,-1)
self.fp.close()
selffp=file("detailing.dat","rb")
[databasename,host,user,password]=pickle.load(self.fp)
return
It has the error:
Traceback (most recent call last):
File "detailing.py", line 91, in ?
app=myApp()
File "detailing.py", line 20, in __init__
wx.App.__init__(self,redirect,filename,useBestVisual,clearSigInt)
File "/usr/lib64/python2.4/site-packages/wx-2.6-gtk2-unicode/wx/_core.py", line 7473, in __init__
self._BootstrapApp()
File "/usr/lib64/python2.4/site-packages/wx-2.6-gtk2-unicode/wx/_core.py", line 7125, in _BootstrapApp
return _core_.PyApp__BootstrapApp(*args, **kwargs)
File "detailing.py", line 33, in OnInit
self.database()
File "detailing.py", line 87, in database
[databasename,host,user,password]=pickle.load(self.fp)
File "/usr/lib64/python2.4/pickle.py", line 1390, in load
return Unpickler(file).load()
File "/usr/lib64/python2.4/pickle.py", line 872, in load
dispatch[key](self)
File "/usr/lib64/python2.4/pickle.py", line 894, in load_eof
raise EOFError
EOFError
What am I doing wrong?

Unless you've got a typo, the issue may be in this line where you assign the file handle to selffp not self.fp:
selffp=file("detailing.dat","rb")
If that is a typo, and your code actually opens the file to self.fp, then you may wish to verify that the file actually has contents (ie: that the previous pickle worked)... the error suggests that the file is empty.
Edit: In the comments to this answer, S. Lott has a nice summary of why the typo generated the error you saw that I'm pasting here for completeness of the answer: "selffp will be the unused opened file, and self.fp (the old closed file) will be used for the load".

Here's the version that I would recommend using:
def database(self):
databasename=""
host=""
user=""
password=""
try:
self.fp=open("detailing.dat","rb")
except IOError:
with open("detailing.dat", "wb") as fp:
pickle.dump([databasename,host,user,password],fp,-1)
self.fp=open("detailing.dat","rb")
[databasename,host,user,password]=pickle.load(self.fp)
return
As has been pointed out, there was a typo on self.fp. But here are a few other things that I notice that can cause problems.
First of all, you shouldn't be using the file constructor directly. You should instead use the built-in open function.
Secondly, you should avoid calling a file's close method outside a finally block. In this case, I've used python 2.6's with block. You can use this in Python 2.5 with the following command:
from __future__ import with_statement
This will prevent the file from being stuck open if an exception is thrown anywhere (as it will close the file when the with block is exited). Although this isn't the cause of your problem, it is an important thing to remember because if one of the file object's methods throws an exception, the file will get held open in sys.traceback indefinitely.
(note that you should probably accept Jarret Hardie's answer though, he caught the bug :-) )

I got this error when I didn't chose the correct mode to read the file (wb instead of rb). Changing back to rb was not sufficient to solve the issue. However, generating again a new clean pickle file solved the issue. It seems that not choosing the correct mode to open the binary file somehow "damages" the file which is then not openable whatsoever afterward.
But I am quite a beginner with Python so I may have miss something too.

While this is not a direct answer to the OP's question -- I happened upon this answer while searching for a reason for an EOFError when trying to unpickle a binary file with : pickle.load(open(filename, "r")).
import cPickle as pickle
A = dict((v, i) for i, v in enumerate(words))
with open("words.pkl", "wb") as f:
pickle.dump(A, f)
#...later open the file -- mistake:trying to read a binary with non-binary method
with open("words.pkl", "r") as f:
A =pickle.load(f) # EOFError
# change that to
with open ("words.pkl", "rb") as f: # notice the "rb" instead of "r"
A = pickle.load(f)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.