How to recover information from pkl file that was half-pickled? - python

I was pickling some variables and the process was interrupted. How do I recover data from the partially pickled file?
Pickled two lists of lists using:
import pickle
import sys
sys.setrecursionlimit(5000) # to get around max depth recursion error
with open('level4_half.pkl', 'wb') as f:
pickle.dump([level4_url, level4_desc], f)
Checked from windows explorer that the file is not empty (158MB)
Tried to unpickle file using:
with open('level4_half.pkl','rb') as f:
level4_url, level4_desc = pickle.load(f)
and encountered the error:
Traceback (most recent call last):
File "<ipython-input-18-32ed3a0e79d4>", line 2, in <module>
level4_url, level4_desc = pickle.load(f)
EOFError
(Previously I've tried to pickle and unpickle (fully pickled) files successfully using the commands above.)
I found a similar question here but I didn't use dill and am not sure if a partially pickled file is considered corrupted. My current technical skill is not so honed as to be able to quickly implement the solution there: "Read through the Python module's source code and you can probably find a way to hook all of the load_ methods to give you more information." If this turns out to be the same solution to my question, it will be great to get guidance on how I can "hook all of the load_ methods".
Thank you.

Related

How to unpickle pickle extension file

I have downloaded a pickle file:
foo.pickle.gz.pickle
The page from where I downloaded this file describes decompressing it to .pickle. I searched about python pickle, there are many pages that describe how to use in python, but not system wide. How can I decompress or unzip it? I am using ubuntu 16.04
Thanks in advance!
Pickle is the name of Python object serialisation module. So, you have to 'unpickle' it with a python script. Basic synthax is:
import pickle
with open('filename', 'rb') as pickled_one:
data = pickle.load(pickled_one)
More details are available here, on official Python documentation.
I do have to warn you about this, from that same page:
The pickle module is not secure against erroneous or maliciously
constructed data. Never unpickle data received from an untrusted or
unauthenticated source.
Pickle object can only be deserialized in python. You can't use non-python environments to deserialize the object. Please see the official page
If there are multiple pickled objects, as the answers above only unpickle 1 object.Use
pickle_list =[]
pickle_file = open(file_name, 'rb')
while True:
try:
pickle_list.append(pickle.load(pickle_file))
except EOFError:
break
pickle_file.close()
Not able to indent the code properly, but try and except are inside the while loop

recover a pickle corrupted file after getting OSError: [Errno 28] No space left on device [duplicate]

My program was killed while serializing data (a dict) to disk with dill. I cannot open the partially-written file now.
Is it possible to partially or fully recover the data? If so, how?
Here's what I've tried:
>>> dill.load(open(filename, 'rb'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lib/python3.4/site-packages/dill/dill.py", line 288, in load
obj = pik.load()
EOFError: Ran out of input
>>>
The file is not empty:
>>> os.stat(filename).st_size
31110059
Note: all data in the dictionary was comprised of python built-in types.
The pure-Python version of pickle.Unpickler keeps a stack around even if it encounters an error, so you can probably get at least something out of it:
import io
import pickle
# Use the pure-Python version, we can't see the internal state of the C version
pickle.Unpickler = pickle._Unpickler
import dill
if __name__ == '__main__':
obj = [1, 2, {3: 4, "5": ('6',)}]
data = dill.dumps(obj)
handle = io.BytesIO(data[:-5]) # cut it off
unpickler = dill.Unpickler(handle)
try:
unpickler.load()
except EOFError:
pass
print(unpickler.stack)
I get the following output:
[3, 4, '5', ('6',)]
The pickle data format isn't that complicated. Read through the Python module's source code and you can probably find a way to hook all of the load_ methods to give you more information.
I can't comment on the above answer, but to extend Blender's answer:
unpickler.metastack worked for me, dill v0.3.5.1 (though you could do it without dill, afaik). stack did exist, but was an empty list.
Also, with dill I got a UnpicklingError rather than EOFError. This could also be partly because of how my file got corrupted (ran out of disk space)

Pickle - cPickle.UnpicklingError: invalid load key, '?'

I was trying to load data by using this repository (uses some Python 2 originally): https://github.com/hashbangCoder/Text-Summarization
However I got an pickling error (using Python 2.7, I tried also Python2.6 with the same result):
>>> import cPickle as pickle
>>> pickle.load(open('train.bin', 'rb'))
Error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
cPickle.UnpicklingError: invalid load key, '?'.
I tried also with Python3 but without success (same for _pickle):
import pickle
pickle.load(open(path, 'rb'))
Error:
---------------------------------------------------------------------------
UnpicklingError Traceback (most recent call last)
<ipython-input-9-0129e43fa781> in <module>()
----> 1 data = pickle.load(open(path, 'rb'), encoding='utf8')
UnpicklingError: invalid load key, '\xd9'.
There are plenty of questions out there dealing with this error, but I haven't found anything that solves my problem.
I tried also on different systems and downloaded it twice to be sure that the file wasn't corrupted during the download. I'm also getting similar errors for the other files.
So I guess it may be some kind of version or encoding problem here?
Any idea what I can try to load the file?
Thanks in advance!
I recently had this issue when trying to unpickle a file... try using joblib instead:
fname = 'Path_to_filename.pkl'
model = joblib.load(open(fname, 'rb'))
Otherwise - it is likely a corrupted file.
I had this issue.
I transferred files using disk.
They were not properly saved.
The issue disappeared after I verified the save to disk.
i also had the same problem coz file was not correctly stored on the disk it was corrupted redownloaded it error was gone
I think you should use a file with PKL extension then it gonna work
train_data = pickle.load(open('train_data.pkl','rb'))

save string to a binary file in python

I would like to know a very basic thing of Python programming as I am a very basic programmer right now): how can I save a result (either a list, a string, or whatever) to a file in Python?
I've been searching a lot, but I couldn't find any good answer to this.
I was thinking about the ".write ()" method, but (for instance) it seems not working with strings, neither I know what it is supposed to do though.
So, my situation is that I have binary fils, which I would like to edit, therefore I found easy to convert them to strings, modify them, and now I'd like to save them i) back to binary files (jpegs images) and ii) in the folder I want.
How would I do that? Please I need some help.
UPDATE
Here is the script I'm trying to run:
import os, sys
newpath= r'C:/Users/Umberto/Desktop/temporary'
if not os.path.exists (newpath):
os.makedirs (newpath)
data= open ('C:/Users/Umberto/Desktop/Prove_Script/Varie/_BR_Browse.001_2065642654_1.BINARY', 'rb+')
edit_data= str (data.read () )
out_dir= os.path.join (newpath, 'feed', 'address')
data.close ()
# do my edits in a secon time...
edit_data.write (newpath)
edit_data.close ()
The error I get is:
AttributeError: 'str' object has no attribute 'write'
UPDATE_2
I tried to use pickle module to serialize my binary file, modify it and save it at the end, but still not getting it to work... This is what I've been trying so far:
import cPickle as pickle
binary= open ('C:\Users\Umberto\Desktop\Prove_Script\Varie\_BR_Browse.001_2065642654_1.BINARY', 'rb')
out= open ('C:\Users\Umberto\Desktop\Prove_Script\Varie\preview.txt', 'wb')
pickle.dump (binary, out, 1)
TypeError Traceback (most recent call last)
<ipython-input-6-981b17a6ad99> in <module>()
----> 1 pprint.pprint (pickle.dump (binary, out, 1))
C:\Python27\ArcGIS10.1\lib\copy_reg.pyc in _reduce_ex(self, proto)
68 else:
69 if base is self.__class__:
---> 70 raise TypeError, "can't pickle %s objects" % base.__name__
71 state = base(self)
72 args = (self.__class__, base, state)
TypeError: can't pickle file objects
Another thing I didn't get is that if I am supposed to create a file to poit to (in my case I had to create "out", otherwise I wouldn't have the right arguments for the pickle method) or it's not necessary.
Hope I'm getting close to the solution.
P.S.: I tried also with pickle.dumps (), not achieving a nicer result though...
If you're opening a binary file and saving another binary file you could do something like this:
with open('file.jpg', 'rb') as jpgFile:
contents = jpgFile.read()
contents = (some operations here)
with open('file2.jpg', 'wb') as jpgFile:
jpgFile.write(contents)
Some comments:
'rb' and 'wb' means read and write in binary mode respectively. More info on why 'b' is recommended when working with binary files here.
Python's with statement takes care of closing the file when exiting the block.
If you need to save lists, strings or other objects, and retrieving them later, use pickle as others pointed out.
You can use standard python module named "pickle".
You can read about it here: pickle documentation
Read and write any data structure will be very easy
pickle.dump(obj, file_handler) # for serialize object to file
pickle.load(file) # for deserialize from file
or you can serialize to string: pickle.dumps(..) and load from it: pickle.loads(...)

pickle.load Not Working

I got a file that contains a data structure with test results from a Windows user. He created this file using the pickle.dump command. On Ubuntu, I tried to load this test results with the following program:
import pickle
import my_module
f = open('results', 'r')
print pickle.load(f)
f.close()
But I get an error inside pickle module that no module named "my_module".
May the problem be due to corruption in the file, or maybe moving from Widows to Linux is the couse?
The problem lies in pickle's way of handling newline characters. Some of the line feed characters cripple module names in dumped / loaded data.
Storing and loading files in binary mode may help, but I was having trouble with them too. After a long time reading docs and searching I found that pickle handles several different "protocols" for storing data and due to backward compatibility it uses the oldest one: protocol 0 - the original ASCII protocol.
User can select modern protocol by specifing the protocol keyword while storing data in dump file, something like this:
pickle.dump(someObj, open("dumpFile.dmp", 'wb'), protocol=2)
or, by choosing the highest protocol available (currently 2)
pickle.dump(someObj, open("dumpFile.dmp", 'wb'), protocol=pickle.HIGHEST_PROTOCOL)
Protocol version is stored in dump file, so Load() function handles it automaticaly.
Regards
You should open the pickled file in binary mode, especially if you are using pickle on different platforms. See this and this questions for an explanation.

Categories