I have been trying to load a Python pickle file using the following Python function:
import os
import cPickle as pickle
def load_var(var_name):
fid = open(var_name + '.pkl', 'rb')
data = pickle.load(fid)
fid.close()
return data
but I keep on running into the following error:
ImportError: No module named sysid_functions
It's complaining about a module named sysid that is called in the pickle.py file. If I import pickle instead of (cPickle as pickle), I get the following more detailed error output:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1378, in load
return Unpickler(file).load()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1090, in load_global
klass = self.find_class(module, name)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1124, in find_class
__import__(module)
ImportError: No module named sysid_functions
Does anybody have any idea what might be causing the error?
Pickle files don't actually store class or module definitions. They only store attribute values. The benefit to this is that you can pickle out an object, update the class definition in your source code, then read in the pickled data and it will use the new class definition instead of having two different versions of the same class.
The downside is that pickle files aren't really transferrable between different python environments (and can't reliably even be transferred across different python files or modules). When it comes to loading a class or object, the pickle file uses the same import structure/namespace that was present when the pickle was created. That means that pickle files created from the same module that defines a class can only be reloaded in that same module (unless you manually import that class into your global namespace).
For all we know sysid_functions was a sub-module of some other package that you don't have installed. Even if you did have it installed, you would likely only be able to load the pickle file if you managed set up your module globals() the same way that the module that created the pickle file was set up.
Related
I am having a super weird error, the situation is the following:
I am writing some code in a package called hsltools. The Module is called em_combiner1_4. I recently changed the version from 1_3, and also changed all imports accordingly.
If I run my code now, I get the following error:
Traceback (most recent call last):
File "C:\Users\...\git_home\hsltools\hsltools\tests\emc_tester\EMC_tester.py", line 85, in <module>
combine_2d()
File "C:\Users\...\git_home\hsltools\hsltools\tests\emc_tester\EMC_tester.py", line 48, in combine_2d
elec_field2d = emc.EM(import_path+"2DelecMap.ef2.npz")
File "c:\users\...\git_home\hsltools\hsltools\em_combiner1_4.py", line 441, in __init__
load_npz()
File "c:\users\...\git_home\hsltools\hsltools\em_combiner1_4.py", line 251, in load_npz
self.em_type = data["metadata"][1]
File "C:\Users\...\Anaconda3\lib\site-packages\numpy\lib\npyio.py", line 254, in __getitem__
return format.read_array(bytes,
File "C:\Users\...\Anaconda3\lib\site-packages\numpy\lib\format.py", line 744, in read_array
array = pickle.load(fp, **pickle_kwargs)
ModuleNotFoundError: No module named 'hsltools.em_combiner1_3
The thing I don't get is the location of the error: ...\numpy\lib\format.py.
There is absolutaly no import there, and anyway, numpy would not import my module.
I am wondering if I am generally misunderstanding some basic concept, or if the error is wrong.
It's happening while deserializing pickled objects. Pickle stores the type information including the module name. When loading a pickled object, the module is imported to construct the final object.
Your pickle file contains an object that is part of hsltools.em_combiner1_3. So that is the module pickle tries to find.
I saved a bunch of stuff in one single pickle file with dill and some of the data are objects. Unfortunately, I didn't realize that if I changed my code structure e.g., heavy refactoring that dill wouldn't be able to find my code anymore for loading the objects - or at least that is my guess from the print statement:
------- Main Resume from Checkpoint --------
Traceback (most recent call last):
File "/Users/brandomiranda/opt/anaconda3/envs/meta_learning/lib/python3.9/site-packages/torch/serialization.py", line 607, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/Users/brandomiranda/opt/anaconda3/envs/meta_learning/lib/python3.9/site-packages/torch/serialization.py", line 882, in _load
result = unpickler.load()
File "/Users/brandomiranda/opt/anaconda3/envs/meta_learning/lib/python3.9/site-packages/torch/serialization.py", line 875, in find_class
return super().find_class(mod_name, name)
ModuleNotFoundError: No module named 'meta_learning'
python-BaseException
however, it's ok if it cannot find them. There is other data there that I still need or can use to restore everything.
How can I "force" dill to open things even if it cannot find the code? Or I can point to it to the paths for the locations of the new code if that is what it needs.
I was going through this question : How do I return the definition of a class in python?
But I am unable to display the class definition. I am getting the below error:
>>> class A:
... pass
...
>>> import inspect
>>> source_text = inspect.getsource(A)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\**\Python\Python36\lib\inspect.py", line 968, in getsource
lines, lnum = getsourcelines(object)
File "C:\Users\**\Python\Python36\lib\inspect.py", line 955, in getsourcelines
lines, lnum = findsource(object)
File "C:\Users\**\Python\Python36\lib\inspect.py", line 768, in findsource
file = getsourcefile(object)
File "C:\Users\**\Python\Python36\lib\inspect.py", line 684, in getsourcefile
filename = getfile(object)
File "C:\Users\**\Python\Python36\lib\inspect.py", line 654, in getfile
raise TypeError('{!r} is a built-in class'.format(object))
TypeError: <module '__main__' (<_frozen_importlib_external.SourceFileLoader object at 0x0000026A79293F60>)> is a built-in class
>>>
Can someone please advise what am I doing wrong here? Thanks.
The inspect.getsource() function only works if there is a text file available to load the source code.
You typed the definition of the class into the interactive interpreter, which doesn't keep the original source around when compiling that source into in-memory class and code objects.
Put your class definition into a module, import the module, and then use inspect.getsource().
inspect.getsource() works by first finding the module for a given object (for classes, by looking at the ClassObj.__module__ attribute for the module name, then getting the module via sys.modules[modulename]) then seeing if the module has a __file__ attribute from which a readable source file can be determined. If there is such a filename and it can be read, then the inspect module reads that file to then search for the class ClassName: line and giving you all lines from that point on with same or deeper indentation. The interactive interpreter executes everything in the __main__ module and there is no __file__ attribute for the interpreter, so any attempts at loading source code for objects defined there will simply fail.
If you just wanted to know what members the class defines, use dir() or help() on the object instead. You don't need to see the full source code for that information.
I have a problem with pickle.load() from a file. Dump and load is done in dill_read_write.py:
dill_read_write.py
import os
import dill
from contact_geometry import ContactGeometry
def write_pickle(obj, filename):
os.chdir(os.path.abspath(os.path.join(os.path.dirname(__file__))))
filename = os.path.join(os.getcwd(), filename)
with open(filename, 'wb') as output_:
dill.dump(obj, output_)
def read_pickle(filename):
with open(filename, 'rb') as input_:
return dill.load(input_)
if __name__ == "__main__":
read_pickle("ground_.pkl")
Saving object ContactGeometry data to pickle file is done when the PyQt application (project) is running. Function write() is called in moduleC.py:
moduleC.py
from contact_geometry import ContactGeometry
from moduleA.moduleB import dill_read_write
class Foo(FooItem):
def __init__(self,...):
...
def createGeometry(self):
contact_geometry_ = ContactGeometry()
# save object to pickle file
dill_read_write.write_pickle(contact_geometry_, "object_data.pkl")
The object is saved and pickle file is created. But when I run only the file dill_read_write.py to read (load) object data from pickle file I get the following error:
Traceback (most recent call last):
File "C:\projectName\moduleA\moduleB\dill_read_write.py", line 29, in <module>
read("ground_.pkl")
File "C:\projectName\moduleA\moduleB\dill_read_write.py", line 24, in read
return dill.load(input_)
File "C:\Python27\lib\site-packages\dill-0.2.2-py2.7.egg\dill\dill.py", line 199, in load
obj = pik.load()
File "C:\Python27\Lib\pickle.py", line 858, in load
dispatch[key](self)
File "C:\Python27\Lib\pickle.py", line 1090, in load_global
klass = self.find_class(module, name)
File "C:\Python27\lib\site-packages\dill-0.2.2-py2.7.egg\dill\dill.py", line 278, in find_class
return StockUnpickler.find_class(self, module, name)
File "C:\Python27\Lib\pickle.py", line 1124, in find_class
__import__(module)
ImportError: No module named moduleA.moduleB.contact_geometry
I searched a bit and found that dill can perform better than pickle with classes but I am having problems to implement it. I've also found that I have to implement __reduce__() in class ContactGeometry in file contact_geometry.py.
contact_geometry.py
class ContactGeometry(object):
def __init__(self, ...):
...
def __reduce__(self):
return (self.__class__, (os.path.realpath(__file__))
But I am not sure what should return this method? How could I successfully load pickle file from the current situation?
Below is the project structure, if it is any help.
I'm the dill author. It's hard to tell how you are running the code, but it looks like the issue is that one way you are running the code and the module name as in #Antti Haapala's answer. His suggestions are also good ones to follow.
I'll add this… You need to make sure that (1) moduleA.moduleB.contact_geometry is on the PYTHONPATH, and (2) you are not dumping the module as __main__.moduleB.contact_geometry and trying to load it as moduleA.moduleB.contact_geometry -- dill treats __main__ as if it were a module (for the most part).
You shouldn't need to add __reduce__ methods to your classes, however.
You cannot run a python file from within a package like that; it wouldn't find the toplevel package names. I'd propose any of the following:
Write a start script in at the top level (where the main.py is), that imports and runs the read_write_dill from moduleA.moduleB
Instead in the top level directory, where the main is, you can run
that module with python -m moduleA.moduleB.dill_read_write.
Or, my preferred alternative, write a setup.py for your project and write a script for that utility.
I am using Pycharm. First of all whenever any module is imported in Pycharm. The complete import line fades out. But in case of import shelve doesn't fade out. Also when I run the file i get following errors:
Traceback (most recent call last):
File "/Users/abhimanyuaryan/PycharmProjects/shelve/main.py", line 13, in <module>
s = shelve.open("file.dat")
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/shelve.py", line 239, in open
return DbfilenameShelf(filename, flag, protocol, writeback)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/shelve.py", line 223, in __init__
Shelf.__init__(self, dbm.open(filename, flag), protocol, writeback)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/dbm/__init__.py", line 88, in open
raise error[0]("db type could not be determined")
dbm.error: db type could not be determined
Here's my code:
import shelve
s = shelve.open("file.dat")
s["first"] = (1182, 234, 632, 4560)
s["second"] = {"404": "file is not present", "googling": "Google to search your content"}
s[3] = ["abhilasha", "jyoti", "nirmal"]
s.sync()
print(s["first"])
print(s["second"])
print(s[3])
The OP explains in a comment that 'file.dat' was created by pickle -- and that's the problem! pickle doesn't use any DB format -- it uses its own! Create file.dat with shelve in the first place (i.e run shelve when file.dat doesn't exist yet and save the stuff into it) and you'll be fine.
OP in comment: "I still don't get what's the problem in this case". Answer: the problem is that pickle does not create a file in any of the DB formats shelve can use. Use a single module for serializing and deserializing -- either just pickle, or, just shelve -- and it will work SO much better:-).
There is one bug with anydb https://bugs.python.org/issue13007 that could not use the right identification for gdbm files.
So if you are trying to open a valid gdbm file with shelve and is thorwing that error use this instead:
mod = __import__("gdbm")
file = shelve.Shelf(mod.open(filename, flag))
You should not put the file extension.
Do as follows:
`
import shelve
s = shelve.open("file")
s["first"] = (1182, 234, 632, 4560)
s["second"] = {"404": "file is not present", "googling": "Google to search your content"}
s[3] = ["abhilasha", "jyoti", "nirmal"]
s.sync()
print(s["first"])
print(s["second"])
print(s[3])
`
It is important to note that if the shelf file was somewhere else you need to add dir name too