cPickle.load throwing ImportError in Python - python

I have Python 2.7.3 installed on my Windows 7 computer. When I run the following code
import nltk, json, cPickle, itertools
import numpy as np
from nltk.tokenize import word_tokenize
from pprint import pprint
t_given_a = json.load(open('conditional_probability.json','rb'))
a_unconditional = json.load(open('age.json','rb'))
t_unconditional = cPickle.load(open('freqdist.pkl','rb'))['distribution']
The command prompt gives me the error
"ImportError: No Module named Multiarray."
I'm fairly new to Python and I'm not exactly sure why this error happened. I searched other threads and many suggested to use 'rb' instead of 'r'. I have rb to begin with and it's still throwing me that error. Any suggestion?

When you pickle an object in python it saves its class as a string of package name + class name. On unpickle python tries to import that module and find that class for you to recreate an object. And if you don't have that module importable you'll get an ImportError.
Just install that Multiarray module, and if you don't know which is it then ask whoever you got that pickle file from.

From the docs:
Note that functions (built-in and user-defined) are pickled by “fully
qualified” name reference, not by value. This means that only the
function name is pickled, along with the name of the module the
function is defined in. Neither the function’s code, nor any of its
function attributes are pickled. Thus the defining module must be
importable in the unpickling environment, and the module must contain
the named object, otherwise an exception will be raised.
Similarly, classes are pickled by named reference, so the same restrictions in
the unpickling environment apply. Note that none of the class’s code
or data is pickled
[...] These restrictions are why picklable functions and classes must be
defined in the top level of a module

Related

ImportError for top-level package when trying to use dill to pickle entire package source code alongside instance

I have the following project structure:
Package1
|--__init__.py
|--__main__.py
|--Module1.py
|--Module2.py
where Module1.py contains something like:
import dill as pickle
import Package1.Module2
# from https://stackoverflow.com/questions/52402783/pickle-class-definition-in-module-with-dill
def mainify(obj):
import __main__
import inspect
import ast
s = inspect.getsource(obj)
m = ast.parse(s)
co = compile(m, "<string>", "exec")
exec(co, __main__.__dict__)
def Module1():
"""I hope the details of this class are not necessary for this example. I can add detail if necessary
"""
obj_to_pickle = Module1()
def write_session():
mainify(Module1)
mainify(Module2)
with FileHandler.open_file(...) as f:
pickle.dump(obj_to_pickle, f)
I run the code as a module via python -m Package1 ..., thus __main__.py is the entry point to package execution, though I hope these details aren't relevant (I can improve my example if necessary).
Now, when I try to load the pickled object, I get ModuleNotFoundError: No module named Package1.
How can tell dill in this situation to understand that Package1 is the package? The mainify function seems to be getting the modules' source code into the pickle, but I believe the import statement in Module1.py that is import Package1.Module2.py is causing the ImportError. How can I tell dill to understand the reference to Package1?
NOTE: this reference can be fixed by adding the directory that Package1 is in via sys.path.append. But the whole point of pickling the package source alongside the instance is to make pickled instance unpicklable without needed to do this.
Relevant posts:
Pickle class definition in module with dill
Why dill dumps external classes by reference, no matter what?
#courtyardz. I'm a contributor of dill and your question is similar to others that have been asked in the past.
First, let me explain that generally dill assumes that all the modules necessary to deserialize an object are importable in the "unpickling" environment. Therefore modules are almost always saved by reference, with the current exception of modules that are not properly installed, like local modules (e.g. located in the working directory) or modules at non-canonical paths added to sys.path. There's also a function that's able to save the complete state of a module, which can be restored afterwards, but not the module itself.
That said, what exactly do you need? It's to serialize an object alongside its class (including any objects in the module's namespace that it refers to), or it's really the whole module?
If you need to transfer the complete module to an interpreter session where it's not available, like in a different machine, this problem is under active discussion here: https://github.com/uqfoundation/dill/issues/123. There's no complete solution for this currently, but one possibility is to ship the module as a ZIP archive, and load it using the zipimport module (indirectly, by saving the zip file to disk, maybe in a temporary location, and adding its path to sys.path as described in Python's documentation).
If you just need to serialize an object with its class, note that doing such has the limitation that objects of that class pickled by separate calls to dill.dump() or dill.dumps() will end up having different (although identical) classes when unpickled. This may or may not be a problem. There's also an open discussion about forcing the serialization of a class by value: https://github.com/uqfoundation/dill/issues/424.
The workaround you are trying to use should work because dill pickles classes defined in the __main__ module by value, as well as "orphaned" classes, i.e. classes that can't be found in the module where they were defined. However, for this to work the object must be created by the __main__.Module1 class (I suppose this is a class, even though you used def instead of class in your code example), not the Package1.Module1.Module1 class. If the class references global objects in Module1 in its methods, you may need to use the option recurse=True with dill.dump(s).
A simpler workaround, that may not work for your specific case as it involves multiple modules, is to temporarily change the __module__ attribute of the class. For example, at a module's body:
import dill
class X:
pass
obj = X()
X.__module__ = None # temporarily orphan the class
with open('/path/to/file.pkl', 'wb') as file:
dill.dump(obj) # X will be pickled by value because __module__ is None
X.__module__ = __name__ # de-orphan the class
Going back to your example, if you can't create the object with the "mainified" class, you may change the object's class temporarily too:
obj_to_pickle = Module1()
def write_session():
mainify(Module1)
mainify(Module2)
obj_to_pickle.__class__ = __main__.Module1
with FileHandler.open_file(...) as f:
pickle.dump(obj_to_pickle, f)
obj_to_pickle.__class__ = Module1
If the object has instance attributes of types defined in Package1, it won't work however.

Does importing specific class from file instead of full file matters?

Most of the tutorials and books about Django or Flask import specific classes from files instead of importing the whole file.
For example, importing DataRequiered validator from wrtforms.validators is done via from wtforms import validators instead of importing it via import wtforms.validators as valids and then accessing DataRequiered with valids.DataRequiered.
My question is: Is there an reason for this ?
I thought to something like avoiding the loading a whole module for computation/memory optimization (is it really relevant?) ? Or is it simply to make the code more readable ?
My question is: Is there an reason for this ?
from module_or_package import something is the canonical pythonic idiom (when you only want to import something in your current namespace of course).
Also, import module_or_package.something only works if module_or_package is a package and something a submodule, it raises an ImportError(No module named something) if something is a function, class or whatever object defined in module_or_package, as can be seen in the stdlib with os.path (which is a submodule of the os.package) vs datetime.date (which is a class defined in the datetime module):
>>> import os.path as p
>>> p
<module 'posixpath' from '/home/bruno/.virtualenvs/blook/lib/python2.7/posixpath.pyc'>
vs
>>>import datetime.date as d
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named date
thought to something like avoiding the loading a whole module for computation/memory optimization (is it really relevant?)
Totally irrelevant - importing a given name from a module requires importing the whole module. Actually, this:
from module_or_package import something
is only syntactic sugar for
import module_or_package
something = module_or_package.something
del module_or_package
EDIT: You mention in a comment that
Right, but importing the whole module means loading it to the memory, which can be a reason for importing only a submodule/class
so it seems I failed to make the point clear: in Python, you can not "import only a submodule/class", period.
In Python, import, class and def are all executable statements (and actually just syntactic sugar for operation you can do 'manually' with functions and classes). Importing a module actually consists in executing all the code at the module's top-level (which will instanciate function and class objects) and create a module object (instance of module type) which attributes will be all names defined at the top-level via import, def and class statements or via explicit assignment. It's only when all this has been done that you can access any name defined in the module, and this is why, as I stated above,
from module import obj
is only syntactic sugar for
import module
obj = module.obj
del module
But (unless you do something stupid like defining a terabyte-huge dict or list in your module) this doesn't actually take that much time nor eat much ram, and a module is only effectively executed once per process the first time it's imported - then it's cached in sys.modules so subsequent imports only fetch it from cache.
Also, unless you actively prevents it, Python will cache the compiled version of the module (the .pyc files) and only recompile it if the .pyc is missing or older than the source .py file.
wrt/ packages and submodules, importing a submodule will also execute the package's __init__.py and build a module instance from it (IOW, at runtime, a package is also a module). Package initializer are canonically rather short, and actually quite often empty FWIW...
It depends, in the tutorial that was probably done for readability
Usually if you use most of the classes in a file, you import the file. If the files contains many classes but you only need a few, just import those.
It's both a matter of readability and optimization.

How does importing a function from a module work in Python?

I have a module some_module.py which contains the following code:
def testf():
print(os.listdir())
Now, in a file named test.py, I have this code:
import os
from some_module import testf
testf()
But executing test.py gives me NameError: name 'os' is not defined. I've already imported os in test.py, and testf is in the namespace of test.py. So why does this error occur?
import is not the same as including the content of the file as if you had typed it directly in place of the import statement. You might think it works this way if you're coming from a C background, where the #include preprocessor directive does this, but Python is different.
The import statement in Python reads the content of the file being imported and evaluates it in its own separate context - so, in your example, the code in some_module.py has no access to or knowledge of anything that exists in test.py or any other file. It starts with a "blank slate", so to speak. If some_module.py's code wants to access the os module, you have to import it at the top of some_module.py.
When a module is imported in Python, it becomes an object. That is, when you write
import some_module
one of the first things Python does is to create a new object of type module to represent the module being imported. As the interpreter goes through the code in some_module.py, it assigns any variables, functions, classes, etc. that are defined in that file to be attributes of this new module object. So in your example, the module object will have one attribute, testf. When the code in the function testf wants to access the variable os, it looks in the function itself (local scope) and sees that os is not defined there, so it then looks at the attributes of the module object which testf belongs to (this is the "global" scope, although it's not truly global). In your example, it will not see os there, so you get an error. If you add
import os
to some_module.py, then that will create an attribute of the module under the name os, and your code will find what it needs to.
You may also be interested in some other answers I've written that may help you understand Python's import statement:
Why import when you need to use the full name?
Does Python import statement also import dependencies automatically?
The name testf is in the namespace of test. The contents of the testf function are still in some_module, and don't have access to anything in test.
If you have code that needs a module, you need to import that module in the same file where that code is. Importing a module only imports it into the one file where you import it. (Multiple imports of the same module, in different files, won't incur a meaningful performance penalty; the actual loading of the module only happens once, and later imports of the same module just get a reference to the already-imported module.)
Importing a module adds its name as an attribute of the current scope. Since different modules have independent scopes, any code in some_module cannot use names in __main__ (the executed script) without having imported it first.

Why am I getting a no attribute error in my python script but not in my python interpreter? [duplicate]

There is one thing, that I do not understand.
Why does this
import scipy # happens with several other modules, too. I took scipy as an example now...
matrix = scipy.sparse.coo_matrix(some_params)
produce this error:
AttributeError: 'module' object has no attribute 'sparse'
This happens because the scipy module doesn't have any attribute named sparse. That attribute only gets defined when you import scipy.sparse.
Submodules don't automatically get imported when you just import scipy; you need to import them explicitly. The same holds for most packages, although a package can choose to import its own submodules if it wants to. (For example, if scipy/__init__.py included a statement import scipy.sparse, then the sparse submodule would be imported whenever you import scipy.)
Because you imported scipy, not sparse. Try from scipy import sparse?
AttributeError is raised when attribute of the object is not available.
An attribute reference is a primary followed by a period and a name:
attributeref ::= primary "." identifier
To return a list of valid attributes for that object, use dir(), e.g.:
dir(scipy)
So probably you need to do simply: import scipy.sparse
The default namespace in Python is "__main__". When you use import scipy, Python creates a separate namespace as your module name.
The rule in Pyhton is: when you want to call an attribute from another namespaces you have to use the fully qualified attribute name.

Having to evaluate a string to access a class from a module

I've loaded one of my modules <module my_modules.mymodule from .../my_modules/mymodule.pyc> with __import__.
Now I'm having the module saved in a variable, but I'd like to create an instance of the class mymodule. Thing is - I've gotten the module name passed as string into my script.
So I've got a variable containing the module, and I've got the module name but as string.
variable_containing_module.string_variable_with_correct_classname() doesn't work. Because it says there is no such thing in the module as "string_variable_with_correct_classname" - because it doesn't evaluate the string name. How can I do so?
Your problem is that __import__ is designed for use by the import internals, not really for direct usage. One symptom of this is that when importing a module from inside a package, the top-level package is returned rather than the module itself.
There are two ways to deal with this. The preferred way is to use importlib.import_module() instead of using __import__ directly.
If you're on an older version of Python that doesn't provide importlib, then you can create your own helper function:
import sys
def import_module(module_name):
__import__(module_name)
return sys.modules[module_name]
Also see http://bugs.python.org/issue9254

Categories