I am developing a Python package which does work by taking in user-defined objects all of which are instances of a class which I wrote. The way I have designed is, user passes his/her objects as defined in one or more python scripts (see example below).
I want to access the objects which user defines in the scripts. How can I do that?
I looked at import by filename but to no avail. I even went on to use imp.load_source but didn't solve.
Some typical user-defined objects
Assume for the sake of the problem, all methods are defined in Base. I understand what I am asking for leads to arbitrary code execution, so I am open to suggestions wherein users can pass their instances of the Base class arbitrarily but safely.
foo.py has the following code:
from package import Base
foo = Base('foo')
foo.AddBar('bar', 'bar')
foo.AddCow('moo')
ooo.py :
from package import Base
ooo = Base('ooo')
ooo.AddBar('ooo','ooo')
ooo.AddO(12)
And I run my main program as,
main_program -p foo.py ooo.py
I want to be able to access foo, ooo in the main_program body.
Tried:
I am using python2.7 I know I am using older Python, I will make the move soon
importlib
Tried importlib.import_module but it throws ImportError: Import by filename is not supported.
__import__
I tried using __import__('/path/to/file.py') but it throws the same ImportError: Import by filename is not supported.
At this point, any solution which lets me use objects defined in user-input scripts works.
If you are okay with skipping the .py in the filename, this can be solved by asking the user to pass the module name (basically the file name without the py) extension
Referring to this answer and this book, here is an example
tester.py
class A:
def write(self):
print("hello")
obj = A()
Now we want to dynamically access obj from a file called test.py, so we do
python test.py tester
And what does test.py do? It imports the module based on name and access it methods. Note that this assumes you are not concerned about the order in which the user passes the objects
test.py
import sys
# Get all parameters (sys.argv[0] is the file name so skipping that)
module_names = sys.argv[1:]
# I believe you can also do this using importlib
modules = list(map(__import__, module_names))
# modules[0] is now "tester"
modules[0].obj.write()
Mapping this to your example, I think this should be
foo.py
from package import Base
foo = Base('foo')
foo.AddBar('bar', 'bar')
foo.AddCow('moo')
ooo.py
from package import Base
ooo = Base('ooo')
ooo.AddBar('ooo','ooo')
ooo.AddO(12)
And run main program as
python main_program.py foo ooo
Have you tried
from (whatever the file name is) import *
Also don’t include .py in the file name.
Related
I have the following project structure:
Package1
|--__init__.py
|--__main__.py
|--Module1.py
|--Module2.py
where Module1.py contains something like:
import dill as pickle
import Package1.Module2
# from https://stackoverflow.com/questions/52402783/pickle-class-definition-in-module-with-dill
def mainify(obj):
import __main__
import inspect
import ast
s = inspect.getsource(obj)
m = ast.parse(s)
co = compile(m, "<string>", "exec")
exec(co, __main__.__dict__)
def Module1():
"""I hope the details of this class are not necessary for this example. I can add detail if necessary
"""
obj_to_pickle = Module1()
def write_session():
mainify(Module1)
mainify(Module2)
with FileHandler.open_file(...) as f:
pickle.dump(obj_to_pickle, f)
I run the code as a module via python -m Package1 ..., thus __main__.py is the entry point to package execution, though I hope these details aren't relevant (I can improve my example if necessary).
Now, when I try to load the pickled object, I get ModuleNotFoundError: No module named Package1.
How can tell dill in this situation to understand that Package1 is the package? The mainify function seems to be getting the modules' source code into the pickle, but I believe the import statement in Module1.py that is import Package1.Module2.py is causing the ImportError. How can I tell dill to understand the reference to Package1?
NOTE: this reference can be fixed by adding the directory that Package1 is in via sys.path.append. But the whole point of pickling the package source alongside the instance is to make pickled instance unpicklable without needed to do this.
Relevant posts:
Pickle class definition in module with dill
Why dill dumps external classes by reference, no matter what?
#courtyardz. I'm a contributor of dill and your question is similar to others that have been asked in the past.
First, let me explain that generally dill assumes that all the modules necessary to deserialize an object are importable in the "unpickling" environment. Therefore modules are almost always saved by reference, with the current exception of modules that are not properly installed, like local modules (e.g. located in the working directory) or modules at non-canonical paths added to sys.path. There's also a function that's able to save the complete state of a module, which can be restored afterwards, but not the module itself.
That said, what exactly do you need? It's to serialize an object alongside its class (including any objects in the module's namespace that it refers to), or it's really the whole module?
If you need to transfer the complete module to an interpreter session where it's not available, like in a different machine, this problem is under active discussion here: https://github.com/uqfoundation/dill/issues/123. There's no complete solution for this currently, but one possibility is to ship the module as a ZIP archive, and load it using the zipimport module (indirectly, by saving the zip file to disk, maybe in a temporary location, and adding its path to sys.path as described in Python's documentation).
If you just need to serialize an object with its class, note that doing such has the limitation that objects of that class pickled by separate calls to dill.dump() or dill.dumps() will end up having different (although identical) classes when unpickled. This may or may not be a problem. There's also an open discussion about forcing the serialization of a class by value: https://github.com/uqfoundation/dill/issues/424.
The workaround you are trying to use should work because dill pickles classes defined in the __main__ module by value, as well as "orphaned" classes, i.e. classes that can't be found in the module where they were defined. However, for this to work the object must be created by the __main__.Module1 class (I suppose this is a class, even though you used def instead of class in your code example), not the Package1.Module1.Module1 class. If the class references global objects in Module1 in its methods, you may need to use the option recurse=True with dill.dump(s).
A simpler workaround, that may not work for your specific case as it involves multiple modules, is to temporarily change the __module__ attribute of the class. For example, at a module's body:
import dill
class X:
pass
obj = X()
X.__module__ = None # temporarily orphan the class
with open('/path/to/file.pkl', 'wb') as file:
dill.dump(obj) # X will be pickled by value because __module__ is None
X.__module__ = __name__ # de-orphan the class
Going back to your example, if you can't create the object with the "mainified" class, you may change the object's class temporarily too:
obj_to_pickle = Module1()
def write_session():
mainify(Module1)
mainify(Module2)
obj_to_pickle.__class__ = __main__.Module1
with FileHandler.open_file(...) as f:
pickle.dump(obj_to_pickle, f)
obj_to_pickle.__class__ = Module1
If the object has instance attributes of types defined in Package1, it won't work however.
Is there a way (using only python. i.e.: without a bash script nor another language code) to call a specific function in every script inside a folder without needing to import all of them explicitly.
For example, let's say that this is my structure:
main.py
modules/
module1.py
module2.py
module3.py
module4.py
and every moduleX.py has this code:
import os
def generic_function(caller):
print('{} was called by {}'.format(os.path.basename(__file__), caller))
def internal_function():
print('ERROR: Someone called an internal function')
while main.py has this code:
import modules
import os
for module in modules.some_magic_function():
module.generic_function(os.path.basename(__file__))
So if I run main.py, I should get this output:
module1.py was called by main.py
module2.py was called by main.py
module3.py was called by main.py
module4.py was called by main.py
*Please note that internal_function() shouldn't be called (unlike this question). Also, I don't want to declare explicitly every module file even on a __init__.py
By the way, I don't mind to use classes for this. In fact it could be even better.
You can use exec or eval to do that. So it would go roughly this way (for exec):
def magic_execute():
import os
import glob
for pyfl in glob.glob(os.path(MYPATH, '*.py'):
with open(pyfl, 'rt') as fh:
pycode = fh.read()
pycode += '\ngeneric_function({})'.format(__file__)
exec(pycode)
The assumption here is that you are not going to import the modules at all.
Please note, that there are numerous security issues related to using exec in such a non-restricted manner. You can increase security a bit.
While sophros' approach is quickly and enough for implicitly importing the modules, you could have issues related to controlling every module or with complex calls (like having conditions for each calls). So I went with another approeach:
First I created a class with the function(s) (now methods) declared. With this I can avoid checking if the method exists as I can use the default one if I didn't declare it:
# main.py
class BaseModule:
def __init__(self):
# Any code
def generic_function(self, caller):
# This could be a Print (or default return value) or an Exception
raise Exception('generic_function wasn\'t overridden or it was used with super')
Then I created another class that extends the BaseModule. Sadly I wasn't able to get a good way for checking inherence without knowing the name of the child class so I used the same name for every module:
# modules/moduleX.py
from main import BaseModule
class GenericModule(BaseModule):
def __init__(self):
BaseModule.__init__(self)
# Any code
def generic_function(self, caller):
print('{} was called by {}'.format(os.path.basename(__file__), caller))
Finally, in my main.py, I used the importlib for importing the modules dynamically and saving an instance for each one, so I can use them later (for sake of simplicity I didn't save them in the following code, but it's easy as using a list and appending every instance on it):
# main.py
import importlib
import os
if __name__ == '__main__':
relPath = 'modules' # This has to be relative to the working directory
for pyFile in os.listdir('./' + relPath):
# just load python (.py) files except for __init__.py or similars
if pyFile.endswith('.py') and not pyFile.startswith('__'):
# each module has to be loaded with dots instead of slashes in the path and without the extension. Also, modules folder must have a __init___.py file
module = importlib.import_module('{}.{}'.format(relPath, pyFile[:-3]))
# we have to test if there is actually a class defined in the module. This was extracted from [1]
try:
moduleInstance = module.GenericModule(self)
moduleInstance.generic_function(os.path.basename(__file__)) # You can actually do whatever you want here. You can save the moduleInstance in a list and call the function (method) later, or save its return value.
except (AttributeError) as e:
# NOTE: This will be fired if there is ANY AttributeError exception, including those that are related to a typo, so you should print or raise something here for diagnosting
print('WARN:', pyFile, 'doesn\'t has GenericModule class or there was a typo in its content')
References:
[1] Check for class existence
[2] Import module dynamically
[3] Method Overriding in Python
I have defined several classes in a single python file. My wish is to create a library with these. I would ideally like to import the library in such a way that I can use the classes without a prefix (like mylibrary.myclass() as opposed to just myclass() ), if that's what you can call them, I am not entirely sure as I am a beginner.
What is the proper way to achieve this, or the otherwise best result? Define all classes in __init __? Define them all in a single file as I currently have like AllMyClasses.py? Or should I have a separate file for every class in the library directory like FirstClass.py, SecondClass.py etc.
I realize this is a question that should be easy enough to google, but since I am still quite new to python and programming in general I haven't quite figured out what the correct keywords are for a problem in this context(such as my uncertainty about "prefix")
More information can be found in the tutorial on modules (single files) or packages (when in a directory with an __init__.py file) on the python site.
The suggested way (according to the style guide) is to spell out each class import specifically.
from my_module import MyClass1, MyClass2
object1 = MyClass1()
object2 = MyClass2()
While you can also shorten the module name:
import my_module as mo
object = mo.MyClass1()
Using from my_module import * is recommended to be avoided as it can be confusing (even if it is the recommended way for some things, like tkinter)
If it's for your personal use, you can just put all your classes Class1, Class2, ... in a myFile.py and to use them call import myFile (without the .py extension)
import myFile
myVar1 = myFile.Class1()
myVar2 = myFile.Class2()
from within another script. If you want to be able to use the classes without the file name prefix, import the file like this:
from myFile import *
Note that the file you want to import should be in a directory where Python can find it (the same where the script is running or a directory in PYTHONPATH).
The _init_ is needed if you want to create a Python module for distribution. Here are the instructions: Distributing Python Modules
EDIT after checking the Python's style guide PEP 8 on imports:
Wildcard imports (from import) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools
So in this example you should have used
from myFile import Class1, Class2
I have a module some_module.py which contains the following code:
def testf():
print(os.listdir())
Now, in a file named test.py, I have this code:
import os
from some_module import testf
testf()
But executing test.py gives me NameError: name 'os' is not defined. I've already imported os in test.py, and testf is in the namespace of test.py. So why does this error occur?
import is not the same as including the content of the file as if you had typed it directly in place of the import statement. You might think it works this way if you're coming from a C background, where the #include preprocessor directive does this, but Python is different.
The import statement in Python reads the content of the file being imported and evaluates it in its own separate context - so, in your example, the code in some_module.py has no access to or knowledge of anything that exists in test.py or any other file. It starts with a "blank slate", so to speak. If some_module.py's code wants to access the os module, you have to import it at the top of some_module.py.
When a module is imported in Python, it becomes an object. That is, when you write
import some_module
one of the first things Python does is to create a new object of type module to represent the module being imported. As the interpreter goes through the code in some_module.py, it assigns any variables, functions, classes, etc. that are defined in that file to be attributes of this new module object. So in your example, the module object will have one attribute, testf. When the code in the function testf wants to access the variable os, it looks in the function itself (local scope) and sees that os is not defined there, so it then looks at the attributes of the module object which testf belongs to (this is the "global" scope, although it's not truly global). In your example, it will not see os there, so you get an error. If you add
import os
to some_module.py, then that will create an attribute of the module under the name os, and your code will find what it needs to.
You may also be interested in some other answers I've written that may help you understand Python's import statement:
Why import when you need to use the full name?
Does Python import statement also import dependencies automatically?
The name testf is in the namespace of test. The contents of the testf function are still in some_module, and don't have access to anything in test.
If you have code that needs a module, you need to import that module in the same file where that code is. Importing a module only imports it into the one file where you import it. (Multiple imports of the same module, in different files, won't incur a meaningful performance penalty; the actual loading of the module only happens once, and later imports of the same module just get a reference to the already-imported module.)
Importing a module adds its name as an attribute of the current scope. Since different modules have independent scopes, any code in some_module cannot use names in __main__ (the executed script) without having imported it first.
parent/__init__.py:
favorite_numbers = [1]
def my_favorite_numbers():
for num in favorite_numbers:
num
my_favorite_numbers()
from .child import *
my_favorite_numbers()
parent/child.py:
print favorite_numbers
favorite_numbers.append(7)
I then created a file one directory up from parent directory named tst.py:
import parent
So the directory structure looks like this:
parent (directory)
__init__.py (file)
child.py (file)
tst.py (file)
And I get this error upon execution:
NameError: name 'favorite_numbers' is not defined
How can I add a value to favorite_numbers within child.py so that when I execute the my_favorite_numbers() function, I get 1 and 7.
In Python, each module has its own separate globals. That's actually the whole point of modules (as opposed to, say, C preprocessor-style text inserts).
When you do from .child import *, that imports .child, then copies all of its globals into the current module's globals. They're still separate modules, with their own globals.
If you want to pass values between code in different modules, you probably want to wrap that code up in functions, then pass the values as function arguments and return values. For example:
parent/__init__.py:
from .child import *
favorite_numbers = [1]
def my_favorite_numbers():
for num in favorite_numbers:
num
my_favorite_numbers()
child_stuff(favorite_numbers)
my_favorite_numbers()
parent/child.py:
def child_stuff(favorite_numbers):
print favorite_numbers
favorite_numbers.append(7)
In fact, you almost always want to wrap up any code besides initialization (defining functions and classes, creating constants and other singletons, etc.) in a function anyway. When you import a module (including from … import), that only runs its top-level code the first time. If you import again, the module object already exists in memory (inside sys.modules), so Python will just use that, instead of running the code to build it again.
If you really want to push a value into another module's namespace, you can, but you have to do it explicitly. And this means you have to have the module object available by importing it, not just importing from it:
from . import child
child.favorite_numbers = favorite_numbers
But this is rarely a good idea.
Did you ever run setup.py or a way of "building" your library?
I would create a setup.py file and likely run it in develop mode. Python setup.py develop vs install