Load modules conditionally Python - python

I'm wrote a main python module that need load a file parser to work, initially I was a only one text parser module, but I need add more parsers for different cases.
parser_class1.py
parser_class2.py
parser_class3.py
Only one is required for every running instance, then I'm thinking load it by command line:
mmain.py -p parser_class1
With this purpose I wrote this code in order to select the parser to load when the main module will be called:
#!/usr/bin/env python
import argparse
aparser = argparse.ArgumentParser()
aparser.add_argument('-p',
action='store',
dest='module',
help='-p module to import')
results = aparser.parse_args()
if not results.module:
aparser.error('Error! no module')
try:
exec("import %s" %(results.module))
print '%s imported done!'%(results.module)
except ImportError, e:
print e
But, I was reading that this way is dangerous, maybe no stardard..
Then, is this approach ok? or I must find another way to do it?
Why?
Thanks, any comment are welcome.

You could actually just execute the import statement inside a conditional block:
if x:
import module1a as module1
else:
import module1b as module1
You can account for various whitelisted module imports in different ways using this, but effectively the idea is to pre-program the imports, and then essentially use a GOTO to make the proper imports... If you do want to just let the user import any arbitrary argument, then the __import__ function would be the way to go, rather than eval.
Update:
As #thedox mentioned in the comment, the as module1 section is the idiomatic way for loading similar APIs with different underlying code.
In the case where you intend to do completely different things with entirely different APIs, that's not the pattern to follow.
A more reasonable pattern in this case would be to include the code related to a particular import with that import statement:
if ...:
import module1
# do some stuff with module1 ...
else:
import module2
# do some stuff with module2 ...
As for security, if you allow the user to cause an import of some arbitrary code-set (e.g. their own module, perhaps?), it's not much different than using eval on user-input. It's essentially the same vulnerability: the user can get your program to execute their own code.
I don't think there's a truly safe manner to let the user import arbitrary modules, at all. The exception here is if they have no access to the file-system, and therefore cannot create new code to be imported, in which case you're basically back to the whitelist case, and may as well implement an explicit whitelist to prevent future-vulnerabilities if/when at some point in the future the user does gain file-system access.

here is how to use __import__()
allowed_modules = ['os', 're', 'your_module', 'parser_class1.py', 'parser_class2.py']
if not results.module:
aparser.error('Error! no module')
try:
if results.module in allowed_modules:
module = __import__(results.module)
print '%s imported as "module"'%(results.module)
else:
print 'hey what are you trying to do?'
except ImportError, e:
print e
module.your_function(your_data)
EVAL vs __IMPORT__()
using eval allows the user to run any code on your computer. Don't do that. __import__() only allows the user to load modules, apparently not allowing user to run arbitrary code. But it's only apparently safer.
The proposed function, without allowed_modules is still risky since it can allow to load an arbitrary model that may have some malicious code running on when loaded. Potentially the attacker can load a file somewhere (a shared folder, a ftp folder, a upload folder managed by your webserver ...) and call it using your argument.
WHITELISTS
Using allowed_modules mitigates the problem but do not solve it completely: to hardening even more you still have to check if the attacker wrote a "os.py", "re.py", "your_module.py", "parser_class1.py" into your script folder, since python first searches module there (docs).
Eventually you may compare parser_class*.py code against a list of hashes, like sha1sum does.
FINAL REMARKS: At the real end, if user has write access to your script folder you cannot ensure an absolutely safe code.

You should think of all of the possible modules you may import for that parsing function and then use a case statement or dictionary to load the correct one. For example:
import parser_class1, parser_class2, parser_class3
parser_map = {
'class1': parser_class1,
'class2': parser_class2,
'class3': parser_class3,
}
if not args.module:
#report error
parser = None
else:
parser = parser_map[args.module]
#perform work with parser
If loading any of the parser_classN modules in this example is expensive, you can define lambdas or functions that return that module (i.e. def get_class1(): import parser_class1; return parser_class1) and alter the line to be parser = parser_map[args.module]()
The exec option could be very dangerous because you're executing unvalidated user input. Imagine if your user did something like -
mmain.py -p "parser_class1; some_function_or_code_that_is_malicious()"

Related

Pass-through/export whole third party module (using __all__?)

I have a module that wraps another module to insert some shim logic in some functions. The wrapped module uses a settings module mod.settings which I want to expose, but I don't want the users to import it from there, in case I would like to shim something there as well in the future. I want them to import wrapmod.settings.
Importing the module and exporting it works, but is a bit verbose on the client side. It results in having to write settings.thing instead of just thing.
I want the users to be able to do from wrapmod.settings import * and get the same results as if they did from mod.settings import * but right now, only from wrapmod import settings is available. How to I work around this?
If I understand the situation correctly, you're writing a module wrapmod that is intended to transform parts of an existing package mod. The specific part you're transforming is the submodule mod.settings. You've imported the settings module and made your changes to it, but even though it is available as wrapmod.settings, you can't use that module name in an from ... import ... statement.
I think the best way to fix that is to insert the modified module into sys.modules under the new dotted name. This makes Python accept that name as valid even though wrapmod isn't really a package.
So wrapmod would look something like:
import sys
from mod import settings
# modify settings here
sys.modules['wrapmod.settings'] = settings # add this line!
I ended up making a code-generator for a thin wrapper module instead, since the sys.module hacking broke all IDE integration.
from ... import mod
# this is just a pass-through wrapper around mod.settings
__all__ = mod.__all__
# generate pass-through wrapper around mod.settings; doesn't break IDE integration, unlike manual sys.modules editing.
if __name__ == "__main__":
for thing in settings.__all__:
print(thing + " = mod." + thing)
which when run as a script, outputs code that can then be appended to the end of this file.

How should one write the import procedures in a module that uses imported modules in a limited way?

I have a module that features numerous functions. Some of these functions are dependent on other modules. The module is to be used in some environments that are not going to have these other modules. In these environments, I want the functionality of the module that is not dependent on these other unavailable modules to be usable. What is a reasonable way of coding this?
I am happy for an exception to be raised when a function's dependency module is not met, but I want the module to be usable for other functions that do not have dependency problems.
At present, I'm thinking of a module-level function something like the following:
tryImport(moduleName = None):
try:
module = __import__(moduleName)
return(module)
except:
raise(Exception("module {moduleName} import error".format(
moduleName = moduleName)))
sys.exit()
This would then be used within functions in a way such as the following:
def function1():
pyfiglet = tryImport(moduleName = "pyfiglet")
For your use case, it sounds like there's nothing wrong with putting the imports you need inside functions, rather than at the top of your module (don't worry, it costs virtually nothing to reimport):
def some_function():
import foo # Throws if there is no foo
return foo.bar ()
Full stop, please.
The first thing is that the problem you described indicates that you should redesign your module. The most obvious solution is to break it into a couple of modules in one package. Each of them would contain group of functions with common external dependencies. You can easily define how many groups you need if you only know which dependencies might be missing on the target machine. Then you can import them separately. Basically in a given environment you import only those you need and the problem doesn't exist anymore.
If you still believe it's not the case and insist on keeping all the functions in one module then you can always do something like:
if env_type == 'env1':
import pyfiglet
if env_type in ('env1', 'env2'):
import pynose
import gzio
How you deduct the env_type is up to you. Might come from some configuration file or the environment variable.
Then you have your functions in this module. No problem occurs if none of the module consumers calls function which makes use of the module which is unavailable in a given environment.
I don't see any point in throwing your custom exception. NameError exception will be thrown anyway upon trial to access not imported name. By the way your sys.exit() would never be executed anyway.
If you don't want to define environment types, you can still achieve the goal with the following code:
try: import pyfiglet
except ImportError: pass
try: import pynose
except ImportError: pass
try: import gzio
except ImportError: pass
Both code snippets are supposed to be used on module level and not inside functions.
TL;DR I would just break this module into several parts. But if you really have to keep it monolithic, just use the basic language features and don't over-engineer it by using __import__ and a dedicated function.

python module import syntax

I'm teaching myself Python (I have experience in other languages).
I found a way to import a "module". In PHP, this would just be named an include file. But I guess Python names it a module. I'm looking for a simple, best-practices approach. I can get fancy later. But right now, I'm trying to keep it simple while not developing bad habits. Here is what I did:
I created a blank file named __init__.py, which I stored in Documents (the folder on the Mac)
I created a file named myModuleFile.py, which I stored in Documents
In myModuleFile.py, I created a function:
def myFunction()
print("hello world")
I created another file: myMainFile.py, which I stored in Documents
In this file, I typed the following:
import myModuleFile.py
myModuleFile.myFunction()
This successfully printed out "hello world" to the console when I ran it on the terminal.
Is this a best-practices way to do this for my simple current workflow?
I'm not sure the dot notation means I'm onto something good or something bad. It throws an error if I try to use myFunction() instead of myModuleFile.myFunction(). I kind of think it would be good. If there were a second imported module, it would know to call myFunction() from myModuleFile rather than the other one. So the dot notation makes everybody know exactly which file you are trying to call the function from.
I think there is some advanced stuff using sys or some sort of exotic configuration stuff. But I'm hoping my simple little way of doing things is ok for now.
Thanks for any clarification on this.
For your import you don't need the ".py" extension
You can use:
import myModuleFile
myModuleFile.myFunction()
Or
from myModuleFile import myFunction
myFunction()
Last syntax is common if you import several functions or globals of your module.
Besides to use the "main" function, I'd put this on your module:
from myModuleFile import myFunction
if __name__ == '__main__':
myFunction()
Otherwise the main code could be executed in imports or other cases.
I'd use just one module for myModuleFile.py and myMainFile.py, using the previous pattern let you know if your module is called from command line or as import.
Lastly, I'd change the name of your files to avoid the CamelCase, that is, I'd replace myModuleFile.py by my_module.py. Python loves the lowercase ;-)
You only need to have init.py if you are creating a package (a package in a simple sense is a subdirectory which has one or more modules in it, but I think it may be more complex than you need right now).
If you have just one folder which has MyModule.py and MyMainFile.py - you don't need the init.py.
In MyMainFile.py you can write :
import myModuleFile
and then use
myModuleFile.MyFunction()
The reason for including the module name is that you may reuse the same function name in more than one module and you need a way of saying which module your program is using.
Module Aliases
If you want to you can do this :
import myModuleFile as MyM
and then use
MyM.MyFunction()
Here you have created MyM as an alias for myModuleFile, and created less typing.
Here Lies Dragons
You will sometimes see one other forms of IMport, which can be dangerous, especially for the beginner.
from myModuleFile import MyFunction
if you do this you can use :
MyFunction()
but this has a problem if you have used the same function name in MyMainFile, or in any other library you have used, as you now can't get to any other definition of the name MyFunction. This is often termed Contaminating the namespace - and should really be avoided unless you are absolutely certain it is safe.
there is a final form which I will show for completeness :
from myModuleFile import *
While you will now be able to access every function defined in myModuleFile without using myModuleFile in front of it, you have also now prevented your MyMainFile from using any function in any library which matches any name defined in myModuleFile.
Using this form is generally not considered to be a good idea.
I hope this helps.

How do I override a Python import?

I'm working on pypreprocessor which is a preprocessor that takes c-style directives and I've been able to make it work like a traditional preprocessor (it's self-consuming and executes postprocessed code on-the-fly) except that it breaks library imports.
The problem is: The preprocessor runs through the file, processes it, outputs to a temporary file, and exec() the temporary file. Libraries that are imported need to be handled a little different, because they aren't executed, but rather they are loaded and made accessible to the caller module.
What I need to be able to do is: Interrupt the import (since the preprocessor is being run in the middle of the import), load the postprocessed code as a tempModule, and replace the original import with the tempModule to trick the calling script with the import into believing that the tempModule is the original module.
I have searched everywhere and so far and have no solution.
This Stack Overflow question is the closest I've seen so far to providing an answer:
Override namespace in Python
Here's what I have.
# Remove the bytecode file created by the first import
os.remove(moduleName + '.pyc')
# Remove the first import
del sys.modules[moduleName]
# Import the postprocessed module
tmpModule = __import__(tmpModuleName)
# Set first module's reference to point to the preprocessed module
sys.modules[moduleName] = tmpModule
moduleName is the name of the original module, and tmpModuleName is the name of the postprocessed code file.
The strange part is this solution still runs completely normal as if the first module completed loaded normally; unless you remove the last line, then you get a module not found error.
Hopefully someone on Stack Overflow know a lot more about imports than I do, because this one has me stumped.
Note: I will only award a solution, or, if this is not possible in Python; the best, most detailed explanation of why this is not impossible.
Update: For anybody who is interested, here is the working code.
if imp.lock_held() is True:
del sys.modules[moduleName]
sys.modules[tmpModuleName] = __import__(tmpModuleName)
sys.modules[moduleName] = __import__(tmpModuleName)
The 'imp.lock_held' part detects whether the module is being loaded as a library. The following lines do the rest.
Does this answer your question? The second import does the trick.
Mod_1.py
def test_function():
print "Test Function -- Mod 1"
Mod_2.py
def test_function():
print "Test Function -- Mod 2"
Test.py
#!/usr/bin/python
import sys
import Mod_1
Mod_1.test_function()
del sys.modules['Mod_1']
sys.modules['Mod_1'] = __import__('Mod_2')
import Mod_1
Mod_1.test_function()
To define a different import behavior or to totally subvert the import process you will need to write import hooks. See PEP 302.
For example,
import sys
class MyImporter(object):
def find_module(self, module_name, package_path):
# Return a loader
return self
def load_module(self, module_name):
# Return a module
return self
sys.meta_path.append(MyImporter())
import now_you_can_import_any_name
print now_you_can_import_any_name
It outputs:
<__main__.MyImporter object at 0x009F85F0>
So basically it returns a new module (which can be any object), in this case itself. You may use it to alter the import behavior by returning processe_xxx on import of xxx.
IMO: Python doesn't need a preprocessor. Whatever you are accomplishing can be accomplished in Python itself due to it very dynamic nature, for example, taking the case of the debug example, what is wrong with having at top of file
debug = 1
and later
if debug:
print "wow"
?
In Python 2 there is the imputil module that seems to provide the functionality you are looking for, but has been removed in python 3. It's not very well documented but contains an example section that shows how you can replace the standard import functions.
For Python 3 there is the importlib module (introduced in Python 3.1) that contains functions and classes to modify the import functionality in all kinds of ways. It should be suitable to hook your preprocessor into the import system.

Is this correct way to import python scripts residing in arbitrary folders?

This snippet is from an earlier answer here on SO. It is about a year old (and the answer was not accepted). I am new to Python and I am finding the system path a real pain. I have a few functions written in scripts in different directories, and I would like to be able to import them into new projects without having to jump through hoops.
This is the snippet:
def import_path(fullpath):
""" Import a file with full path specification. Allows one to
import from anywhere, something __import__ does not do.
"""
path, filename = os.path.split(fullpath)
filename, ext = os.path.splitext(filename)
sys.path.append(path)
module = __import__(filename)
reload(module) # Might be out of date
del sys.path[-1]
return module
Its from here:
How to do relative imports in Python?
I would like some feedback as to whether I can use it or not - and if there are any undesirable side effects that may not be obvious to a newbie.
I intend to use it something like this:
import_path(/home/pydev/path1/script1.py)
script1.func1()
etc
Is it 'safe' to use the function in the way I intend to?
The "official" and fully safe approach is the imp module of the standard Python library.
Use imp.find_module to find the module on your precisely-specified list of acceptable directories -- it returns a 3-tuple (file, pathname, description) -- if unsuccessful, file is actually None (but it can also raise ImportError so you should use a try/except for that as well as checking if file is None:).
If the search is successful, call imp.load_module (in a try/finally to make sure you close the file!) with the above three arguments after the first one which must be the same name you passed to find_module -- it returns the module object (phew;-).
As mentioned, please consider thread safety, if appropriate. I prefer something closer to a solution posted in a similar post. The main differences below: the use of insert to specify priority of the import, correct restoration of sys.path using try...finally, and setting the global namespace.
# inspired by Alex Martelli's solution to
# http://stackoverflow.com/questions/1096216/override-namespace-in-python/1096247#1096247
def import_from_absolute_path(fullpath, global_name=None):
"""Dynamic script import using full path."""
import os
import sys
script_dir, filename = os.path.split(fullpath)
script, ext = os.path.splitext(filename)
sys.path.insert(0, script_dir)
try:
module = __import__(script)
if global_name is None:
global_name = script
globals()[global_name] = module
sys.modules[global_name] = module
finally:
del sys.path[0]
It does feel like a bit of a hack, but at the moment, I can't think of any unintended side effects that are likely to occur, at least not as long as you're just using this for your own scripts. Basically what it does is temporarily add the parent directory of the specified file (in your example, /home/pydev/path1/) to the list of paths that Python checks when it's looking for a module to import.
The only risk I can think of right now would arise in a multithreaded environment, where two or more threads (or processes) are running this function simultaneously. If thread A wants to import module A from path dirA/A.py, and thread B wants to import module B from path dirB/B.py, you'd wind up with both dirA and dirB in sys.path for a short time. And if there is a file named B.py in dirA, it's possible that thread B will find that (dirA/B.py) instead of the file it's looking for (dirB/B.py), thus importing the wrong module. For this reason, I wouldn't use it in production code, or code that you're going to distribute to other people (at least not without warning them that this hack is in here!). In a situation like that, you could write a more complex function that allows you to specify the file to import without messing with the standard set of paths. (That's what mod_python does, for example)
I would be worried that your script name might correspond with a module that shows up earlier in the path. To dispel this fear, I would fully replace the path with a new list containing just the directory containing the module, then put it back once the import has completed. Also, you should wrap this in some sort of lock so that multiple threads trying to do the same thing don't interfere with each other.

Categories