Validating Arbitrary Python Code - python

I have an application that will take in a string and later run it as arbitrary python code. I wish to validate this string before I attempt to run it and evaluate it for a few things:
Syntactically correct (this can be done via the compile(stringCode, foo.py, "execute") builtin)
All imports are available locally
Whether a class in the arbitrary code string inherits from a specific class
Whether the class from #3 also implements a specifically named method (so I can later call foo.bar() on the arbitrary code without too much hassle)
I've looked around at code objects, but they don't seem to be able to do anything unless I try to run the code directly, when I would rather validate that it works beforehand

You can use ast.parse to create a syntax tree of your string. Then you can iterate over the tree and validate whatever parse-time qualities you like.
As internet_user says, this will not tell you about the run-time qualities of your code; if modules are imported through a mechanism other than the usual import statement, those won't be validated. If your classes are dynamically changed to add or remove methods, you won't know that just from looking at the defs in their class definition.
Provided that you're not worried about any of that, here's a sample implementation:
import ast
import sys
import os
import imp
s = """
import math, gzip
from os import system
import numpy
import obviouslyFakeModuleName
class A(int):
def troz(self):
return 23
class B(str):
def zort(self):
return 42
"""
def can_be_imported(name):
try:
imp.find_module(name)
return True
except ImportError:
return False
def iter_nodes_by_type(code, type_or_types):
for node in ast.walk(code):
if isinstance(node, type_or_types):
yield node
def iter_imported_module_names(code):
for node in iter_nodes_by_type(code, ast.Import):
for alias in node.names:
yield alias.name
for node in iter_nodes_by_type(code, ast.ImportFrom):
yield node.module
def iter_globally_defined_classes(code):
for child in ast.iter_child_nodes(code):
if isinstance(child, ast.ClassDef):
yield child
def iter_methods(class_):
for node in ast.iter_child_nodes(class_):
if isinstance(node, ast.FunctionDef):
yield node
try:
code = ast.parse(s)
except SyntaxError:
print("That string is not valid Python.")
sys.exit(0)
#inspection of imports
for name in iter_imported_module_names(code):
if can_be_imported(name):
print("module {} is available for import.".format(name))
else:
print("module {} is not available for import.".format(name))
#inspection of classes
for class_ in iter_globally_defined_classes(code):
class_name = class_.name
base_class_names = [name.id for name in class_.bases]
function_names = [func.name for func in iter_methods(class_)]
print("Inspecting class {}...".format(class_name))
#we want to know if this class inherits directly from int
if "int" in base_class_names:
print(" Does inherit from int.")
else:
print(" Does not inherit from int.")
#and does it implement zort()?
if "zort" in function_names:
print(" Implements `zort`.")
else:
print(" Does not implement `zort`.")
Result:
module math is available for import.
module gzip is available for import.
module numpy is not available for import.
module obviouslyFakeModuleName is not available for import.
module os is available for import.
Inspecting class A...
Does inherit from int.
Does not implement `zort`.
Inspecting class B...
Does not inherit from int.
Implements `zort`.

Related

Best practices for importing rarely used package in Python

My Python package depends on an external library for a few of it's functions. This is a non-Python package and can be difficult to install, so I'd like users to still be able to use my package but have it fail when using any functions that depend on this non-Python package.
What is the standard practice for this? I could only import the non-Python package inside the methods that use it, but I really hate doing this
My current setup:
myInterface.py
myPackage/
--classA.py
--classB.py
The interfaces script myInterface.py imports classA and classB and classB imports the non-Python package. If the import fails I print a warning. If myMethod is called and the package isn't installed there will be some error downstream but I do not catch it anywhere, nor do I warn the user.
classB is imported every time the interface script is called so I can't have anything fail there, which is why I included the pass. Like I said above, I could import inside the method and have it fail there, but I really like keeping all of my imports in one place.
From classB.py
try:
import someWeirdPackage
except ImportError:
print("Cannot import someWeirdPackage")
pass
class ClassB():
...
def myMethod():
swp = someWeirdPackage()
...
If you are only importing one external library, I would go for something along these lines:
try:
import weirdModule
available = True
except ImportError:
available = False
def func_requiring_weirdmodule():
if not available:
raise ImportError('weirdModule not available')
...
The conditional and error checking is only needed if you want to give more descriptive errors. If not you can omit it and let python throw the corresponding error when trying to calling a non-imported module, as you do in your current setup.
If multiple functions do use weirdModule, you can wrap the checking into a function:
def require_weird_module():
if not available:
raise ImportError('weirdModule not available')
def f1():
require_weird_module()
...
def f2():
require_weird_module()
...
On the other hand, if you have multiple libraries to be imported by different functions, you can load them dynamically. Although it doesn't look pretty, python caches them and there is nothing wrong with it. I would use importlib
import importlib
def func_requiring_weirdmodule():
weirdModule = importlib.import_module('weirdModule')
Again, if multiple of your functions import complicated external modules you can wrap them into:
def import_external(name):
return importlib.import_module(name)
def f1():
weird1 = import_external('weirdModule1')
def f2():
weird2 = import_external('weirdModule2')
And last, you could create a handler to prevent importing the same module twice, something along the lines of:
class Importer(object):
__loaded__ = {}
#staticmethod
def import_external(name):
if name in Importer.__loaded__:
return Importer.__loaded__[name]
mod = importlib.import_module(name)
Importer.__loaded__[name] = mod
return mod
def f1():
weird = Importer.import_external('weird1')
def f2():
weird = Importer.import_external('weird1')
Although I'm pretty sure that importlib does caching behing the scenes and you don't really need for manual caching.
In short, although it does look ugly, there is nothing wrong with importing modules dynamically in python. In fact, a lot of libraries rely on this. On the other hand, if it is just for an special case of 3 methods accessing 1 external function, do use your approach or my first one in case you cant to add custom sception handling.
I'm not really sure that there's any best practice in this situation, but I would redefine the function if it's not supported:
def warn_import():
print("Cannot import someWeirdPackage")
try:
import someWeirdPackage
external_func = someWeirdPackage
except ImportError:
external_func = warn_import
class ClassB():
def myMethod(self):
swp = external_func()
b = ClassB()
b.myMethod()
You can create two separate classes for the two cases. The first will be used when the the package exist . The second will used when the package does not exist.
class ClassB1():
def myMethod(self):
print("someWeirdPackage exist")
# do something
class ClassB2(ClassB1):
def myMethod(self):
print("someWeirdPackage does not exist")
# do something or raise Exception
try:
import someWeirdPackage
class ClassB(ClassB1):
pass
except ImportError:
class ClassB(ClassB2):
pass
You can also use given below approach to overcome the problem that you're facing.
class UnAvailableName(object):
def __init__(self, name):
self.target = name
def __getattr_(self, attr):
raise ImportError("{} is not available.".format(attr))
try:
import someWeirdPackage
except ImportError:
print("Cannot import someWeirdPackage")
someWeirdPackage = someWeirdPackage("someWeirdPackage")
class ClassB():
def myMethod():
swp = someWeirdPackage.hello()
a = ClassB()
a.myMethod()

Plone/Zope/ZODB: how to override storage method on a File field

I have a File Field in my Plone product that I want to allow the user to "Turn Off" blob storage. The file will be stored elsewhere. I can't seem to do it.
Below is my attempt. I can't get Products.Archetypes.Field.ObjectField.setStorage() to recognize that this 'noStorage' instance "provides" IStorage.
Much inconsequential code has been removed for brevity, but the complete source can be found at https://github.com/flipmcf/rfa.kaltura
The Archetype schema includes "ATBlob.schema" to pull in the "File" field, then:
class KalturaVideo(ATBlob, KalturaBase.KalturaContentMixin):
nostorage = NoStorage()
directlyProvides(nostorage, INoStorage)
def __init__(self, oid, **kwargs):
if settings.storageMethod == u"No Local Storage":
import pdb; pdb.set_trace() #
#(Pdb) IStorage.providedBy(self.nostorage)
#True
self.getField('file').setStorage(self, self.nostorage)
My Storage class and interface is really boring:
from zope.interface import implements
from ZODB.interfaces import IStorage
from zope.component.zcml import interface
from zope.interface import Interface
class INoStorage(IStorage):
pass
class NoStorage(object):
"""Completely skip storage on Plone."""
implements(INoStorage)
def __init__(self):
pass
def close():
pass
def getName():
return "NoStorage - Blackhole"
#etc... lots of implemented methods that do nothing.
configure.zcml in the 'storage' package also:
<adapter
factory=".storage.NoStorage"
provides=".storage.INoStorage"
for="Products.Archetypes.interfaces.field.IObjectField"
/>
<adapter
factory=".storage.NoStorage"
provides=".storage.IStorage"
for="Products.Archetypes.interfaces.field.IObjectField"
/>
Now, within the setStorage() method of Products.Archetypes.Field.ObjectField:
def setStorage(self, instance, storage):
import pdb; pdb.set_trace()
#(Pdb) IStorage.providedBy(storage)
#False
#HeadDesk
if not IStorage.providedBy(storage):
raise ObjectFieldException, "Not a valid Storage method"
And when I debug, IStorage.providedBy(storage) returns False
Why would it return False in setStorage and True in the calling code? Am I not registering the interface correctly?
Note that in the module Products.Archetypes.Field, the IStorage instance there is actually sourced from this:
from Products.Archetypes.interfaces.storage import IStorage
Comparing the interface resolution order (via __iro__) we get
>>> from pprint import pprint as pp
>>> pp(ZODB.interfaces.IStorage.__iro__)
(<InterfaceClass ZODB.interfaces.IStorage>,
<InterfaceClass zope.interface.Interface>)
>>> pp(Products.Archetypes.interfaces.storage.IStorage.__iro__)
(<InterfaceClass Products.Archetypes.interfaces.storage.IStorage>,
<InterfaceClass zope.interface.Interface>)
As the INoStorage was subclassed from ZODB.interfaces.IStorage, and that interface class isn't the parent of Products.Archetypes.interfaces.storage.IStorage which the setStorage calls providedBy on, the NoStorage class as define will not satisfy that check. To solve this, just have INoStorage simply inherit from the Archetype version of the IStorage and implement all its methods and it should work as intend.
That said, you could simply your code somewhat further with regards to the way you provide the interfaces, see this example:
>>> class INoStorage(Products.Archetypes.interfaces.storage.IStorage):
... pass
...
>>> class NoStorage(object):
... zope.interface.implements(INoStorage)
...
>>> nostorage = NoStorage()
>>> Products.Archetypes.interfaces.storage.IStorage.providedBy(nostorage)
True
Inheritances of the Interface subclasses will persist correctly through without extra directlyProvides definitions, so you can just drop that extra call inside the KalturaVideo class definition. Naturally, you can just do from ... import IStorage and simply that to class INoStorage(IStorage). The example was done so to make things more explicitly visible.

Python imported module is None

I have a module that imports fine (i print it at the top of the module that uses it)
from authorize import cim
print cim
Which produces:
<module 'authorize.cim' from '.../dist-packages/authorize/cim.pyc'>
However later in a method call, it has mysteriously turned to None
class MyClass(object):
def download(self):
print cim
which when run show that cim is None. The module isn't ever directly assigned to None anywhere in this module.
Any ideas how this can happen?
As you comment it youself - it is likely some code is attributing None to the "cim" name on your module itself - the way for checking for this is if your large module would be made "read only" for other modules -- I think Python allows for this --
(20 min. hacking ) --
Here -- just put this snippet in a "protect_module.py" file, import it, and call
"ProtectdedModule()" at the end of your module in which the name "cim" is vanishing -
it should give you the culprit:
"""
Protects a Module against naive monkey patching -
may be usefull for debugging large projects where global
variables change without notice.
Just call the "ProtectedModule" class, with no parameters from the end of
the module definition you want to protect, and subsequent assignments to it
should fail.
"""
from types import ModuleType
from inspect import currentframe, getmodule
import sys
class ProtectedModule(ModuleType):
def __init__(self, module=None):
if module is None:
module = getmodule(currentframe(1))
ModuleType.__init__(self, module.__name__, module.__doc__)
self.__dict__.update(module.__dict__)
sys.modules[self.__name__] = self
def __setattr__(self, attr, value):
frame = currentframe(1)
raise ValueError("Attempt to monkey patch module %s from %s, line %d" %
(self.__name__, frame.f_code.co_filename, frame.f_lineno))
if __name__ == "__main__":
from xml.etree import ElementTree as ET
ET = ProtectedModule(ET)
print dir(ET)
ET.bla = 10
print ET.bla
In my case, this was related with threading quirks: https://docs.python.org/2/library/threading.html#importing-in-threaded-code

Check if classes in modules implement the right interface

I have the following interface :
class Interface(object):
__metaclass__ = abc.ABCMeta
#abc.abstractmethod
def run(self):
"""Run the process."""
return
I have a collections of modules that are all in the same directory. Each module contains a single class that implements my interface.
For example Launch.py :
class Launch(Interface):
def run(self):
pass
Let's say I have 20 modules, that implements 20 classes. I would like to be able to launch a module that would check if some of the classes do not implement the Interface.
I know I have to use :
issubclass(Launch, ProcessInterface) to know if a certain class implements my process interface.
introspection to get the class that is in my module.
import modules at runtime
I am just not sure how to do that.
I can manage to use issubclass inside a module.
But I cannot use issubclass if I am outside the module.
I need to :
get the list of all modules in the directory
get the class in each module
do issubclass on each class
I would need a draf of a function that could do that.
You're probably looking for something like this:
from os import listdir
from sys import path
modpath = "/path/to/modules"
for modname in listdir(modpath):
if modname.endswith(".py"):
# look only in the modpath directory when importing
oldpath, path[:] = path[:], [modpath]
try:
module = __import__(modname[:-3])
except ImportError:
print "Couldn't import", modname
continue
finally: # always restore the real path
path[:] = oldpath
for attr in dir(module):
cls = getattr(module, attr)
if isinstance(cls, type) and not issubclass(cls, ProcessInterface):
# do whatever

Python package/module lazily loading submodules

Interesting usecase today: I need to migrate a module in our codebase following code changes. The old mynamespace.Document will disappear and I want to ensure smooth migration by replacing this package by a code object that will dynamically import the correct path and migrate the corresponding objects.
In short:
# instanciate a dynamic package, but do not load
# statically submodules
mynamespace.Document = SomeObject()
assert 'submodule' not in mynamespace.Document.__dict__
# and later on, when importing it, the submodule
# is built if not already available in __dict__
from namespace.Document.submodule import klass
c = klass()
A few things to note:
I am not talking only of migrating code. A simple huge sed would in a sense be enough to change the code in order to migrate some imports, and I would not need a dynamic module. I am talking of objects. A website, holding some live/stored objects will need migration. Those objects will be loaded assuming that mynamespace.Document.submodule.klass exists, and that's the reason for the dynamic module. I need to provide the site with something to load.
We cannot, or do not want to change the way objects are unpickled/loaded. For simplicity, let's just say that we want to make sure that the idiom from mynamespace.Document.submodule import klass has to work. I cannot use instead from mynamespace import Document as container; klass = getattr(getattr(container, 'submodule'), 'klass')
What I tried:
import sys
from types import ModuleType
class VerboseModule(ModuleType):
def __init__(self, name, doc=None):
super(VerboseModule, self).__init__(name, doc)
sys.modules[name] = self
def __repr__(self):
return "<%s %s>" % (self.__class__.__name__, self.__name__)
def __getattribute__(self, name):
if name not in ('__name__', '__repr__', '__class__'):
print "fetching attribute %s for %s" % (name, self)
return super(VerboseModule, self).__getattribute__(name)
class DynamicModule(VerboseModule):
"""
This module generates a dummy class when asked for a component
"""
def __getattr__(self, name):
class Dummy(object):
pass
Dummy.__name__ = name
Dummy.__module__ = self
setattr(self, name, Dummy)
return Dummy
class DynamicPackage(VerboseModule):
"""
This package should generate dummy modules
"""
def __getattr__(self, name):
mod = DynamicModule("%s.%s" % (self.__name__, name))
setattr(self, name, mod)
return mod
DynamicModule("foobar")
# (the import prints:)
# fetching attribute __path__ for <DynamicModule foobar>
# fetching attribute DynamicModuleWorks for <DynamicModule foobar>
# fetching attribute DynamicModuleWorks for <DynamicModule foobar>
from foobar import DynamicModuleWorks
print DynamicModuleWorks
DynamicPackage('document')
# fetching attribute __path__ for <DynamicPackage document>
from document.submodule import ButDynamicPackageDoesNotWork
# Traceback (most recent call last):
# File "dynamicmodule.py", line 40, in <module>
# from document.submodule import ButDynamicPackageDoesNotWork
#ImportError: No module named submodule
As you can see the Dynamic Package does not work. I do not understand what is happening because document is not even asked for a ButDynamicPackageDoesNotWork attribute.
Can anyone clarify what is happening; and if/how I can fix this?
The problem is that python will bypass the entry in for document in sys.modules and load the file for submodule directly. Of course this doesn't exist.
demonstration:
>>> import multiprocessing
>>> multiprocessing.heap = None
>>> import multiprocessing.heap
>>> multiprocessing.heap
<module 'multiprocessing.heap' from '/usr/lib/python2.6/multiprocessing/heap.pyc'>
We would expect that heap is still None because python can just pull it out of sys.modules but That doesn't happen. The dotted notation essentially maps directly to {something on python path}/document/submodule.py and an attempt is made to load that directly.
Update
The trick is to override pythons importing system. The following code requires your DynamicModule class.
import sys
class DynamicImporter(object):
"""this class works as both a finder and a loader."""
def __init__(self, lazy_packages):
self.packages = lazy_packages
def load_module(self, fullname):
"""this makes the class a loader. It is given name of a module and expected
to return the module object"""
print "loading {0}".format(fullname)
components = fullname.split('.')
components = ['.'.join(components[:i+1])
for i in range(len(components))]
for component in components:
if component not in sys.modules:
DynamicModule(component)
print "{0} created".format(component)
return sys.modules[fullname]
def find_module(self, fullname, path=None):
"""This makes the class a finder. It is given the name of a module as well as
the package that contains it (if applicable). It is expected to return a
loader for that module if it knows of one or None in which case other methods
will be tried"""
if fullname.split('.')[0] in self.packages:
print "found {0}".format(fullname)
return self
else:
return None
# This is a list of finder objects which is empty by defaule
# It is tried before anything else when a request to import a module is encountered.
sys.meta_path=[DynamicImporter('foo')]
from foo.bar import ThisShouldWork

Categories