Can I handle imports in an Abstract Syntax Tree? - python

I want to parse and check config.py for admissible nodes.
config.py can import other config files, which also must be checked.
Is there any functionality in the ast module to parse ast.Import and ast.ImportFrom objects to ast.Module objects?
Here is a code example, I am checking a configuration file (path_to_config), but I want to also check any files that it imports:
with open(path_to_config) as config_file:
ast_tree = ast.parse(config_file.read())
for script_object in ast_tree.body:
if isinstance(script_object, ast.Import):
# Imported file must be checked too
elif isinstance(script_object, ast.ImportFrom):
# Imported file must be checked too
elif not _is_admissible_node(script_object):
raise Exception("Config file '%s' contains unacceptable statements" % path_to_config)

This is a little more complex than you think. from foo import name is a valid way of importing both an object defined in the foo module, and the foo.name module, so you may have to try both forms to see if they resolve to a file. Python also allows for aliases, where code can import foo.bar, but the actual module is really defined as foo._bar_implementation and made available as an attribute of the foo package. You can't detect all of these cases purely by looking at Import and ImportFrom nodes.
If you ignore those cases and only look at the from name, then you'll still have to turn the module name into a filename, then parse the source from the file, for each import.
In Python 2 you can use imp.find_module to get an open file object for the module (*). You want to keep the full module name around when parsing each module, because you'll need it to help you figure out package-relative imports later on. imp.find_module() can't handle package imports so I created a wrapper function:
import imp
_package_paths = {}
def find_module(module):
# imp.find_module can't handle package paths, so we need to do this ourselves
# returns an open file object, the filename, and a flag indicating if this
# is a package directory with __init__.py file.
path = None
if '.' in module:
# resolve the package path first
parts = module.split('.')
module = parts.pop()
for i, part in enumerate(parts, 1):
name = '.'.join(parts[:i])
if name in _package_paths:
path = [_package_paths[name]]
else:
_, filename, (_, _, type_) = imp.find_module(part, path)
if type_ is not imp.PKG_DIRECTORY:
# no Python source code for this package, abort search
return None, None
_package_paths[name] = filename
path = [filename]
source, filename, (_, _, type_) = imp.find_module(module, path)
is_package = False
if type_ is imp.PKG_DIRECTORY:
# load __init__ file in package
source, filename, (_, _, type_) = imp.find_module('__init__', [filename])
is_package = True
if type_ is not imp.PY_SOURCE:
return None, None, False
return source, filename, is_package
I'd also track what module names you already imported so you don't process them twice; use the name from the spec object to make sure you track their canonical names.
Use a stack to process all the modules:
with open(path_to_config) as config_file:
# stack consists of (modulename, ast) tuples
stack = [('', ast.parse(config_file.read()))]
seen = set()
while stack:
modulename, ast_tree = stack.pop()
for script_object in ast_tree.body:
if isinstance(script_object, (ast.Import, ast.ImportFrom)):
names = [a.name for a in script_object.names]
from_names = []
if hasattr(script_object, 'level'): # ImportFrom
from_names = names
name = script_object.module
if script_object.level:
package = modulename.rsplit('.', script_object.level - 1)[0]
if script_object.module:
name = "{}.{}".format(name, script_object.module)
else:
name = package
names = [name]
for name in names:
if name in seen:
continue
seen.add(name)
source, filename, is_package = find_module(name)
if source is None:
continue
if is_package and from_names:
# importing from a package, assume the imported names
# are modules
names += ('{}.{}'.format(name, fn) for fn in from_names)
continue
with source:
module_ast = ast.parse(source.read(), filename)
stack.append((name, module_ast))
elif not _is_admissible_node(script_object):
raise Exception("Config file '%s' contains unacceptable statements" % path_to_config)
In case of from foo import bar imports, if foo is a package then foo/__init__.py is skipped and it is assumed that bar will be a module.
(*) imp.find_module() is deprecated for Python 3 code. On Python 3 you would use importlib.util.find_spec() to get the module loader spec, and then use the ModuleSpec.origin attribute to get the filename. importlib.util.find_spec() knows how to handle packages.

Related

Python: always import the last revision in the directory

Imagine that we have the following Data Base structure with the data stored in python files ready to be imported:
data_base/
foo_data/
rev_1.py
rev_2.py
bar_data/
rev_1.py
rev_2.py
rev_3.py
In my main script, I would like to import the last revision of the data available in the folder. For example, instead of doing this:
from data_base.foo_data.rev_2 import foofoo
from data_base.bar_data.rev_3 import barbar
I want to call a method:
import_from_db(path='data_base.foo_data', attr='foofoo', rev='last')
import_from_db(path='data_base.bar_data', attr='barbar', rev='last')
I could take a relative path to the Data Base and use glob.glob to search the last revision, but for this, I should know the path to the data_base folder, which complicates things (imagine that the parent folder of the data_base is in sys.path so the from data_base.*** import will work)
Is there an efficient way to maybe retrieve a full path knowing only part of it (data_base.foo_data)? Other ideas?
I think it's better to install the last version.
but going on with your flow, you may use getattr on the module:
from data_base import foo_data
i = 0
while True:
try:
your_module = getattr(foo_data, f'rev_{i}')
except AttributeError:
break
i += 1
# Now your_module is the latest rev
#JohnDoriaN 's idea led me to a quite simple solution:
import os, glob
def import_from_db(import_path, attr, rev_id=None):
"""
"""
# Get all the modules/folders names
dir_list = import_path.split('.')
# Import the last module
exec(f"from {'.'.join(dir_list[:-1])} import {dir_list[-1]}")
db_parent = locals()[dir_list[-1]]
# Get an absolute path to corresponding to the db_parent folder
abs_path = db_parent.__path__._path[0]
rev_path = os.path.join(abs_path, 'rev_*.py')
rev_names = [os.path.basename(x) for x in glob.glob(rev_path)]
if rev_id is None:
revision = rev_names[-1]
else:
revision = rev_names[rev_id]
revision = revision.split('.')[0]
# import attribute
exec(f'from {import_path}.{revision} import {attr}', globals())
Some explanations:
Apparently (I didn't know this), we can import a folder as a module; this module has a __path__ attribute (found out using the built-in dir method).
glob.glob allows us to use regex expressions to search for a required pattern for files in the directory.
using exec without parameters will import only in the local namespace (namespace of the method) so without polluting the global namespace.
using exec with globals() allows us to import in the global namespace.

How to mock a zip file

I want to mock a ZipFile. In particular, I need a mock
Which passes a zipfile.is_zipfile() test,
Returns a list of strings for zipfile.ZipFile().namelist(), and
Uses only the standard library.
The code I am testing looks for potential Python modules1 within a given zip archive (i.e. .py, .zip, and .whl files):
# utils.py
import zipfile
from pathlib import Path
def find_modules(archive=None):
"""Find modules within a given zip archive.
Inputs:
archive (str/Path): Zip archive
Returns:
list (str): List of module names as strings
"""
possible_ext = ['.py'. '.zip', '.whl']
modules = []
if zipfile.is_zipfile(archive):
paths = [Path(p) for p in zipfile.ZipFile(archive).namelist()]
modules = [p.stem for p in paths if p.stem != '__init__' and p.suffix in possible_ext]
return modules
Voodoo solution
I have cobbled together the following test:
# test_utils.py
from mypackage import utils
from unittest import mock
class TestFunctions():
MOCK_LISTING = ['single_file_module.py', 'dummy.txt',
'package_namespace.zip', 'wheel_namespace-0.1-py3-none-any.whl']
#mock.patch('zipfile.ZipFile')
#mock.patch('zipfile.is_zipfile')
def test_find_modules_return_value(self, mock_is_zipfile, mock_zipfile):
mock_is_zipfile.return_value = True
mock_zipfile.return_value.namelist.return_value = self.MOCK_LISTING
modules = utils.find_modules('dummy_archive.zip')
assert len(modules) == 3
def main():
"""Main function used to run tests manually.
Use PyTest to run tests in bulk.
"""
tc = TestFunctions()
tc.test_find_modules_return_value()
if __name__ == '__main__':
import time
start_time = time.time()
main()
print("\nThe chosen tests have all passed.")
print("--- %s seconds ---" % (time.time() - start_time))
Questions
I found that a #mock.path('zipfile.ZipFile') alone wouldn't meet my needs; it failed a zipfile.is_zipfile() test.
If I'm mocking a ZipFile object, shouldn't it automatically pass a zipfile.is_zipfile() test?
I found that I couldn't use the same approach to overriding is_zipfile as I did namelist. That is, an additional #mock.patch('zipfile.is_zipfile') was needed. My understanding is that because a ZipFile defines a context, the first return_value overrides the __enter__ of the context, and then the next namespace is the ZipFile method level. Why doesn't the same approach work for both is_zipfile and namelist?
# Test doesn't work
# Fails on: assert 0 == 3
# + where 0 = len([])
#mock.patch('zipfile.ZipFile')
def test_find_modules_return_value(self, mock_zipfile):
mock_zipfile.return_value.is_zipfile.return_value = True
mock_zipfile.return_value.namelist.return_value = self.MOCK_LISTING
modules = utils.find_modules('dummy_archive.zip')
assert len(modules) == 3
Maybe I'm getting too far off-base and there's a simpler way to mock a .zip archive?
EDIT
Based on #Don Kirby's answer, the pattern I found most intuitive was:
def test_find_modules_return_value(self):
# Create mock zipfile and override the is_zipfile function
with mock.patch('mypackage.utils.zipfile') as mock_zipfile:
mock_zipfile.is_zipfile.return_value = True
mock_zipfile.namelist.return_value = self.MOCK_LISTING
# Since a ZipFile is a separate object, which returns a zipfile (note
# that that's lowercase), we need to mock the ZipFile and have it return
# the zipfile mock previously created.
with mock_patch('mypackage.utils.zipfile.ZipFile') as mock_ZipFile:
mock_ZipFile.return_value = mock_zipfile
modules = utils.find_modules("/dummy/path/to/check.zip")
assert len(modules) == 3
1 It's assumed that .zip files may contain modules and that .zip and .whl will be handled in a different process. The file names are all we care about in this step.
You have to patch is_zipfile() separately from ZipFile, because is_zipfile() is a function, not a method of the ZipFile class. I suppose you might be able to patch the whole zipfile module by patching mypackage.utils.zipfile, but that seems way more confusing.
The zipfile source code might be useful.

Getting attributes from a function in another file

I have a main file which is looking at files within a /modules/ folder, it needs to look at every .py file and find all functions that have a specific attribute.
An example module will be like this:
def Command1_1():
True
Command1_1.command = ['cmd1']
def Command1_2():
True
The code I am currently using to look through each file and function is this:
for module in glob.glob('modules/*.py'):
print(module)
tree = ast.parse(open(module, "rt").read(), filename=PyBot.msggrp + module)
for item in [x.name for x in ast.walk(tree) if isinstance(x, ast.FunctionDef)]:
if item is not None:
print(str(item))
Below is what the code produces but I cannot find a way to show if a function has a ".command" attribute:
modules/Placeholder001.py
Command1_1
Command1_2
modules/Placeholder002.py
Command2_1
Command2_2
Command2_3
The easiest way is to import each file and then look for functions in its global scope. Functions can be identified with the use of callable. Checking if a function has an attribute can be done with hasattr.
The code to import a module from a path is taken from this answer.
from pathlib import Path
import importlib.util
def import_from_path(path):
spec = importlib.util.spec_from_file_location(path.stem, str(path))
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
for module_path in Path('modules').glob('*.py'):
module = import_from_path(module_path)
for name, value in vars(module).items():
if callable(value):
has_attribute = hasattr(value, 'command')
print(name, has_attribute)
Output:
Command1_1 True
Command1_2 False

How to get file path of a add_static_view() in Pyramid

When I am adding a static view like this:
cfg = config.Configurator(...)
cfg.add_static_view(name='static', path='MyPgk:static')
# And I want to add a view for 'favicon.ico'.
cfg.add_route(name='favicon', pattern='/favicon.ico')
cfg.add_view(route_name='favicon', view='MyPgk.views.mymodule.favicon_view')
I am trying to add that favicon.ico annoying default path of /favicon.ico called by the browser if it's undefined in the webpage. I would like to use the example at http://docs.pylonsproject.org/projects/pyramid_cookbook/en/latest/files.html and modify it to have:
def favicon_view(request, cache=dict()):
if (not cache):
_path_to_MyPkg_static = __WHAT_GOES_HERE__
_icon = open(os.path.join(_path_to_MyPkg_static, 'favicon.ico')).read()
cache['response'] = Response(content_type='image/x-icon', body=_icon)
return cache['response']
Since, I can't really define the _here proposed in the example, how can I make it dependent to request to get the actual full path at runtime? Or do I really have to deal with:
_here = os.path.dirname(__file__)
_path_to_MyPkg_static = os.path.join(os.path.dirname(_here), 'static')
and having to be careful when I decide to refactor and put the view in another pkg or subpackage, or where-ever?
Something equivalent to request.static_path() but instead of getting the url path, to actually get a directory path:
request.static_file_path('static') -> /path/to/site-packages/MyPkg/static
Thanks,
You can use the pkg_resources module to make paths that are relative to Python modules (and thus, independent of the module that retrieves them). For example:
import pkg_resources
print pkg_resources.resource_filename('os.path', 'static/favicon.ico')
# 'C:\\Python27\\lib\\static\\favicon.ico'
Just substitute os.path with whatever module that is the parent of your static files.
EDIT: If you need to remember that 'static' route mapped to 'MyPkg:static', then the easiest way is to save it in some dictionary in the first place:
STATIC_ROUTES = {'static': 'MyPkg:static'}
for name, path in STATIC_ROUTES.iteritems():
cfg.add_static_view(name=name, path=path)
and then simply retrieve the path:
static_path = STATIC_ROUTES['static']
package, relative_path = static_path.split(':')
icon_path = pkg_resources.resource_filename(
package, os.path.join(relative_path, 'favicon.ico'))
If that's impossible, though (e.g. you don't have access to the cfg object), you can retrieve this path, it's just quite painful. Here's a sample function that uses undocumented calls (and so may change in future Pyramid versions) and ignores some additional settings (like route_prefix configuration variable):
def get_static_path(request, name):
from pyramid.config.views import StaticURLInfo
registrations = StaticURLInfo()._get_registrations(request.registry)
if not name.endswith('/'):
name = name + '/'
route_name = '__%s' % name
for _url, spec, reg_route_name in registrations:
print ':', reg_route_name
if reg_route_name == route_name:
return spec
In your case, it should work like this:
>>> get_static_path(request, 'static')
MyPkg:static/

Dynamic importing of modules followed by instantiation of objects with a certain baseclass from said modules

I'm writing an application. No fancy GUI:s or anything, just a plain old console application. This application, lets call it App, needs to be able to load plugins on startup. So, naturally, i created a class for the plugins to inherit from:
class PluginBase(object):
def on_load(self):
pass
def on_unload(self):
pass
def do_work(self, data):
pass
The idea being that on startup, App would walk through the current dir, including subdirs, searching for modules containing classes that themselves are subclasses of PluginBase.
More code:
class PluginLoader(object):
def __init__(self, path, cls):
""" path=path to search (unused atm), cls=baseclass """
self.path=path
def search(self):
for root, dirs, files in os.walk('.'):
candidates = [fname for fname in files if fname.endswith('.py') \
and not fname.startswith('__')]
## this only works if the modules happen to be in the current working dir
## that is not important now, i'll fix that later
if candidates:
basename = os.path.split(os.getcwd())[1]
for c in candidates:
modname = os.path.splitext(c)[0]
modname = '{0}.{1}'.format(basename, modname)
__import__(mod)
module = sys.modules[mod]
After that last line in search I'd like to somehow a) find all classes in the newly loaded module, b) check if one or more of those classes are subclasses of PluginBase and c) (if b) instantiate that/those classes and add to App's list of loaded modules.
I've tried various combinations of issubclass and others, followed by a period of intense dir:ing and about an hour of panicked googling. I did find a similar approach to mine here and I tried just copy-pasting that but got an error saying that Python doesn't support imports by filename, at which point I kind of lost my concentration and as a result of that, this post was written.
I'm at my wits end here, all help appreciated.
You might do something like this:
for c in candidates:
modname = os.path.splitext(c)[0]
try:
module=__import__(modname) #<-- You can get the module this way
except (ImportError,NotImplementedError):
continue
for cls in dir(module): #<-- Loop over all objects in the module's namespace
cls=getattr(module,cls)
if (inspect.isclass(cls) # Make sure it is a class
and inspect.getmodule(cls)==module # Make sure it was defined in module, not just imported
and issubclass(cls,base)): # Make sure it is a subclass of base
# print('found in {f}: {c}'.format(f=module.__name__,c=cls))
classList.append(cls)
To test the above, I had to modify your code a bit; below is the full script.
import sys
import inspect
import os
class PluginBase(object): pass
def search(base):
for root, dirs, files in os.walk('.'):
candidates = [fname for fname in files if fname.endswith('.py')
and not fname.startswith('__')]
classList=[]
if candidates:
for c in candidates:
modname = os.path.splitext(c)[0]
try:
module=__import__(modname)
except (ImportError,NotImplementedError):
continue
for cls in dir(module):
cls=getattr(module,cls)
if (inspect.isclass(cls)
and inspect.getmodule(cls)==module
and issubclass(cls,base)):
# print('found in {f}: {c}'.format(f=module.__name__,c=cls))
classList.append(cls)
print(classList)
search(PluginBase)
You would make this a lot easier if you forced some constraints on the plugin writer, for example that all plugins must be packages that contain a load_plugin( app, config) function that returns a Plugin instance. Then all you have to do is try to import these packages and run the function.
Here is a meta-classier way to register the plugins:
Define PluginBase to be of type PluginType.
PluginType automatically registers any instance (class) in the plugins set.
plugin.py:
plugins=set()
class PluginType(type):
def __init__(cls, name, bases, attrs):
super(PluginType, cls).__init__(name, bases, attrs)
# print(cls, name,cls.__module__)
plugins.add(cls)
class PluginBase(object):
__metaclass__=PluginType
pass
This is the part that the user writes. Notice that there is nothing special here.
pluginDir/myplugin.py:
import plugin
class Foo(plugin.PluginBase):
pass
Here is what the search function might look like:
test.py:
import plugin
import os
import imp
def search(plugindir):
for root, dirs, files in os.walk(plugindir):
for fname in files:
modname = os.path.splitext(fname)[0]
try:
module=imp.load_source(modname,os.path.join(root,fname))
except Exception: continue
search('pluginDir')
print(plugin.plugins)
Running test.py yields
set([<class 'myplugin.Foo'>])
Could you use execfile() instead of import with a specified namespace dict, then iterate over that namespace with issubclass, etc?

Categories