__repr__ and __str__ not working in dynamic class construction - python

I'm building an interactive file explorer inside the Python console, such that when I pass in a path, I get an object, then with a dot . the auto-complete starts suggesting the contents of the path, then I do that again to get to the contents of the subfolder, and so on untill I get to the file and it returns the path.
I have achieved my goal, except this little nagging thing: I wanted __repr__ method, but it never worked.
Here's my code:
import os
from glob import glob
path = r'C:\Users\eng_a\Downloads'
def browse(path):
my_dict = {'_path': path}
tmp = os.listdir(path)
key_contents = []
for akey in tmp:
key_contents.append(akey.replace(".", "_").replace(" ", "_").replace("-", "_"))
val_paths = glob(path + '//*')
for akey, avalue in zip(key_contents, val_paths):
if os.path.isfile(avalue):
my_dict[akey] = avalue
else:
my_dict[akey] = browse(avalue)
def func(self):
return self._path
my_dict["__repr__"] = func
my_dict["__str__"] = func
obj = type(os.path.basename(path), (), dict(zip(my_dict.keys(), my_dict.values())))
return obj
>>> b = browse(path)
>>> b
Unfortunately it keeps printing __main__.

As noted in the comments, obj is a class, not an instance. It contains a function __repr__ that will be bound to an instance as soon as you create it.
A simple and elegant solution to this would be to replace the function browse with a class of the same name. Calling a class creates an instance (unless you really mess with metaclasses or __new__), so the interface you have now would not have to change. Internally, however, you would instantiate your class for every directory that you delved into.
Another thing that this would allow you to do is to have a truly dynamic solution. Right now you actually recurse into all the children of your root. This can be very expensive in both time and memory. Ideally, you would only want to list the current directory, and recurse into children only when asked to.
from os import listdir
from os.path import isdir, join
import re
class browse:
def __init__(self, path, directory=True):
# Create an attribute in __dict__ for each child
self.__path__ = path
if directory:
for file in listdir(path):
full = join(path, file)
key = re.sub(r'^(?=\d)|\W', '_', file)
setattr(self, key, full if isdir(full) else browse(full, False))
def __getattribute__(self, name):
if name == '__path__':
return super().__getattribute__(name)
d = super().__getattribute__('__dict__')
if name in d:
child = d[name]
if isinstance(child, str):
child = browse(child)
setattr(self, name, child)
return child
return super().__getattribute__(name)
def __repr__(self):
return self.__path__
def __str__(self):
return self.__path__
This solution does adds an attribute for each entry in the root path. Files are recorded as browse objects, while directories are recorded as strings. Overriding __getattribute__ allows you to swap the strings you request for full browse objects on the fly, instead of having to expand all your folders up front.
A possible improvement, given the intended use case, would be to remove the line setattr(self, name, child). This way, you would not retain unnecessary references to directories that you accidentally browsed into, for example.

Related

How to load a method from a file into an existing class (a 'plugin' method)

Call me weird if you like, many have before, but I have a large class which I'd like to make extensible with methods loaded from a plugin directory. Essentially, I'm monkey patching the class. What I have almost works but the method loaded doesn't 'see' the globals defined in __main__. Ideally I'd like a way to tell globals() (or whatever mechanism is actually used to locate global variables) to use that existing in __main__. Here is the code I have (trimmed for the sake of brevity):
#!/usr/bin/env python3
import importlib
import os
import types
main_global = "Hi, I'm in main"
class MyClass:
def __init__(self, plugin_dir=None):
if plugin_dir:
self.load_plugins(plugin_dir, ext="plugin")
def load_plugins(self, plugin_dir, ext):
""" Load plugins
Plugins are files in 'plugin_dir' that have the given extension.
The functions defined within are imported as methods of this class.
"""
cls = self.__class__
# First check that we're not importing the same extension twice into
# the same class.
try:
plugins = getattr(cls, "_plugins")
except AttributeError:
plugins = set()
setattr(cls, "_plugins", plugins)
if ext in plugins:
return
plugins.add(ext)
for file in os.listdir(plugin_dir):
if not file.endswith(ext):
continue
filename = os.path.join(plugin_dir, file)
loader = importlib.machinery.SourceFileLoader("bar", filename)
module = types.ModuleType(loader.name)
loader.exec_module(module)
for name in dir(module):
if name.startswith("__"):
continue
obj = getattr(module, name)
if callable(obj):
obj = obj.__get__(self, cls)
setattr(cls, name, obj)
z = MyClass(plugin_dir="plugins")
z.foo("Hello")
And this is 'foo.plugin' from the plugins directory:
#!/usr/bin/env python3
foo_global = "I am global within foo"
def foo(self, value):
print(f"I am foo, called with {self} and {value}")
print(f"foo_global = {foo_global}")
print(f"main_global = {main_global}")
The output is...
I am foo, called with <__main__.MyClass object at 0x7fd4680bfac8> and Hello
foo_global = I am global within foo
Traceback (most recent call last):
File "./plugged", line 55, in <module>
z.foo("Hello")
File "plugins/foo.plugin", line 8, in foo
print(f"main_global = {main_global}")
NameError: name 'main_global' is not defined
I know it all feels a bit 'hacky', but it's become a challenge so please don't flame me on style etc. If there's another way to achieve this aim, I'm all ears.
Thoughts, learned friends?
You can do what you want with a variation of the technique shown in #Martijn Pieters' answer to the the question: How to inject variable into scope with a decorator? tweaked to inject multiple values into a class method.
from functools import wraps
import importlib
import os
from pathlib import Path
import types
main_global = "Hi, I'm in main"
class MyClass:
def __init__(self, plugin_dir=None):
if plugin_dir:
self.load_plugins(plugin_dir, ext="plugin")
def load_plugins(self, plugin_dir, ext):
""" Load plugins
Plugins are files in 'plugin_dir' that have the given extension.
The functions defined within are imported as methods of this class.
"""
cls = self.__class__
# First check that we're not importing the same extension twice into
# the same class.
try:
plugins = getattr(cls, "_plugins")
except AttributeError:
plugins = set()
setattr(cls, "_plugins", plugins)
if ext in plugins:
return
plugins.add(ext)
for file in Path(plugin_dir).glob(f'*.{ext}'):
loader = importlib.machinery.SourceFileLoader("bar", str(file))
module = types.ModuleType(loader.name)
loader.exec_module(module)
namespace = globals()
for name in dir(module):
if name.startswith("__"):
continue
obj = getattr(module, name)
if callable(obj):
obj = inject(obj.__get__(self, cls), namespace)
setattr(cls, name, obj)
def inject(method, namespace):
#wraps(method)
def wrapped(*args, **kwargs):
method_globals = method.__globals__
# Save copies of any of method's global values replaced by the namespace.
replaced = {key: method_globals[key] for key in namespace if key in method_globals}
method_globals.update(namespace)
try:
method(*args[1:], **kwargs)
finally:
method_globals.update(replaced) # Restore any replaced globals.
return wrapped
z = MyClass(plugin_dir="plugins")
z.foo("Hello")
Example output:
I am foo, called with <__main__.MyClass object at 0x0056F670> and Hello
foo_global = I am global within foo
main_global = Hi, I'm in main
You can approach the problem with a factory function and inheritance. Assuming each of your plugins is something like this, defined in a separate importable file:
class MyPlugin:
foo = 'bar'
def extra_method(self):
print(self.foo)
You can use a factory like this:
def MyClassFactory(plugin_dir):
def search_and_import_plugins(plugin_dir):
# Look for all possible plugins and import them
return plugin_list # a list of plugin classes, like [MyPlugin]
plugin_list = search_and_import_plugins(plugin_dir):
class MyClass(*plugin_list):
pass
return MyClass()
z = MyClassFactory('/home/me/plugins')

In Python, how do I get the list of classes defined within a particular file?

If a file myfile.py contains:
class A(object):
# Some implementation
class B (object):
# Some implementation
How can I define a method so that, given myfile.py, it returns
[A, B]?
Here, the returned values for A and B can be either the name of the classes or the type of the classes.
(i.e. type(A) = type(str) or type(A) = type(type))
You can get both:
import importlib, inspect
for name, cls in inspect.getmembers(importlib.import_module("myfile"), inspect.isclass):
you may additionally want to check:
if cls.__module__ == 'myfile'
In case it helps someone else. Here is the final solution that I used. This method returns all classes defined in a particular package.
I keep all of the subclasses of X in a particular folder (package) and then, using this method, I can load all the subclasses of X, even if they haven't been imported yet. (If they haven't been imported yet, they cannot be accessible via __all__; otherwise things would have been much easier).
import importlib, os, inspect
def get_modules_in_package(package_name: str):
files = os.listdir(package_name)
for file in files:
if file not in ['__init__.py', '__pycache__']:
if file[-3:] != '.py':
continue
file_name = file[:-3]
module_name = package_name + '.' + file_name
for name, cls in inspect.getmembers(importlib.import_module(module_name), inspect.isclass):
if cls.__module__ == module_name:
yield cls
It's a bit long-winded, but you first need to load the file as a module, then inspect its methods to see which are classes:
import inspect
import importlib.util
# Load the module from file
spec = importlib.util.spec_from_file_location("foo", "foo.py")
foo = importlib.util.module_from_spec(spec)
spec.loader.exec_module(foo)
# Return a list of all attributes of foo which are classes
[x for x in dir(foo) if inspect.isclass(getattr(foo, x))]
Just building on the answers above.
If you need a list of the classes defined within the module (file), i.e. not just those present in the module namespace, and you want the list within that module, i.e. using reflection, then the below will work under both __name__ == __main__ and __name__ == <module> cases.
import sys, inspect
# You can pass a lambda function as the predicate for getmembers()
[name, cls in inspect.getmembers(sys.modules[__name__], lambda x: inspect.isclass(x) and (x.__module__ == __name__))]
In my very specific use case of registering classes to a calling framework, I used as follows:
def register():
myLogger.info(f'Registering classes defined in module {__name__}')
for name, cls in inspect.getmembers(sys.modules[__name__], lambda x: inspect.isclass(x) and (x.__module__ == __name__)):
myLogger.debug(f'Registering class {cls} with name {name}')
<framework>.register_class(cls)

what does __getattr__ do when it return a class of itself?

I read an example on a website.
The code of the example is here:
class Chain(object):
def __init__(self, path=''):
self._path = path
def __getattr__(self, path):
return Chain('%s/%s' % (self._path, path))
def __str__(self):
return self._path
__repr__ = __str__
print(Chain().status.user.timeline.list)
The output of this code:
'/status/user/timeline/list'
I understand what "__getattr__" does when it returns a value, but it becomes more complicated when it returns a class itself.
What I think the step of this code will do is like this:
print(Chain().status.user.timeline.list) starts.
Chain().__init__ will initialize self._path by path which is ''
Chain().__getattr__ path parameter will become "status".
"__getattr__" will return Chain('%s/%s' % (self._path, path))
%s/%s becomes "/status"
Then the new Chain class which was created just now will soon initialize.
self._path will assigned by path which is '' again.
Chain().__getattr__ parameter path will become "user".
"__getattr__" will return Chain('%s/%s' % (self._path, path))
%s/%s becomes "/user"
loop...
My question:
Because of the initialization, every time the "__getattr__" create a
new Chain, the self._path will be assigned by '', so I think the
final output should be "/list". but the result is not like that.
I don't understand what is happening inside this process.
Thank you everyone who read this question and try to give me an answer.
Suppose you have a Chain whose path is '/status'. Now you call .user on it.
In __getattr__, self is current Chain (so self.path is '/status'), and the path parameter is the name of the attribute you're trying to access, which is 'user'. The __getattr__ method builds a new string from self.path and path, giving '/status/user'. It passes this string to Chain() to give a new Chain object whose path is '/status/user'.
Etc.

Share plugin resources with implemented permission rules

I have multiple scripts that are exporting a same interface and they're executed using execfile() in insulated scope.
The thing is, I want them to share some resources so that each new script doesn't have to load them again from the start, thus loosing starting speed and using unnecessary amount of RAM.
The scripts are in reality much better encapsulated and guarded from malicious plug-ins than presented in example below, that's where problems for me begins.
The thing is, I want the script that creates a resource to be able to fill it with data, remove data or remove a resource, and of course access it's data.
But other scripts shouldn't be able to change another's scripts resource, just read it. I want to be sure that newly installed plug-ins cannot interfere with already loaded and running ones via abuse of shared resources.
Example:
class SharedResources:
# Here should be a shared resource manager that I tried to write
# but got stuck. That's why I ask this long and convoluted question!
# Some beginning:
def __init__ (self, owner):
self.owner = owner
def __call__ (self):
# Here we should return some object that will do
# required stuff. Read more for details.
pass
class plugin (dict):
def __init__ (self, filename):
dict.__init__(self)
# Here some checks and filling with secure versions of __builtins__ etc.
# ...
self["__name__"] = "__main__"
self["__file__"] = filename
# Add a shared resources manager to this plugin
self["SharedResources"] = SharedResources(filename)
# And then:
execfile(filename, self, self)
# Expose the plug-in interface to outside world:
def __getattr__ (self, a):
return self[a]
def __setattr__ (self, a, v):
self[a] = v
def __delattr__ (self, a):
del self[a]
# Note: I didn't use self.__dict__ because this makes encapsulation easier.
# In future I won't use object itself at all but separate dict to do it. For now let it be
----------------------------------------
# An example of two scripts that would use shared resource and be run with plugins["name"] = plugin("<filename>"):
# Presented code is same in both scripts, what comes after will be different.
def loadSomeResource ():
# Do it here...
return loadedresource
# Then Load this resource if it's not already loaded in shared resources, if it isn't then add loaded resource to shared resources:
shr = SharedResources() # This would be an instance allowing access to shared resources
if not shr.has_key("Default Resources"):
shr.create("Default Resources")
if not shr["Default Resources"].has_key("SomeResource"):
shr["Default Resources"].add("SomeResource", loadSomeResource())
resource = shr["Default Resources"]["SomeResource"]
# And then we use normally resource variable that can be any object.
# Here I Used category "Default Resources" to add and/or retrieve a resource named "SomeResource".
# I want more categories so that plugins that deal with audio aren't mixed with plug-ins that deal with video for instance. But this is not strictly needed.
# Here comes code specific for each plug-in that will use shared resource named "SomeResource" from category "Default Resources".
...
# And end of plugin script!
----------------------------------------
# And then, in main program we load plug-ins:
import os
plugins = {} # Here we store all loaded plugins
for x in os.listdir("plugins"):
plugins[x] = plugin(x)
Let say that our two scripts are stored in plugins directory and are both using some WAVE files loaded into memory.
Plugin that loads first will load the WAVE and put it into RAM.
The other plugin will be able to access already loaded WAVE but not to replace or delete it, thus messing with other plugin.
Now, I want each resource to have an owner, some id or filename of the plugin script, and that this resource is writable only by it's owner.
No tweaking or workarounds should enable the other plugin to access the first one.
I almost did it and then got stuck, and my head is spining with concepts that when implemented do the thing, but only partially.
This eats me, so I cannot concentrate any more. Any suggestion is more than welcome!
Adding:
This is what I use now without any safety included:
# Dict that will hold a category of resources (should implement some security):
class ResourceCategory (dict):
def __getattr__ (self, i): return self[i]
def __setattr__ (self, i, v): self[i] = v
def __delattr__ (self, i): del self[i]
SharedResources = {} # Resource pool
class ResourceManager:
def __init__ (self, owner):
self.owner = owner
def add (self, category, name, value):
if not SharedResources.has_key(category):
SharedResources[category] = ResourceCategory()
SharedResources[category][name] = value
def get (self, category, name):
return SharedResources[category][name]
def rem (self, category, name=None):
if name==None: del SharedResources[category]
else: del SharedResources[category][name]
def __call__ (self, category):
if not SharedResources.has_key(category):
SharedResources[category] = ResourceCategory()
return SharedResources[category]
__getattr__ = __getitem__ = __call__
# When securing, this must not be left as this, it is unsecure, can provide a way back to SharedResources pool:
has_category = has_key = SharedResources.has_key
Now a plugin capsule:
class plugin(dict):
def __init__ (self, path, owner):
dict.__init__()
self["__name__"] = "__main__"
# etc. etc.
# And when adding resource manager to the plugin, register it with this plugin as an owner
self["SharedResources"] = ResourceManager(owner)
# ...
execfile(path, self, self)
# ...
Example of a plugin script:
#-----------------------------------
# Get a category we want. (Using __call__() ) Note: If a category doesn't exist, it is created automatically.
AudioResource = SharedResources("Audio")
# Use an MP3 resource (let say a bytestring):
if not AudioResource.has_key("Beep"):
f = open("./sounds/beep.mp3", "rb")
Audio.Beep = f.read()
f.close()
# Take a reference out for fast access and nicer look:
beep = Audio.Beep # BTW, immutables doesn't propagate as references by themselves, doesn't they? A copy will be returned, so the RAM space usage will increase instead. Immutables shall be wrapped in a composed data type.
This works perfectly but, as I said, messing resources is too much easy here.
I would like an instance of ResourceManager() to be in charge to whom return what version of stored data.
So, my general approach would be this.
Have a central shared resource pool. Access through this pool would be read-only for everybody. Wrap all data in the shared pool so that no one "playing by the rules" can edit anything in it.
Each agent (plugin) maintains knowledge of what it "owns" at the time it loads it. It keeps a read/write reference for itself, and registers a reference to the resource to the centralized read-only pool.
When an plugin is loaded, it gets a reference to the central, read-only pool that it can register new resources with.
So, only addressing the issue of python native data structures (and not instances of custom classes), a fairly locked down system of read-only implementations is as follows. Note that the tricks that are used to lock them down are the same tricks that someone could use to get around the locks, so the sandboxing is very weak if someone with a little python knowledge is actively trying to break it.
import collections as _col
import sys
if sys.version_info >= (3, 0):
immutable_scalar_types = (bytes, complex, float, int, str)
else:
immutable_scalar_types = (basestring, complex, float, int, long)
# calling this will circumvent any control an object has on its own attribute lookup
getattribute = object.__getattribute__
# types that will be safe to return without wrapping them in a proxy
immutable_safe = immutable_scalar_types
def add_immutable_safe(cls):
# decorator for adding a new class to the immutable_safe collection
# Note: only ImmutableProxyContainer uses it in this initial
# implementation
global immutable_safe
immutable_safe += (cls,)
return cls
def get_proxied(proxy):
# circumvent normal object attribute lookup
return getattribute(proxy, "_proxied")
def set_proxied(proxy, proxied):
# circumvent normal object attribute setting
object.__setattr__(proxy, "_proxied", proxied)
def immutable_proxy_for(value):
# Proxy for known container types, reject all others
if isinstance(value, _col.Sequence):
return ImmutableProxySequence(value)
elif isinstance(value, _col.Mapping):
return ImmutableProxyMapping(value)
elif isinstance(value, _col.Set):
return ImmutableProxySet(value)
else:
raise NotImplementedError(
"Return type {} from an ImmutableProxyContainer not supported".format(
type(value)))
#add_immutable_safe
class ImmutableProxyContainer(object):
# the only names that are allowed to be looked up on an instance through
# normal attribute lookup
_allowed_getattr_fields = ()
def __init__(self, proxied):
set_proxied(self, proxied)
def __setattr__(self, name, value):
# never allow attribute setting through normal mechanism
raise AttributeError(
"Cannot set attributes on an ImmutableProxyContainer")
def __getattribute__(self, name):
# enforce attribute lookup policy
allowed_fields = getattribute(self, "_allowed_getattr_fields")
if name in allowed_fields:
return getattribute(self, name)
raise AttributeError(
"Cannot get attribute {} on an ImmutableProxyContainer".format(name))
def __repr__(self):
proxied = get_proxied(self)
return "{}({})".format(type(self).__name__, repr(proxied))
def __len__(self):
# works for all currently supported subclasses
return len(get_proxied(self))
def __hash__(self):
# will error out if proxied object is unhashable
proxied = getattribute(self, "_proxied")
return hash(proxied)
def __eq__(self, other):
proxied = get_proxied(self)
if isinstance(other, ImmutableProxyContainer):
other = get_proxied(other)
return proxied == other
class ImmutableProxySequence(ImmutableProxyContainer, _col.Sequence):
_allowed_getattr_fields = ("count", "index")
def __getitem__(self, index):
proxied = get_proxied(self)
value = proxied[index]
if isinstance(value, immutable_safe):
return value
return immutable_proxy_for(value)
class ImmutableProxyMapping(ImmutableProxyContainer, _col.Mapping):
_allowed_getattr_fields = ("get", "keys", "values", "items")
def __getitem__(self, key):
proxied = get_proxied(self)
value = proxied[key]
if isinstance(value, immutable_safe):
return value
return immutable_proxy_for(value)
def __iter__(self):
proxied = get_proxied(self)
for key in proxied:
if not isinstance(key, immutable_scalar_types):
# If mutable keys are used, returning them could be dangerous.
# If owner never puts a mutable key in, then integrity should
# be okay. tuples and frozensets should be okay as keys, but
# are not supported in this implementation for simplicity.
raise NotImplementedError(
"keys of type {} not supported in "
"ImmutableProxyMapping".format(type(key)))
yield key
class ImmutableProxySet(ImmutableProxyContainer, _col.Set):
_allowed_getattr_fields = ("isdisjoint", "_from_iterable")
def __contains__(self, value):
return value in get_proxied(self)
def __iter__(self):
proxied = get_proxied(self)
for value in proxied:
if isinstance(value, immutable_safe):
yield value
yield immutable_proxy_for(value)
#classmethod
def _from_iterable(cls, it):
return set(it)
NOTE: this is only tested on Python 3.4, but I tried to write it to be compatible with both Python 2 and 3.
Make the root of the shared resources a dictionary. Give a ImmutableProxyMapping of that dictionary to the plugins.
private_shared_root = {}
public_shared_root = ImmutableProxyMapping(private_shared_root)
Create an API where the plugins can register new resources to the public_shared_root, probably on a first-come-first-served basis (if it's already there, you can't register it). Pre-populate private_shared_root with any containers you know you're going to need, or any data you want to share with all plugins but you know you want to be read-only.
It might be convenient if the convention for the keys in the shared root mapping were all strings, like file-system paths (/home/dalen/local/python) or dotted paths like python library objects (os.path.expanduser). That way collision detection is immediate and trivial/obvious if plugins try to add the same resource to the pool.

python: Organizing object model of an application

I have the following problem. My application randomly takes different files, e.g. rar, zip, 7z. And I have different processors to extract and save them locally:
Now everything looks this way:
if extension == 'zip':
archive = zipfile.ZipFile(file_contents)
file_name = archive.namelist()[0]
file_contents = ContentFile(archive.read(file_name))
elif extension == '7z':
archive = py7zlib.Archive7z(file_contents)
file_name = archive.getnames()[0]
file_contents = ContentFile(
archive.getmember(file_name).read())
elif extension == '...':
And I want to switch to more object oriented approach, with one main Processor class and subclasses responsible for specific archives.
E.g. I was thinking about:
class Processor(object):
def __init__(self, filename, contents):
self.filename = filename
self.contents = contents
def get_extension(self):
return self.filename.split(".")[-1]
def process(self):
raise NotImplemented("Need to implement something here")
class ZipProcessor(Processor):
def process(self):
archive = zipfile.ZipFile(file_contents)
file_name = archive.namelist()[0]
file_contents = ContentFile(archive.read(file_name))
etc
But I am not sure, that's a correct way. E.g. I can't invent a way to call needed processor based on the file extension, if following this way
A rule of thumb is that if you have a class with two methods, one of which is __init__(), then it's not a class but a function is disguise.
Writing classes is overkill in this case, because you still have to use the correct class manually.
Since the handling of all kinds of archives will be subtly different, wrap each in a function;
def handle_zip(name):
print name, 'is a zip file'
return 'zip'
def handle_7z(name):
print name, 'is a 7z file'
return '7z'
Et cetera. Since functions are first-class objects in Python, you can use a dictionary using the extension as a key for calling the right function;
import os.path
filename = 'foo.zip'
dispatch = {'.zip': handle_zip, '.7z': handle_7z}
_, extension = os.path.splitext(filename)
try:
rv = dispatch[extension](filename)
except KeyError:
print 'Unknown extension', extension
rv = None
It is important to handle the KeyError here, since dispatch doesn't contain all possible extensions.
An idea that might make sense before (or instead) of writing a custom class to perform your operations generally, is making sure you offer a consistent interface to archives - wrapping zipfile.ZipFile and py7zlib.Archive7z into classes with, for example, a getfilenames method.
This method ensures that you don't repeat yourself, without needing to "hide" your operations in a class, if you don't want to
You may want to use a abc as a base class, to make things extra clear.
Then, you can simply:
archive_extractors= {'zip':MyZipExtractor, '7z':My7zExtractor}
extractor= archive_extractors[extension]
file_name = extractor.getfilenames()[0]
#...
If you want to stick to OOP, you could give Processor a static method to decide if a class can handle a certain file, and implement it in every subclass. Then, if you need to unpack a file, use the base class'es __subclasses__() method to iterate over the subclasses and create an instance of the appropriate one:
class Processor(object):
#staticmethod
def is_appropriate_for(name):
raise NotImplemented()
def process(self, name):
raise NotImplemented()
class ZipProcessor(Processor):
#staticmethod
def is_appropriate_for(name):
return name[-4:] == ".zip"
def process(self, name):
print ".. handling ", name
name = "test.zip"
handler = None
for cls in Processor.__subclasses__():
if cls.is_appropriate_for(name):
handler = cls()
print name, "handled by", handler

Categories