I have the following problem. My application randomly takes different files, e.g. rar, zip, 7z. And I have different processors to extract and save them locally:
Now everything looks this way:
if extension == 'zip':
archive = zipfile.ZipFile(file_contents)
file_name = archive.namelist()[0]
file_contents = ContentFile(archive.read(file_name))
elif extension == '7z':
archive = py7zlib.Archive7z(file_contents)
file_name = archive.getnames()[0]
file_contents = ContentFile(
archive.getmember(file_name).read())
elif extension == '...':
And I want to switch to more object oriented approach, with one main Processor class and subclasses responsible for specific archives.
E.g. I was thinking about:
class Processor(object):
def __init__(self, filename, contents):
self.filename = filename
self.contents = contents
def get_extension(self):
return self.filename.split(".")[-1]
def process(self):
raise NotImplemented("Need to implement something here")
class ZipProcessor(Processor):
def process(self):
archive = zipfile.ZipFile(file_contents)
file_name = archive.namelist()[0]
file_contents = ContentFile(archive.read(file_name))
etc
But I am not sure, that's a correct way. E.g. I can't invent a way to call needed processor based on the file extension, if following this way
A rule of thumb is that if you have a class with two methods, one of which is __init__(), then it's not a class but a function is disguise.
Writing classes is overkill in this case, because you still have to use the correct class manually.
Since the handling of all kinds of archives will be subtly different, wrap each in a function;
def handle_zip(name):
print name, 'is a zip file'
return 'zip'
def handle_7z(name):
print name, 'is a 7z file'
return '7z'
Et cetera. Since functions are first-class objects in Python, you can use a dictionary using the extension as a key for calling the right function;
import os.path
filename = 'foo.zip'
dispatch = {'.zip': handle_zip, '.7z': handle_7z}
_, extension = os.path.splitext(filename)
try:
rv = dispatch[extension](filename)
except KeyError:
print 'Unknown extension', extension
rv = None
It is important to handle the KeyError here, since dispatch doesn't contain all possible extensions.
An idea that might make sense before (or instead) of writing a custom class to perform your operations generally, is making sure you offer a consistent interface to archives - wrapping zipfile.ZipFile and py7zlib.Archive7z into classes with, for example, a getfilenames method.
This method ensures that you don't repeat yourself, without needing to "hide" your operations in a class, if you don't want to
You may want to use a abc as a base class, to make things extra clear.
Then, you can simply:
archive_extractors= {'zip':MyZipExtractor, '7z':My7zExtractor}
extractor= archive_extractors[extension]
file_name = extractor.getfilenames()[0]
#...
If you want to stick to OOP, you could give Processor a static method to decide if a class can handle a certain file, and implement it in every subclass. Then, if you need to unpack a file, use the base class'es __subclasses__() method to iterate over the subclasses and create an instance of the appropriate one:
class Processor(object):
#staticmethod
def is_appropriate_for(name):
raise NotImplemented()
def process(self, name):
raise NotImplemented()
class ZipProcessor(Processor):
#staticmethod
def is_appropriate_for(name):
return name[-4:] == ".zip"
def process(self, name):
print ".. handling ", name
name = "test.zip"
handler = None
for cls in Processor.__subclasses__():
if cls.is_appropriate_for(name):
handler = cls()
print name, "handled by", handler
Related
I'm building an interactive file explorer inside the Python console, such that when I pass in a path, I get an object, then with a dot . the auto-complete starts suggesting the contents of the path, then I do that again to get to the contents of the subfolder, and so on untill I get to the file and it returns the path.
I have achieved my goal, except this little nagging thing: I wanted __repr__ method, but it never worked.
Here's my code:
import os
from glob import glob
path = r'C:\Users\eng_a\Downloads'
def browse(path):
my_dict = {'_path': path}
tmp = os.listdir(path)
key_contents = []
for akey in tmp:
key_contents.append(akey.replace(".", "_").replace(" ", "_").replace("-", "_"))
val_paths = glob(path + '//*')
for akey, avalue in zip(key_contents, val_paths):
if os.path.isfile(avalue):
my_dict[akey] = avalue
else:
my_dict[akey] = browse(avalue)
def func(self):
return self._path
my_dict["__repr__"] = func
my_dict["__str__"] = func
obj = type(os.path.basename(path), (), dict(zip(my_dict.keys(), my_dict.values())))
return obj
>>> b = browse(path)
>>> b
Unfortunately it keeps printing __main__.
As noted in the comments, obj is a class, not an instance. It contains a function __repr__ that will be bound to an instance as soon as you create it.
A simple and elegant solution to this would be to replace the function browse with a class of the same name. Calling a class creates an instance (unless you really mess with metaclasses or __new__), so the interface you have now would not have to change. Internally, however, you would instantiate your class for every directory that you delved into.
Another thing that this would allow you to do is to have a truly dynamic solution. Right now you actually recurse into all the children of your root. This can be very expensive in both time and memory. Ideally, you would only want to list the current directory, and recurse into children only when asked to.
from os import listdir
from os.path import isdir, join
import re
class browse:
def __init__(self, path, directory=True):
# Create an attribute in __dict__ for each child
self.__path__ = path
if directory:
for file in listdir(path):
full = join(path, file)
key = re.sub(r'^(?=\d)|\W', '_', file)
setattr(self, key, full if isdir(full) else browse(full, False))
def __getattribute__(self, name):
if name == '__path__':
return super().__getattribute__(name)
d = super().__getattribute__('__dict__')
if name in d:
child = d[name]
if isinstance(child, str):
child = browse(child)
setattr(self, name, child)
return child
return super().__getattribute__(name)
def __repr__(self):
return self.__path__
def __str__(self):
return self.__path__
This solution does adds an attribute for each entry in the root path. Files are recorded as browse objects, while directories are recorded as strings. Overriding __getattribute__ allows you to swap the strings you request for full browse objects on the fly, instead of having to expand all your folders up front.
A possible improvement, given the intended use case, would be to remove the line setattr(self, name, child). This way, you would not retain unnecessary references to directories that you accidentally browsed into, for example.
I am writing a module that converts between several different file formats (e.g. vhdl to verilog, excel table to vhdl etc). Its not so hard but there is a lot of language specific formatting to do. It just occurred to me that an elegant way to do this was to have a class type for each file format type by having a class built on file.io. The class would inherit methods of file but also the ability to read or write specific syntax to that file. I could not find any examples of a file io superclass and how to write it. My idea was that to instantiate it (open the file) i could use:
my_lib_file = Libfile(filename, 'w')
and to write a simple parameter to the libfile I could use something like
my_lib_file.simple_parameter(param, value)
Such a class would tie together the many file specific functions I currently have in a neat way. Actually I would prefer to be able to instantiate the class as part of a with statement e.g.:
with Libfile(filename, 'w') as my_lib_file:
for param, value in my_stuff.items():
my_lib_file.simple_parameter(param, value)
This is the wrong way to think about it.
You inherit in order to be reused. The base class provides an interface which others can use. For file-like objects it's mainly read and write. But, you only want to call another function simple_parameter. Calling write directly could mess up the format.
Really you don't want it to be a file-like object. You want to write to a file when the user calls simple_parameter. The implementation should delegate to a member file-like object instead, e.g.:
class LibFile:
def __init__(self, file):
self.file = file
def simple_parameter(self, param, value):
self.file.write('{}: {}\n'.format(param, value))
This is easy to test as you could pass in anything that supports write:
>>> import sys
>>> lib = LibFile(sys.stdout)
>>> lib.simple_parameter('name', 'Stephen')
name: Stephen
edit:
If you really want the class to manage the lifetime of the file you can provide a close function and use the closing context manager:
class Formatter:
def __init__(self, filename, mode):
self.file = open(filename, mode)
def close(self):
self.file.close()
Usage:
class LibFormatter(Formatter):
def simple_parameter(self, param, value):
self.file.write('{}: {}\n'.format(param, value))
from contextlib import closing
with closing(LibFormatter('library.txt', 'w')) as lib:
... # etc
2nd edit:
If you don't want to use closing, you can write your own context manager:
class ManagedFile:
def __init__(self, filename, mode):
self.file = open(filename, mode)
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
def close(self):
self.file.close()
Usage:
class LibFormatter(ManagedFile):
def simple_parameter(self, param, value):
self.file.write('{}: {}\n'.format(param, value))
with LibFormatter('library.txt', 'w') as lib:
... # etc
My two line solution is as follows:
with open(lib_loc + '\\' + lib_name + '.lib', 'w') as lib_file_handle:
lib_file = Liberty(lib_file_handle)
# do stuff using lib_file
the class initialization is as follows:
def __init__(self, file):
''' associate this instance with the given file handle '''
self.f = file
now instead of passing the raw file handle I pass the class along with the functions to my functions.
The simplest function is:
def wr(self, line):
''' write line given to file'''
self.f.write(line + '\n')
Which means I am replicating the write function built into the file.io class. This was what I was trying to avoid.
I have now found a satisfactory way of doing what I wanted. The following is my base class, which is built on the base functions of file_io (but is not a subclass) and a simple example for writing CSV files. I also have Formatters for HTML, Verilog and others. Code is:
class Formatter():
''' base class to manage the opening of a file in order to build classes which write a file
with a specific format so as to be able to pass the formatting functions to a subroutine
along with the file handle
Designed to use with "with" statement and to shorten argument lists of functions which use
the file
'''
def __init__(self, filename):
''' associate this instance with the given file handle
'''
self.f = open(filename, 'w')
def wr(self, line, end='\n'):
''' write line given to file
'''
self.f.write(line + end)
def wr_newline(self, num):
''' write num newlines to file
'''
self.f.write('\n'*num)
def __enter__(self):
''' needed for use with with statement
'''
return self
def __exit__(self, *args):
''' needed for use with with statement
'''
self.close()
def close(self):
''' explicit file close for use with procedural progamming style
'''
self.f.close()
class CSV(Formatter):
''' class to write items using comma separated file format string formatting
inherrits:
=> wr(line, end='\n'):
=> wr_newline(n):
all file io functions via the self.f variable
'''
#staticmethod
def pp(item):
''' 'pretty-print - can't have , or \n in strings used in CSV files
'''
return str(item).replace('\n', '/').replace(',', '-')
def __init__(self, filename):
'''open filen given as a CSV file
'''
super().__init__(filename + '.csv')
def wp(self, item):
''' write a single item to the file
'''
self.f.write(self.pp(item)+', ')
def ws(self, itemlist):
''' write a csv list from a list variable
'''
self.wr(','.join([self.pp(item) for item in itemlist]))
I have multiple scripts that are exporting a same interface and they're executed using execfile() in insulated scope.
The thing is, I want them to share some resources so that each new script doesn't have to load them again from the start, thus loosing starting speed and using unnecessary amount of RAM.
The scripts are in reality much better encapsulated and guarded from malicious plug-ins than presented in example below, that's where problems for me begins.
The thing is, I want the script that creates a resource to be able to fill it with data, remove data or remove a resource, and of course access it's data.
But other scripts shouldn't be able to change another's scripts resource, just read it. I want to be sure that newly installed plug-ins cannot interfere with already loaded and running ones via abuse of shared resources.
Example:
class SharedResources:
# Here should be a shared resource manager that I tried to write
# but got stuck. That's why I ask this long and convoluted question!
# Some beginning:
def __init__ (self, owner):
self.owner = owner
def __call__ (self):
# Here we should return some object that will do
# required stuff. Read more for details.
pass
class plugin (dict):
def __init__ (self, filename):
dict.__init__(self)
# Here some checks and filling with secure versions of __builtins__ etc.
# ...
self["__name__"] = "__main__"
self["__file__"] = filename
# Add a shared resources manager to this plugin
self["SharedResources"] = SharedResources(filename)
# And then:
execfile(filename, self, self)
# Expose the plug-in interface to outside world:
def __getattr__ (self, a):
return self[a]
def __setattr__ (self, a, v):
self[a] = v
def __delattr__ (self, a):
del self[a]
# Note: I didn't use self.__dict__ because this makes encapsulation easier.
# In future I won't use object itself at all but separate dict to do it. For now let it be
----------------------------------------
# An example of two scripts that would use shared resource and be run with plugins["name"] = plugin("<filename>"):
# Presented code is same in both scripts, what comes after will be different.
def loadSomeResource ():
# Do it here...
return loadedresource
# Then Load this resource if it's not already loaded in shared resources, if it isn't then add loaded resource to shared resources:
shr = SharedResources() # This would be an instance allowing access to shared resources
if not shr.has_key("Default Resources"):
shr.create("Default Resources")
if not shr["Default Resources"].has_key("SomeResource"):
shr["Default Resources"].add("SomeResource", loadSomeResource())
resource = shr["Default Resources"]["SomeResource"]
# And then we use normally resource variable that can be any object.
# Here I Used category "Default Resources" to add and/or retrieve a resource named "SomeResource".
# I want more categories so that plugins that deal with audio aren't mixed with plug-ins that deal with video for instance. But this is not strictly needed.
# Here comes code specific for each plug-in that will use shared resource named "SomeResource" from category "Default Resources".
...
# And end of plugin script!
----------------------------------------
# And then, in main program we load plug-ins:
import os
plugins = {} # Here we store all loaded plugins
for x in os.listdir("plugins"):
plugins[x] = plugin(x)
Let say that our two scripts are stored in plugins directory and are both using some WAVE files loaded into memory.
Plugin that loads first will load the WAVE and put it into RAM.
The other plugin will be able to access already loaded WAVE but not to replace or delete it, thus messing with other plugin.
Now, I want each resource to have an owner, some id or filename of the plugin script, and that this resource is writable only by it's owner.
No tweaking or workarounds should enable the other plugin to access the first one.
I almost did it and then got stuck, and my head is spining with concepts that when implemented do the thing, but only partially.
This eats me, so I cannot concentrate any more. Any suggestion is more than welcome!
Adding:
This is what I use now without any safety included:
# Dict that will hold a category of resources (should implement some security):
class ResourceCategory (dict):
def __getattr__ (self, i): return self[i]
def __setattr__ (self, i, v): self[i] = v
def __delattr__ (self, i): del self[i]
SharedResources = {} # Resource pool
class ResourceManager:
def __init__ (self, owner):
self.owner = owner
def add (self, category, name, value):
if not SharedResources.has_key(category):
SharedResources[category] = ResourceCategory()
SharedResources[category][name] = value
def get (self, category, name):
return SharedResources[category][name]
def rem (self, category, name=None):
if name==None: del SharedResources[category]
else: del SharedResources[category][name]
def __call__ (self, category):
if not SharedResources.has_key(category):
SharedResources[category] = ResourceCategory()
return SharedResources[category]
__getattr__ = __getitem__ = __call__
# When securing, this must not be left as this, it is unsecure, can provide a way back to SharedResources pool:
has_category = has_key = SharedResources.has_key
Now a plugin capsule:
class plugin(dict):
def __init__ (self, path, owner):
dict.__init__()
self["__name__"] = "__main__"
# etc. etc.
# And when adding resource manager to the plugin, register it with this plugin as an owner
self["SharedResources"] = ResourceManager(owner)
# ...
execfile(path, self, self)
# ...
Example of a plugin script:
#-----------------------------------
# Get a category we want. (Using __call__() ) Note: If a category doesn't exist, it is created automatically.
AudioResource = SharedResources("Audio")
# Use an MP3 resource (let say a bytestring):
if not AudioResource.has_key("Beep"):
f = open("./sounds/beep.mp3", "rb")
Audio.Beep = f.read()
f.close()
# Take a reference out for fast access and nicer look:
beep = Audio.Beep # BTW, immutables doesn't propagate as references by themselves, doesn't they? A copy will be returned, so the RAM space usage will increase instead. Immutables shall be wrapped in a composed data type.
This works perfectly but, as I said, messing resources is too much easy here.
I would like an instance of ResourceManager() to be in charge to whom return what version of stored data.
So, my general approach would be this.
Have a central shared resource pool. Access through this pool would be read-only for everybody. Wrap all data in the shared pool so that no one "playing by the rules" can edit anything in it.
Each agent (plugin) maintains knowledge of what it "owns" at the time it loads it. It keeps a read/write reference for itself, and registers a reference to the resource to the centralized read-only pool.
When an plugin is loaded, it gets a reference to the central, read-only pool that it can register new resources with.
So, only addressing the issue of python native data structures (and not instances of custom classes), a fairly locked down system of read-only implementations is as follows. Note that the tricks that are used to lock them down are the same tricks that someone could use to get around the locks, so the sandboxing is very weak if someone with a little python knowledge is actively trying to break it.
import collections as _col
import sys
if sys.version_info >= (3, 0):
immutable_scalar_types = (bytes, complex, float, int, str)
else:
immutable_scalar_types = (basestring, complex, float, int, long)
# calling this will circumvent any control an object has on its own attribute lookup
getattribute = object.__getattribute__
# types that will be safe to return without wrapping them in a proxy
immutable_safe = immutable_scalar_types
def add_immutable_safe(cls):
# decorator for adding a new class to the immutable_safe collection
# Note: only ImmutableProxyContainer uses it in this initial
# implementation
global immutable_safe
immutable_safe += (cls,)
return cls
def get_proxied(proxy):
# circumvent normal object attribute lookup
return getattribute(proxy, "_proxied")
def set_proxied(proxy, proxied):
# circumvent normal object attribute setting
object.__setattr__(proxy, "_proxied", proxied)
def immutable_proxy_for(value):
# Proxy for known container types, reject all others
if isinstance(value, _col.Sequence):
return ImmutableProxySequence(value)
elif isinstance(value, _col.Mapping):
return ImmutableProxyMapping(value)
elif isinstance(value, _col.Set):
return ImmutableProxySet(value)
else:
raise NotImplementedError(
"Return type {} from an ImmutableProxyContainer not supported".format(
type(value)))
#add_immutable_safe
class ImmutableProxyContainer(object):
# the only names that are allowed to be looked up on an instance through
# normal attribute lookup
_allowed_getattr_fields = ()
def __init__(self, proxied):
set_proxied(self, proxied)
def __setattr__(self, name, value):
# never allow attribute setting through normal mechanism
raise AttributeError(
"Cannot set attributes on an ImmutableProxyContainer")
def __getattribute__(self, name):
# enforce attribute lookup policy
allowed_fields = getattribute(self, "_allowed_getattr_fields")
if name in allowed_fields:
return getattribute(self, name)
raise AttributeError(
"Cannot get attribute {} on an ImmutableProxyContainer".format(name))
def __repr__(self):
proxied = get_proxied(self)
return "{}({})".format(type(self).__name__, repr(proxied))
def __len__(self):
# works for all currently supported subclasses
return len(get_proxied(self))
def __hash__(self):
# will error out if proxied object is unhashable
proxied = getattribute(self, "_proxied")
return hash(proxied)
def __eq__(self, other):
proxied = get_proxied(self)
if isinstance(other, ImmutableProxyContainer):
other = get_proxied(other)
return proxied == other
class ImmutableProxySequence(ImmutableProxyContainer, _col.Sequence):
_allowed_getattr_fields = ("count", "index")
def __getitem__(self, index):
proxied = get_proxied(self)
value = proxied[index]
if isinstance(value, immutable_safe):
return value
return immutable_proxy_for(value)
class ImmutableProxyMapping(ImmutableProxyContainer, _col.Mapping):
_allowed_getattr_fields = ("get", "keys", "values", "items")
def __getitem__(self, key):
proxied = get_proxied(self)
value = proxied[key]
if isinstance(value, immutable_safe):
return value
return immutable_proxy_for(value)
def __iter__(self):
proxied = get_proxied(self)
for key in proxied:
if not isinstance(key, immutable_scalar_types):
# If mutable keys are used, returning them could be dangerous.
# If owner never puts a mutable key in, then integrity should
# be okay. tuples and frozensets should be okay as keys, but
# are not supported in this implementation for simplicity.
raise NotImplementedError(
"keys of type {} not supported in "
"ImmutableProxyMapping".format(type(key)))
yield key
class ImmutableProxySet(ImmutableProxyContainer, _col.Set):
_allowed_getattr_fields = ("isdisjoint", "_from_iterable")
def __contains__(self, value):
return value in get_proxied(self)
def __iter__(self):
proxied = get_proxied(self)
for value in proxied:
if isinstance(value, immutable_safe):
yield value
yield immutable_proxy_for(value)
#classmethod
def _from_iterable(cls, it):
return set(it)
NOTE: this is only tested on Python 3.4, but I tried to write it to be compatible with both Python 2 and 3.
Make the root of the shared resources a dictionary. Give a ImmutableProxyMapping of that dictionary to the plugins.
private_shared_root = {}
public_shared_root = ImmutableProxyMapping(private_shared_root)
Create an API where the plugins can register new resources to the public_shared_root, probably on a first-come-first-served basis (if it's already there, you can't register it). Pre-populate private_shared_root with any containers you know you're going to need, or any data you want to share with all plugins but you know you want to be read-only.
It might be convenient if the convention for the keys in the shared root mapping were all strings, like file-system paths (/home/dalen/local/python) or dotted paths like python library objects (os.path.expanduser). That way collision detection is immediate and trivial/obvious if plugins try to add the same resource to the pool.
I'm creating some classes for dealing with filenames in various types of file shares (nfs, afp, s3, local disk) etc. I get as user input a string that identifies the data source (i.e. "nfs://192.168.1.3" or "s3://mybucket/data") etc.
I'm subclassing the specific filesystems from a base class that has common code. Where I'm confused is in the object creation. What I have is the following:
import os
class FileSystem(object):
class NoAccess(Exception):
pass
def __new__(cls,path):
if cls is FileSystem:
if path.upper().startswith('NFS://'):
return super(FileSystem,cls).__new__(Nfs)
else:
return super(FileSystem,cls).__new__(LocalDrive)
else:
return super(FileSystem,cls).__new__(cls,path)
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem):
def __init__ (self,path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem):
def __init__(self,path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return len([x for x in os.listdir(self.path) if os.path.isfile(os.path.join(self.path, x))])
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('/var/log')
print type(data1)
print type(data2)
print data2.count_files()
I thought this would be a good use of __new__ but most posts I read about it's use discourage it. Is there a more accepted way to approach this problem?
I don't think using __new__() to do what you want is improper. In other words, I disagree with the accepted answer to this question which claims factory functions are always the "best way to do it".
If you really want to avoid using it, then the only options are metaclasses or a separate factory function/method (however see Python 3.6+ Update below). Given the choices available, making the __new__() method one — since it's static by default — is a perfectly sensible approach.
That said, below is what I think is an improved version of your code. I've added a couple of class methods to assist in automatically finding all the subclasses. These support the most important way in which it's better — which is now adding subclasses doesn't require modifying the __new__() method. This means it's now easily extensible since it effectively supports what you could call virtual constructors.
A similar implementation could also be used to move the creation of instances out of the __new__() method into a separate (static) factory method — so in one sense the technique shown is just a relatively simple way of coding an extensible generic factory function regardless of what name it's given.
# Works in Python 2 and 3.
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
#classmethod
def _get_all_subclasses(cls):
""" Recursive generator of all class' subclasses. """
for subclass in cls.__subclasses__():
yield subclass
for subclass in subclass._get_all_subclasses():
yield subclass
#classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass using path prefix. """
path_prefix = cls._get_prefix(path)
for subclass in cls._get_all_subclasses():
if subclass.prefix == path_prefix:
# Using "object" base class method avoids recursion here.
return object.__new__(subclass)
else: # No subclass with matching prefix found (& no default defined)
raise FileSystem.Unknown(
'path "{}" has no known file system prefix'.format(path))
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem):
prefix = 'nfs'
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem):
prefix = None # Default when no file system prefix is found.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
print(type(data1).__name__) # -> Nfs
print(type(data2).__name__) # -> LocalDrive
print(data2.count_files()) # -> <some number>
Python 3.6+ Update
The code above works in both Python 2 and 3.x. However in Python 3.6 a new class method was added to object named __init_subclass__() which makes the finding of subclasses simpler by using it to automatically create a "registry" of them instead of potentially having to check every subclass recursively as the _get_all_subclasses() method is doing in the above.
I got the idea of using __init_subclass__() to do this from the Subclass registration section in the PEP 487 -- Simpler customisation of class creation proposal. Since the method will be inherited by all the base class' subclasses, registration will automatically be done for sub-subclasses, too (as opposed to only to direct subclasses) — it completely eliminates the need for a method like _get_all_subclasses().
# Requires Python 3.6+
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Pattern for matching "xxx://" # x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
#classmethod
def __init_subclass__(cls, /, path_prefix, **kwargs):
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.
#classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass. """
path_prefix = cls._get_prefix(path)
subclass = cls._registry.get(path_prefix)
if subclass:
return object.__new__(subclass)
else: # No subclass with matching prefix found (and no default).
raise cls.Unknown(
f'path "{path}" has no known file system prefix')
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem, path_prefix='nfs'):
def __init__ (self, path):
pass
def count_files(self):
pass
class Ufs(Nfs, path_prefix='ufs'):
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem, path_prefix=None): # Default file system.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise self.NoAccess(f'Cannot read directory {path!r}')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
data4 = FileSystem('ufs://192.168.1.18')
print(type(data1)) # -> <class '__main__.Nfs'>
print(type(data2)) # -> <class '__main__.LocalDrive'>
print(f'file count: {data2.count_files()}') # -> file count: <some number>
try:
data3 = FileSystem('c:/foobar') # A non-existent directory.
except FileSystem.NoAccess as exc:
print(f'{exc} - FileSystem.NoAccess exception raised as expected')
else:
raise RuntimeError("Non-existent path should have raised Exception!")
try:
data4 = FileSystem('foobar://42') # Unregistered path prefix.
except FileSystem.Unknown as exc:
print(f'{exc} - FileSystem.Unknown exception raised as expected')
else:
raise RuntimeError("Unregistered path prefix should have raised Exception!")
In my opinion, using __new__ in such a way is really confusing for other people who might read your code. Also it requires somewhat hackish code to distinguish guessing file system from user input and creating Nfs and LocalDrive with their corresponding classes.
Why not make a separate function with this behaviour? It can even be a static method of FileSystem class:
class FileSystem(object):
# other code ...
#staticmethod
def from_path(path):
if path.upper().startswith('NFS://'):
return Nfs(path)
else:
return LocalDrive(path)
And you call it like this:
data1 = FileSystem.from_path('nfs://192.168.1.18')
data2 = FileSystem.from_path('/var/log')
Edit [BLUF]: there is no problem with the answer provided by #martineau, this post is merely to follow up for completion to discuss a potential error encountered when using additional keywords in a class definition that are not managed by the metaclass.
I'd like to supply some additional information on the use of __init_subclass__ in conjuncture with using __new__ as a factory. The answer that #martineau has posted is very useful and I have implemented an altered version of it in my own programs as I prefer using the class creation sequence over adding a factory method to the namespace; very similar to how pathlib.Path is implemented.
To follow up on a comment trail with #martinaeu I have taken the following snippet from his answer:
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
#classmethod
def __init_subclass__(cls, /, **kwargs):
path_prefix = kwargs.pop('path_prefix', None)
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.
#classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass. """
path_prefix = cls._get_prefix(path)
subclass = FileSystem._registry.get(path_prefix)
if subclass:
# Using "object" base class method avoids recursion here.
return object.__new__(subclass)
else: # No subclass with matching prefix found (and no default).
raise FileSystem.Unknown(
f'path "{path}" has no known file system prefix')
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem, path_prefix='nfs'):
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem, path_prefix=None): # Default file system.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
print(type(data1).__name__) # -> Nfs
print(type(data2).__name__) # -> LocalDrive
print(data2.count_files()) # -> <some number>
try:
data3 = FileSystem('foobar://42') # Unregistered path prefix.
except FileSystem.Unknown as exc:
print(str(exc), '- raised as expected')
else:
raise RuntimeError(
"Unregistered path prefix should have raised Exception!")
This answer, as written works, but I wish to address a few items (potential pitfalls) others may experience through inexperience or perhaps codebase standards their team requires.
Firstly, for the decorator on __init_subclass__, per the PEP:
One could require the explicit use of #classmethod on the __init_subclass__ decorator. It was made implicit since there's no sensible interpretation for leaving it out, and that case would need to be detected anyway in order to give a useful error message.
Not a problem since its already implied, and the Zen tells us "explicit over implicit"; never the less, when abiding by PEPs, there you go (and rational is further explained).
In my own implementation of a similar solution, subclasses are not defined with an additional keyword argument, such as #martineau does here:
class Nfs(FileSystem, path_prefix='nfs'): ...
class LocalDrive(FileSystem, path_prefix=None): ...
When browsing through the PEP:
As a second change, the new type.__init__ just ignores keyword arguments. Currently, it insists that no keyword arguments are given. This leads to a (wanted) error if one gives keyword arguments to a class declaration if the metaclass does not process them. Metaclass authors that do want to accept keyword arguments must filter them out by overriding __init__.
Why is this (potentially) problematic? Well there are several questions (notably this) describing the problem surrounding additional keyword arguments in a class definition, use of metaclasses (subsequently the metaclass= keyword) and the overridden __init_subclass__. However, that doesn't explain why it works in the currently given solution. The answer: kwargs.pop().
If we look at the following:
# code in CPython 3.7
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
def __init_subclass__(cls, **kwargs):
path_prefix = kwargs.pop('path_prefix', None)
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.
...
class Nfs(FileSystem, path_prefix='nfs'): ...
This will still run without issue, but if we remove the kwargs.pop():
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs) # throws TypeError
cls._registry[path_prefix] = cls # Add class to registry.
The error thrown is already known and described in the PEP:
In the new code, it is not __init__ that complains about keyword arguments, but __init_subclass__, whose default implementation takes no arguments. In a classical inheritance scheme using the method resolution order, each __init_subclass__ may take out it's keyword arguments until none are left, which is checked by the default implementation of __init_subclass__.
What is happening is the path_prefix= keyword is being "popped" off of kwargs, not just accessed, so then **kwargs is now empty and passed up the MRO and thus compliant with the default implementation (receiving no keyword arguments).
To avoid this entirely, I propose not relying on kwargs but instead use that which is already present in the call to __init_subclass__, namely the cls reference:
# code in CPython 3.7
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
cls._registry[cls._path_prefix] = cls # Add class to registry.
...
class Nfs(FileSystem):
_path_prefix = 'nfs'
...
Adding the prior keyword as a class attribute also extends the use in later methods if one needs to refer back to the particular prefix used by the subclass (via self._path_prefix). To my knowledge, you cannot refer back to supplied keywords in the definition (without some complexity) and this seemed trivial and useful.
So to #martineau I apologize for my comments seeming confusing, only so much space to type them and as shown it was more detailed.
I have a component that uses a simple pub/sub module I wrote as a message queue. I would like to try out other implementations like RabbitMQ. However, I want to make this backend change configurable so I can switch between my implementation and 3rd party modules for cleanliness and testing.
The obvious answer seems to be to:
Read a config file
Create a modifiable settings object/dict
Modify the target component to lazily load the specified implementation.
something like :
# component.py
from test.queues import Queue
class Component:
def __init__(self, Queue=Queue):
self.queue = Queue()
def publish(self, message):
self.queue.publish(message)
# queues.py
import test.settings as settings
def Queue(*args, **kwargs):
klass = settings.get('queue')
return klass(*args, **kwargs)
Not sure if the init should take in the Queue class, I figure it would help in easily specifying the queue used while testing.
Another thought I had was something like http://www.voidspace.org.uk/python/mock/patch.html though that seems like it would get messy. Upside would be that I wouldn't have to modify the code to support swapping component.
Any other ideas or anecdotes would be appreciated.
EDIT: Fixed indent.
One thing I've done before is to create a common class that each specific implementation inherits from. Then there's a spec that can easily be followed, and each implementation can avoid repeating certain code they'll all share.
This is a bad example, but you can see how you could make the saver object use any of the classes specified and the rest of your code wouldn't care.
class SaverTemplate(object):
def __init__(self, name, obj):
self.name = name
self.obj = obj
def save(self):
raise NotImplementedError
import json
class JsonSaver(SaverTemplate):
def save(self):
file = open(self.name + '.json', 'wb')
json.dump(self.object, file)
file.close()
import cPickle
class PickleSaver(SaverTemplate):
def save(self):
file = open(self.name + '.pickle', 'wb')
cPickle.dump(self.object, file, protocol=cPickle.HIGHEST_PROTOCOL)
file.close()
import yaml
class PickleSaver(SaverTemplate):
def save(self):
file = open(self.name + '.yaml', 'wb')
yaml.dump(self.object, file)
file.close()
saver = PickleSaver('whatever', foo)
saver.save()