How to do dependency injection python-way?

How to do dependency injection python-way? - python

I've been reading a lot about python-way lately so my question is
How to do dependency injection python-way?
I am talking about usual scenarios when, for example, service A needs access to UserService for authorization checks.

It all depends on the situation. For example, if you use dependency injection for testing purposes -- so you can easily mock out something -- you can often forgo injection altogether: you can instead mock out the module or class you would otherwise inject:
subprocess.Popen = some_mock_Popen
result = subprocess.call(...)
assert some_mock_popen.result == result
subprocess.call() will call subprocess.Popen(), and we can mock it out without having to inject the dependency in a special way. We can just replace subprocess.Popen directly. (This is just an example; in real life you would do this in a much more robust way.)
If you use dependency injection for more complex situations, or when mocking whole modules or classes isn't appropriate (because, for example, you want to only mock out one particular call) then using class attributes or module globals for the dependencies is the usual choice. For example, considering a my_subprocess.py:
from subprocess import Popen
def my_call(...):
return Popen(...).communicate()
You can easily replace only the Popen call made by my_call() by assigning to my_subprocess.Popen; it wouldn't affect any other calls to subprocess.Popen (but it would replace all calls to my_subprocess.Popen, of course.) Similarly, class attributes:
class MyClass(object):
Popen = staticmethod(subprocess.Popen)
def call(self):
return self.Popen(...).communicate(...)
When using class attributes like this, which is rarely necessary considering the options, you should take care to use staticmethod. If you don't, and the object you're inserting is a normal function object or another type of descriptor, like a property, that does something special when retrieved from a class or instance, it would do the wrong thing. Worse, if you used something that right now isn't a descriptor (like the subprocess.Popen class, in the example) it would work now, but if the object in question changed to a normal function future, it would break confusingly.
Lastly, there's just plain callbacks; if you just want to tie a particular instance of a class to a particular service, you can just pass the service (or one or more of the service's methods) to the class initializer, and have it use that:
class MyClass(object):
def __init__(self, authenticate=None, authorize=None):
if authenticate is None:
authenticate = default_authenticate
if authorize is None:
authorize = default_authorize
self.authenticate = authenticate
self.authorize = authorize
def request(self, user, password, action):
self.authenticate(user, password)
self.authorize(user, action)
self._do_request(action)
...
helper = AuthService(...)
# Pass bound methods to helper.authenticate and helper.authorize to MyClass.
inst = MyClass(authenticate=helper.authenticate, authorize=helper.authorize)
inst.request(...)
When setting instance attributes like that, you never have to worry about descriptors firing, so just assigning the functions (or classes or other callables or instances) is fine.

How about this "setter-only" injection recipe?
http://code.activestate.com/recipes/413268/
It is quite pythonic, using the "descriptor" protocol with __get__()/__set__(), but rather invasive, requiring to replace all your attribute-setting code with a RequiredFeature instance initialized with the str-name of the Feature required.

After years using Python without any DI autowiring framework and Java with Spring I've come to realize plain simple Python code often doesn't need frameworks for dependency injection without autowiring (autowiring is what Guice and Spring both do in Java), i.e., just doing something like this is enough:
def foo(dep = None): # great for unit testing!
self.dep = dep or Dep() # callers can not care about this too
...
This is pure dependency injection (quite simple) but without magical frameworks for automatically injecting them for you (i.e., autowiring) and without Inversion of Control.
Though as I dealt with bigger applications this approach wasn't cutting it anymore. So I've come up with injectable a micro-framework that wouldn't feel non-pythonic and yet would provide first class dependency injection autowiring.
Under the motto Dependency Injection for Humans™ this is what it looks like:
# some_service.py
class SomeService:
#autowired
def __init__(
self,
database: Autowired(Database),
message_brokers: Autowired(List[Broker]),
):
pending = database.retrieve_pending_messages()
for broker in message_brokers:
broker.send_pending(pending)
# database.py
#injectable
class Database:
...
# message_broker.py
class MessageBroker(ABC):
def send_pending(messages):
...
# kafka_producer.py
#injectable
class KafkaProducer(MessageBroker):
...
# sqs_producer.py
#injectable
class SQSProducer(MessageBroker):
...

What is dependency injection?
Dependency injection is a principle that helps to decrease coupling and increase cohesion.
Coupling and cohesion are about how tough the components are tied.
High coupling. If the coupling is high it’s like using a superglue or welding. No easy way to disassemble.
High cohesion. High cohesion is like using the screws. Very easy to disassemble and assemble back or assemble a different way. It is an opposite to high coupling.
When the cohesion is high the coupling is low.
Low coupling brings a flexibility. Your code becomes easier to change and test.
How to implement the dependency injection?
Objects do not create each other anymore. They provide a way to inject the dependencies instead.
before:
import os
class ApiClient:
def __init__(self):
self.api_key = os.getenv('API_KEY') # <-- dependency
self.timeout = os.getenv('TIMEOUT') # <-- dependency
class Service:
def __init__(self):
self.api_client = ApiClient() # <-- dependency
def main() -> None:
service = Service() # <-- dependency
...
if __name__ == '__main__':
main()
after:
import os
class ApiClient:
def __init__(self, api_key: str, timeout: int):
self.api_key = api_key # <-- dependency is injected
self.timeout = timeout # <-- dependency is injected
class Service:
def __init__(self, api_client: ApiClient):
self.api_client = api_client # <-- dependency is injected
def main(service: Service): # <-- dependency is injected
...
if __name__ == '__main__':
main(
service=Service(
api_client=ApiClient(
api_key=os.getenv('API_KEY'),
timeout=os.getenv('TIMEOUT'),
),
),
)
ApiClient is decoupled from knowing where the options come from. You can read a key and a timeout from a configuration file or even get them from a database.
Service is decoupled from the ApiClient. It does not create it anymore. You can provide a stub or other compatible object.
Function main() is decoupled from Service. It receives it as an argument.
Flexibility comes with a price.
Now you need to assemble and inject the objects like this:
main(
service=Service(
api_client=ApiClient(
api_key=os.getenv('API_KEY'),
timeout=os.getenv('TIMEOUT'),
),
),
)
The assembly code might get duplicated and it’ll become harder to change the application structure.
Conclusion
Dependency injection brings you 3 advantages:
Flexibility. The components are loosely coupled. You can easily extend or change a functionality of the system by combining the components different way. You even can do it on the fly.
Testability. Testing is easy because you can easily inject mocks instead of real objects that use API or database, etc.
Clearness and maintainability. Dependency injection helps you reveal the dependencies. Implicit becomes explicit. And “Explicit is better than implicit” (PEP 20 - The Zen of Python). You have all the components and dependencies defined explicitly in the container. This provides an overview and control on the application structure. It is easy to understand and change it.
—-
I believe that through the already presented example you will understand the idea and be able to apply it to your problem, ie the implementation of UserService for authorization.

I recently released a DI framework for python that might help you here. I think its a fairly fresh take on it, but I'm not sure how 'pythonic' it is. Judge for yourself. Feedback is very welcome.
https://github.com/suned/serum

Related

Is it bad practice to modify attributes of one module from another module?

I want to define a bunch of config variables that can be imported in all the modules in my project. The values of those variables will be constant during runtime but are not known before runtime; they depend on the input. Usually I'd define a dict in my top module which would be passed to all functions and classes from other modules; however, I was thinking it may be cleaner to simply create a blank config.py module which would be dynamically filled with config variables by the top module:
# top.py
import config
config.x = x
# config.py
x = None
# other.py
import config
print(config.x)
I like this approach because I don't have to save the parameters as attributes of classes in my other modules; which makes sense to me because parameters do not describe classes themselves.
This works but is it considered bad practice?

The question as such may be disputed. But I would generally say yes, it's "bad practice" because scope and impact of change is really getting blurred. Note the use case you're describing really is not about sharing configuration, but about different parts of the program functions, objects, modules exchanging data and as such it's a bit of a variation on (meta)global variable).
Reading common configuration values could be fine, but changing them along the way... you may lose track of what happened where and also in which order as modules get imported / values get modified. For instance assume the config.py and two modules m1.py:
import config
print(config.x)
config.x=1
and m2.py:
import config
print(config.x)
config.x=2
and a main.py that just does:
import m1
import m2
import config
print(config.x)
or:
import m2
import m1
import config
print(config.x)
The state in which you find config in each module and really any other (incl. main.py here) depends on order in which imports have occurred and who assigned what value when. Even for a program entirely under your control, this may get confusing (and source of mistakes) rather quickly.
For runtime data and passing information between objects and modules (and your example is really that and not configuration that is predefined and shared between modules) I would suggest you look into describing the information perhaps in a custom state (config) object and pass it around through appropriate interface. But really just a function / method argument may be all that is needed. The exact form depends on what exactly you're trying to achieve and what your overall design is.
In your example, other.py behaves differently when called or imported before top.py which may still seem obvious and manageable in a minimal example, but really is not a very sound design. Anyone reading the code (incl. future you) should be able to follow its logic and this IMO breaks its flow.
The most trivial (and procedural) example of what for what you've described and now I hopefully have a better grasp of would be other.py recreating your current behavior:
def do_stuff(value):
print(value) # We did something useful here
if __name__ == "__main__":
do_stuff(None) # Could also use config with defaults
And your top.py presumably being the entry point and orchestrating importing and execution doing:
import other
x = get_the_value()
other.do_stuff(x)
You can of course introduce an interface to configure do_stuff perhaps a dict or a custom class even with default implementation in config.py:
class Params:
def __init__(self, x=None):
self.x = x
and your other.py:
def do_stuff(params=config.Params()):
print(params.x) # We did something useful here
And on your top.py you can use:
params = config.Params(get_the_value())
other.do_stuff(params)
But you could also have any use case specific source of value(s):
class TopParams:
def __init__(self, url):
self.x = get_value_from_url(url)
params = TopParams("https://example.com/value-source")
other.do_stuff(params)
x could even be a property which you retrieve every time you access it... or lazily when needed and then cached... Again, it really then is a matter of what you need to do.

"Is it bad practice to modify attributes of one module from another module?"
that it is considered as bad practice - violation of the law of demeter, which means in fact "talk to friends, not to strangers".
Objects should expose behaviour and functions, but should HIDE the data.
DataStructures should EXPOSE data, but should not have any methods (which are exposed). The law of demeter does not apply to such DataStructures. OOP Purists might cover such DataStructures with setters and getters, but it really adds no value in Python.
there is a lot of literature about that like : https://en.wikipedia.org/wiki/Law_of_Demeter
and of course, a must to read: "Clean Code", by Robert C. Martin (Uncle Bob), check it out on Youtube also.
For procedural programming it is perfectly normal to keep data in a DataStructure which does not have any (exposed) methods.
The procedures in the program work with that data. Consider to use the module attrs, see : https://www.attrs.org/en/stable/ for easy creation of such classes.
my prefered method for keeping config is (here without using attrs):
# conf_xy.py
"""
config is code - so why use damned parsers, textfiles, xml, yaml, toml and all that
if You just can use testable code as config that can deliver the correct types, etc.
as well as hinting in Your favorite IDE ?
Here, for demonstration without using attrs package - usually I use attrs (read the docs)
"""
class ConfXY(object):
def __init__(self) -> None:
self.x: int = 1
self.z: float = get_z_from_input()
...
conf_xy=ConfXY()
# other.py
from conf_xy import conf_xy
...
y = conf_xy.x * 2
...

Factory methods vs inject framework in Python - what is cleaner?

What I usually do in my applications is that I create all my services/dao/repo/clients using factory methods
class Service:
def init(self, db):
self._db = db
#classmethod
def from_env(cls):
return cls(db=PostgresDatabase.from_env())
And when I create app I do
service = Service.from_env()
what creates all dependencies
and in tests when I dont want to use real db I just do DI
service = Service(db=InMemoryDatabse())
I suppose that is quite far from clean/hex architecture since Service knows how to create a Database and knows which database
type it creates (could be also InMemoryDatabse or MongoDatabase)
I guess that in clean/hex architecture I would have
class DatabaseInterface(ABC):
#abstractmethod
def get_user(self, user_id: int) -> User:
pass
import inject
class Service:
#inject.autoparams()
def __init__(self, db: DatabaseInterface):
self._db = db
And I would set up injector framework to do
# in app
inject.clear_and_configure(lambda binder: binder
.bind(DatabaseInterface, PostgresDatabase()))
# in test
inject.clear_and_configure(lambda binder: binder
.bind(DatabaseInterface, InMemoryDatabse()))
And my questions are:
Is my way really bad? Is it not a clean architecture anymore?
What are the benefits of using inject?
Is it worth to bother and use inject framework?
Are there any other better ways of separating the domain from the outside?

There are several main goals in Dependency Injection technique, including (but not limited to):
Lowering coupling between parts of your system. This way you can change each part with less effort. See "High cohesion, low coupling"
To enforce stricter rules about responsibilities. One entity must do only one thing on its level of abstraction. Other entities must be defined as dependencies to this one. See "IoC"
Better testing experience. Explicit dependencies allow you to stub different parts of your system with some primitive test behaviour that has the same public API than your production code. See "Mocks arent' stubs"
The other thing to keep in mind is that we usually shall rely on abstractions, not implementations. I see a lot of people who use DI to inject only particular implementation. There's a big difference.
Because when you inject and rely on an implementation, there's no difference in what method we use to create objects. It just does not matter. For example, if you inject requests without proper abstractions you would still require anything similar with the same methods, signatures, and return types. You would not be able to replace this implementation at all. But, when you inject fetch_order(order: OrderID) -> Order it means that anything can be inside. requests, database, whatever.
To sum things up:
What are the benefits of using inject?
The main benefit is that you don't have to assemble your dependencies manually. However, this comes with a huge cost: you are using complex, even magical, tools to solve problems. One day or another complexity will fight you back.
Is it worth to bother and use inject framework?
One more thing about inject framework in particular. I don't like when objects where I inject something knows about it. It is an implementation detail!
How in a world Postcard domain model, for example, knows this thing?
I would recommend to use punq for simple cases and dependencies for complex ones.
inject also does not enforce a clean separation of "dependencies" and object properties. As it was said, one of the main goal of DI is to enforce stricter responsibilities.
In contrast, let me show how punq works:
from typing_extensions import final
from attr import dataclass
# Note, we import protocols, not implementations:
from project.postcards.repository.protocols import PostcardsForToday
from project.postcards.services.protocols import (
SendPostcardsByEmail,
CountPostcardsInAnalytics,
)
#final
#dataclass(frozen=True, slots=True)
class SendTodaysPostcardsUsecase(object):
_repository: PostcardsForToday
_email: SendPostcardsByEmail
_analytics: CountPostcardInAnalytics
def __call__(self, today: datetime) -> None:
postcards = self._repository(today)
self._email(postcards)
self._analytics(postcards)
See? We even don't have a constructor. We declaratively define our dependencies and punq will automatically inject them. And we do not define any specific implementations. Only protocols to follow. This style is called "functional objects" or SRP-styled classes.
Then we define the punq container itself:
# project/implemented.py
import punq
container = punq.Container()
# Low level dependencies:
container.register(Postgres)
container.register(SendGrid)
container.register(GoogleAnalytics)
# Intermediate dependencies:
container.register(PostcardsForToday)
container.register(SendPostcardsByEmail)
container.register(CountPostcardInAnalytics)
# End dependencies:
container.register(SendTodaysPostcardsUsecase)
And use it:
from project.implemented import container
send_postcards = container.resolve(SendTodaysPostcardsUsecase)
send_postcards(datetime.now())
See? Now our classes have no idea who and how creates them. No decorators, no special values.
Read more about SRP-styled classes here:
Enforcing Single Responsibility Principle in Python
Are there any other better ways of separating the domain from the outside?
You can use functional programming concepts instead of imperative ones. The main idea of function dependency injection is that you don't call things that relies on context you don't have. You schedule these calls for later, when the context is present. Here's how you can illustrate dependency injection with just simple functions:
from django.conf import settings
from django.http import HttpRequest, HttpResponse
from words_app.logic import calculate_points
def view(request: HttpRequest) -> HttpResponse:
user_word: str = request.POST['word'] # just an example
points = calculate_points(user_words)(settings) # passing the dependencies and calling
... # later you show the result to user somehow
# Somewhere in your `word_app/logic.py`:
from typing import Callable
from typing_extensions import Protocol
class _Deps(Protocol): # we rely on abstractions, not direct values or types
WORD_THRESHOLD: int
def calculate_points(word: str) -> Callable[[_Deps], int]:
guessed_letters_count = len([letter for letter in word if letter != '.'])
return _award_points_for_letters(guessed_letters_count)
def _award_points_for_letters(guessed: int) -> Callable[[_Deps], int]:
def factory(deps: _Deps):
return 0 if guessed < deps.WORD_THRESHOLD else guessed
return factory
The only problem with this pattern is that _award_points_for_letters will be hard to compose.
That's why we made a special wrapper to help the composition (it is a part of the returns:
import random
from typing_extensions import Protocol
from returns.context import RequiresContext
class _Deps(Protocol): # we rely on abstractions, not direct values or types
WORD_THRESHOLD: int
def calculate_points(word: str) -> RequiresContext[_Deps, int]:
guessed_letters_count = len([letter for letter in word if letter != '.'])
awarded_points = _award_points_for_letters(guessed_letters_count)
return awarded_points.map(_maybe_add_extra_holiday_point) # it has special methods!
def _award_points_for_letters(guessed: int) -> RequiresContext[_Deps, int]:
def factory(deps: _Deps):
return 0 if guessed < deps.WORD_THRESHOLD else guessed
return RequiresContext(factory) # here, we added `RequiresContext` wrapper
def _maybe_add_extra_holiday_point(awarded_points: int) -> int:
return awarded_points + 1 if random.choice([True, False]) else awarded_points
For example, RequiresContext has special .map method to compose itself with a pure function. And that's it. As a result you have just simple functions and composition helpers with simple API. No magic, no extra complexity. And as a bonus everything is properly typed and compatible with mypy.
Read more about this approach here:
Typed functional dependency injection
returns docs

The initial example is pretty close to a "proper" clean/hex. What's missing is the idea of a Composition Root, and you can do clean/hex without any injector framework. Without it, you'd do something like:
class Service:
def __init__(self, db):
self._db = db
# In your app entry point:
service = Service(PostGresDb(config.host, config.port, config.dbname))
which goes by Pure/Vanilla/Poor Man's DI, depending on who you talk to. An abstract interface is not absolutely necessary, since you can rely on duck-typing or structural typing.
Whether or not you want to use a DI framework is a matter of opinion and taste, but there are other simpler alternatives to inject like punq that you could consider, if you choose to go down that path.
https://www.cosmicpython.com/ is a good resource that looks at these issues in depth.

you may want to use a different database and you want to have the flexibility to do it in a simple way, for this reason, I consider dependency injection a better way to configure your service

Global state in Python module

I am writing a Python wrapper for a C library using the cffi.
The C library has to be initialized and shut down. Also, the cffi needs some place to save the state returned from ffi.dlopen().
I can see two paths here:
Either I wrap this whole stateful business in a class like this
class wrapper(object):
def __init__(self):
self.c = ffi.dlopen("mylibrary")
self.c.initialize()
def __del__(self):
self.c.terminate()
Or I provide two global functions that hide the state in a global variable
def initialize():
global __library
__library = ffi.dlopen("mylibrary")
__library.initialize()
def terminate():
__library.terminate()
del __library
The first path is somewhat cumbersome in that it requires the user to always create an object that really serves no other purpose other than managing the library state. On the other hand, it makes sure that terminate() is actually called every time.
The second path seems to result in a somewhat easier API. However, it exposes some hidden global state, which might be a bad thing. Also, if the user forgets to call terminate(), the C library is not unloaded correctly (which is not a big problem on the C side).
Which one of these paths would be more pythonic?

Exposing a wrapper object only makes sense in python if the library actually supports something like multiple instances in one application. If it doesn't support that or it's not really relevant go for kindall's suggestion and just initialize the library when imported and add an atexit handler for cleanup.
Adding wrappers around a stateless api or even an api without support for keeping different sets of state is not really pythonic and would raise expectations that different instances have some kind of isolation.
Example code:
import atexit
# Normal library initialization
__library = ffi.dlopen("mylibrary")
__library.initialize()
# Private library cleanup function
def __terminate():
__library.terminate()
# register function to be called on clean interpreter termination
atexit.register(__terminate)
For more details about atexit this question has some more details, as has the python documentation of course.

Giving parameters into TestCase from Suite in python

From python documentation(http://docs.python.org/library/unittest.html):
import unittest
class WidgetTestCase(unittest.TestCase):
def setUp(self):
self.widget = Widget('The widget')
def tearDown(self):
self.widget.dispose()
self.widget = None
def test_default_size(self):
self.assertEqual(self.widget.size(), (50,50),
'incorrect default size')
def test_resize(self):
self.widget.resize(100,150)
self.assertEqual(self.widget.size(), (100,150),
'wrong size after resize')
Here is, how invoke those testcase:
def suite():
suite = unittest.TestSuite()
suite.addTest(WidgetTestCase('test_default_size'))
suite.addTest(WidgetTestCase('test_resize'))
return suite
Is it possible to insert parameter custom_parameter into WidgetTestCase like:
class WidgetTestCase(unittest.TestCase):
def setUp(self,custom_parameter):
self.widget = Widget('The widget')
self.custom_parameter=custom_parameter
?

What I've done is in test_suite module just added
WidgetTestCase.CustomParameter="some_address"
The simplest solutions are the best :)

I've found a way to do this, but it's a bit of a cludge.
Basically, what I do is add, to the TestCase, an __init__ method which defines a 'default' parameter and a __str__ so that we can distinguish cases:
class WidgetTestCase(unittest.TestCase):
def __init__(self, methodName='runTest'):
self.parameter = default_parameter
unittest.TestCase.__init__(self, methodName)
def __str__(self):
''' Override this so that we know which instance it is '''
return "%s(%s) (%s)" % (self._testMethodName, self.currentTest, unittest._strclass(self.__class__))
Then in suite(), I iterate over my test parameters, replacing the default parameter with one specific to each test:
def suite():
suite = unittest.TestSuite()
for test_parameter in test_parameters:
loadedtests = unittest.TestLoader().loadTestsFromTestCase(WidgetTestCase)
for t in loadedtests:
t.parameter = test_parameter
suite.addTests(loadedtests)
suite.addTests(unittest.TestLoader().loadTestsFromTestCase(OtherWidgetTestCases))
return suite
where OtherWidgetTestCases are tests which don't need to be parameterised.
For instance I have a bunch of tests on real data for which a suite of tests need to be applied to each, but I also have some synthetic data sets, designed to test certain edge cases not normally present in the data, and I only need to apply certain tests to those, so they get their own tests in OtherWidgetTestCases.

This is something that has been on my mind recently. Yes it is very possible to do. I called it scenario testing, but I think parameterized may be more accurate. I put a proof of concept up as a gist here. In short it is a meta class that allows you to define a scenario and run the tests against it a bunch. With it your example can be something like this:
class WidgetTestCase(unittest.TestCase):
__metaclass__ = ScenarioMeta
class widget_width(ScenerioTest):
scenarios = [
dict(widget_in=Widget("One Way"), expected_tuple=(50, 50)),
dict(widget_in=Widget("Another Way"), expected_tuple=(100, 150))
]
def __test__(self, widget_in, expected_tuple):
self.assertEqual(widget_in.size, expected_tuple)
When run, the meta class writes 2 seperate tests out so the output would be something like:
$ python myscerariotest.py -v
test_widget_width_0 (__main__.widget_width) ... ok
test_widget_width_1 (__main__.widget_width) ... ok
----------------------------------------------------------------------
Ran 2 tests in 0.001s
OK
As you can see the scenarios are converted to tests at runtime.
Now I am not yet sure if this is even a good idea. I use it in tests where I have a lot of text centric cases that repeat the same assertions on slightly different data, which helps me to catch the little edge cases. But the classes in that gist do work and I believe it accomplishes what you are after.
Note that the with some trickery the test cases can be given names and even pulled from an external source like a text file or database. Its not documented yet but some digging around in the meta class should get you started. There is also some more info and examples on my post here.
Edit
This is an ugly hack that I do not support anymore. The implementation should have been done as a subclass of TestCase, not as a hacked meta class. Live and learn. An even better solution would be to use nose generators.

I don't believe so, the signature for setUp needs to be what unittest is expecting, afaik, setUp is automagically called within the testcase's run method as setUp()... you're not going to be able to pass it unless you override run to pass in the var you want. But I think what you want defeats the purpose of unit testing. Don't try to use a DRY philosophy with this, each unit you're testing should be a part of a class or even part of a function/method.

I don't think this is a good idea. Unit tests should be thorough enough that you test all functionality in your cases so passing in different parameteres shouldn't be required.
You mention you're passing in a www address - this is almost certainly not a good idea. What happens if you try and run the tests on a machine where the 'net connection is down? Your tests should be:
Automatic - they will run on all machines and platforms where your app is supported, without user intervention. They shouldn't rely on external environment to pass. This means (amongst other things) that relying on a properly set up connection to the Internet is a bad idea. You can get around this by providing dummy data. Instead of passing in a URL to a resource, abstract away the data source and pass in a data-stream or whatever. This is especially easy in python since you can make use of python's duck-typing to present a stream-like object (python frequently uses a "file-like" object for this very reason!).
Thorough - your unit tests should have 100% code coverage, and cover all possible situations. You want to test your code with multiple sites? Instead, test your code with all the possible features that a site may include. Without knowing more about what your application does, I can't offer much advice in this point.
Now, it looks like you're tests are going to be heavily data-driven. There are many tools that allow you to define data-sets for unit tests and load them in the tests. Check out python test fixtures, for example.
I realise that this isn't the answer you're looking for, but I think you'll have more joy in the long-run if you follow these principles.

Building a minimal plugin architecture in Python

I have an application, written in Python, which is used by a fairly technical audience (scientists).
I'm looking for a good way to make the application extensible by the users, i.e. a scripting/plugin architecture.
I am looking for something extremely lightweight. Most scripts, or plugins, are not going to be developed and distributed by a third-party and installed, but are going to be something whipped up by a user in a few minutes to automate a repeating task, add support for a file format, etc. So plugins should have the absolute minimum boilerplate code, and require no 'installation' other than copying to a folder (so something like setuptools entry points, or the Zope plugin architecture seems like too much.)
Are there any systems like this already out there, or any projects that implement a similar scheme that I should look at for ideas / inspiration?

Mine is, basically, a directory called "plugins" which the main app can poll and then use imp.load_module to pick up files, look for a well-known entry point possibly with module-level config params, and go from there. I use file-monitoring stuff for a certain amount of dynamism in which plugins are active, but that's a nice-to-have.
Of course, any requirement that comes along saying "I don't need [big, complicated thing] X; I just want something lightweight" runs the risk of re-implementing X one discovered requirement at a time. But that's not to say you can't have some fun doing it anyway :)

module_example.py:
def plugin_main(*args, **kwargs):
print args, kwargs
loader.py:
def load_plugin(name):
mod = __import__("module_%s" % name)
return mod
def call_plugin(name, *args, **kwargs):
plugin = load_plugin(name)
plugin.plugin_main(*args, **kwargs)
call_plugin("example", 1234)
It's certainly "minimal", it has absolutely no error checking, probably countless security problems, it's not very flexible - but it should show you how simple a plugin system in Python can be..
You probably want to look into the imp module too, although you can do a lot with just __import__, os.listdir and some string manipulation.

Have a look at at this overview over existing plugin frameworks / libraries, it is a good starting point. I quite like yapsy, but it depends on your use-case.

While that question is really interesting, I think it's fairly hard to answer, without more details. What sort of application is this? Does it have a GUI? Is it a command-line tool? A set of scripts? A program with an unique entry point, etc...
Given the little information I have, I will answer in a very generic manner.
What means do you have to add plugins?
You will probably have to add a configuration file, which will list the paths/directories to load.
Another way would be to say "any files in that plugin/ directory will be loaded", but it has the inconvenient to require your users to move around files.
A last, intermediate option would be to require all plugins to be in the same plugin/ folder, and then to active/deactivate them using relative paths in a config file.
On a pure code/design practice, you'll have to determine clearly what behavior/specific actions you want your users to extend. Identify the common entry point/a set of functionalities that will always be overridden, and determine groups within these actions. Once this is done, it should be easy to extend your application,
Example using hooks, inspired from MediaWiki (PHP, but does language really matters?):
import hooks
# In your core code, on key points, you allow user to run actions:
def compute(...):
try:
hooks.runHook(hooks.registered.beforeCompute)
except hooks.hookException:
print('Error while executing plugin')
# [compute main code] ...
try:
hooks.runHook(hooks.registered.afterCompute)
except hooks.hookException:
print('Error while executing plugin')
# The idea is to insert possibilities for users to extend the behavior
# where it matters.
# If you need to, pass context parameters to runHook. Remember that
# runHook can be defined as a runHook(*args, **kwargs) function, not
# requiring you to define a common interface for *all* hooks. Quite flexible :)
# --------------------
# And in the plugin code:
# [...] plugin magic
def doStuff():
# ....
# and register the functionalities in hooks
# doStuff will be called at the end of each core.compute() call
hooks.registered.afterCompute.append(doStuff)
Another example, inspired from mercurial. Here, extensions only add commands to the hg commandline executable, extending the behavior.
def doStuff(ui, repo, *args, **kwargs):
# when called, a extension function always receives:
# * an ui object (user interface, prints, warnings, etc)
# * a repository object (main object from which most operations are doable)
# * command-line arguments that were not used by the core program
doMoreMagicStuff()
obj = maybeCreateSomeObjects()
# each extension defines a commands dictionary in the main extension file
commands = { 'newcommand': doStuff }
For both approaches, you might need common initialize and finalize for your extension.
You can either use a common interface that all your extension will have to implement (fits better with second approach; mercurial uses a reposetup(ui, repo) that is called for all extension), or use a hook-kind of approach, with a hooks.setup hook.
But again, if you want more useful answers, you'll have to narrow down your question ;)

Marty Allchin's simple plugin framework is the base I use for my own needs. I really recommand to take a look at it, I think it is really a good start if you want something simple and easily hackable. You can find it also as a Django Snippets.

I am a retired biologist who dealt with digital micrograqphs and found himself having to write an image processing and analysis package (not technically a library) to run on an SGi machine. I wrote the code in C and used Tcl for the scripting language. The GUI, such as it was, was done using Tk. The commands that appeared in Tcl were of the form "extensionName commandName arg0 arg1 ... param0 param1 ...", that is, simple space-separated words and numbers. When Tcl saw the "extensionName" substring, control was passed to the C package. That in turn ran the command through a lexer/parser (done in lex/yacc) and then called C routines as necessary.
The commands to operate the package could be run one by one via a window in the GUI, but batch jobs were done by editing text files which were valid Tcl scripts; you'd pick the template that did the kind of file-level operation you wanted to do and then edit a copy to contain the actual directory and file names plus the package commands. It worked like a charm. Until ...
1) The world turned to PCs and 2) the scripts got longer than about 500 lines, when Tcl's iffy organizational capabilities started to become a real inconvenience. Time passed ...
I retired, Python got invented, and it looked like the perfect successor to Tcl. Now, I have never done the port, because I have never faced up to the challenges of compiling (pretty big) C programs on a PC, extending Python with a C package, and doing GUIs in Python/Gt?/Tk?/??. However, the old idea of having editable template scripts seems still workable. Also, it should not be too great a burden to enter package commands in a native Python form, e.g.:
packageName.command( arg0, arg1, ..., param0, param1, ...)
A few extra dots, parens, and commas, but those aren't showstoppers.
I remember seeing that someone has done versions of lex and yacc in Python (try: http://www.dabeaz.com/ply/), so if those are still needed, they're around.
The point of this rambling is that it has seemed to me that Python itself IS the desired "lightweight" front end usable by scientists. I'm curious to know why you think that it is not, and I mean that seriously.
added later: The application gedit anticipates plugins being added and their site has about the clearest explanation of a simple plugin procedure I've found in a few minutes of looking around. Try:
https://wiki.gnome.org/Apps/Gedit/PythonPluginHowToOld
I'd still like to understand your question better. I am unclear whether you 1) want scientists to be able to use your (Python) application quite simply in various ways or 2) want to allow the scientists to add new capabilities to your application. Choice #1 is the situation we faced with the images and that led us to use generic scripts which we modified to suit the need of the moment. Is it Choice #2 which leads you to the idea of plugins, or is it some aspect of your application that makes issuing commands to it impracticable?

When i searching for Python Decorators, found a simple but useful code snippet. It may not fit in your needs but very inspiring.
Scipy Advanced Python#Plugin Registration System
class TextProcessor(object):
PLUGINS = []
def process(self, text, plugins=()):
if plugins is ():
for plugin in self.PLUGINS:
text = plugin().process(text)
else:
for plugin in plugins:
text = plugin().process(text)
return text
#classmethod
def plugin(cls, plugin):
cls.PLUGINS.append(plugin)
return plugin
#TextProcessor.plugin
class CleanMarkdownBolds(object):
def process(self, text):
return text.replace('**', '')
Usage:
processor = TextProcessor()
processed = processor.process(text="**foo bar**", plugins=(CleanMarkdownBolds, ))
processed = processor.process(text="**foo bar**")

I enjoyed the nice discussion on different plugin architectures given by Dr Andre Roberge at Pycon 2009. He gives a good overview of different ways of implementing plugins, starting from something really simple.
Its available as a podcast (second part following an explanation of monkey-patching) accompanied by a series of six blog entries.
I recommend giving it a quick listen before you make a decision.

I arrived here looking for a minimal plugin architecture, and found a lot of things that all seemed like overkill to me. So, I've implemented Super Simple Python Plugins. To use it, you create one or more directories and drop a special __init__.py file in each one. Importing those directories will cause all other Python files to be loaded as submodules, and their name(s) will be placed in the __all__ list. Then it's up to you to validate/initialize/register those modules. There's an example in the README file.

Actually setuptools works with a "plugins directory", as the following example taken from the project's documentation:
http://peak.telecommunity.com/DevCenter/PkgResources#locating-plugins
Example usage:
plugin_dirs = ['foo/plugins'] + sys.path
env = Environment(plugin_dirs)
distributions, errors = working_set.find_plugins(env)
map(working_set.add, distributions) # add plugins+libs to sys.path
print("Couldn't load plugins due to: %s" % errors)
In the long run, setuptools is a much safer choice since it can load plugins without conflicts or missing requirements.
Another benefit is that the plugins themselves can be extended using the same mechanism, without the original applications having to care about it.

Expanding on the #edomaur's answer may I suggest taking a look at simple_plugins (shameless plug), which is a simple plugin framework inspired by the work of Marty Alchin.
A short usage example based on the project's README:
# All plugin info
>>> BaseHttpResponse.plugins.keys()
['valid_ids', 'instances_sorted_by_id', 'id_to_class', 'instances',
'classes', 'class_to_id', 'id_to_instance']
# Plugin info can be accessed using either dict...
>>> BaseHttpResponse.plugins['valid_ids']
set([304, 400, 404, 200, 301])
# ... or object notation
>>> BaseHttpResponse.plugins.valid_ids
set([304, 400, 404, 200, 301])
>>> BaseHttpResponse.plugins.classes
set([<class '__main__.NotFound'>, <class '__main__.OK'>,
<class '__main__.NotModified'>, <class '__main__.BadRequest'>,
<class '__main__.MovedPermanently'>])
>>> BaseHttpResponse.plugins.id_to_class[200]
<class '__main__.OK'>
>>> BaseHttpResponse.plugins.id_to_instance[200]
<OK: 200>
>>> BaseHttpResponse.plugins.instances_sorted_by_id
[<OK: 200>, <MovedPermanently: 301>, <NotModified: 304>, <BadRequest: 400>, <NotFound: 404>]
# Coerce the passed value into the right instance
>>> BaseHttpResponse.coerce(200)
<OK: 200>

As one another approach to plugin system, You may check Extend Me project.
For example, let's define simple class and its extension
# Define base class for extensions (mount point)
class MyCoolClass(Extensible):
my_attr_1 = 25
def my_method1(self, arg1):
print('Hello, %s' % arg1)
# Define extension, which implements some aditional logic
# or modifies existing logic of base class (MyCoolClass)
# Also any extension class maby be placed in any module You like,
# It just needs to be imported at start of app
class MyCoolClassExtension1(MyCoolClass):
def my_method1(self, arg1):
super(MyCoolClassExtension1, self).my_method1(arg1.upper())
def my_method2(self, arg1):
print("Good by, %s" % arg1)
And try to use it:
>>> my_cool_obj = MyCoolClass()
>>> print(my_cool_obj.my_attr_1)
25
>>> my_cool_obj.my_method1('World')
Hello, WORLD
>>> my_cool_obj.my_method2('World')
Good by, World
And show what is hidden behind the scene:
>>> my_cool_obj.__class__.__bases__
[MyCoolClassExtension1, MyCoolClass]
extend_me library manipulates class creation process via metaclasses, thus in example above, when creating new instance of MyCoolClass we got instance of new class that is subclass of both MyCoolClassExtension and MyCoolClass having functionality of both of them, thanks to Python's multiple inheritance
For better control over class creation there are few metaclasses defined in this lib:
ExtensibleType - allows simple extensibility by subclassing
ExtensibleByHashType - similar to ExtensibleType, but haveing ability
to build specialized versions of class, allowing global extension
of base class and extension of specialized versions of class
This lib is used in OpenERP Proxy Project, and seems to be working good enough!
For real example of usage, look in OpenERP Proxy 'field_datetime' extension:
from ..orm.record import Record
import datetime
class RecordDateTime(Record):
""" Provides auto conversion of datetime fields from
string got from server to comparable datetime objects
"""
def _get_field(self, ftype, name):
res = super(RecordDateTime, self)._get_field(ftype, name)
if res and ftype == 'date':
return datetime.datetime.strptime(res, '%Y-%m-%d').date()
elif res and ftype == 'datetime':
return datetime.datetime.strptime(res, '%Y-%m-%d %H:%M:%S')
return res
Record here is extesible object. RecordDateTime is extension.
To enable extension, just import module that contains extension class, and (in case above) all Record objects created after it will have extension class in base classes, thus having all its functionality.
The main advantage of this library is that, code that operates extensible objects, does not need to know about extension and extensions could change everything in extensible objects.

setuptools has an EntryPoint:
Entry points are a simple way for distributions to “advertise” Python
objects (such as functions or classes) for use by other distributions.
Extensible applications and frameworks can search for entry points
with a particular name or group, either from a specific distribution
or from all active distributions on sys.path, and then inspect or load
the advertised objects at will.
AFAIK this package is always available if you use pip or virtualenv.

You can use pluginlib.
Plugins are easy to create and can be loaded from other packages, file paths, or entry points.
Create a plugin parent class, defining any required methods:
import pluginlib
#pluginlib.Parent('parser')
class Parser(object):
#pluginlib.abstractmethod
def parse(self, string):
pass
Create a plugin by inheriting a parent class:
import json
class JSON(Parser):
_alias_ = 'json'
def parse(self, string):
return json.loads(string)
Load the plugins:
loader = pluginlib.PluginLoader(modules=['sample_plugins'])
plugins = loader.plugins
parser = plugins.parser.json()
print(parser.parse('{"json": "test"}'))

I have spent time reading this thread while I was searching for a plugin framework in Python now and then. I have used some but there were shortcomings with them. Here is what I come up with for your scrutiny in 2017, a interface free, loosely coupled plugin management system: Load me later. Here are tutorials on how to use it.

I've spent a lot of time trying to find small plugin system for Python, which would fit my needs. But then I just thought, if there is already an inheritance, which is natural and flexible, why not use it.
The only problem with using inheritance for plugins is that you dont know what are the most specific(the lowest on inheritance tree) plugin classes are.
But this could be solved with metaclass, which keeps track of inheritance of base class, and possibly could build class, which inherits from most specific plugins ('Root extended' on the figure below)
So I came with a solution by coding such a metaclass:
class PluginBaseMeta(type):
def __new__(mcls, name, bases, namespace):
cls = super(PluginBaseMeta, mcls).__new__(mcls, name, bases, namespace)
if not hasattr(cls, '__pluginextensions__'): # parent class
cls.__pluginextensions__ = {cls} # set reflects lowest plugins
cls.__pluginroot__ = cls
cls.__pluginiscachevalid__ = False
else: # subclass
assert not set(namespace) & {'__pluginextensions__',
'__pluginroot__'} # only in parent
exts = cls.__pluginextensions__
exts.difference_update(set(bases)) # remove parents
exts.add(cls) # and add current
cls.__pluginroot__.__pluginiscachevalid__ = False
return cls
#property
def PluginExtended(cls):
# After PluginExtended creation we'll have only 1 item in set
# so this is used for caching, mainly not to create same PluginExtended
if cls.__pluginroot__.__pluginiscachevalid__:
return next(iter(cls.__pluginextensions__)) # only 1 item in set
else:
name = cls.__pluginroot__.__name__ + 'PluginExtended'
extended = type(name, tuple(cls.__pluginextensions__), {})
cls.__pluginroot__.__pluginiscachevalid__ = True
return extended
So when you have Root base, made with metaclass, and have tree of plugins which inherit from it, you could automatically get class, which inherits from the most specific plugins by just subclassing:
class RootExtended(RootBase.PluginExtended):
... your code here ...
Code base is pretty small (~30 lines of pure code) and as flexible as inheritance allows.
If you're interested, get involved # https://github.com/thodnev/pluginlib

You may also have a look at Groundwork.
The idea is to build applications around reusable components, called patterns and plugins. Plugins are classes that derive from GwBasePattern.
Here's a basic example:
from groundwork import App
from groundwork.patterns import GwBasePattern
class MyPlugin(GwBasePattern):
def __init__(self, app, **kwargs):
self.name = "My Plugin"
super().__init__(app, **kwargs)
def activate(self):
pass
def deactivate(self):
pass
my_app = App(plugins=[MyPlugin]) # register plugin
my_app.plugins.activate(["My Plugin"]) # activate it
There are also more advanced patterns to handle e.g. command line interfaces, signaling or shared objects.
Groundwork finds its plugins either by programmatically binding them to an app as shown above or automatically via setuptools. Python packages containing plugins must declare these using a special entry point groundwork.plugin.
Here are the docs.
Disclaimer: I'm one of the authors of Groundwork.

In our current healthcare product we have a plugin architecture implemented with interface class. Our tech stack are Django on top of Python for API and Nuxtjs on top of nodejs for frontend.
We have a plugin manager app written for our product which is basically pip and npm package in adherence with Django and Nuxtjs.
For new plugin development(pip and npm) we made plugin manager as dependency.
In Pip package:
With the help of setup.py you can add entrypoint of the plugin to do something with plugin manager(registry, initiations, ...etc.)
https://setuptools.readthedocs.io/en/latest/setuptools.html#automatic-script-creation
In npm package:
Similar to pip there are hooks in npm scripts to handle the installation.
https://docs.npmjs.com/misc/scripts
Our usecase:
plugin development team is separate from core devopment team now. The scope of plugin development is for integrating with 3rd party apps which are defined in any of the categories of the product. The plugin interfaces are categorised for eg:- Fax, phone, email ...etc plugin manager can be enhanced to new categories.
In your case: Maybe you can have one plugin written and reuse the same for doing stuffs.
If plugin developers has need to use reuse core objects that object can be used by doing a level of abstraction within plugin manager so that any plugins can inherit those methods.
Just sharing how we implemented in our product hope it will give a little idea.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.