In programming for fun, I've noticed that managing dependencies feels like a boring chore that I want to minimize. After reading this, I've come up with a super trivial dependency injector, whereby the dependency instances are looked up by a string key:
def run_job(job, args, instance_keys, injected):
args.extend([injected[key] for key in instance_keys])
return job(*args)
This cheap trick works since calls in my program are always lazily defined (where the function handle is stored separately from its arguments) in an iterator, e.g.:
jobs_to_run = [[some_func, ("arg1", "arg2"), ("obj_key",)], [other_func,(),()]]
The reason is because of a central main loop that must schedule all events. It has a reference to all dependencies, so the injection for "obj_key" can be passed in a dict object, e.g.:
# inside main loop
injection = {"obj_key" : injected_instance}
for (callable, with_args, and_dependencies) in jobs_to_run:
run_job(callable, with_args, and_dependencies, injection)
So when an event happens (user input, etc.), the main loop may call an update() on a particular object who reacts to that input, who in turn builds a list of jobs for the main loop to schedule when there's resources. To me it is cleaner to key-reference any dependencies for someone else to inject rather than having all objects form direct relationships.
Because I am lazily defining all callables (functions) for a game clock engine to run them on its own accord, the above naive approach worked with very little added complexity. Still, there is a code stink in having to reference objects by strings. At the same time, it was stinky to be passing dependencies around, and constructor or setter injection would be overkill, as would perhaps most large dependency injection libraries.
For the special case of injecting dependencies in callables that are lazily defined, are there more expressive design patterns in existence?
I've noticed that managing dependencies feels like a boring chore that I want to minimize.
First of all, you shouldn't assume that dependency injection is a means to minimize the chore of dependency management. It doesn't go away, it is just deferred to another place and time and possibly delegated to someone else.
That said, if what you are building is going to be used by others it would thus be wise to include some form of version checking into your 'injectables'so that your users will have an easy way to check if their version matches the one that is expected.
are there more expressive design patterns in existence?
Your method as I understand it is essentially a Strategy-Pattern, that is the job's code (callable) relies on calling methods on one of several concrete objects. The way you do it is perfectly reasonable - it works and is efficient.
You may want to formalize it a bit more to make it easier to read and maintain, e.g.
from collections import namedtuple
Job = namedtuple('Job', ['callable', 'args', 'strategies'])
def run_job(job, using=None):
strategies = { k: using[k] for k in job.strategies] }
return job.callable(*args, **strategies)
jobs_to_run = [
Job(callable=some_func, args=(1,2), strategies=('A', 'B')),
Job(callable=other_func, ...),
]
strategies = {"A": injected_strategy, ...}
for job in jobs_to_run:
run_job(job, using=strategies)
# actual job
def some_func(arg1, arg2, A=None, B=None):
...
As you can see the code still does the same thing, but it is instantly more readable, and it concentrates knowledge about the structure of the Job() objects in run_job. Also the call to a job function like some_func will fail if the wrong number of arguments are passed, and the job functions are easier to code and debug due to their explicitely listed and named arguments.
About the strings you could just make'em constants in a dependencies.py file an use these constants.
An more robust option with still little overhead would to be to use a dependency injection framework such as Injectable:
#autowired
def job42(some_instance: Autowired("SomeInstance", lazy=true)):
...
# some_instance is autowired to job42 calls and
# it will be automatically injected for you
job42()
Disclosure: I am the project maintainer.
Related
Problem
I have a function make_pipeline that accepts an arbitrary number of functions, which it then calls to perform sequential data transformation. The resulting call chain performs transformations on a pandas.DataFrame. Some, but not all functions that it may call need to operate on a sub-array of the DataFrame. I have written multiple selector functions. However at present each member-function of the chain has to be explicitly be given the user-selected selector/filter function. This is VERY error-prone and accessibility is very important as the end-code is addressed to non-specialists (possibly with no Python/programming knowledge), so it must be "batteries-included". This entire project is written in a functional style (that's what's always worked for me).
Sample Code
filter_func = simple_filter()
# The API looks like this
make_pipeline(
load_data("somepath", header = [1,0]),
transform1(arg1,arg2),
transform2(arg1,arg2, data_filter = filter_func),# This function needs access to user-defined filter function
transform3(arg1,arg2,, data_filter = filter_func),# This function needs access to user-defined filter function
transform4(arg1,arg2),
)
Expected API
filter_func = simple_filter()
# The API looks like this
make_pipeline(
load_data("somepath", header = [1,0]),
transform1(arg1,arg2),
transform2(arg1,arg2),
transform3(arg1,arg2),
transform4(arg1,arg2),
)
Attempted
I thought that if the data_filter alias is available in the caller's namespace, it also becomes available (something similar to a closure) to all functions it calls. This seems to happen with some toy examples but wont work in the case (UnboundError).
What's a good way to make a function defined in one place available to certain interested functions in the call chain? I'm trying to avoid global.
Notes/Clarification
I've had problems with OOP and mutable states in the past, and functional programming has worked quite well. Hence I've set a goal for myself to NOT use classes (to the extent that Python enables me to anyways). So no classes.
I should have probably clarified this initially: In the pipeline the output of all functions is a DataFrame and the input of all functions (except load data obviously) is a DataFrame. The functions are decorated with a wrapper that calls functools.partial because we want the user to supply the args to each function but not execute it. The actual execution is done be a forloop in make_pipeline.
Each function accepts df:pandas.DataFrame plus all arguements that are specific to that function. The statement seen above transform1(arg1,arg2,...) actually calls the decorated transform1 witch returns functools.partial(transform, arg1,arg2,...) which is now has a signature like transform(df:pandas.DataFrame).
load_dataframe is just a convenience function to load the initial dataframe so that all other functions can begin operating on it. It just felt more intuitive to users to have it part of the chain rather that a separate call
The problem is this: I need a way for a filter function to be initialized (called) in only on place, such that every function in the call chain that needs access to the filter function, gets it without it being explicitly passed as argument to said function. If you're wondering why this is the case, it's because I feel that end users will find it unintuitive and arbitrary. Some functions need it, some don't. I'm also pretty certain that they will make all kinds of errors like passing different filters, forgetting it sometimes etc.
(Update) I've also tried inspect.signature() in make_pipeline to check if each function accepts a data_filter argument and pass it on. However, this raises an incorrect function signature error so some unclear reason (likely because of the decorators/partial calls). If signature could the return the non-partial function signature, this would solve the issue, but I couldn't find much info in the docs
Turns out it was pretty easy. The solution is inspect.signature.
def make_pipeline(*args, data_filter:Optional[Callable[...,Any]] = None)
d = args[0]
for arg in args[1:]:
if "data_filter" in inspect.signature(arg):
d = arg(d, data_filter = data_filter)
else:
d= arg(d)
Leaving this here mostly for reference because I think this is a mini design pattern. I've also seen an function._closure_ on unrelated subject. That may also work, but will likely be more complicated.
As I cannot find any guidelines about syntax highlighting, I decided to prepare simple write-as-plain-text-and-then-highlight-everything-in-html-preview, which is enough for my scope at the moment.
By overriding many custom meta-model classes I have to_source method, which actually reimplements the whole syntax in reverse, as reverse parsing is not yet available. It's fine, but it ignores user formatting.
To retain user formatting we can use only available thing: _tx_position and _tx_position_end. Descending from main textX rule to its children by stored custom meta-model classes attributes works for most cases, but it fails with primitives.
# textX meta-model file
NonsenseProgram:
"begin" foo=Foo "," count=INT "end";
;
Foo:
"fancy" a=ID "separator" b=ID "finished"
;
# textX custom meta-model classes
class NonsenseProgram():
def __init__(foo, count):
self.foo = foo
self.count = count
def to_source(self):
pass # some recursive magic that use _tx_position and _tx_position_end
class Foo():
def __init__(parent, a, b):
self.parent = parent
self.a = a
self.b = b
def to_source(self):
pass # some recursive magic that use _tx_position and _tx_position_end
Let's consider given example. As we have NonsenseProgram and Foo classes that we can override, we are in control about it's returning source as a whole. We can modify NonsenseProgram generated code, NonsenseProgram.foo fragment (by overriding Foo), by accessing its _tx_* attributes. We can't do the same with NonsenseProgram.count, Foo.a and Foo.b as we have primitive string or int value.
Depending of the usage of primitives is out grammar we have following options:
Wrap every primitive with rule that contains only that primitive and nothing else.
Pros: It just works right now!
Cons: Produces massive overhead of nested values that our grammar toolchain need to handle. It's actually messing with grammar only for being pretty...
Ignore syntax from user and use only our reverse parsing rules.
Pros: It just works too!
Cons: You need reimplement your syntax with nearly every grammar element. It's forces code reformat on every highlight try.
Use some external rules of highlighting.
Pros: It would work...
Cons: Again grammar reimplementation.
Use language server.
Pros: Would be the best option on long run.
Cons: It's only mentioned once without any in-depth docs.
Any suggestions about any other options?
You are right. There is no information on position for primitive types. It seems that you have covered available options at the moment.
What would be an easy to implement option is to add bookkeeping of position directly to textX of all attributes as a special structure on each created object (e.g. a dict keyed by attribute name). It should be straightforward to implement so you can register a feature request in the issue tracker if you wish.
There was some work in the past to support full language services to the textX based languages. The idea is to get all the features you would expect from a decent code editor/IDE for any language specified using textX.
The work staled for a while but resumed recently as the full rewrite. It should be officially supported by the textX team. You can follow the progress here. Although, the project doesn't mention syntax highlighting at the moment, it is on our agenda.
I've been programming Python for a year now, having come from a Java background, and I've noticed that, at least in my organization, the style for passing complex parameters to functions is to use dicts or tuples, rather than instances of a specialized parameter class. For example, we have a method that takes three dicts, each structured in a particular way, each of which is itself formatted as tuples. It's complicated for me to build args and to read the code. Here's an example of a passed arg:
{'[A].X': ((DiscreteMarginalDistribution, ('red', 'blue')), ()),
'[A].Y': ((DiscreteConditionalDistribution, ('yellow', 'green'), ('red', 'blue')),
(IdentityAggregator('[A].X'), ))
My questions are:
Is passing dicts/tuples like this a common Python idiom?
When, if ever, do you write Python code to use the latter (parameter instances)? E.g., when the nested structure surpasses some complexity threshold.
Thanks in advance!
Yes, it is very usual to pass a dictionary to Python functions in order to reduce the number of arguments. Dictionary-style configuration with proper key naming is much more readable than just using tuples.
I consider it rather uncommon to dynamically construct dedicated instances of a custom config class. I'd stick with dictionaries for that. In case your config dict and the consumer of it go out of sync, you get KeyErrors, which are pretty good to debug.
Some recommendations and reasoning:
If some parts of your application require really really complex configuration, I consider it a good idea to have a configuration object that properly represents the current config. However, in my projects I never ended up passing such objects as function arguments. This smells. In some applications, I have a constant global configuration object, set up during bootstrap. Such an object is globally available and treated as "immutable".
Single functions should never be so complex that they require to retrieve a tremendously complex configuration. This indicates that you should split your code into several components, each subunit having a rather simple parameterization.
If the runtime configuration of a function has a somewhat higher complexity than it is easily dealt with normal (keyword)arguments, it is absolutely common to pass a dictionary, so to say as a "leightweight" configuration object. A well thought-through selection of key names makes such an approach very well readable. Of course you can also build up a hierarchy in case one level is not enough for your use case.
Most importantly, please note that in many cases the best way is to explicitly define the parameterization of a function via its signature, using the normal argument specification:
def f(a, b, c, d, e):
...
In the calling code, you can then prepare the values for these arguments in a dictionary:
arguments = {
a = 1,
b = 2,
c = 3,
d = 4,
e = "x"
}
and then use Python's snytactic sugar for keyword expansion upon function call:
f(**arguments)
I'm coding a poker hand evaluator as my first programming project. I've made it through three classes, each of which accomplishes its narrowly-defined task very well:
HandRange = a string-like object (e.g. "AA"). getHands() returns a list of tuples for each specific hand within the string:
[(Ad,Ac),(Ad,Ah),(Ad,As),(Ac,Ah),(Ac,As),(Ah,As)]
Translation = a dictionary that maps the return list from getHands to values that are useful for a given evaluator (yes, this can probably be refactored into another class).
{'As':52, 'Ad':51, ...}
Evaluator = takes a list from HandRange (as translated by Translator), enumerates all possible hand matchups and provides win % for each.
My question: what should my "domain" class for using all these classes look like, given that I may want to connect to it via either a shell UI or a GUI? Right now, it looks like an assembly line process:
user_input = HandRange()
x = Translation.translateList(user_input)
y = Evaluator.getEquities(x)
This smells funny in that it feels like it's procedural when I ought to be using OO.
In a more general way: if I've spent so much time ensuring that my classes are well defined, narrowly focused, orthogonal, whatever ... how do I actually manage work flow in my program when I need to use all of them in a row?
Thanks,
Mike
Don't make a fetish of object orientation -- Python supports multiple paradigms, after all! Think of your user-defined types, AKA classes, as building blocks that gradually give you a "language" that's closer to your domain rather than to general purpose language / library primitives.
At some point you'll want to code "verbs" (actions) that use your building blocks to perform something (under command from whatever interface you'll supply -- command line, RPC, web, GUI, ...) -- and those may be module-level functions as well as methods within some encompassing class. You'll surely want a class if you need multiple instances, and most likely also if the actions involve updating "state" (instance variables of a class being much nicer than globals) or if inheritance and/or polomorphism come into play; but, there is no a priori reason to prefer classes to functions otherwise.
If you find yourself writing static methods, yearning for a singleton (or Borg) design pattern, writing a class with no state (just methods) -- these are all "code smells" that should prompt you to check whether you really need a class for that subset of your code, or rather whether you may be overcomplicating things and should use a module with functions for that part of your code. (Sometimes after due consideration you'll unearth some different reason for preferring a class, and that's allright too, but the point is, don't just pick a class over a module w/functions "by reflex", without critically thinking about it!).
You could create a Poker class that ties these all together and intialize all of that stuff in the __init__() method:
class Poker(object):
def __init__(self, user_input=HandRange()):
self.user_input = user_input
self.translation = Translation.translateList(user_input)
self.evaluator = Evaluator.getEquities(x)
# and so on...
p = Poker()
# etc, etc...
my question is rather a design question.
In Python, if code in your "constructor" fails, the object ends up not being defined. Thus:
someInstance = MyClass("test123") #lets say that constructor throws an exception
someInstance.doSomething() # will fail, name someInstance not defined.
I do have a situation though, where a lot of code copying would occur if i remove the error-prone code from my constructor. Basically my constructor fills a few attributes (via IO, where a lot can go wrong) that can be accessed with various getters. If I remove the code from the contructor, i'd have 10 getters with copy paste code something like :
is attribute really set?
do some IO actions to fill the attribute
return the contents of the variable in question
I dislike that, because all my getters would contain a lot of code. Instead of that I perform my IO operations in a central location, the constructor, and fill all my attributes.
Whats a proper way of doing this?
There is a difference between a constructor in C++ and an __init__ method
in Python. In C++, the task of a constructor is to construct an object. If it fails,
no destructor is called. Therefore if any resources were acquired before an
exception was thrown, the cleanup should be done before exiting the constructor.
Thus, some prefer two-phase construction with most of the construction done
outside the constructor (ugh).
Python has a much cleaner two-phase construction (construct, then
initialize). However, many people confuse an __init__ method (initializer)
with a constructor. The actual constructor in Python is called __new__.
Unlike in C++, it does not take an instance, but
returns one. The task of __init__ is to initialize the created instance.
If an exception is raised in __init__, the destructor __del__ (if any)
will be called as expected, because the object was already created (even though it was not properly initialized) by the time __init__ was called.
Answering your question:
In Python, if code in your
"constructor" fails, the object ends
up not being defined.
That's not precisely true. If __init__ raises an exception, the object is
created but not initialized properly (e.g., some attributes are not
assigned). But at the time that it's raised, you probably don't have any references to
this object, so the fact that the attributes are not assigned doesn't matter. Only the destructor (if any) needs to check whether the attributes actually exist.
Whats a proper way of doing this?
In Python, initialize objects in __init__ and don't worry about exceptions.
In C++, use RAII.
Update [about resource management]:
In garbage collected languages, if you are dealing with resources, especially limited ones such as database connections, it's better not to release them in the destructor.
This is because objects are destroyed in a non-deterministic way, and if you happen
to have a loop of references (which is not always easy to tell), and at least one of the objects in the loop has a destructor defined, they will never be destroyed.
Garbage collected languages have other means of dealing with resources. In Python, it's a with statement.
In C++ at least, there is nothing wrong with putting failure-prone code in the constructor - you simply throw an exception if an error occurs. If the code is needed to properly construct the object, there reallyb is no alternative (although you can abstract the code into subfunctions, or better into the constructors of subobjects). Worst practice is to half-construct the object and then expect the user to call other functions to complete the construction somehow.
It is not bad practice per se.
But I think you may be after a something different here. In your example the doSomething() method will not be called when the MyClass constructor fails. Try the following code:
class MyClass:
def __init__(self, s):
print s
raise Exception("Exception")
def doSomething(self):
print "doSomething"
try:
someInstance = MyClass("test123")
someInstance.doSomething()
except:
print "except"
It should print:
test123
except
For your software design you could ask the following questions:
What should the scope of the someInstance variable be? Who are its users? What are their requirements?
Where and how should the error be handled for the case that one of your 10 values is not available?
Should all 10 values be cached at construction time or cached one-by-one when they are needed the first time?
Can the I/O code be refactored into a helper method, so that doing something similiar 10 times does not result in code repetition?
...
I'm not a Python developer, but in general, it's best to avoid complex/error-prone operations in your constructor. One way around this would be to put a "LoadFromFile" or "Init" method in your class to populate the object from an external source. This load/init method must then be called separately after constructing the object.
One common pattern is two-phase construction, also suggested by Andy White.
First phase: Regular constructor.
Second phase: Operations that can fail.
Integration of the two: Add a factory method to do both phases and make the constructor protected/private to prevent instantation outside the factory method.
Oh, and I'm neither a Python developer.
If the code to initialise the various values is really extensive enough that copying it is undesirable (which it sounds like it is in your case) I would personally opt for putting the required initialisation into a private method, adding a flag to indicate whether the initialisation has taken place, and making all accessors call the initialisation method if it has not initialised yet.
In threaded scenarios you may have to add extra protection in case initialisation is only allowed to occur once for valid semantics (which may or may not be the case since you are dealing with a file).
Again, I've got little experience with Python, however in C# its better to try and avoid having a constructor that throws an exception. An example of why that springs to mind is if you want to place your constructor at a point where its not possible to surround it with a try {} catch {} block, for example initialisation of a field in a class:
class MyClass
{
MySecondClass = new MySecondClass();
// Rest of class
}
If the constructor of MySecondClass throws an exception that you wish to handle inside MyClass then you need to refactor the above - its certainly not the end of the world, but a nice-to-have.
In this case my approach would probably be to move the failure-prone initialisation logic into an initialisation method, and have the getters call that initialisation method before returning any values.
As an optimisation you should have the getter (or the initialisation method) set some sort of "IsInitialised" boolean to true, to indicate that the (potentially costly) initialisation does not need to be done again.
In pseudo-code (C# because I'll just mess up the syntax of Python):
class MyClass
{
private bool IsInitialised = false;
private string myString;
public void Init()
{
// Put initialisation code here
this.IsInitialised = true;
}
public string MyString
{
get
{
if (!this.IsInitialised)
{
this.Init();
}
return myString;
}
}
}
This is of course not thread-safe, but I don't think multithreading is used that commonly in python so this is probably a non-issue for you.
seems Neil had a good point: my friend just pointed me to this:
http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization
which is basically what Neil said...