Why is IoC / DI not common in Python? - python

In Java IoC / DI is a very common practice which is extensively used in web applications, nearly all available frameworks and Java EE. On the other hand, there are also lots of big Python web applications, but beside of Zope (which I've heard should be really horrible to code) IoC doesn't seem to be very common in the Python world. (Please name some examples if you think that I'm wrong).
There are of course several clones of popular Java IoC frameworks available for Python, springpython for example. But none of them seems to get used practically. At least, I've never stumpled upon a Django or sqlalchemy+<insert your favorite wsgi toolkit here> based web application which uses something like that.
In my opinion IoC has reasonable advantages and would make it easy to replace the django-default-user-model for example, but extensive usage of interface classes and IoC in Python looks a bit odd and not »pythonic«. But maybe someone has a better explanation, why IoC isn't widely used in Python.

I don't actually think that DI/IoC are that uncommon in Python. What is uncommon, however, are DI/IoC frameworks/containers.
Think about it: what does a DI container do? It allows you to
wire together independent components into a complete application ...
... at runtime.
We have names for "wiring together" and "at runtime":
scripting
dynamic
So, a DI container is nothing but an interpreter for a dynamic scripting language. Actually, let me rephrase that: a typical Java/.NET DI container is nothing but a crappy interpreter for a really bad dynamic scripting language with butt-ugly, sometimes XML-based, syntax.
When you program in Python, why would you want to use an ugly, bad scripting language when you have a beautiful, brilliant scripting language at your disposal? Actually, that's a more general question: when you program in pretty much any language, why would you want to use an ugly, bad scripting language when you have Jython and IronPython at your disposal?
So, to recap: the practice of DI/IoC is just as important in Python as it is in Java, for exactly the same reasons. The implementation of DI/IoC however, is built into the language and often so lightweight that it completely vanishes.
(Here's a brief aside for an analogy: in assembly, a subroutine call is a pretty major deal - you have to save your local variables and registers to memory, save your return address somewhere, change the instruction pointer to the subroutine you are calling, arrange for it to somehow jump back into your subroutine when it is finished, put the arguments somewhere where the callee can find them, and so on. IOW: in assembly, "subroutine call" is a Design Pattern, and before there were languages like Fortran which had subroutine calls built in, people were building their own "subroutine frameworks". Would you say that subroutine calls are "uncommon" in Python, just because you don't use subroutine frameworks?)
BTW: for an example of what it looks like to take DI to its logical conclusion, take a look at Gilad Bracha's Newspeak Programming Language and his writings on the subject:
Constructors Considered Harmful
Lethal Injection
A Ban on Imports (continued)

IoC and DI are super common in mature Python code. You just don't need a framework to implement DI thanks to duck typing.
The best example is how you set up a Django application using settings.py:
# settings.py
CACHES = {
'default': {
'BACKEND': 'django_redis.cache.RedisCache',
'LOCATION': REDIS_URL + '/1',
},
'local': {
'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
'LOCATION': 'snowflake',
}
}
Django Rest Framework utilizes DI heavily:
class FooView(APIView):
# The "injected" dependencies:
permission_classes = (IsAuthenticated, )
throttle_classes = (ScopedRateThrottle, )
parser_classes = (parsers.FormParser, parsers.JSONParser, parsers.MultiPartParser)
renderer_classes = (renderers.JSONRenderer,)
def get(self, request, *args, **kwargs):
pass
def post(self, request, *args, **kwargs):
pass
Let me remind (source):
"Dependency Injection" is a 25-dollar term for a 5-cent concept. [...] Dependency injection means giving an object its instance variables. [...].

Part of it is the way the module system works in Python. You can get a sort of "singleton" for free, just by importing it from a module. Define an actual instance of an object in a module, and then any client code can import it and actually get a working, fully constructed / populated object.
This is in contrast to Java, where you don't import actual instances of objects. This means you are always having to instantiate them yourself, (or use some sort of IoC/DI style approach). You can mitigate the hassle of having to instantiate everything yourself by having static factory methods (or actual factory classes), but then you still incur the resource overhead of actually creating new ones each time.

Django makes great use of inversion of control. For instance, the database server is selected by the configuration file, then the framework provides appropriate database wrapper instances to database clients.
The difference is that Python has first-class types. Data types, including classes, are themselves objects. If you want something to use a particular class, simply name the class. For example:
if config_dbms_name == 'postgresql':
import psycopg
self.database_interface = psycopg
elif config_dbms_name == 'mysql':
...
Later code can then create a database interface by writing:
my_db_connection = self.database_interface()
# Do stuff with database.
Instead of the boilerplate factory functions that Java and C++ need, Python does it with one or two lines of ordinary code. This is the strength of functional versus imperative programming.

It seems that people really don't get what Dependency injection and inversion of control mean anymore.
The practice of using inversion of control is to have classes or functions that depend on other classes or functions, but instead of creating the instances whithin the class or function code it is better to receive them as parameters, so loose coupling can be achieved. That has many benefits as more testability and to achieve the liskov substitution principle.
You see, by working with interfaces and injections, your code gets more maintainable, since you can change the behavior easily, because you won't have to rewrite a single line of code (maybe a line or two on the DI configuration) of your class to change its behavior, since the classes that implement the interface your class is waiting for can vary independently as long as they follow the interface. One of the best strategies to keep code decoupled and easy to maintain is to follow at least the single responsibility, substitution and dependency inversion principles.
What's a DI library good for if you can instantiate an object yourself inside a package and import it to inject it yourself? The chosen answer is right, since java has no procedural sections (code outside of classes), all that goes into boring configuration xml's, hence the need of a class to instantiate and inject dependencies on a lazy load fashion so you don't blow away your performance, while on python you just code the injections in the "procedural" (code outside classes) sections of your code.

Haven't used Python in several years, but I would say that it has more to do with it being a dynamically typed language than anything else. For a simple example, in Java, if I wanted to test that something wrote to standard out appropriately I could use DI and pass in any PrintStream to capture the text being written and verify it. When I'm working in Ruby, however, I can dynamically replace the 'puts' method on STDOUT to do the verify, leaving DI completely out of the picture. If the only reason I'm creating an abstraction is to test the class that's using it (think File system operations or the clock in Java) then DI/IoC creates unnecessary complexity in the solution.

Actually, it is quite easy to write sufficiently clean and compact code with DI (I wonder, will it be/stay pythonic then, but anyway :) ), for example I actually perefer this way of coding:
def polite(name_str):
return "dear " + name_str
def rude(name_str):
return name_str + ", you, moron"
def greet(name_str, call=polite):
print "Hello, " + call(name_str) + "!"
_
>>greet("Peter")
Hello, dear Peter!
>>greet("Jack", rude)
Hello, Jack, you, moron!
Yes, this can be viewed as just a simple form of parameterizing functions/classes, but it does its work. So, maybe Python's default-included batteries are enough here too.
P.S. I have also posted a larger example of this naive approach at Dynamically evaluating simple boolean logic in Python.

IoC/DI is a design concept, but unfortunately it's often taken as a concept that applies to certain languages (or typing systems). I'd love to see dependency injection containers become far more popular in Python. There's Spring, but that's a super-framework and seems to be a direct port of the Java concepts without much consideration for "The Python Way."
Given Annotations in Python 3, I decided to have a crack at a full featured, but simple, dependency injection container: https://github.com/zsims/dic . It's based on some concepts from a .NET dependency injection container (which IMO is fantastic if you're ever playing in that space), but mutated with Python concepts.

I think due to the dynamic nature of python people don't often see the need for another dynamic framework. When a class inherits from the new-style 'object' you can create a new variable dynamically (https://wiki.python.org/moin/NewClassVsClassicClass).
i.e.
In plain python:
#application.py
class Application(object):
def __init__(self):
pass
#main.py
Application.postgres_connection = PostgresConnection()
#other.py
postgres_connection = Application.postgres_connection
db_data = postgres_connection.fetchone()
However have a look at https://github.com/noodleflake/pyioc this might be what you are looking for.
i.e. In pyioc
from libs.service_locator import ServiceLocator
#main.py
ServiceLocator.register(PostgresConnection)
#other.py
postgres_connection = ServiceLocator.resolve(PostgresConnection)
db_data = postgres_connection.fetchone()

pytest fixtures all based on DI (source)

Check out FastAPI, it has dependency injection built-in. For example:
from fastapi import Depends, FastAPI
async def get_db():
db = DBSession()
try:
yield db
except Exception:
db.rollback()
raise
finally:
db.close()
app = FastAPI()
#app.get("/items")
def get_items(db=Depends(get_db)):
return db.get_items()

I back "Jörg W Mittag" answer: "The Python implementation of DI/IoC is so lightweight that it completely vanishes".
To back up this statement, take a look at the famous Martin Fowler's example ported from Java to Python: Python:Design_Patterns:Inversion_of_Control
As you can see from the above link, a "Container" in Python can be written in 8 lines of code:
class Container:
def __init__(self, system_data):
for component_name, component_class, component_args in system_data:
if type(component_class) == types.ClassType:
args = [self.__dict__[arg] for arg in component_args]
self.__dict__[component_name] = component_class(*args)
else:
self.__dict__[component_name] = component_class

My 2cents is that in most Python applications you don't need it and, even if you needed it, chances are that many Java haters (and incompetent fiddlers who believe to be developers) consider it as something bad, just because it's popular in Java.
An IoC system is actually useful when you have complex networks of objects, where each object may be a dependency for several others and, in turn, be itself a dependant on other objects. In such a case you'll want to define all these objects once and have a mechanism to put them together automatically, based on as many implicit rules as possible. If you also have configuration to be defined in a simple way by the application user/administrator, that's an additional reason to desire an IoC system that can read its components from something like a simple XML file (which would be the configuration).
The typical Python application is much simpler, just a bunch of scripts, without such a complex architecture. Personally I'm aware of what an IoC actually is (contrary to those who wrote certain answers here) and I've never felt the need for it in my limited Python experience (also I don't use Spring everywhere, not when the advantages it gives don't justify its development overhead).
That said, there are Python situations where the IoC approach is actually useful and, in fact, I read here that Django uses it.
The same reasoning above could be applied to Aspect Oriented Programming in the Java world, with the difference that the number of cases where AOP is really worthwhile is even more limited.

You can do dependency injection with Python manually, but manual approach has its downsides:
lots of boilerplate code to do the wiring. You can use dynamic features of Python to do the injection, but then you're loosing IDE support (e.g. Ctrl+Space in PyCharm), and you're making code harder to understand and debug
no standards: every programmer has its own way for solving same problems, this leads to reinventing the wheel, understanding each other's code can quickly become a pain. Dependency injection library provides easy framework to plug-in
To have it all we NEED a dependency injection framework, for example this one https://python-dependency-injector.ets-labs.org/index.html seems to be the most mature DI framework for Python.
For smaller apps DI container is not necessary, for anything that has few hundred lines of code or more, DI container is a must have to keep your code maintaineable.

I agree with #Jorg in the point that DI/IoC is possible, easier and even more beautiful in Python. What's missing is the frameworks supporting it, but there are a few exceptions. To point a couple of examples that come to my mind:
Django comments let you wire your own Comment class with your custom logic and forms. [More Info]
Django let you use a custom Profile object to attach to your User model. This is not completely IoC but is a good approach. Personally I'd like to replace the hole User model as the comments framework does. [More Info]

IoC containers are "mimicked" mostly using **kwargs
class A:
def __init__(self, **kwargs):
print(kwargs)
Class B:
pass
Class C:
pass
Ainstance = A(b=B, c=C)

In my opinion, things like dependency injection are symptoms of a rigid and over-complex framework. When the main body of code becomes much too weighty to change easily, you find yourself having to pick small parts of it, define interfaces for them, and then allowing people to change behaviour via the objects that plug into those interfaces. That's all well and good, but it's better to avoid that sort of complexity in the first place.
It's also the symptom of a statically-typed language. When the only tool you have to express abstraction is inheritance, then that's pretty much what you use everywhere. Having said that, C++ is pretty similar but never picked up the fascination with Builders and Interfaces everywhere that Java developers did. It is easy to get over-exuberant with the dream of being flexible and extensible at the cost of writing far too much generic code with little real benefit. I think it's a cultural thing.
Typically I think Python people are used to picking the right tool for the job, which is a coherent and simple whole, rather than the One True Tool (With A Thousand Possible Plugins) that can do anything but offers a bewildering array of possible configuration permutations. There are still interchangeable parts where necessary, but with no need for the big formalism of defining fixed interfaces, due to the flexibility of duck-typing and the relative simplicity of the language.

Unlike the strong typed nature in Java. Python's duck typing behavior makes it so easy to pass objects around.
Java developers are focusing on the constructing the class strcuture and relation between objects, while keeping things flexible. IoC is extremely important for achieving this.
Python developers are focusing on getting the work done. They just wire up classes when they need it. They don't even have to worry about the type of the class. As long as it can quack, it's a duck! This nature leaves no room for IoC.

Related

How to apply Closed-Open and Inversion of Control principles in Python?

Building out a new application now and struggling a lot with the implementation part of "Closed-Open" and "Inversion of Control" principles I following after reading Clean Architecture book by Uncle Bob.
How can I implement them in Python?
Usually, these two principles coming hand in hand and depicted in the UML as an Interface reversing control from module/package A to B.
I'm confused because:
Python does not possess Interfaces as Java and C++ do. Yes, there are ABC and #abstractmethod, but it is not a Pythonic style and redundant from my point of view if you are not developing a framework
Passing a class to the method of another one (I understood that it is a way to implement open-closed principle) is a little bit dangerous in Python, since it does not have a compiler which is catching issues may (and will) happen if one of two loosely coupled objects change
After neglecting interfaces and passing a top-level class to lower-level ones... I still need to import everything somewhere at the top module. And by that, the whole thing is violated.
So, as you can see I'm super confused and having a hard time programming according to my design. I came up with. Can you help me, please?
You just pass an object that implements the methods you need it to implement.
True, there is no "Interface" to define what those methods have to be, but that's just the way it is in python.
You pass around arguments all the time that have to be lists, maps, tuples, or whatever, and none of these are type-checked. You can write code that calls whatever you want on these things and python will not notice any kind of problem until that code is actually executed.
It's exactly the same when you need those arguments to implement whatever IoC interface you're using. Make sure you detail the requirements in comments.
Yes, this is all pretty dangerous. That's why we prefer statically typed languages for large systems that have complex interfaces.

Which way is ideal for Python factory registration?

This is a question about which of these methods would be considered as the most Pythonic. I'm not looking for personal opinions, but, instead, what is idiomatic. My background is not in Python, so this will help me.
I'm working on a Python 3 project which is extensible. The idea is similar to the factory pattern, except it is based on functions.
Essentially, users will be able to create a custom function (across packages and projects) which my tool can locate and dynamically invoke. It will also be able to use currying to pass arguments down (but that code is not included here)
My goal is for this to follow good-Pythonic practice. I'm torn between two strategies. And, since Python is not my expertise, I would like to know the pros/cons of the following practices:
Use a decorator
registered = {}
def factoried(name):
def __inner_factory_function(fn):
registered[name] = fn
return fn
return __inner_factory_function
def get_function(name):
return registered[name]
Then, the following function is automatically registered...
#factoried('foo')
def build_foo():
print('hi')
This seems reasonable, but does appear slightly magical to those who are not familiar with decorators.
Force sub-classing of an abstract class and use __subclasses__()
If subclasses are used, there's no need for registration. However, I feel like this forces classes to be defined when a full class may be unnecessary. Also, the use of .__subclasses__() under the hood could seem magical to consumers as well. However, even Java can be used to search for classes with annotations.
Explicit registration
Forget all of the above and force explicit registration. No decorators. No subclasses. Just something like this:
def build_foo():
# ...
factory.register('foo', build_foo)
There is no answer to this question.
The only standard practices promoted by the Python Foundation are PEP 8.
PEP 8 has very little related to higher-level "design-pattern" questions like this, and, in particular, nothing related to your specific question.
And, even if it did, PEP 8 is explicitly only a guideline for "code comprising the standard library in the main Python distribution", and Guido has rejected suggestions to make it some kind of wide-ranging standard that should be enforced on every Python project.
Plus, it hammers home the point that it's only a guideline, not a rigid recommendation.
Of course there are subjective reasons to prefer one design over another.
Ideally, these subjective reasons will usually be driven by some community consensus on what's "idiomatic" or "pythonic". But that community consensus isn't written down anywhere as some objective source you can cite.
There may be arguments that appeal to The Zen of Python, but that itself is just Tim Peters' attempts to distill Guido's own subjective guidelines into a collection of pithy sound bites, not an objective source. (And anyone who takes a brief look at, e.g., the python-ideas list can see that both sides of almost any question can appeal to the Zen…)

Tracking changes in python source files?

I'm learning python and came into a situation where I need to change the behvaviour of a function. I'm initially a java programmer so in the Java world a change in a function would let Eclipse shows that a lot of source files in Java has errors. That way I can know which files need to get modified. But how would one do such a thing in python considering there are no types?! I'm using TextMate2 for python coding.
Currently I'm doing the brute-force way. Opening every python script file and check where I'm using that function and then modify. But I'm sure this is not the way to deal with large projects!!!
Edit: as an example I define a class called Graph in a python script file. Graph has two objects variables. I created many objects (each with different name!!!) of this class in many script files and then decided that I want to change the name of the object variables! Now I'm going through each file and reading my code again in order to change the names again :(. PLEASE help!
Example: File A has objects x,y,z of class C. File B has objects xx,yy,zz of class C. Class C has two instance variables names that should be changed Foo to Poo and Foo1 to Poo1. Also consider many files like A and B. What would you do to solve this? Are you serisouly going to open each file and search for x,y,z,xx,yy,zz and then change the names individually?!!!
Sounds like you can only code inside an IDE!
Two steps to free yourself from your IDE and become a better programmer.
Write unit tests for your code.
Learn how to use grep
Unit tests will exercise your code and provide reassurance that it is always doing what you wanted it to do. They make refactoring MUCH easier.
grep, what a wonderful tool grep -R 'my_function_name' src will find every reference to your function in files under the directory src.
Also, see this rather wonderful blog post: Unix as an IDE.
Whoa, slow down. The coding process you described is not scalable.
How exactly did you change the behavior of the function? Give specifics, please.
UPDATE: This all sounds like you're trying to implement a class and its methods by cobbling together a motley patchwork of functions and local variables - like I wrongly did when I first learned OO coding in Python. The code smell is that when the type/class of some class internal changes, it should generally not affect the class methods. If you're refactoring all your code every 10 mins, you're doing something seriously wrong. Step back and think about clean decomposition into objects, methods and data members.
(Please give more specifics if you want a more useful answer.)
If you were only changing input types, there might be no need to change the calling code.
(Unless the new fn does something very different to the old one, in which case what was the argument against calling it a different name?)
If you changed the return type, and you can't find a common ancestor type or container (tuple, sequence etc.) to put the return values in, then yes you need to change its caller code. However...
...however if the function should really be a method of a class, declare that class and the method already. The previous paragraph was a code smell that your function really should have been a method, specifically a polymorphic method.
Read about code smells, anti-patterns and When do you know you're dealing with an anti-pattern?. There e.g. you will find a recommendation for the video "Recovery from Addiction - A taste of the Python programming language's concision and elegance from someone who once suffered an addiction to the Java programming language." - Sean Kelly
Also, sounds like you want to use Test-Driven Design and add some unittests.
If you give us the specifics we can critique it better.
You won't get this functionality in a text editor. I use sublime text 3, and I love it, but it doesn't have this functionality. It does however jump to files and functions via its 'Goto Anything' (Ctrl+P) functionality, and its Multiple Selections / Multi Edit is great for small refactoring tasks.
However, when it comes to IDEs, JetBrains pycharm has some of the amazing re-factoring tools that you might be looking for.
The also free Python Tools for Visual Studio (see free install options here which can use the free VS shell) has some excellent Refactoring capabilities and a superb REPL to boot.
I use all three. I spend most of my time in sublime text, I like pycharm for refactoring, and I find PT4VS excellent for very involved prototyping.
Despite python being a dynamically typed language, IDEs can still introspect to a reasonable degree. But, of course, it won't approach the level of Java or C# IDEs. Incidentally, if you are coming over from Java, you may have come across JetBrains IntelliJ, which PyCharm will feel almost identical to.
One's programming style is certainly different between a statically typed language like C# and a dynamic language like python. I find myself doing things in smaller, testable modules. The iteration speed is faster. And in a dynamic language one relies less on IDE tools and more on unit tests that cover the key functionality. If you don't have these you will break things when you refactor.
One answer only specific to your edit:
if your old code was working and does not need to be modified, you could just keep old names as alias of the new ones, resulting in your old code not to be broken. Example:
class MyClass(object):
def __init__(self):
self.t = time.time()
# creating new names
def new_foo(self, arg):
return 'new_foo', arg
def new_bar(self, arg):
return 'new_bar', arg
# now creating functions aliases
foo = new_foo
bar = new_bar
if your code need rework, rewrite your common code, execute everything, and correct any failure. You could also look for any import/instantiation of your class.
One of the tradeoffs between statically and dynamically typed languages is that the latter require less scaffolding in the form of type declarations, but also provide less help with refactoring tools and compile-time error detection. Some Python IDEs do offer a certain level of type inference and help with refactoring, but even the best of them will not be able to match the tools developed for statically typed languages.
Dynamic language programmers typically ensure correctness while refactoring in one or more of the following ways:
Use grep to look for function invocation sites, and fix them. (You would have to do that in languages like Java as well if you wanted to handle reflection.)
Start the application and see what goes wrong.
Write unit tests, if you don't already have them, use a coverage tool to make sure that they cover your whole program, and run the test suite after each change to check that everything still works.

What is a real-world example of Dependency Injection in a Dynamic Language?

I have a thorough background in .NET but have been using Python and Ruby lately. I found myself pondering how to best provide dependencies to objects that need them in Ruby.
At first thought, I did not actually think DI and IoC frameworks would be required to interact with dependencies because of the leniency of dynamic languages (a la redefinition, mixins, stubs, etc). Then, however, I came across answers as to why DI/IoC frameworks are not needed in dynamic languages. The reasons provided don't sit too well with me. I'm hoping I can see an example that might clear things up.
Recommended suggestions that I kind of disagree with:
Reason 1: A dependent class can be changed at run time (think testing)
In Why are IOC containers unnecessary with dynamic languages we see that a dependent class (non-injected), say X, can be stubbed or mocked in a test. Sure, but that requires us to know our System Under Test is depending on something called X. If our System Under Test suddenly depends on N instead of X, we must now remember to mock N instead of X. The benefit of using DI is we'd never accidentally run a test with production dependencies because we'd always be passing in mocked dependencies.
Reason 2: Subclass or use constructor injection for testing
In everyone's favorite goto resource for all things DI + Ruby, LEGOs, Play-Doh, and Programming, we see an example of subclassing a System Under Test to mock dependencies. Alternatively, we can use constructor injection. Okay, so B depends on A. We call B.get_dependency which provides B with an instance of A. But what if A depends on N which depends on X? Must we call get_dependency on each successive object in the chain?
Reason 3: Dependencies can be mixed in or monkeypatched
Fabio mentions we can just use mixins/monkeypatch. So X is mixedin to N. But The issue is what if X depends on A which depends on B? Do we just use mixins for every dependency down the chain? I see how that can work but it could get messy and confusing quickly.
Side note: Many users say DI frameworks are not needed in dynamic languages. However, Angular.JS has really benefited from implementing a pretty solid DI system. Angular is built on JavaScript, a dynamic language. Is this approach comparable to Ruby or Python?
Please keep in mind I'm not saying I want to force DI/IoC into Ruby, Python, etc.
While many think DI is not needed, I agree with you, that it's indeed needed a lot; but sometimes it gets mixed with other techniques Python provides. I suggest you to look at venusian, it may kind of verbose, but if you come from .NET you'll see the relation. In a word: venusian allows you to annotate your methods without changing their behavior. Thus, you may write venusian decorators so that your unit-testing does not get affected. Pyramid uses venusian to annotate views, for instance.

Statically Typed Metaprogramming?

I've been thinking about what I would miss in porting some Python code to a statically typed language such as F# or Scala; the libraries can be substituted, the conciseness is comparable, but I have lots of python code which is as follows:
#specialclass
class Thing(object):
#specialFunc
def method1(arg1, arg2):
...
#specialFunc
def method2(arg3, arg4, arg5):
...
Where the decorators do a huge amount: replacing the methods with callable objects with state, augmenting the class with additional data and properties, etc.. Although Python allows dynamic monkey-patch metaprogramming anywhere, anytime, by anyone, I find that essentially all my metaprogramming is done in a separate "phase" of the program. i.e.:
load/compile .py files
transform using decorators
// maybe transform a few more times using decorators
execute code // no more transformations!
These phases are basically completely distinct; I do not run any application level code in the decorators, nor do I perform any ninja replace-class-with-other-class or replace-function-with-other-function in the main application code. Although the "dynamic"ness of the language says I can do so anywhere I want, I never go around replacing functions or redefining classes in the main application code because it gets crazy very quickly.
I am, essentially, performing a single re-compile on the code before i start running it.
The only similar metapogramming i know of in statically typed languages is reflection: i.e. getting functions/classes from strings, invoking methods using argument arrays, etc. However, this basically converts the statically typed language into a dynamically typed language, losing all type safety (correct me if i'm wrong?). Ideally, I think, I would have something like the following:
load/parse application files
load/compile transformer
transform application files using transformer
compile
execute code
Essentially, you would be augmenting the compilation process with arbitrary code, compiled using the normal compiler, that will perform transformations on the main application code. The point is that it essentially emulates the "load, transform(s), execute" workflow while strictly maintaining type safety.
If the application code are borked the compiler will complain, if the transformer code is borked the compiler will complain, if the transformer code compiles but doesn't do the right thing, either it will crash or the compilation step after will complain that the final types don't add up. In any case, you will never get the runtime type-errors possible by using reflection to do dynamic dispatch: it would all be statically checked at every step.
So my question is, is this possible? Has it already been done in some language or framework which I do not know about? Is it theoretically impossible? I'm not very familiar with compiler or formal language theory, I know it would make the compilation step turing complete and with no guarantee of termination, but it seems to me that this is what I would need to match the sort of convenient code-transformation i get in a dynamic language while maintaining static type checking.
EDIT: One example use case would be a completely generic caching decorator. In python it would be:
cacheDict = {}
def cache(func):
#functools.wraps(func)
def wrapped(*args, **kwargs):
cachekey = hash((args, kwargs))
if cachekey not in cacheDict.keys():
cacheDict[cachekey] = func(*args, **kwargs)
return cacheDict[cachekey]
return wrapped
#cache
def expensivepurefunction(arg1, arg2):
# do stuff
return result
While higher order functions can do some of this or objects-with-functions-inside can do some of this, AFAIK they cannot be generalized to work with any function taking an arbitrary set of parameters and returning an arbitrary type while maintaining type safety. I could do stuff like:
public Thingy wrap(Object O){ //this probably won't compile, but you get the idea
return (params Object[] args) => {
//check cache
return InvokeWithReflection(O, args)
}
}
But all the casting completely kills type safety.
EDIT: This is a simple example, where the function signature does not change. Ideally what I am looking for could modify the function signature, changing the input parameters or output type (a.l.a. function composition) while still maintaining type checking.
Very interesting question.
Some points regarding metaprogramming in Scala:
In scala 2.10 there will be developments in scala reflection
There is work in source to source transformation (macros) which is something you are looking for: scalamacros.org
Java has introspection (through the reflection api) but does not allow self modification. However you can use tools to support this (such as javassist). In theory you could use these tools in Scala to achieve more than introspection.
From what I could understand of your development process, you separate your domain code from your decorators (or a cross cutting concern if you will) which allow to achieve modularity and code simplicity. This can be a good use for aspect oriented programming, which allows to just that. For Java theres is a library (aspectJ), however I'm dubious it will run with Scala.
So my question is, is this possible?
There are many ways to achieve the same effect in statically-typed programming languages.
You have essentially described the process of doing some term rewriting on a program before executing it. This functionality is perhaps best known in the form of the Lisp macro but some statically typed languages also have macro systems, most notably OCaml's camlp4 macro system which can be used to extend the language.
More generally, you are describing one form of language extensibility. There are many alternatives and different languages provide different techniques. See my blog post Extensibility in Functional Programming for more information. Note that many of these languages are research projects so the motivation is to add novel features and not necessarily good features, so they rarely retrofit good features that were invented elsewhere.
The ML (meta language) family of languages including Standard ML, OCaml and F# were specifically designed for metaprogramming. Consequently, they tend to have awesome support for lexing, parsing, rewriting, interpreting and compiling. However, F# is the most far removed member of this family and lacks the mature tools that languages like OCaml benefit from (e.g. camlp4, ocamllex, dypgen, menhir etc.). F# does have a partial implementation of fslex, fsyacc and a Haskell-inspired parser combinator library called FParsec.
You may well find that the problem you are facing (which you have not described) is better solved using more traditional forms of metaprogramming, most notably a DSL or EDSL.
Without knowing why you're doing this, it's difficult to know whether this kind of approach is the right one in Scala or F#. But ignoring that for now, it's certainly possible to achieve in Scala, at least, although not at the language level.
A compiler plugin gives you access to the tree and allows you to perform all kinds of manipulation of that tree, all fully typechecked.
There are some issues with generating synthetic methods in Scala compiler plugins - it's difficult for me to know whether that will be a problem for you.
It is possible to work around this by creating a compiler plugin that generates source code which is then compiled in a separate pass. This is how ScalaMock works, for instance.
You might be interested in source-to-source program transformation systems (PTS).
Such tools parse the source code, producing an AST, and then allow one to define arbitrary analyses and/or transformations on the code, finally regenerating source code from the modified AST.
Some tools provide parsing, tree building and AST navigation by a procedural interface, such as ANTLR. Many of the more modern dynamic languages (Python, Scala, etc.) have had some self-hosting parser libraries built, and even Java (compiler plug-ins) and C# (open compiler) are catching on to this idea.
But mostly these tools only provide procedural access to the AST. A system with surface-syntax rewriting allows you to express "if you see this change it to that" using patterns with the syntax of the language(s) being manipulated. These include Stratego/XT and TXL.
It is our experience that manipulating complex languages requires complex compiler support and reasoning; this is the canonical lesson from 70 years of people building compilers. All of the above tools suffer from not having access to symbol tables and various kinds of flow analysis; after all, how one part of the program operates, depends on action taken in remote parts, so information flow is fundamental. [As noted in comments on another answer, you can implement symbol tables/flow analysis with those tools; my point is they give you no special support for doing so, and these are difficult tasks, even worse on modern languages with complex type systems and control flows].
Our DMS Software Reengineering Toolkit is a PTS that provides all of the above facilities (Life After Parsing), at some cost in configuring it to your particular language or DSL, which we try to ameliorate by providing these off-the-shelf for mainstream languages. [DMS provides explicit infrastructure for building/managing symbol tables, control and data flow; this has been used to implement these mechanisms for Java 1.8 and full C++14].
DMS has also been used to define meta-AOP, tools that enable one to build AOP systems for arbitrary languages and apply AOP like operations.
In any case, to the extent that you simply modify the AST, directly or indirectly, you have no guarantee of "type safety". You can only get that by writing transformation rules that don't break it. For that, you'd need a theorem prover to check that each modification (or composition of such) didn't break type safety, and that's pretty much beyond the state of the art. However, you can be careful how you write your rules, and get pretty useful systems.
You can see an example of specification of a DSL and manipulation with surface-syntax source-to-source rewriting rules, that preserves semantics, in this example that defines and manipulates algebra and calculus using DMS. I note this example is simple to make it understandable; in particular, its does not exhibit any of the flow analysis machinery DMS offers.
Ideally what I am looking for could modify the function signature, changing the input parameters or output type (a.l.a. function composition) while still maintaining type checking.
I have same need for making R APIs available in the type safe world. This way we would bring the wealth of scientific code from R into the (type) safe world of Scala.
Rationale
Make possible documenting the business domain aspects of the APIs through Specs2 (see https://etorreborre.github.io/specs2/guide/SPECS2-3.0/org.specs2.guide.UserGuide.html; is generated from Scala code). Think Domain Driven Design applied backwards.
Take a language oriented approach to the challenges faced by SparkR which tries to combine Spark with R.
See https://spark-summit.org/east-2015/functionality-and-performance-improvement-of-sparkr-and-its-application/ for attempts to improve how it is currently done in SparkR. See also https://github.com/onetapbeyond/renjin-spark-executor for a simplistic way to integrate.
In terms of solutioning this we could use Renjin (Java based interpreter) as runtime engine but use StrategoXT Metaborg to parse R and generate strongly typed Scala APIs (like you describe).
StrategoTX (http://www.metaborg.org/en/latest/) is the most powerful DSL development platform I know. Allows combining/embedding languages using a parsing technology that allows composing languages (longer story).

Categories