In my python program, I have a ton of functions that are really wrappers for more complicated functions (the more complicated functions take more arguments, so the simple functions calculate the extra arguments and pass them along with the original arguments to the complex functions). I don't want the more complicated functions to be visible from the outer scope. However, my understanding is that if you define a function inside a function every time the outer function gets called it redefines the inner function, which is wasteful. How can I hide my inner functions without redefining them over and over again? There must be some way for the interpreter to parse my file and just do the definitions once but still keep them in the inner scope.
Rather than controlling access to your "inner functions" by nesting them, use either or both of:
naming conventions (a leading underscore on a name means private-by-convention, see the style guide); and
defining a list named __all__ to specify what gets imported from the package by default (see the tutorial on modules).
In use:
# define the names that get imported from this package
__all__ = ['outer_func']`
def _inner_func(...):
"""Private-by-convention inner function."""
...
def outer_func(...):
"""Public outer function to call _inner_func."""
...
This makes testing much easier, too, as you can still get direct access to _inner_func when necessary.
I think the convention is to prepend the function name with two underscores.
(See: http://www.diveintopython.net/object_oriented_framework/private_functions.html)
Related
Problem
I have a function make_pipeline that accepts an arbitrary number of functions, which it then calls to perform sequential data transformation. The resulting call chain performs transformations on a pandas.DataFrame. Some, but not all functions that it may call need to operate on a sub-array of the DataFrame. I have written multiple selector functions. However at present each member-function of the chain has to be explicitly be given the user-selected selector/filter function. This is VERY error-prone and accessibility is very important as the end-code is addressed to non-specialists (possibly with no Python/programming knowledge), so it must be "batteries-included". This entire project is written in a functional style (that's what's always worked for me).
Sample Code
filter_func = simple_filter()
# The API looks like this
make_pipeline(
load_data("somepath", header = [1,0]),
transform1(arg1,arg2),
transform2(arg1,arg2, data_filter = filter_func),# This function needs access to user-defined filter function
transform3(arg1,arg2,, data_filter = filter_func),# This function needs access to user-defined filter function
transform4(arg1,arg2),
)
Expected API
filter_func = simple_filter()
# The API looks like this
make_pipeline(
load_data("somepath", header = [1,0]),
transform1(arg1,arg2),
transform2(arg1,arg2),
transform3(arg1,arg2),
transform4(arg1,arg2),
)
Attempted
I thought that if the data_filter alias is available in the caller's namespace, it also becomes available (something similar to a closure) to all functions it calls. This seems to happen with some toy examples but wont work in the case (UnboundError).
What's a good way to make a function defined in one place available to certain interested functions in the call chain? I'm trying to avoid global.
Notes/Clarification
I've had problems with OOP and mutable states in the past, and functional programming has worked quite well. Hence I've set a goal for myself to NOT use classes (to the extent that Python enables me to anyways). So no classes.
I should have probably clarified this initially: In the pipeline the output of all functions is a DataFrame and the input of all functions (except load data obviously) is a DataFrame. The functions are decorated with a wrapper that calls functools.partial because we want the user to supply the args to each function but not execute it. The actual execution is done be a forloop in make_pipeline.
Each function accepts df:pandas.DataFrame plus all arguements that are specific to that function. The statement seen above transform1(arg1,arg2,...) actually calls the decorated transform1 witch returns functools.partial(transform, arg1,arg2,...) which is now has a signature like transform(df:pandas.DataFrame).
load_dataframe is just a convenience function to load the initial dataframe so that all other functions can begin operating on it. It just felt more intuitive to users to have it part of the chain rather that a separate call
The problem is this: I need a way for a filter function to be initialized (called) in only on place, such that every function in the call chain that needs access to the filter function, gets it without it being explicitly passed as argument to said function. If you're wondering why this is the case, it's because I feel that end users will find it unintuitive and arbitrary. Some functions need it, some don't. I'm also pretty certain that they will make all kinds of errors like passing different filters, forgetting it sometimes etc.
(Update) I've also tried inspect.signature() in make_pipeline to check if each function accepts a data_filter argument and pass it on. However, this raises an incorrect function signature error so some unclear reason (likely because of the decorators/partial calls). If signature could the return the non-partial function signature, this would solve the issue, but I couldn't find much info in the docs
Turns out it was pretty easy. The solution is inspect.signature.
def make_pipeline(*args, data_filter:Optional[Callable[...,Any]] = None)
d = args[0]
for arg in args[1:]:
if "data_filter" in inspect.signature(arg):
d = arg(d, data_filter = data_filter)
else:
d= arg(d)
Leaving this here mostly for reference because I think this is a mini design pattern. I've also seen an function._closure_ on unrelated subject. That may also work, but will likely be more complicated.
I need to create an algorithm which is executed as a sequence of functions chosen at run-time, a bit like the strategy pattern. I want to avoid creating many single method classes (100+), but use simple functions, and assemble them using a decorator.
actions = []
def act(specific_fn):
actions.append(specific_fn)
return specific_fn
#act
def specific_fn1():
print("specific_fn1")
#act
def specific_fn2():
print("specific_fn2")
def execute_strategy():
[f() for f in actions]
I have a couple of questions:
How can I modify the decorator function act to take the list actions as a parameter, so that it adds the decorated function into the list?
How do I use specific_fnX defined in another file? Currently, I just import the file inside the calling function - but that seems odd. Any other options?
Also, any other ideas on implementing this pattern?
This is a pretty good tutorial on using decorators, both with and without arguments. Others may know of others.
The trick is to remember that a decorator is a function call, but when applied to a function definition, it magically replaces the function with the returned result. If you want #my_decorator(my_list) to be a decorator, then my_decorator(my_list) must either return a function or an object with a __call__ method, and that function is then called on called on specific_fn1.
So yes, you need a function that returns a function that returns a function. Looking at some examples in a tutorial will make this clearer.
As to your second question, you can just call my_decorator(my_list)(specific_fn1) without using a decorator, and then ignorning the result.
I am very new to Python (switching from Matlab) and I am currently working with the SymPy package. I realised that I can calculate the derivate of a function with f.diff(x), even when I have not imported the diff function. So, basically f.diff(x) works but diff(f,x) returns an error.
from sympy import symbols
x = symbols('x')
f = x**2 + 1
f.diff(x)
The reason that I could think of was that diff is actually defined as a method attribute for the class Symbol and thus, f.diff(x) works as long as x is of Symbol type and f has been defined using x. Is there a way to somehow view the Symbol class definition in order to verify that a diff method attribute actually exists?
The reason that I could think of was that diff is actually defined as a method attribute for the class Symbol and thus, f.diff(x) works as long as x is of Symbol type and f has been defined using x.
This is mostly correct (corrections below).
In contrast to Matlab, Python uses namespaces. This means that you only have very basic functions, classes, etc. available by default and everything else needs to be imported into the main namespace or is only available with a “prefix” specifying the namespace. What you gain from this is that you avoid name clashes and it’s easy to trace from which module a function is coming. For instance, in your example, the reader can see that symbols was imported from the sympy module (into the main namespace). This module also has a diff function (not the method) that you could use after importing with from sympy import diff.
In this sense, each object comes along with its own namespace, which is for most practical purposes determined by its class¹.
Functions in this namespace are called methods and (usually) do something on the object itself or using the specifics of the object itself.
Now, for the promised corrections or clarifications:
It is f’s class which is relevant here, not x’s.
You can see the class of f with type(f) and it is Add (residing in sympy.core.add).
This is because it is primarily a sum (of x**2 and 1).
More importantly, Add is a subclass of Expr (expression), which is the parent class for all SymPy expressions.
For example, the class Symbol is also a subclass of Expr.
(You can see this with type(f).mro().)
And this is the important thing here: All SymPy expressions have the diff method.
It is actually not relevant that the argument of f.diff is a Symbol or Expr.
It only needs to be something that SymPy can reasonably interpret as one.
For example f.diff("x") also works, because SymPy can translate the string "x" to a Symbol that is equivalent to your x.
Is there a way to somehow view the Symbol class definition in order to verify that a diff method attribute actually exists?
Yes. The easiest way is the basic Python function dir, which returns a list of all attributes (everything accessible by the . operator) of an object. Typically, most of these are methods. In you case, you can just call dir(f). Note that this lists also contains quite some attributes starting with _, which indicates that they are not designated for user consumption. In any reasonable programming environment (IDE, IPython, Jupyter), this list is also shown to you when you use tab completion (F, ., Tab).
However, while learning about a class by going through all its methods is usually a good approach, for SymPy expressions this is not feasible.
There is a lot of things somebody could want to do with these expressions, but you will only ever use a fraction of them.
Instead, you can either guess the name of the method and thus narrow down your search considerable.
For example, you can guess that the method for differentiation starts with a d (be it for differentiate or derivative), and here the tab completion (F, ., D, Tab) only gives you four results instead of three hundred.
Another approach is that you start searching the documentation (or the Internet in general) with what your operation of interest (here differentiating) instead of your the object of your operation (here, SymPy expressions, i.e., instances of Expr). After all SymPy is all about the latter, so that is kind of a given.
Finally, normally there is a documentation of a class featuring all its methods.
For Expr, this is here.
Unfortunately, in case of Expr the documentation is not exhaustive, e.g., it lacks the diff method.
While this is not ideal, it is somewhat understandable given the amount of methods as well as the duality of methods and functions of SymPy: For most methods of Expr, an analogous function can be directly imported from sympy.
¹ You can also just add stuff there (symbols.wrzlprmft = "foo"), but that’s a pretty advanced and rare usage. Also some classes are made to block this, e.g., you cannot do f.wrzlprmft = "foo".
Suppose I have a module PyFoo.py that has a function bar. I want bar to print all of the local variables associated with the namespace that called it.
For example:
#! /usr/bin/env python
import PyFoo as pf
var1 = 'hi'
print locals()
pf.bar()
The two last lines would give the same output. So far I've tried defining bar as such:
def bar(x=locals):
print x()
def bar(x=locals()):
print x
But neither works. The first ends up being what's local to bar's namespace (which I guess is because that's when it's evaluated), and the second is as if I passed in globals (which I assume is because it's evaluated during import).
Is there a way I can have the default value of argument x of bar be all variables in the namespace which called bar?
EDIT 2018-07-29:
As has been pointed out, what was given was an XY Problem; as such, I'll give the specifics.
The module I'm putting together will allow the user to create various objects that represent different aspects of a numerical problem (e.x. various topology definitions, boundary conditions, constitutive models, ect.) and define how any given object interacts with any other object(s). The idea is for the user to import the module, define the various model entities that they need, and then call a function which will take all objects passed to it, make needed adjustments to ensure capability between them, and then write out a file that represents the entire numerical problem as a text file.
The module has a function generate that accepts each of the various types of aspects of the numerical problem. The default value for all arguments is an empty list. If a non-empty list is passed, then generate will use those instances for generating the completed numerical problem. If an argument is an empty list, then I'd like it to take in all instances in the namespace that called generate (which I will then parse out the appropriate instances for the argument).
EDIT 2018-07-29:
Sorry for any lack of understanding on my part (I'm not that strong of a programmer), but I think I might understand what you're saying with respect to an instance being declared or registered.
From my limited understanding, could this be done by creating some sort of registry dataset (like a list or dict) in the module that will be created when the module is imported, and that all module classes take this registry object in by default. During class initialization self can be appended to said dataset, and then the genereate function will take the registry as a default value for one of the arguments?
There's no way you can do what you want directly.
locals just returns the local variables in whatever namespace it's called in. As you've seen, you have access to the namespace the function is defined in at the time of definition, and you have access to the namespace of the function itself from within the function, but you don't have access to any other namespaces.
You can do what you want indirectly… but it's almost certainly a bad idea. At least this smells like an XY problem, and whatever it is you're actually trying to do, there's probably a better way to do it.
But occasionally it is necessary, so in case you have one of those cases:
The main good reason to want to know the locals of your caller is for some kind of debugging or other introspection function. And the way to do introspection is almost always through the inspect library.
In this case, what you want to inspect is the interpreter call stack. The calling function will be the first frame on the call stack behind your function's own frame.
You can get the raw stack frame:
inspect.currentframe().f_back
… or you can get a FrameInfo representing it:
inspect.stack()[1]
As explained at the top of the inspect docs, a frame object's local namespace is available as:
frame.f_locals
Note that this has all the same caveats that apply to getting your own locals with locals: what you get isn't the live namespace, but a mapping that, even if it is mutable, can't be used to modify the namespace (or, worse in 2.x, one that may or may not modify the namespace, unpredictably), and that has all cell and free variables flattened into their values rather than their cell references.
Also, see the big warning in the docs about not keeping frame objects alive unnecessarily (or calling their clear method if you need to keep a snapshot but not all of the references, but I think that only exists in 3.x).
I have two functions like the following:
def fitnesscompare(x, y):
if x.fitness>y.fitness:
return 1
elif x.fitness==y.fitness:
return 0
else: #x.fitness<y.fitness
return -1
that are used with 'sort' to sort on different attributes of class instances.
These are used from within other functions and methods in the program.
Can I make them visible everywhere rather than having to pass them to each object in which they are used?
Thanks
The best approach (to get the visibility you ask about) is to put this def statement in a module (say fit.py), import fit from any other module that needs access to items defined in this one, and use fit.fitnesscompare in any of those modules as needed.
What you ask, and what you really need, may actually be different...:
as I explained in another post earlier today, custom comparison functions are not the best way to customize sorting in Python (which is why in Python 3 they're not even allowed any more): rather, a custom key-extraction function will serve you much better (future-proof, more general, faster). I.e., instead of calling, say
somelist.sort(cmp=fit.fitnesscompare)
call
somelist.sort(key=fit.fitnessextract)
where
def fitnessextract(x):
return x.fitness
or, for really blazing speed,
import operator
somelist.sort(key=operator.attrgetter('fitness'))
Defining a function with def makes that function available within whatever scope you've defined it in. At module level, using def will make that function available to any other function inside that module.
Can you perhaps post an example of what is not working for you? The code you've posted appears to be unrelated to your actual problem.