How to check if a function is pure in Python? - python

A pure function is a function similar to a Mathematical function, where there is no interaction with the "Real world" nor side-effects. From a more practical point of view, it means that a pure function can not:
Print or otherwise show a message
Be random
Depend on system time
Change global variables
And others
All this limitations make it easier to reason about pure functions than non-pure ones. The majority of the functions should then be pure so that the program can have less bugs.
In languages with a huge type-system like Haskell the reader can know right from the start if a function is or is not pure, making the successive reading easier.
In Python this information may be emulated by a #pure decorator put on top of the function. I would also like that decorator to actually do some validation work. My problem lies in the implementation of such a decorator.
Right now I simply look the source code of the function for buzzwords such as global or random or print and complains if it finds one of them.
import inspect
def pure(function):
source = inspect.getsource(function)
for non_pure_indicator in ('random', 'time', 'input', 'print', 'global'):
if non_pure_indicator in source:
raise ValueError("The function {} is not pure as it uses `{}`".format(
function.__name__, non_pure_indicator))
return function
However it feels like a weird hack, that may or may not work depending on your luck, could you please help me in writing a better decorator?

I kind of see where you are coming from but I don't think this can work. Let's take a simple example:
def add(a,b):
return a + b
So this probably looks "pure" to you. But in Python the + here is an arbitrary function which can do anything, just depending on the bindings in force when it is called. So that a + b can have arbitrary side effects.
But it's even worse than that. Even if this is just doing standard integer + then there's more 'impure' stuff going on.
The + is creating a new object. Now if you are sure that only the caller has a reference to that new object then there is a sense in which you can think of this as a pure function. But you can't be sure that, during the creation process of that object, no reference to it leaked out.
For example:
class RegisteredNumber(int):
numbers = []
def __new__(cls,*args,**kwargs):
self = int.__new__(cls,*args,**kwargs)
self.numbers.append(self)
return self
def __add__(self,other):
return RegisteredNumber(super().__add__(other))
c = RegisteredNumber(1) + 2
print(RegisteredNumber.numbers)
This will show that the supposedly pure add function has actually changed the state of the RegisteredNumber class. This is not a stupidly contrived example: in my production code base we have classes which track each created instance, for example, to allow access via key.
The notion of purity just doesn't make much sense in Python.

(not an answer, but too long for a comment)
So if a function can return different values for the same set of arguments, it is not pure?
Remember that functions in Python are objects, so you want to check the purity of an object...
Take this example:
def foo(x):
ret, foo.x = x*x+foo.x, foo.x+1
return ret
foo.x=0
calling foo(3) repeatedly gives:
>>> foo(3)
9
>>> foo(3)
10
>>> foo(3)
11
...
Moreover, reading globals does not require to use the global statement, or the global() builtin inside your function. Global variables might change somewhere else, affecting the purity of your function.
All the above situation might be difficult to detect at runtime.

Related

What is the purpose of decorators (why use them)?

I've been learning and experimenting with decorators. I understand what they do: they allow you to write modular code by allowing you to add functionality to existing functions without changing them.
I found a great thread which really helped me learn how to do it by explaining all the ins and outs here: How to make a chain of function decorators?
But what is missing from this thread (and from other resources I've looked at) is WHY do we need the decorator syntax? Why do I need nested functions to create a decorator? Why can't I just take an existing function, write another one with some extra functionality, and feed the first into the 2nd to make it do something else?
Is it just because this is the convention? What am I missing? I assume my inexperience is showing here.
I will try to keep it simple and explain it with examples.
One of the things that I often do is measure the time taken by API's that I build and then publish them on AWS.
Since it is a very common use case I have created a decorator for it.
def log_latency():
def actual_decorator(f):
#wraps(f)
def wrapped_f(*args, **kwargs):
t0 = time()
r = f(*args, **kwargs)
t1 = time()
async_daemon_execute(public_metric, t1 - t0, log_key)
return r
return wrapped_f
return actual_decorator
Now if there is any method that I want to measure the latency, I just put annotate it with the required decorator.
#log_latency()
def batch_job_api(param):
pass
Suppose you want to write a secure API which only works if you send a header with a particular value then you can use a decorator for it.
def secure(f):
#wraps(f)
def wrapper(*args, **kw):
try:
token = request.headers.get("My_Secret_Token")
if not token or token != "My_Secret_Text":
raise AccessDenied("Required headers missing")
return f(*args, **kw)
return wrapper
Now just write
#secure
def my_secure_api():
pass
I have also been using the above syntax for API specific exceptions, and if a method needs a database interaction instead of acquiring a connection I use #session decorator which tells that this method will use a database connection and you don't need to handle one yourself.
I could have obviously avoided it, by writing a function that checks for the header or prints time taken by API on AWS, but that just looks a bit ugly and nonintuitive.
There is no convention for this, at least I am not aware of them but it definitely makes the code more readable and easier to manage.
Most of the IDE's also have different color syntax for annotation which makes it easier to understand and organize code.
So, I would it may just be because of inexperience; if you start using them you will automatically start knowing where to use them.
From PEP 318 -- Decorators for Functions and Methods (with my own added emphasis):
Motivation
The current method of applying a transformation to a
function or method places the actual transformation after the function
body. For large functions this separates a key component of the
function's behavior from the definition of the rest of the function's
external interface. For example:
def foo(self):
perform method operation
foo = classmethod(foo)
This becomes less readable with longer methods. It also seems less than pythonic to name
the function three times for what is conceptually a single
declaration. A solution to this problem is to move the transformation
of the method closer to the method's own declaration. The intent of
the new syntax is to replace
def foo(cls):
pass
foo = synchronized(lock)(foo)
foo = classmethod(foo)
with an alternative that places the decoration in the function's declaration:
#classmethod
#synchronized(lock)
def foo(cls):
pass
I came across a neatly-written article which explains decorators in Python, why we should leverage them, how to use them and more.
Some of the benefits of using decorators in Python include:
Allows us to extend functions without actually modifying them.
Helps us stick with the Open-Closed Principle.
It helps reduce code duplication -- we can factorize out repeated code into a decorator.
Decorators help keep a project organized and easy to maintain.
Reference: https://tomisin.dev/blog/understanding-decorators-in-python
A great example of why decorators are useful is the numba package.
It provides a decorator to speed up Python functions:
#jit
def my_slow_function():
# ...
The #jit decorator does some very amazing and complex operations - it compiles the whole function (well, parts of it) to machine code. If Python did not have decorators you would have to write the function in machine code yourself... or more seriously, the syntax would look like this:
def my_slow_function():
# ...
my_slow_function = jit(my_slow_function)
The purpose of the decorator syntax is to make the second example nicer. It's just syntactic sugar so you don't have to type the function name three times.

Why was the mutable default argument's behavior never changed? [duplicate]

This question already has answers here:
"Least Astonishment" and the Mutable Default Argument
(33 answers)
Closed 6 months ago.
I had a very difficult time with understanding the root cause of a problem in an algorithm. Then, by simplifying the functions step by step I found out that evaluation of default arguments in Python doesn't behave as I expected.
The code is as follows:
class Node(object):
def __init__(self, children = []):
self.children = children
The problem is that every instance of Node class shares the same children attribute, if the attribute is not given explicitly, such as:
>>> n0 = Node()
>>> n1 = Node()
>>> id(n1.children)
Out[0]: 25000176
>>> id(n0.children)
Out[0]: 25000176
I don't understand the logic of this design decision? Why did Python designers decide that default arguments are to be evaluated at definition time? This seems very counter-intuitive to me.
The alternative would be quite heavyweight -- storing "default argument values" in the function object as "thunks" of code to be executed over and over again every time the function is called without a specified value for that argument -- and would make it much harder to get early binding (binding at def time), which is often what you want. For example, in Python as it exists:
def ack(m, n, _memo={}):
key = m, n
if key not in _memo:
if m==0: v = n + 1
elif n==0: v = ack(m-1, 1)
else: v = ack(m-1, ack(m, n-1))
_memo[key] = v
return _memo[key]
...writing a memoized function like the above is quite an elementary task. Similarly:
for i in range(len(buttons)):
buttons[i].onclick(lambda i=i: say('button %s', i))
...the simple i=i, relying on the early-binding (definition time) of default arg values, is a trivially simple way to get early binding. So, the current rule is simple, straightforward, and lets you do all you want in a way that's extremely easy to explain and understand: if you want late binding of an expression's value, evaluate that expression in the function body; if you want early binding, evaluate it as the default value of an arg.
The alternative, forcing late binding for both situation, would not offer this flexibility, and would force you to go through hoops (such as wrapping your function into a closure factory) every time you needed early binding, as in the above examples -- yet more heavy-weight boilerplate forced on the programmer by this hypothetical design decision (beyond the "invisible" ones of generating and repeatedly evaluating thunks all over the place).
In other words, "There should be one, and preferably only one, obvious way to do it [1]": when you want late binding, there's already a perfectly obvious way to achieve it (since all of the function's code is only executed at call time, obviously everything evaluated there is late-bound); having default-arg evaluation produce early binding gives you an obvious way to achieve early binding as well (a plus!-) rather than giving TWO obvious ways to get late binding and no obvious way to get early binding (a minus!-).
[1]: "Although that way may not be obvious at first unless you're Dutch."
The issue is this.
It's too expensive to evaluate a function as an initializer every time the function is called.
0 is a simple literal. Evaluate it once, use it forever.
int is a function (like list) that would have to be evaluated each time it's required as an initializer.
The construct [] is literal, like 0, that means "this exact object".
The problem is that some people hope that it to means list as in "evaluate this function for me, please, to get the object that is the initializer".
It would be a crushing burden to add the necessary if statement to do this evaluation all the time. It's better to take all arguments as literals and not do any additional function evaluation as part of trying to do a function evaluation.
Also, more fundamentally, it's technically impossible to implement argument defaults as function evaluations.
Consider, for a moment the recursive horror of this kind of circularity. Let's say that instead of default values being literals, we allow them to be functions which are evaluated each time a parameter's default values are required.
[This would parallel the way collections.defaultdict works.]
def aFunc( a=another_func ):
return a*2
def another_func( b=aFunc ):
return b*3
What is the value of another_func()? To get the default for b, it must evaluate aFunc, which requires an eval of another_func. Oops.
Of course in your situation it is difficult to understand. But you must see, that evaluating default args every time would lay a heavy runtime burden on the system.
Also you should know, that in case of container types this problem may occur -- but you could circumvent it by making the thing explicit:
def __init__(self, children = None):
if children is None:
children = []
self.children = children
The workaround for this, discussed here (and very solid), is:
class Node(object):
def __init__(self, children = None):
self.children = [] if children is None else children
As for why look for an answer from von Löwis, but it's likely because the function definition makes a code object due to the architecture of Python, and there might not be a facility for working with reference types like this in default arguments.
I thought this was counterintuitive too, until I learned how Python implements default arguments.
A function's an object. At load time, Python creates the function object, evaluates the defaults in the def statement, puts them into a tuple, and adds that tuple as an attribute of the function named func_defaults. Then, when a function is called, if the call doesn't provide a value, Python grabs the default value out of func_defaults.
For instance:
>>> class C():
pass
>>> def f(x=C()):
pass
>>> f.func_defaults
(<__main__.C instance at 0x0298D4B8>,)
So all calls to f that don't provide an argument will use the same instance of C, because that's the default value.
As far as why Python does it this way: well, that tuple could contain functions that would get called every time a default argument value was needed. Apart from the immediately obvious problem of performance, you start getting into a universe of special cases, like storing literal values instead of functions for non-mutable types to avoid unnecessary function calls. And of course there are performance implications galore.
The actual behavior is really simple. And there's a trivial workaround, in the case where you want a default value to be produced by a function call at runtime:
def f(x = None):
if x == None:
x = g()
This comes from python's emphasis on syntax and execution simplicity. a def statement occurs at a certain point during execution. When the python interpreter reaches that point, it evaluates the code in that line, and then creates a code object from the body of the function, which will be run later, when you call the function.
It's a simple split between function declaration and function body. The declaration is executed when it is reached in the code. The body is executed at call time. Note that the declaration is executed every time it is reached, so you can create multiple functions by looping.
funcs = []
for x in xrange(5):
def foo(x=x, lst=[]):
lst.append(x)
return lst
funcs.append(foo)
for func in funcs:
print "1: ", func()
print "2: ", func()
Five separate functions have been created, with a separate list created each time the function declaration was executed. On each loop through funcs, the same function is executed twice on each pass through, using the same list each time. This gives the results:
1: [0]
2: [0, 0]
1: [1]
2: [1, 1]
1: [2]
2: [2, 2]
1: [3]
2: [3, 3]
1: [4]
2: [4, 4]
Others have given you the workaround, of using param=None, and assigning a list in the body if the value is None, which is fully idiomatic python. It's a little ugly, but the simplicity is powerful, and the workaround is not too painful.
Edited to add: For more discussion on this, see effbot's article here: http://effbot.org/zone/default-values.htm, and the language reference, here: http://docs.python.org/reference/compound_stmts.html#function
I'll provide a dissenting opinion, by addessing the main arguments in the other posts.
Evaluating default arguments when the function is executed would be bad for performance.
I find this hard to believe. If default argument assignments like foo='some_string' really add an unacceptable amount of overhead, I'm sure it would be possible to identify assignments to immutable literals and precompute them.
If you want a default assignment with a mutable object like foo = [], just use foo = None, followed by foo = foo or [] in the function body.
While this may be unproblematic in individual instances, as a design pattern it's not very elegant. It adds boilerplate code and obscures default argument values. Patterns like foo = foo or ... don't work if foo can be an object like a numpy array with undefined truth value. And in situations where None is a meaningful argument value that may be passed intentionally, it can't be used as a sentinel and this workaround becomes really ugly.
The current behaviour is useful for mutable default objects that should be shared accross function calls.
I would be happy to see evidence to the contrary, but in my experience this use case is much less frequent than mutable objects that should be created anew every time the function is called. To me it also seems like a more advanced use case, whereas accidental default assignments with empty containers are a common gotcha for new Python programmers. Therefore, the principle of least astonishment suggests default argument values should be evaluated when the function is executed.
In addition, it seems to me that there exists an easy workaround for mutable objects that should be shared across function calls: initialise them outside the function.
So I would argue that this was a bad design decision. My guess is that it was chosen because its implementation is actually simpler and because it has a valid (albeit limited) use case. Unfortunately, I don't think this will ever change, since the core Python developers want to avoid a repeat of the amount of backwards incompatibility that Python 3 introduced.
Python function definitions are just code, like all the other code; they're not "magical" in the way that some languages are. For example, in Java you could refer "now" to something defined "later":
public static void foo() { bar(); }
public static void main(String[] args) { foo(); }
public static void bar() {}
but in Python
def foo(): bar()
foo() # boom! "bar" has no binding yet
def bar(): pass
foo() # ok
So, the default argument is evaluated at the moment that that line of code is evaluated!
Because if they had, then someone would post a question asking why it wasn't the other way around :-p
Suppose now that they had. How would you implement the current behaviour if needed? It's easy to create new objects inside a function, but you cannot "uncreate" them (you can delete them, but it's not the same).

Assign a numeric value that is returned from a function by using exec command

I need to assign a numeric value, which is returned from a function, to a variable name by using exec() command:
def func1(x,y):
return x+y
def main(x,y,n):
x=3; y=5; n=1
t = exec("func%s(%s,%s)" % (n,x,y))
return t**2
main(3,5,1)
I have many functions like "func1", "func2" and so on... I try to return t = func1(x,y) with theexec("func%s(%s,%s)" % (n,x,y)) statement. However, I cannot assign a value, returned fromexec().
There is a partly similar question on SE, but it is written for Python3 and is also not applicable to my case.
How can we resolve this problem or is there a more elegant way to perform such an operation, maybe without the use of'exec()'?
By the way, as "func%s(%s,%s)" % (n,x,y) is a statement, I used exec. Or should I better use eval?
It is almost always a really bad idea to get at functions and variables using their names as strings (or bits of their names as integers, as in the example code in the question). One reason why is that eval or exec can do literally anything and you generally want to avoid using code constructs whose behaviour is so hard to predict and reason about.
So here are two ways to get similar results with less pain.
First: Instead of passing around magic index numbers, like the 1 in the code above, pass the actual function which is, itself, a perfectly reasonable value for a variable in Python.
def main(x,y,f):
return f(x,y)**2
main(3, 5, func1)
(Incidentally, the definition of main in the question throws away the values of x,y,n that are passed in to it. That's probably a mistake.)
Second: Instead of making these mere functions, make them methods on classes, and pass around not the functions but either the classes themselves or instances of the classes.
class Solver:
def solve(self, x,y): return "eeeek, not implemented"
class Solver1:
def solve(self, x,y): return x+y
def main(x, y, obj):
return obj.solve(x,y)**2
main(3, 5, Solver1())
Which of these is the better approach depends on the details of your application. For instance, if you actually have multiple "parallel" sets of functions -- as well as your func1, func2, etc., there are also otherfunc1, otherfunc2 etc. -- then this is crying out for an object-oriented solution (the second approach above).
In general I would argue for the second approach; it tends to lead to cleaner code as your requirements grow.

Should I use a class in this: Reading a XML file using lxml

This question is in continuation to my previous question, in which I asked about passing around an ElementTree.
I need to read the XML files only and to solve this, I decided to create a global ElementTree and then parse it wherever required.
My question is:
Is this an acceptable practice? I heard global variables are bad. If I don't make it global, I was suggested to make a class. But do I really need to create a class? What benefits would I have from that approach. Note that I would be handling only one ElementTree instance per run, the operations are read-only. If I don't use a class, how and where do I declare that ElementTree so that it available globally? (Note that I would be importing this module)
Please answer this question in the respect that I am a beginner to development, and at this stage I can't figure out whether to use a class or just go with the functional style programming approach.
There are a few reasons that global variables are bad. First, it gets you in the habit of declaring global variables which is not good practice, though in some cases globals make sense -- PI, for instance. Globals also create problems when you on purpose or accidentally re-use the name locally. Or worse, when you think you're using the name locally but in reality you're assigning a new value to the global variable. This particular problem is language dependent, and python handles it differently in different cases.
class A:
def __init__(self):
self.name = 'hi'
x = 3
a = A()
def foo():
a.name = 'Bedevere'
x = 9
foo()
print x, a.name #outputs 3 Bedevere
The benefit of creating a class and passing your class around is you will get a defined, constant behavior, especially since you should be calling class methods, which operate on the class itself.
class Knights:
def __init__(self, name='Bedevere'):
self.name = name
def knight(self):
self.name = 'Sir ' + self.name
def speak(self):
print self.name + ":", "Run away!"
class FerociousRabbit:
def __init__(self):
self.death = "awaits you with sharp pointy teeth!"
def speak(self):
print "Squeeeeeeee!"
def cave(thing):
thing.speak()
if isinstance(thing, Knights):
thing.knight()
def scene():
k = Knights()
k2 = Knights('Launcelot')
b = FerociousRabbit()
for i in (b, k, k2):
cave(i)
This example illustrates a few good principles. First, the strength of python when calling functions - FerociousRabbit and Knights are two different classes but they have the same function speak(). In other languages, in order to do something like this, they would at least have to have the same base class. The reason you would want to do this is it allows you to write a function (cave) that can operate on any class that has a 'speak()' method. You could create any other method and pass it to the cave function:
class Tim:
def speak(self):
print "Death awaits you with sharp pointy teeth!"
So in your case, when dealing with an elementTree, say sometime down the road you need to also start parsing an apache log. Well if you're doing purely functional program you're basically hosed. You can modify and extend your current program, but if you wrote your functions well, you could just add a new class to the mix and (technically) everything will be peachy keen.
Pragmatically, is your code expected to grow? Even though people herald OOP as the right way, I found that sometimes it's better to weigh cost:benefit(s) whenever you refactor a piece of code. If you are looking to grow this, then OOP is a better option in that you can extend and customise any future use case, while saving yourself from unnecessary time wasted in code maintenance. Otherwise, if it ain't broken, don't fix it, IMHO.
I generally find myself regretting it when I give in to the temptation to give a module, for example, a load_file() method that sets a global that the module's other functions can then use to find the file they're supposed to be talking about. It makes testing far more difficult, for example, and as soon as I need two XML files there is a problem. Plus, every single function needs to check whether the file's there and give an error if it's not.
If I want to be functional, I simply therefore have every function take the XML file as an argument.
If I want to be object oriented, I'll have a MyXMLFile class whose methods can just look at self.xmlfile or whatever.
The two approaches are more or less equivalent when there's just one single thing, like a file, to be passed around; but when the number of things in the "state" becomes larger than a few, then I find classes simpler because I can stick all of those things in the class.
(Am I answering your question? I'm still a big vague on what kind of answer you want.)

Python: how to make a function visible throughout a program

I have two functions like the following:
def fitnesscompare(x, y):
if x.fitness>y.fitness:
return 1
elif x.fitness==y.fitness:
return 0
else: #x.fitness<y.fitness
return -1
that are used with 'sort' to sort on different attributes of class instances.
These are used from within other functions and methods in the program.
Can I make them visible everywhere rather than having to pass them to each object in which they are used?
Thanks
The best approach (to get the visibility you ask about) is to put this def statement in a module (say fit.py), import fit from any other module that needs access to items defined in this one, and use fit.fitnesscompare in any of those modules as needed.
What you ask, and what you really need, may actually be different...:
as I explained in another post earlier today, custom comparison functions are not the best way to customize sorting in Python (which is why in Python 3 they're not even allowed any more): rather, a custom key-extraction function will serve you much better (future-proof, more general, faster). I.e., instead of calling, say
somelist.sort(cmp=fit.fitnesscompare)
call
somelist.sort(key=fit.fitnessextract)
where
def fitnessextract(x):
return x.fitness
or, for really blazing speed,
import operator
somelist.sort(key=operator.attrgetter('fitness'))
Defining a function with def makes that function available within whatever scope you've defined it in. At module level, using def will make that function available to any other function inside that module.
Can you perhaps post an example of what is not working for you? The code you've posted appears to be unrelated to your actual problem.

Categories