What is the 'Pythonic' way to handling functions and using subfunctions in a scenario where they are used in a particular order?
As one of the ideas seem to be that functions should be doing 1 thing, I run into the situation that I find myself splitting up functions while they have a fixed order of execution.
When functions are really a kind of 'do step 1', 'then with outcome of step 1, do step 2' I currently end up wrapping the step functions into another function while defining them on the same level. However, I'm wondering if this is indeed the way I should be doing this.
Example code:
def step_1(data):
# do stuff on data
return a
def step_2(data, a):
# do stuff on data with a
return b
def part_1(data):
a = step_1(data)
b = step_2(data, a)
return a, b
def part_2(data_set_2, a, b):
# do stuff on data_set_2 with a and b as input
return c
I'd be calling this from another file/script (or Jupyter notebook) as part_1 and then part_2
Seems to be working just fine for my purposes right now, but as I said I'm wondering at this (early) stage if I should be using a different approach for this.
I guess you can use a Class here, otherwise your code can be made shorter using the following:
def step_1(data):
# do stuff on data
return step_2(data, a)
def step_2(data, a):
# do stuff on data with a
return a, b
def part_2(data_set_2, a, b):
# do stuff on data_set_2 with a and b as input
return c
As a rule of thumb, if more functions use the same arguments, it is a good idea to group them together into a class. But you can also define a main() or run() function that makes uses of your functions in a sequential fashion. Since the example you have made is not too complex, I would avoid using classes and go for something like:
def step_1(data):
# do stuff on data
return step_2(data, a)
def step_2(data, a):
# do stuff on data with a
return a, b
def part_2(data_set_2, a, b):
# do stuff on data_set_2 with a and b as input
return c
def run(data, data_set_2, a, b):
step_1(data)
step_2(data, a)
part_2(data_set_2, a, b)
run(data, data_set_2, a, b)
If the code grows in complexity, using classes is advised. In the end, it's your choice.
Related
I have a function that process some quite nested data, using nested loops. Its simplified structure is something like this:
def process_elements(root):
for a in root.elements:
if a.some_condition:
continue
for b in a.elements:
if b.some_condition:
continue
for c in b.elements:
if c.some_condition:
continue
for d in c.elements:
if d.some_condition:
do_something_using_all(a, b, c, d)
This does not look very pythonic to me, so I want to refactor it. My idea was to break it in multiple functions, like:
def process_elements(root):
for a in root.elements:
if a.some_condition:
continue
process_a_elements(a)
def process_a_elements(a):
for b in a.elements:
if b.some_condition:
continue
process_b_elements(b)
def process_b_elements(b):
for c in b.elements:
if c.some_condition:
continue
process_c_elements(c)
def proccess_c_elements(c):
for d in c.elements:
if d.some_condition:
do_something_using_all(a, b, c, d) # Problem: I do not have a nor b!
As you can see, for the more nested level, I need to do something using all its "parent" elements. The functions would have unique scopes, so I couldn't access those elements. Passing all the previous elements to each function (like proccess_c_elements(c, a, b)) does look ugly and not very pythonic to me either...
Any ideas?
I don't know the exact data structures and the complexity of your code but you may try to use a list to pass the object reference to the next daisy chained function something like the following:
def process_elements(root):
for a in root.elements:
if a.some_condition:
continue
listobjects=[]
listobjects.append(a)
process_a_elements(a,listobjects)
def process_a_elements(a,listobjects):
for b in a.elements:
if b.some_condition:
continue
listobjects.append(b)
process_b_elements(b,listobjects)
def process_b_elements(b,listobjects):
for c in b.elements:
if c.some_condition:
continue
listobjects.append(c)
process_c_elements(c,listobjects)
def process_c_elements(c,listobjects):
for d in c.elements:
if d.some_condition:
listobjects.append(d)
do_something_using_all(listobjects)
def do_something_using_all(listobjects):
print(listobjects)
FWIW, I've found a solution, which is to encapsulate all the proccessing inside a class, and having attributes to track the currently processed elements:
class ElementsProcessor:
def __init__(self, root):
self.root = root
self.currently_processed_a = None
self.currently_processed_b = None
def process_elements(self):
for a in self.root.elements:
if a.some_condition:
continue
self.process_a_elements(a)
def process_a_elements(self, a):
self.currently_processed_a = a
for b in a.elements:
if b.some_condition:
continue
self.process_b_elements(b)
def process_b_elements(self, b):
self.currently_processed_b = b
for c in b.elements:
if c.some_condition:
continue
self.process_c_elements(c)
def process_c_elements(self, c):
for d in c.elements:
if d.some_condition:
do_something_using_all(
self.currently_processed_a,
self.currently_processed_b,
c,
d
)
I have a large number of blending functions:
mix(a, b)
add(a, b)
sub(a, b)
xor(a, b)
...
These functions all take the same inputs and provide different outputs, all of the same type.
However, I do not know which function must be run until runtime.
How would I go about implementing this behavior?
Example code:
def add(a, b):
return a + b
def mix(a, b):
return a * b
# Required blend -> decided by other code
blend_name = "add"
a = input("Some input")
b = input("Some other input")
result = run(add, a, b) # I need a run function
I have looked online, but most searches lead to either running functions from the console, or how to define a function.
I'm not really big fan of using dictionary in this case so here is my approach using getattr. although technically its almost the same thing and principle is also almost the same, code looks cleaner for me at least
class operators():
def add(self, a, b):
return (a + b)
def mix(self, a, b):
return(a * b)
# Required blend -> decided by other code
blend_name = "add"
a = input("Some input")
b = input("Some other input")
method = getattr(operators, blend_name)
result = method(operators, a, b)
print(result) #prints 12 for input 1 and 2 for obvious reasons
EDIT
this is edited code without getattr and it looks way cleaner. so you can make this class the module and import as needed, also adding new operators are easy peasy, without caring to add an operator in two places (in the case of using dictionary to store functions as a key/value)
class operators():
def add(self, a, b):
return (a + b)
def mix(self, a, b):
return(a * b)
def calculate(self, blend_name, a, b):
return(operators.__dict__[blend_name](self, a, b))
# Required blend -> decided by other code
oper = operators()
blend_name = "add"
a = input("Some input")
b = input("Some other input")
result = oper.calculate(blend_name, a, b)
print(result)
You can create a dictionary that maps the function names to their function objects and use that to call them. For example:
functions = {"add": add, "sub": sub} # and so on
func = functions[blend_name]
result = func(a, b)
Or, a little more compact, but perhaps less readable:
result = functions[blend_name](a, b)
You could use the globals() dictionary for the module.
result = globals()[blend_name](a, b)
It would be prudent to add some validation for the values of blend_name
I have an existing pytest test that makes use of some predefined lists to test the cross-product of them all:
A_ITEMS = [1, 2, 3]
B_ITEMS = [4, 5, 6]
C_ITEMS = [7, 8, 9]
I also have an expensive fixture that has internal conditions dependent on A and B items (but not C), called F:
class Expensive:
def __init__(self):
# expensive set up
time.sleep(10)
def check(self, a, b, c):
return True # keep it simple, but in reality this depends on a, b and c
#pytest.fixture
def F():
return Expensive()
Currently I have a naive approach that simply parametrizes a test function:
#pytest.mark.parametrize("A", A_ITEMS)
#pytest.mark.parametrize("B", B_ITEMS)
#pytest.mark.parametrize("C", C_ITEMS)
def test_each(F, A, B, C):
assert F.check(A, B, C)
This tests all combinations of F with A, B and C items, however it constructs a new Expensive instance via the F fixture for every test. More specifically, it reconstructs a new Expensive via fixture F for every combination of A, B and C.
This is very inefficient, because I should only need to construct a new Expensive when the values of A and B change, which they don't between all tests of C.
What I would like to do is somehow combine the F fixture with the A_ITEMS and B_ITEMS lists, so that the F fixture only instantiates a new instance once for each run through the values of C.
My first approach involves separating the A and B lists into their own fixtures and combining them with the F fixture:
class Expensive:
def __init__(self, A, B):
# expensive set up
self.A = A
self.B = B
time.sleep(10)
def check(self, c):
return True # keep it simple
#pytest.fixture(params=[1,2,3])
def A(request):
return request.param
#pytest.fixture(params=[4,5,6])
def B(request):
return request.param
#pytest.fixture
def F(A, B):
return Expensive(a, b)
#pytest.mark.parametrize("C", C_ITEMS)
def test_each2(F, C):
assert F.check(C)
Although this tests all combinations, unfortunately this creates a new instance of Expensive for each test, rather than combining each A and B item into a single instance that can be reused for each value of C.
I've looked into indirect fixtures, but I can't see a way to send multiple lists (i.e. both the A and B items) to a single fixture.
Is there a better approach I can take with pytest? Essentially what I'm looking to do is minimise the number of times Expensive is instantiated, given that it's dependent on values of item A and B.
Note: I've tried to simplify this, however the real-life situation is that F represents creation of a new process, A and B are command-line parameters for this process, and C is simply a value passed to the process via a socket. Therefore I want to be able to send each value of C to this process without recreating it every time C changes, but obviously if A or B change, I need to restart it (as they are command-line parameters to the process).
I've had some success using a more broadly scoped fixture (module or session) as "cache" for the per-test fixture for this sort of situation where the "lifetimes" of the fixtures proper don't align cleanly with amortised costs or whatever.
If using pytest scoping (as proposed in the other answer) is not an option, you may try to cache the expansive object, so that it will be constructed only if needed.
Basically, this expands the proposal given in the question with an additional static caching of the last used parameters to avoid creating a new Expansive if not needed:
#pytest.fixture(params=A_ITEMS)
def A(request):
return request.param
#pytest.fixture(params=B_ITEMS)
def B(request):
return request.param
class FFactory:
lastAB = None
lastF = None
#classmethod
def F(cls, A, B):
if (A, B) != cls.lastAB:
cls.lastAB = (A, B)
cls.lastF = Expensive(A, B)
return cls.lastF
#pytest.fixture
def F(A, B):
return FFactory.F(A, B)
#pytest.mark.parametrize("C", C_ITEMS)
def test_each(F, C):
assert F.check(C)
I'm learning this language hence I'm new with Python. The code is:
def add(a, b):
return a + b
def double_add(x, a, b):
return x(x(a, b), x(a, b))
a = 4
b = 5
print(double_add(add, a, b))
The add function is simple, it adds two numbers. The double_add function has three arguments. I understand what is happening (With some doubts). The result is 18. I can't understand how double_add uses add to function.
The question is, what is the connection between these two functions?
It would be helpful if tell me some examples of using a function as an argument of another function.
Thanks in advance.
In python language, functions (and methods) are first class objects. First Class objects are those objects, which can be handled uniformly.
So, you just pass a method as an argument.
Your method will return add(add(4, 5), add(4, 5)) which is add(9, 9) and it's equals to 18.
A function is an object just like any other in Python. So you can pass it as argument, assign attributes to it, and well maybe most importantely - call it. We can look at a simpler example to understand how passing a function works:
def add(a, b):
return a + b
def sub(a, b):
return a - b
def operate(func, a, b):
return func(a, b)
a = 4
b = 5
print(operate(add, a, b))
print(operate(sub, a, b))
operate(print, a, b)
And this prints out:
9
-1
4 5
That is because in each case, func is assigned with the respective function object passed as an argument, and then by doing func(a, b) it actually calls that function on the given arguments.
So what happens with your line:
return x(x(a, b), x(a, b))
is first both x(a, b) are evaluated as add(4, 5) which gives 9. And then the outer x(...) is evaluated as add(9, 9) which gives 18.
If you would add print(x) in the double_add function you would see that it would print <function add at 0x10dd12290>.
Therefore, the code of double_add is basically the same as if you would do following:
print(add(add(a,b), add(a,b))) # returns 18 in your case
Functions are objects in Python, just like anything else such as lists, strings.. and you can pass them same way you do with variables.
The function object add is passed as an argument to double_add, where it is locally referred to as x. x is then called on each, and then on the two return values from that.
def double_add(x, a, b):
return x(x(a, b), x(a, b))
Let's write it differently so it's easier to explain:
def double_add(x, a, b):
result1 = x(a, b)
result2 = x(a, b)
return x(result1, result2)
This means, take the function x, and apply it to the parameters a and b. x could be whatever function here.
print(double_add(add, a, b))
Then this means: call the double_add function, giving itaddas the first parameter. Sodouble_add`, would do:
result1 = add(a, b)
result2 = add(a, b)
return add(result1, result2)
This is a very simple example of what is called "dependency injection". What it means is that you are not explicitly defining an interaction between the two functions, instead you are defining that double_add should use some function, but it only knows what it is when the code is actually run. (At runtime you are injecting the depedency on a specific function, instead of hardcoding it in the function itself),
Try for example the following
def add(a, b):
return a + b
def subtract(a, b):
return a - b
def double_add(x, a, b):
return x(x(a, b), x(a, b))
a = 4
b = 5
print(double_add(add, a, b))
print(double_add(subtract, a, b))
In other words, double_add has become a generic function that will execute whatever you give it twice and print the result
I understand from this answer why the warning exists. However, why would the default value of it be 2?
It seems to me that classes with a single public method aside from __init__ are perfectly normal! Is there any caveat to just setting
min-public-methods=1
in the pylintrc file?
The number 2 is completely arbitrary. If min-public-methods=1 is a more fitting policy for your project and better matches your code esthetic opinions, then by all means go for it. As was once said, "Pylint doesn't know what's best".
For another perspective, Jack Diederich gave a talk at PyCon 2012 called "Stop Writing Classes".
One of his examples is the class with a single method, which he suggests should be just a function. If the idea is to set up an object containing a load of data and a single method that can be called later (perhaps many times) to act on that data, then you can still do that with a regular function by making an inner function the return value.
Something like:
def complicated(a, b, c, d, e):
def inner(k):
return (a*k, b*k, c*k, d*k, e*k)
return inner
foo = complicated(1, 2, 3, 4, 5)
result = foo(100)
This does seem much simpler to me than:
class Complicated:
def __init__(self, a, b, c, d, e):
self.a = a
self.b = b
self.c = c
self.d = d
self.e = e
def calc(self, k)
return (self.a*k, self.b*k, self.c*k, self.d*k, self.e*k)
foo = Complicated(1, 2, 3, 4, 5)
result = Complicated.calc(100)
The main limitation of the function based approach is that you cannot read back the values of a, b, c, d, and e in the example.