I have a function that process some quite nested data, using nested loops. Its simplified structure is something like this:
def process_elements(root):
for a in root.elements:
if a.some_condition:
continue
for b in a.elements:
if b.some_condition:
continue
for c in b.elements:
if c.some_condition:
continue
for d in c.elements:
if d.some_condition:
do_something_using_all(a, b, c, d)
This does not look very pythonic to me, so I want to refactor it. My idea was to break it in multiple functions, like:
def process_elements(root):
for a in root.elements:
if a.some_condition:
continue
process_a_elements(a)
def process_a_elements(a):
for b in a.elements:
if b.some_condition:
continue
process_b_elements(b)
def process_b_elements(b):
for c in b.elements:
if c.some_condition:
continue
process_c_elements(c)
def proccess_c_elements(c):
for d in c.elements:
if d.some_condition:
do_something_using_all(a, b, c, d) # Problem: I do not have a nor b!
As you can see, for the more nested level, I need to do something using all its "parent" elements. The functions would have unique scopes, so I couldn't access those elements. Passing all the previous elements to each function (like proccess_c_elements(c, a, b)) does look ugly and not very pythonic to me either...
Any ideas?
I don't know the exact data structures and the complexity of your code but you may try to use a list to pass the object reference to the next daisy chained function something like the following:
def process_elements(root):
for a in root.elements:
if a.some_condition:
continue
listobjects=[]
listobjects.append(a)
process_a_elements(a,listobjects)
def process_a_elements(a,listobjects):
for b in a.elements:
if b.some_condition:
continue
listobjects.append(b)
process_b_elements(b,listobjects)
def process_b_elements(b,listobjects):
for c in b.elements:
if c.some_condition:
continue
listobjects.append(c)
process_c_elements(c,listobjects)
def process_c_elements(c,listobjects):
for d in c.elements:
if d.some_condition:
listobjects.append(d)
do_something_using_all(listobjects)
def do_something_using_all(listobjects):
print(listobjects)
FWIW, I've found a solution, which is to encapsulate all the proccessing inside a class, and having attributes to track the currently processed elements:
class ElementsProcessor:
def __init__(self, root):
self.root = root
self.currently_processed_a = None
self.currently_processed_b = None
def process_elements(self):
for a in self.root.elements:
if a.some_condition:
continue
self.process_a_elements(a)
def process_a_elements(self, a):
self.currently_processed_a = a
for b in a.elements:
if b.some_condition:
continue
self.process_b_elements(b)
def process_b_elements(self, b):
self.currently_processed_b = b
for c in b.elements:
if c.some_condition:
continue
self.process_c_elements(c)
def process_c_elements(self, c):
for d in c.elements:
if d.some_condition:
do_something_using_all(
self.currently_processed_a,
self.currently_processed_b,
c,
d
)
Related
I have a large number of blending functions:
mix(a, b)
add(a, b)
sub(a, b)
xor(a, b)
...
These functions all take the same inputs and provide different outputs, all of the same type.
However, I do not know which function must be run until runtime.
How would I go about implementing this behavior?
Example code:
def add(a, b):
return a + b
def mix(a, b):
return a * b
# Required blend -> decided by other code
blend_name = "add"
a = input("Some input")
b = input("Some other input")
result = run(add, a, b) # I need a run function
I have looked online, but most searches lead to either running functions from the console, or how to define a function.
I'm not really big fan of using dictionary in this case so here is my approach using getattr. although technically its almost the same thing and principle is also almost the same, code looks cleaner for me at least
class operators():
def add(self, a, b):
return (a + b)
def mix(self, a, b):
return(a * b)
# Required blend -> decided by other code
blend_name = "add"
a = input("Some input")
b = input("Some other input")
method = getattr(operators, blend_name)
result = method(operators, a, b)
print(result) #prints 12 for input 1 and 2 for obvious reasons
EDIT
this is edited code without getattr and it looks way cleaner. so you can make this class the module and import as needed, also adding new operators are easy peasy, without caring to add an operator in two places (in the case of using dictionary to store functions as a key/value)
class operators():
def add(self, a, b):
return (a + b)
def mix(self, a, b):
return(a * b)
def calculate(self, blend_name, a, b):
return(operators.__dict__[blend_name](self, a, b))
# Required blend -> decided by other code
oper = operators()
blend_name = "add"
a = input("Some input")
b = input("Some other input")
result = oper.calculate(blend_name, a, b)
print(result)
You can create a dictionary that maps the function names to their function objects and use that to call them. For example:
functions = {"add": add, "sub": sub} # and so on
func = functions[blend_name]
result = func(a, b)
Or, a little more compact, but perhaps less readable:
result = functions[blend_name](a, b)
You could use the globals() dictionary for the module.
result = globals()[blend_name](a, b)
It would be prudent to add some validation for the values of blend_name
Let's say I have a calculate() method which have complicated calculation with many variables, while I want to log down what is the value of variables in different phase (EDIT: Not only for verification but for data study purpose). For example.
# These assignment are arbitrary,
# but my calculate() method is more complex
def calculate(a, b):
c = 2*a+b
d = a-b
if c > d+10:
g = another_calc(a, c):
else:
g = another_calc(a, d):
return c, d, g
def another_calc(a, c_d):
e = a+c_d
f = a*c_d
g = e+f
return g
You may assume the method will be modified a lot for experimental exploration.
There is no much logging here and I want to log down what happen, for example I can write aggressive code like this
# These assignment are arbitrary,
# but my calculate() method is more complex
def calculate(a, b):
info = {"a": a, "b": b}
c = 2*a+b
d = a-b
info["c"], info["d"] = c, d
if c > d+10:
info["switch"] = "entered c"
g, info = another_calc(a, c, info):
else:
info["switch"] = "entered d"
g, info = another_calc(a, d, info):
return c, d, g, info
def another_calc(a, c_d, info):
e = a+c_d
f = a*c_d
g = e+f
info["e"], info["f"], info["g"] = e, f, g
return g, info
This serve my purpose (I got the info object, then it will be exported as CSV for my further study)
But it is pretty ugly to add more (non-functional) lines to the original clean calculate() method, changing signature and return value.
But can I write a cleaner code?
I am thinking whether it is possible to use decorator to wrap this method. Hope you guys would have some great answers. Thanks.
One way to write cleaner code (my opinion) is to wrap the info -dictionary inside a class.
Here is my simple code example:
# These assignment are arbitrary,
# but my calculate() method is more complex
def calculate(a, b, logger):
logger.log("a", a)
logger.log("b", b)
c = 2*a+b
d = a-b
logger.log("c", c)
logger.log("d", d)
if c > d+10:
logger.log("switch", "entered c")
g = another_calc(a, c)
else:
logger.log("switch", "entered d")
g = another_calc(a, d)
return c, d, g
def another_calc(a, c_d, logger):
e = a+c_d
f = a*c_d
g = e+f
logger.log("e", e)
logger.log("f", f)
logger.log("g", g)
return g
class Logger(object):
data = []
def log(self, key, value):
self.data.append({key: value})
def getLog(self):
return self.data
logger = Logger()
print(calculate(4, 7, logger))
print(logger.getLog())
Pros and cons
I use separated logger class here because then I don't need to know how the logger is implemented. In the example, it is just a simple dictionary but if needed, you can just change the implementation of creating a new logger.
Also, you have a way to choose how to print the data or choose output. Maybe you can have an interface for Logger.
I used a dictionary because it looked like you was just needing key-value pairs.
Now, using the logger, we need to change method signature. Of course, you can define default value as None, for example. Then None value should be checked all the time but that is why I didn't define the default value. If you own the code and can change every reference for the calculate()method, then it should not be a problem.
There is also one interesting thing that could be important later. When you have debugged your output and not need to log anything anymore, then you can just implement Null object. Using Null object, you can just remove all logging without changing the code again.
I was trying to think how to use decorator but now find any good way. If only output should be logged, then decorator could work.
What is the 'Pythonic' way to handling functions and using subfunctions in a scenario where they are used in a particular order?
As one of the ideas seem to be that functions should be doing 1 thing, I run into the situation that I find myself splitting up functions while they have a fixed order of execution.
When functions are really a kind of 'do step 1', 'then with outcome of step 1, do step 2' I currently end up wrapping the step functions into another function while defining them on the same level. However, I'm wondering if this is indeed the way I should be doing this.
Example code:
def step_1(data):
# do stuff on data
return a
def step_2(data, a):
# do stuff on data with a
return b
def part_1(data):
a = step_1(data)
b = step_2(data, a)
return a, b
def part_2(data_set_2, a, b):
# do stuff on data_set_2 with a and b as input
return c
I'd be calling this from another file/script (or Jupyter notebook) as part_1 and then part_2
Seems to be working just fine for my purposes right now, but as I said I'm wondering at this (early) stage if I should be using a different approach for this.
I guess you can use a Class here, otherwise your code can be made shorter using the following:
def step_1(data):
# do stuff on data
return step_2(data, a)
def step_2(data, a):
# do stuff on data with a
return a, b
def part_2(data_set_2, a, b):
# do stuff on data_set_2 with a and b as input
return c
As a rule of thumb, if more functions use the same arguments, it is a good idea to group them together into a class. But you can also define a main() or run() function that makes uses of your functions in a sequential fashion. Since the example you have made is not too complex, I would avoid using classes and go for something like:
def step_1(data):
# do stuff on data
return step_2(data, a)
def step_2(data, a):
# do stuff on data with a
return a, b
def part_2(data_set_2, a, b):
# do stuff on data_set_2 with a and b as input
return c
def run(data, data_set_2, a, b):
step_1(data)
step_2(data, a)
part_2(data_set_2, a, b)
run(data, data_set_2, a, b)
If the code grows in complexity, using classes is advised. In the end, it's your choice.
I've often been frustrated by the lack of flexibility in Python's iterable unpacking.
Take the following example:
a, b = range(2)
Works fine. a contains 0 and b contains 1, just as expected. Now let's try this:
a, b = range(1)
Now, we get a ValueError:
ValueError: not enough values to unpack (expected 2, got 1)
Not ideal, when the desired result was 0 in a, and None in b.
There are a number of hacks to get around this. The most elegant I've seen is this:
a, *b = function_with_variable_number_of_return_values()
b = b[0] if b else None
Not pretty, and could be confusing to Python newcomers.
So what's the most Pythonic way to do this? Store the return value in a variable and use an if block? The *varname hack? Something else?
As mentioned in the comments, the best way to do this is to simply have your function return a constant number of values and if your use case is actually more complicated (like argument parsing), use a library for it.
However, your question explicitly asked for a Pythonic way of handling functions that return a variable number of arguments and I believe it can be cleanly accomplished with decorators. They're not super common and most people tend to use them more than create them so here's a down-to-earth tutorial on creating decorators to learn more about them.
Below is a decorated function that does what you're looking for. The function returns an iterator with a variable number of arguments and it is padded up to a certain length to better accommodate iterator unpacking.
def variable_return(max_values, default=None):
# This decorator is somewhat more complicated because the decorator
# itself needs to take arguments.
def decorator(f):
def wrapper(*args, **kwargs):
actual_values = f(*args, **kwargs)
try:
# This will fail if `actual_values` is a single value.
# Such as a single integer or just `None`.
actual_values = list(actual_values)
except:
actual_values = [actual_values]
extra = [default] * (max_values - len(actual_values))
actual_values.extend(extra)
return actual_values
return wrapper
return decorator
#variable_return(max_values=3)
# This would be a function that actually does something.
# It should not return more values than `max_values`.
def ret_n(n):
return list(range(n))
a, b, c = ret_n(1)
print(a, b, c)
a, b, c = ret_n(2)
print(a, b, c)
a, b, c = ret_n(3)
print(a, b, c)
Which outputs what you're looking for:
0 None None
0 1 None
0 1 2
The decorator basically takes the decorated function and returns its output along with enough extra values to fill in max_values. The caller can then assume that the function always returns exactly max_values number of arguments and can use fancy unpacking like normal.
Here's an alternative version of the decorator solution by #supersam654, using iterators rather than lists for efficiency:
def variable_return(max_values, default=None):
def decorator(f):
def wrapper(*args, **kwargs):
actual_values = f(*args, **kwargs)
try:
for count, value in enumerate(actual_values, 1):
yield value
except TypeError:
count = 1
yield actual_values
yield from [default] * (max_values - count)
return wrapper
return decorator
It's used in the same way:
#variable_return(3)
def ret_n(n):
return tuple(range(n))
a, b, c = ret_n(2)
This could also be used with non-user-defined functions like so:
a, b, c = variable_return(3)(range)(2)
Shortest known to me version (thanks to #KellyBundy in comments below):
a, b, c, d, e, *_ = *my_list_or_iterable, *[None]*5
Obviously it's possible to use other default value than None if necessary.
Also there is one nice feature in Python 3.10 which comes handy here when we know upfront possible numbers of arguments - like when unpacking sys.argv
Previous method:
import sys.argv
_, x, y, z, *_ = *sys.argv, *[None]*3
New method:
import sys
match sys.argv[1:]: #slice needed to drop first value of sys.argv
case [x]:
print(f'x={x}')
case [x,y]:
print(f'x={x}, y={y}')
case [x,y,z]:
print(f'x={x}, y={y}, z={z}')
case _:
print('No arguments')
I understand from this answer why the warning exists. However, why would the default value of it be 2?
It seems to me that classes with a single public method aside from __init__ are perfectly normal! Is there any caveat to just setting
min-public-methods=1
in the pylintrc file?
The number 2 is completely arbitrary. If min-public-methods=1 is a more fitting policy for your project and better matches your code esthetic opinions, then by all means go for it. As was once said, "Pylint doesn't know what's best".
For another perspective, Jack Diederich gave a talk at PyCon 2012 called "Stop Writing Classes".
One of his examples is the class with a single method, which he suggests should be just a function. If the idea is to set up an object containing a load of data and a single method that can be called later (perhaps many times) to act on that data, then you can still do that with a regular function by making an inner function the return value.
Something like:
def complicated(a, b, c, d, e):
def inner(k):
return (a*k, b*k, c*k, d*k, e*k)
return inner
foo = complicated(1, 2, 3, 4, 5)
result = foo(100)
This does seem much simpler to me than:
class Complicated:
def __init__(self, a, b, c, d, e):
self.a = a
self.b = b
self.c = c
self.d = d
self.e = e
def calc(self, k)
return (self.a*k, self.b*k, self.c*k, self.d*k, self.e*k)
foo = Complicated(1, 2, 3, 4, 5)
result = Complicated.calc(100)
The main limitation of the function based approach is that you cannot read back the values of a, b, c, d, and e in the example.