I wrote a generic framework that help me to bench-mark code critical sections.
Here is an explanation of the framework and in the end is the problem I am facing and few ideas I have for solutions.
Basically, I am looking for more elegant solutions
Suppose I have a function that does this (in pseudo code):
#Pseudo Code - Don't expect it to run
def foo():
do_begin()
do_critical()
some_value = do_end()
return some_value
I want to run "do_critical" section many times in loop and measure the time but still get the return value.
so, I wrote BenchMarker class that its api is something like that:
#Pseudo Code - Don't expect it to run
bm = BenchMarker(first=do_begin, critical=do_critical, end=do_end)
bm.start_benchmarking()
returned_value = bm.returned_value
benchmark_result = bm.time
This Benckmarker internally performing the following:
#Pseudo Code - Don't expect it to run
class BenchMarker:
def __init__(self):
.....
def start_benchmarking(self):
first()
t0 = take_time
for i in range(n_loops):
critical()
t1 = take_time
self.time = (t1-t0)/n_loops
value = end()
self.returned_value = value
Important to mention that I also able to pass context between first, critical and end functions, but I omitted it for simplicity as this is not the gist of my question.
This framework is working like a charm until the following use case:
I have the following code
#Pseudo Code - Don't expect it to run
def bar():
do_begin()
with some_context_manager() as ctx:
do_critical()
some_value = do_end()
return some_value
Now, after this long introduction (sorry ...), I am getting to the real question.
I don't want to run the "with statement" in the time measuring loop, but the critical code needs the context manger.
so what I basically want is equivalent to the following decomposing of bar:
first -> do_begin() + "what happens in the with before the with body"
critical -> do_critical()
end -> "what happens after the with body" + do_end()
Two Solutions I thought of (but I don't like):
Solution 1
Mimic what with does under the hood
In end of first()m create the context manager object + run it's enter() function
In the start of end(), call the context manager exit() function
Solution 2
Framework Enhancement to handle CM
Add to the framework a "context work mode" (flag, whatever ...) on which the "start_benchmarking" flow will look like this:
#Pseudo Code - Don't expect it to run
def start_benchmarking(self):
first() #including instantiating the context manager
ctx = get_the_context_manager_created_in_first()
with ctx ...:
t0 = take_time
for i in range(n_loops):
critical()
t1 = take_time
self.time = (t1-t0)/n_loops
value = end()
self.returned_value = value
Any other, more elegant, solutions?
this is way over-complicated. and i cannot quite figure out why you'd actually want to do this, but assuming that you have reasons, just create a function that does your timing for you:
def run_func_n_times(n_times, func, *args, **kwargs):
start = time.time()
for _ in range(n_times):
res = func(*args, **kwargs)
return res, (time.time() - start) / n_times
no need for a class, just a simple func:
def example():
do_begin()
print('look, i am here')
with ctx() as blah:
res, timed = run_func_n_times(27, f, foo, bar)
do_end()
Related
Let's say we have a python class with methods intended to be called one or more times inline. My goal is to make methods behave differently when they are invoked last in an inline chain of calls.
Example:
class InlineClass:
def __init__(self):
self.called = False
def execute(self):
if self.called:
print('do something with awareness of prior call')
return self
else:
print('first call detected')
self.called = True
return self
def end(self):
print ('last call detected, do something with awareness this is the last call')
self.called = False
return self
x = InlineClass()
x.execute().execute().execute().end() # runs 3 execute calls inline and end.
The example above only knows it has reached the last inline call once the end method is invoked. What I would like to do, in essence, is to make that step redundant
QUESTION
Keeping in mind the intent for this class's methods to always be called one or more times inline, is there an elegant way to format the class so it is aware it has reached its last inline method call, and not necessitate the end call as in the example above.
Instead of chaining the functions, you can try creating a function that handles passing different parameters depending on how many times the function has been / will be called.
Here is some example code:
class Something:
def repeat(self, function, count):
for i in range(count):
if i == 0:
function("This is the first time")
elif i == count - 1:
function("This is the last time")
else:
function("This is somewhere in between")
def foo_function(self, text):
print(text)
foo = Something()
foo.repeat(foo.foo_function, 5)
foo.repeat(foo.foo_function, 2)
foo.repeat(foo.foo_function, 6)
foo.repeat(foo.foo_function, 8)
Output:
This is the first time
This is somewhere in between
This is somewhere in between
This is somewhere in between
This is the last time
This is the first time
This is the last time
This is the first time
This is somewhere in between
This is somewhere in between
This is somewhere in between
This is somewhere in between
This is the last time
This is the first time
This is somewhere in between
This is somewhere in between
This is somewhere in between
This is somewhere in between
This is somewhere in between
This is somewhere in between
This is the last time
You need to return modified copies instead of self, here is the example with behaviour you described:
class InlineClass:
def __init__(self, counter=0):
self.counter = counter
def execute(self):
return InlineClass(self.counter+1)
def __str__(self):
return f'InlineClass<counter={self.counter}>'
x = InlineClass()
print(x)
# => InlineClass<counter=0>
y = x.execute().execute().execute()
print(y)
# => InlineClass<counter=3>
print(x.execute().execute().execute())
# => InlineClass<counter=3>
print(y.execute().execute().execute())
# => InlineClass<counter=6>
Python offers tracing through its trace module. There are also custom solutions like this. But these approaches capture most low-level executions, inside-and-out of most/every library you use. Other than deep-dive debugging this isn't very useful.
It would be nice to have something that captures only the highest-level functions laid out in your pipeline. For example, if I had:
def funct1():
res = funct2()
print(res)
def funct2():
factor = 3
res = funct3(factor)
return(res)
def funct3(factor):
res = 1 + 100*factor
return(res)
...and called:
funct1()
...it would be nice to capture:
function order:
- funct1
- funct2
- funct3
I have looked at:
trace
tracefunc
sys.settrace
trace.py
I am happy to manually mark the functions inside the scripts, like we do with Docstrings. Is there a way to add "hooks" to functions, then track them as they get called?
You can always use a decorator to track which functions are called. Here is an example that allows you to keep track of what nesting level the function is called at:
class Tracker:
level = 0
def __init__(self, indent=2):
self.indent = indent
def __call__(self, fn):
def wrapper(*args, **kwargs):
print(' '*(self.indent * self.level) + '-' + fn.__name__)
self.level += 1
out = fn(*args, **kwargs)
self.level -= 1
return out
return wrapper
track = Tracker()
#track
def funct1():
res = funct2()
print(res)
#track
def funct2():
factor = 3
res = funct3(factor)
return(res)
#track
def funct3(factor):
res = 1 + 100*factor
return(res)
It uses the class variable level to keep track of how many functions have been called and simply prints out the the function name with a space indent. So calling funct1 gives:
funct1()
# prints:
-funct1
-funct2
-funct3
# returns:
301
Depending on how you want to save the output, you can use the logging module for the output
With the goal of capturing meaningful elapsed-time information in for logs, I have replicated the following time-capture and logging code across many functions:
import time
import datetime
def elapsed_str(seconds):
""" Returns elapsed number of seconds in format '(elapsed HH:MM:SS)' """
return "({} elapsed)".format(str(datetime.timedelta(seconds=int(seconds))))
def big_job(job_obj):
""" Do a big job and return the result """
start = time.time()
logging.info(f"Starting big '{job_obj.name}' job...")
logging.info(f"Doing stuff related to '{job_type}'...")
time.sleep(10) # Do some stuff...
logging.info(f"Big '{job_obj.name}' job completed! "
f"{elapsed_str(time.time() - start)}")
return my_result
With sample usage output:
big_job("sheep-counting")
# Log Output:
# 2019-09-04 01:10:48,027 - INFO - Starting big 'sheep-counting' job...
# 2019-09-04 01:10:48,092 - INFO - Doing stuff related to 'sheep-counting'
# 2019-09-04 01:10:58,802 - INFO - Big 'sheep-counting' job completed! (0:00:10 elapsed)
I'm looking for an elegant (pythonic) method to remove these redundant lines from having to be rewritten each time:
start = time.time() - Should just automatically capture the start time at function launch.
time.time() - start Should use previously captured start time and infer current time from now(). (Ideally elapsed_str() would be callable with zero arguments.)
My specific use case is to generate large datasets in the data science / data engineering field. Runtimes could be anywhere from seconds to days, and it is critical that (1) logs are easily searchable (for the word "elapsed" in this case) and (2) that the developer cost of adding the logs is very low (since we don't know ahead of time which jobs may be slow and we may not be able to modify source code once we identify a performance problem).
The recommended way is to use time.perf_counter() and time.perf_counter_ns() since 3.7.
In order to measure runtime of functions it is comfortable to use a decorator. For example the following one:
import time
def benchmark(fn):
def _timing(*a, **kw):
st = time.perf_counter()
r = fn(*a, **kw)
print(f"{fn.__name__} execution: {time.perf_counter() - st} seconds")
return r
return _timing
#benchmark
def your_test():
print("IN")
time.sleep(1)
print("OUT")
your_test()
(c) The code of this decorator is slightly modified from sosw package
If I understood you correctly you could write a decorator that will time the function.
A good example here: https://stackoverflow.com/a/5478448/6001492
This may be overkill for others' use cases but the solution I found required a few difficult hurtles and I'll document them here for anyone who wants to accomplish something similar.
1. Helper function to dynamically evaluate f-strings
def fstr(fstring_text, locals, globals=None):
"""
Dynamically evaluate the provided fstring_text
Sample usage:
format_str = "{i}*{i}={i*i}"
i = 2
fstr(format_str, locals()) # "2*2=4"
i = 4
fstr(format_str, locals()) # "4*4=16"
fstr(format_str, {"i": 12}) # "10*10=100"
"""
locals = locals or {}
globals = globals or {}
ret_val = eval(f'f"{fstring_text}"', locals, globals)
return ret_val
2. The #logged decorator class
class logged(object):
"""
Decorator class for logging function start, completion, and elapsed time.
"""
def __init__(
self,
desc_text="'{desc_detail}' call to {fn.__name__}()",
desc_detail="",
start_msg="Beginning {desc_text}...",
success_msg="Completed {desc_text} {elapsed}",
log_fn=logging.info,
**addl_kwargs,
):
""" All arguments optional """
self.context = addl_kwargs.copy() # start with addl. args
self.context.update(locals()) # merge all constructor args
self.context["elapsed"] = None
self.context["start"] = time.time()
def re_eval(self, context_key: str):
""" Evaluate the f-string in self.context[context_key], store back the result """
self.context[context_key] = fstr(self.context[context_key], locals=self.context)
def elapsed_str(self):
""" Return a formatted string, e.g. '(HH:MM:SS elapsed)' """
seconds = time.time() - self.context["start"]
return "({} elapsed)".format(str(datetime.timedelta(seconds=int(seconds))))
def __call__(self, fn):
""" Call the decorated function """
def wrapped_fn(*args, **kwargs):
"""
The decorated function definition. Note that the log needs access to
all passed arguments to the decorator, as well as all of the function's
native args in a dictionary, even if args are not provided by keyword.
If start_msg is None or success_msg is None, those log entries are skipped.
"""
self.context["fn"] = fn
fn_arg_names = inspect.getfullargspec(fn).args
for x, arg_value in enumerate(args, 0):
self.context[fn_arg_names[x]] = arg_value
self.context.update(kwargs)
desc_detail_fn = None
log_fn = self.context["log_fn"]
# If desc_detail is callable, evaluate dynamically (both before and after)
if callable(self.context["desc_detail"]):
desc_detail_fn = self.context["desc_detail"]
self.context["desc_detail"] = desc_detail_fn()
# Re-evaluate any decorator args which are fstrings
self.re_eval("desc_detail")
self.re_eval("desc_text")
# Remove 'desc_detail' if blank or unused
self.context["desc_text"] = self.context["desc_text"].replace("'' ", "")
self.re_eval("start_msg")
if self.context["start_msg"]:
# log the start of execution
log_fn(self.context["start_msg"])
ret_val = fn(*args, **kwargs)
if desc_detail_fn:
# If desc_detail callable, then reevaluate
self.context["desc_detail"] = desc_detail_fn()
self.context["elapsed"] = self.elapsed_str()
# log the end of execution
log_fn(fstr(self.context["success_msg"], locals=self.context))
return ret_val
return wrapped_fn
Sample usage:
#logged()
def my_func_a():
pass
# 2019-08-18 - INFO - Beginning call to my_func_a()...
# 2019-08-18 - INFO - Completed call to my_func_a() (00:00:00 elapsed)
#logged(log_fn=logging.debug)
def my_func_b():
pass
# 2019-08-18 - DEBUG - Beginning call to my_func_b()...
# 2019-08-18 - DEBUG - Completed call to my_func_b() (00:00:00 elapsed)
#logged("doing a thing")
def my_func_c():
pass
# 2019-08-18 - INFO - Beginning doing a thing...
# 2019-08-18 - INFO - Completed doing a thing (00:00:00 elapsed)
#logged("doing a thing with {foo_obj.name}")
def my_func_d(foo_obj):
pass
# 2019-08-18 - INFO - Beginning doing a thing with Foo...
# 2019-08-18 - INFO - Completed doing a thing with Foo (00:00:00 elapsed)
#logged("doing a thing with '{custom_kwarg}'", custom_kwarg="foo")
def my_func_e(foo_obj):
pass
# 2019-08-18 - INFO - Beginning doing a thing with 'foo'...
# 2019-08-18 - INFO - Completed doing a thing with 'foo' (00:00:00 elapsed)
Conclusion
The main advantages versus simpler decorator solutions are:
By leveraging delayed execution of f-strings, and by injecting context variables from both the decorator constructor as well ass the function call itself, the log messaging can be easily customized to be human readable.
(And most importantly), almost any derivation of the function's arguments can be used to distinguish the logs in subsequent calls - without changing how the function itself is defined.
Advanced callback scenarios can be achieved by sending a functions or complex objects to the decorator argument desc_detail, in which case the function would get evaluated both before and after function execution. This could eventually be extended to use a callback functions to count rows in created data table (for instance) and to include the table row counts in the completion log.
How can I monkey patch the builtin input and print functions using Pytest so as to capture the output of someone else's code and test it with pytest before refactoring?
For example, I have acquired some code similar to this:
class QueryProcessor:
def __init__(self ...):
...
def write_search_result(self, was_found):
print('yes' if was_found else 'no')
def read_query(self):
return Query(input().split())
I don't want to read dozens of input parameters from stdin, and I don't want to print the outputs. I want to use the functions I've written that sift through a directory full of mytest.in and mytest.out files and pass the inputs to pytest using #pytest.mark.parametrize(...).
But I can't figure out how to patch the awkward read… and write… functions in this class.
I suspect it's something along the lines of:
#yptest.mark.parametrize("inputs…, expected outputs…", data_reading_func())
def test_QueryProcessor(monkeypatch, inputs…, expected outputs…):
"""Docstring
"""
q = QueryProcessor()
def my_replacement_read():
...
return [...]
def my_replacement_write():
...
return [...]
monkeypatch.???
assert ...
Can you help?
Many thanks
While awaiting a response, I came up with the following myself. I think the ideal answer will be what I've done implemented in the way #hoefling suggests—using patch.
#pytest.mark.parametrize("m, n, commands, expec", helpers.get_external_inputs_outputs('spampath', helpers.read_spam_input_output))
def test_QueryProcessor(monkeypatch, m, n, commands, expec):
def mock_process_queries(cls):
for cmd in commands:
cls.process_query(Query(cmd.split())) # also mocks read_query()
def mock_write_search_result(cls, was_found):
outputs.append('yes' if was_found else 'no')
monkeypatch.setattr('test.QueryProcessor.process_queries', mock_process_queries)
monkeypatch.setattr('test.QueryProcessor.write_search_result', mock_write_search_result)
outputs = []
proc = QueryProcessor(m)
proc.process_queries()
assert outputs == expec
UPDATE:
#pytest.mark.parametrize("m, n, commands, expec",
helpers.get_external_inputs_outputs(
'spampath',
helpers.read_input_output))
def test_QueryProcessor_mockpatch(m, n, commands, expec):
commands.insert(0,n)
mock_stdout = io.StringIO()
with patch('spammodule.input', side_effect=commands):
with patch('sys.stdout', mock_stdout):
proc = hash_chains.QueryProcessor(m)
proc.process_queries()
assert mock_stdout.getvalue().split('\n')[:-1] == expec
Hey there I guess it is too late for you but for the other people asking themselves how to monkeypatch input(), I did it like this:
monkeypatch.setattr(builtins, 'input', lambda *args, **kwargs: 'Yes, I like monkeypatching')
So I would refactor the code you posted in your own answer its update to (Assuming that commands is callable, since you specified it as a side_effect):
# don't forget the imports
import builtins
import io
import sys
#pytest.mark.parametrize("m, n, commands, expec",
helpers.get_external_inputs_outputs('spampath',helpers.read_input_output))
def test_QueryProcessor_mockpatch(monkeypatch, m, n, commands, expec):
commands.insert(0,n)
mock_stdout = io.StringIO()
monkeypatch.setattr(builtins, 'input', lambda description: commands())
monkeypatch.setattr(sys, 'stdout', mock_stdout)
proc = hash_chains.QueryProcessor(m)
proc.process_queries()
assert mock_stdout.getvalue().split('\n')[:-1] == expec
I have a class with code that fits into the following template:
class aClass:
def __init__(self, switch = False):
self.switch = switch
def f(self):
done = False
while not done:
# a dozen lines of code
if self.switch:
# a single line of code
# another dozen lines of code
So the single line of code in the if statement will either never be executed, or it will be executed in all iterations. And this is actually known as soon as the object is initialized.
When self.switch is True, I would like the single line of code to be executed without having to check for self.switch at every single iteration. And when self.switch is False, I would like the single line of code to be ignored, again without having to repeatedly check for self.switch.
I have of course considered writing two versions of f and selecting the appropriate one in __init__ according to the value of the switch, but duplicating all this code except for a single line doesn't feel right.
Can anyone suggest an elegant way to solve this problem? Perhaps a way to generate the appropriate version of the f method at initialization?
That's a completely valid ask. If not for performance then for readability.
Extract the three pieces of logic (before, inside, and after your condition) in three separate methods and in f() just write two implementations of the big loop:
def first(self):
pass
def second(self):
pass
def third(self):
pass
def f(self):
if self.switch:
while ...:
self.first()
self.third()
else:
while ...:
self.first()
self.second()
self.third()
If you want it more elegant (although it depends on taste), you express the two branches of my f() into two methods first_loop and second_loop and then in __init__ assign self.f = self.first_loop or self.f = self.second_loop depending on the switch:
class SuperUnderperformingAccordingToManyYetReadable(object):
def __init__(self, switch):
if self.switch:
self.f = self._first_loop
else:
self.f = self._second_loop
def _first(self):
pass
def _second(self):
pass
def _third(self):
pass
def _first_loop(self):
while ...:
self.first()
self.third()
def _second_loop(self):
while ...:
self.first()
self.second()
self.third()
You may need to do some extra work to manage breaking out of the while loop.
If the .switch attribute is not supposed to change, try to select the loop body dynamicly in the __init__() method:
def __init__(self, switch=False):
self.switch = switch
self.__fBody = self.__fSwitchTrue if switch else self.__fSwitchFalse
def f(self):
self.__done = False
while not self.__done:
self.__fBody()
def __fSwitchTrue(self):
self.__fBodyStart()
... # a single line of code
self.__fBodyEnd()
def __fSwitchFalse(self):
self.__fBodyStart()
self.__fBodyEnd()
def __fBodyStart(self):
... # a dozen lines of code
def __fBodyEnd(self):
... # another dozen lines of code
Remember to change values used by more than one of the defined methods to attributes (like done is changed to .__done).
In a comment to my original question, JohnColeman suggested using exec and provided a link to another relevant question.
That was an excellent suggestion and the solution I was lead to is:
_template_pre = """\
def f(self):
for i in range(5):
print("Executing code before the optional segment.")
"""
_template_opt = """\
print("Executing the optional segment")
"""
_template_post = """\
print("Executing code after the optional segment.")
"""
class aClass:
def __init__(self, switch = False):
if switch:
fdef = _template_pre + _template_opt + _template_post
else:
fdef = _template_pre + _template_post
exec(fdef, globals(), self.__dict__)
# bind the function
self.f = self.f.__get__(self)
You can verify this actually works:
aClass(switch = False).f()
aClass(switch = True).f()
Before jumping to conclusions as to how "pythonic" this is, let me point out that such an approach is employed in a couple of metaclass recipes I have encountered and even in the Python Standard Library (check the implementation of namedtuple, to name one example).