How to clear/reset results of the Python line profiler? - python

I'm trying to start and stop the line profiling of a Python function multiple times during runtime. Therefore I'd like to reset the already collected stats when starting a new profiling. Is there a way to do that?
In lack of an obvious solution I also tried replacing the line profiler lp with a fresh instance:
#!/usr/bin/env python3
from line_profiler import LineProfiler
lp = LineProfiler()
#lp
def count():
return sum(range(1_000_000))
count()
lp.print_stats()
# reset line profiler
new_lp = LineProfiler()
for f in lp.functions:
new_lp(f)
lp = new_lp
count()
lp.print_stats()
But somehow the new stats are empty, possibly because the function count() can't be wrapped twice?

I came up with the following solution based of a new profiler class. Every time a profiling is started, it creates a new instance of LineProfiler. The key is to store wrapped functions next to the originals, so that they can be reset when stopping the profiler.
from typing import Optional
from line_profiler import LineProfiler
from functools import wraps
class MyLineProfiler:
def __init__(self):
self.functions: list[list] = []
self.line_profiler: Optional[LineProfiler] = None
def __call__(self, func):
index = len(self.functions)
#wraps(func)
def wrap(*args, **kw):
return self.functions[index][1](*args, **kw)
self.functions.append([func, func])
return wrap
def start(self):
self.line_profiler = LineProfiler()
for f in self.functions:
f[1] = self.line_profiler(f[0])
def stop(self, *, print: bool = True):
for f in self.functions:
f[1] = f[0]
if self.line_profiler and print:
self.line_profiler.print_stats()
def reset(self):
self.stop(print=False)
self.start()
The wrapped functions call whatever is currently stored at functions[index][1], either the original func (when no profiling is stopped) or the decorated one (when start() was called).
It can be used as follows:
profile = MyLineProfiler()
#profile
def count():
return sum(range(1_000_000))
count()
profile.start()
count()
count()
profile.stop()
profile.start()
count()
profile.stop()

Related

Measure elapsed time for dependent generators

I want to measure the time that various functions take. The thing is, the functions are generators which are piped together, like this:
import functools
import string
from time import sleep
from timeit import default_timer as timer
lines = (string.ascii_lowercase for _ in range(1000))
class Timer:
_results = {}
#classmethod
def measure(cls):
def decorator(method):
#functools.wraps(method)
def wrapper(*args, **kwargs):
obj = args[0]
start = timer()
gen = method(*args, **kwargs)
yield from gen
end = timer()
cls._results[str(obj)] = end - start
return wrapper
return decorator
class Source:
def __init__(self, lines):
self._lines = lines
def __iter__(self):
for line in self._lines:
yield line
class Log:
def __init__(self, stream):
self._stream = stream
def __next__(self):
return next(self._stream)
def __iter__(self):
yield from self._stream
def __or__(self, filter):
return filter(self)
#classmethod
def from_source(cls, source):
return cls(iter(source))
class Filter1:
def __call__(self, log):
return Log(self._generator(log))
#Timer.measure()
def _generator(self, log):
for event in log:
sleep(0.001)
yield event
class Filter2:
def __call__(self, log):
return Log(self._generator(log))
#Timer.measure()
def _generator(self, log):
for event in log:
yield event
if __name__ == "__main__":
source = Source(lines)
pipeline = Log.from_source(source) | Filter2() | Filter1()
list(pipeline)
print(Timer._results)
Filter1._generator and Filter2._generator are the functions I want to measure. As for the Log class, it has an __or__ operator allowing me to pipe those filters on the data. Notice that the filters are identical, but the Filter1 has some sleeps added (in my real code they both actually do some stuff, different stuff).
The Timer decorator is a standard decorator that uses timeit.default_timer to measure the function's execution time.
The result is:
{'<__main__.Filter2 object at 0x000001D0CB7B62C0>': 15.599821100011468, '<__main__.Filter1 object at 0x000001D0CB7B6500>': 15.599853199906647}
So, the times are pretty much identical. This is the result of the fact that one filter parses the data (here, it only yields it, I just created a small representation of what I'm working on) and yields the line to the next filter to be picked up. This is how it's supposed to work.
The question would be: can I measure the times of execution accurately here? The thing I want to measure is: how much time does each filter take to process all the lines. Because obviously Filter1._generator would take more time, but I cannot see it, because the Timer.measure() waits for the generator to exit.

Cannot use ProcessPoolExecutor if in a decorator?

I have this minimal example:
from functools import wraps
from concurrent import futures
import random
def decorator(func):
num_process = 4
def impl(*args, **kwargs):
with futures.ProcessPoolExecutor() as executor:
fs = []
for i in range(num_process):
fut = executor.submit(func, *args, **kwargs)
fs.append(fut)
result = []
for f in futures.as_completed(fs):
result.append(f.result())
return result
return impl
#decorator
def get_random_int():
return random.randint(0, 100)
if __name__ == "__main__":
result = get_random_int()
print(result)
If we try to run this function I think we will have the following error:
_pickle.PicklingError: Can't pickle <function get_random_int at 0x7f06cee666a8>: it's not the same object as __main__.get_random_int
I think the main issue here is that the "wraps" decorator itself alters the func object and thus make it impossible to pickle. I found this rather strange. I am just wondering if there is any way to get around this behavior? I would want to use wraps if possible. Thanks!
This is because run_in_executor is calling functools.partial on the decorated function see: https://docs.python.org/3/library/asyncio-eventloop.html#asyncio-pass-keywords The picklability of partial objects is spotty (see: Are partial functions "officially" picklable?) but See this comment over here Pickling wrapped partial functions partial functions are only pickleable when the function being pickled is in the global name sapce. We know run_in_executor with a ProcessPoolExecutor will work for non wrapped functions since that pattern is documented in asyncio. To get around this I decorate a dummy function and pass the function I want to be executed in multiple processes as an argument to the decorator
from functools import wraps
from concurrent import futures
import random
def decorator(multiprocess_func):
def _decorate(func):
num_process = 4
def impl(*args, **kwargs):
with futures.ProcessPoolExecutor() as executor:
fs = []
for i in range(num_process):
fut = executor.submit(multiprocess_func, *args, **kwargs)
fs.append(fut)
result = []
for f in futures.as_completed(fs):
result.append(f.result())
return result
return impl
return _decorate
def _get_random_int():
return random.randint(0, 100)
#decorator(_get_random_int)
def get_random_int():
return _get_random_int()
if __name__ == "__main__":
result = get_random_int()
print(result)
I ultimately decided that not using a decorator was cleaner
from concurrent import futures
import random
def decorator(multiprocess_func):
num_process = 4
def impl(*args, **kwargs):
with futures.ProcessPoolExecutor(max_workers=num_process) as executor:
fs = []
for i in range(num_process):
fut = executor.submit(multiprocess_func, *args, **kwargs)
fs.append(fut)
result = []
for f in futures.as_completed(fs):
result.append(f.result())
return result
return impl
def _get_random_int():
return random.randint(0, 100)
get_random_int = decorator(_get_random_int)
if __name__ == "__main__":
result = get_random_int()
print(result)
Similar to the linked answer above about pickling wrapped partial functions.

python - How to define function that I can use efficiently across modules and directories

I want to implement a timer to measure how long a block of code takes to run. I then want to do this across an entire application containing multiple modules (40+) across multiple directories (4+).
My timer is created with two functions that are within a class with a structure like this:
class SubClass(Class1)
def getStartTime(self):
start = time.time()
return start
def logTiming(self, classstring, start):
fin = time.time() - start
logging.getLogger('perf_log_handler').info((classstring + ' sec').format(round(fin,3)))
The first function gets the start time, and the second function calculates the time for the block to run and then logs it to a logger.
This code is in a module that we'll call module1.py.
In practice, generically, it will be implemented as such:
class SubSubClass(SubClass)
def Some_Process
stim = super().getStartTime()
code..............................
...
...
...
...
super().logTiming("The Process took: {}", stim)
return Result_Of_Process
This code resides in a module called module2.py and already works and successfully logs. My problem is that when structured like this, I can seemingly only use the timer inside code that is under the umbrella of SubClass, where it is defined (my application fails to render and I get a "can't find page" error in my browser). But I want to use this code everywhere in all the application modules, globally. Whether the module is within another directory, whether some blocks of code are within other classes and subclasses inside other modules, everywhere.
What is the easiest, most efficient way to create this timing instrument so that I can use it anywhere in my application? I understand I may have to define it completely differently. I am very new to all of this, so any help is appreciated.
OPTION 1) You should define another module, for example, "mytimer.py" fully dedicated to the timer:
import time
class MyTimer():
def __init__(self):
self.start = time.time()
def log(self):
now = time.time()
return now - self.start
And then, from any line of your code, for example, in module2.py:
from mytimer import MyTimer
class SomeClass()
def Some_Function
t = MyTimer()
....
t.log()
return ...
OPTION 2) You could also use a simple function instead of a class:
import time
def mytimer(start=None, tag=""):
if start is None:
start = time.time()
now = time.time()
delay = float(now - start)
print "%(tag)s %(delay).2f seconds." % {'tag': tag, 'delay': delay}
return now
And then, in your code:
from mytimer import mytimer
class SomeClass()
def Some_Function
t = mytimer(tag='BREAK0')
....
t = mytimer(start=t, tag='BREAK1')
....
t = mytimer(start=t, tag='BREAK2')
....
t = mytimer(start=t, tag='BREAK3')
return ...
I am not quite sure what you are looking for, but once upon a time I used a decorator for a similar type of problem.
The snippet below is the closest I can remember to what I implemented at that time. Hopefully it is useful to you.
Brief explanation
The timed is a 'decorator' that wraps methods in the python object and times the method.
The class contains a log that is updated by the wrapper as the #timed methods are called.
Note that if you want to make the #property act as a "class property" you can draw inspiration from this post.
from time import sleep, time
# -----------------
# Define Decorators
# ------------------
def timed(wrapped):
def wrapper(self, *arg, **kwargs):
start = time()
res = wrapped(self, *arg, **kwargs)
stop = time()
self.log = {'method': wrapped.__name__, 'called': start, 'elapsed': stop - start}
return res
return wrapper
# -----------------
# Define Classes
# ------------------
class Test(object):
__log = []
#property
def log(self):
return self.__log
#log.setter
def log(self, kwargs):
self.__log.append(kwargs)
#timed
def test(self):
print("Running timed method")
sleep(2)
#timed
def test2(self, a, b=2):
print("Running another timed method")
sleep(2)
return a+b
# ::::::::::::::::::
if __name__ == '__main__':
t = Test()
res = t.test()
res = t.test2(1)
print(t.log)

Running unit tests in Python with a caching decorator

So I'm working on an application that, upon import of certain records, requires some fields to be recalculated. To prevent a database read for each check, there is a caching decorator so the database read is only preformed once every n seconds during import. The trouble comes with building test cases. The following does work, but it has an ugly sleep in it.
# The decorator I need to patch
#cache_function_call(2.0)
def _latest_term_modified():
return PrimaryTerm.objects.latest('object_modified').object_modified
# The 2.0 sets the TTL of the decorator. So I need to switch out
# self.ttl for this decorated function before
# this test. Right now I'm just using a sleep, which works
#mock.patch.object(models.Student, 'update_anniversary')
def test_import_on_term_update(self, mock_update):
self._import_student()
latest_term = self._latest_term_mod()
latest_term.save()
time.sleep(3)
self._import_student()
self.assertEqual(mock_update.call_count, 2)
The decorator itself looks like the following:
class cache_function_call(object):
"""Cache an argument-less function call for 'ttl' seconds."""
def __init__(self, ttl):
self.cached_result = None
self.timestamp = 0
self.ttl = ttl
def __call__(self, func):
#wraps(func)
def inner():
now = time.time()
if now > self.timestamp + self.ttl:
self.cached_result = func()
self.timestamp = now
return self.cached_result
return inner
I have attempted to set the decorator before the import of the models:
decorators.cache_function_call = lambda x : x
import models
But even at the top of the file, django still initializes the models before running my tests.py and the function still gets decorated with the caching decorator instead of my lambda/noop one.
What's the best way to go about writing this test so I don't have a sleep. Can I set the ttl of the decorator before running my import somehow?
You can change the decorator class just a little bit.
At module level in decorators.py set the global
BAILOUT = False
and in your decorator class, change:
def __call__(self, func):
#wraps(func)
def inner():
now = time.time()
if BAILOUT or now > self.timestamp + self.ttl:
self.cached_result = func()
self.timestamp = now
return self.cached_result
return inner
Then in your tests set decorators.BAILOUT = True, and, hey presto!-)

Threads with decorators

I'm trying to implement threading(with using decorators) to my application, but can't understand some things about locks and managing threads.
import threading
def run_in_thread(fn):
def run(*k, **kw):
t = threading.Thread(target=fn, args=k, kwargs=kw)
t.start()
return run
class A:
#run_in_thread
def method1(self):
for x in range(10000):
print x
#run_in_thread
def method2(self):
for y in list('wlkefjwfejwiefwhfwfkjshkjadgfjhkewgfjwjefjwe'):
print y
def stop_thread(self):
pass
c = A()
c.method1()
c.method2()
As I understand, method1 and method2 are not synchronized, but synchronizing of that stuff implementing with help of locks. How I can add locks to my decorator-function?
How can I realize method for stopping long threads using decorators?
If you extend the function to
def run_in_thread(fn):
def run(*k, **kw):
t = threading.Thread(target=fn, args=k, kwargs=kw)
t.start()
return t # <-- this is new!
return run
i. e., let the wrapper function return the created thread, you can do
c = A()
t1 = c.method1()
t1.join() # wait for it to finish
t2 = c.method2()
# ...
i. e, get the thread where the original method runs in, do whatever you want with it (e. g. join it) and only then call the next method.
If you don't need it in a given case, you are free to omit it.
If you want to synchronize the two threads you simply need to add locks inside the decorated functions, not the decorators themselves.
There is not a simple way to directly stop a Thread, only way is to use an Event to signal the thread it must exit.
For threading decorators you can take a look at pebble.
Maybe Semaphores could help in decorators, something like this - calculating factorial numbers from 1 to 1000:
import threading
from functools import wraps
from math import factorial
DIC = {}
def limit(number):
''' This decorator limits the number of simultaneous Threads
'''
sem = threading.Semaphore(number)
def wrapper(func):
#wraps(func)
def wrapped(*args):
with sem:
return func(*args)
return wrapped
return wrapper
def async(f):
''' This decorator executes a function in a Thread'''
#wraps(f)
def wrapper(*args, **kwargs):
thr = threading.Thread(target=f, args=args, kwargs=kwargs)
thr.start()
return wrapper
#limit(10) # Always use #limit as the outter decorator
#async
def calcula_fatorial(number):
DIC.update({number: factorial(number)})
#limit(10)
def main(lista):
for elem in lista:
calcula_fatorial(elem)
if __name__ == '__main__':
from pprint import pprint
main(range(1000))
pprint(DIC)

Categories