A module to profile peak memory usage of Python code

A module to profile peak memory usage of Python code - python

Currently, I tried to use the memory_profiler module to get the used memory like the following code:
from memory_profiler import memory_usage
memories=[]
def get_memory(mem,ended):
if ended:
highest_mem=max(mem)
print highest_mem
else:
memories.append(mem)
def f1():
#do something
ended=False
get_memory(memory_usage(),ended)
return #something
def f2():
#do something
ended=False
get_memory(memory_usage(),ended)
return #something
#main
f1()
f2()
ended=True
get_memory(memory_usage(),ended) #code end
>>>#output
# output
# highest memory
however, it did not successfully execute. It got stuck when ended=True and sent the value of memory_usage() and ended to the function of get_memory. It did not show any error as well., just waiting for long long time, then I force to stop executing. Anyone knows the better way or the solution?

An easy way to use memory_usage to get the peak / maximum memory from a block of code is to first put that code in a function, and then pass that function - without the () call - to memory_usage() as the proc argument:
from memory_profiler import memory_usage
def myfunc():
# code
return
mem = max(memory_usage(proc=myfunc))
print("Maximum memory used: {} MiB".format(mem))
Other arguments allow you to collect timestamps, return values, pass arguments to myfunc, etc. The docstring seems to be the only complete source for documentation on this: https://github.com/fabianp/memory_profiler/blob/master/memory_profiler.py
https://github.com/fabianp/memory_profiler/blob/4089e3ed4d5c4197925a2df8393d4cbfca745ae5/memory_profiler.py#L244

I mainly use Heapy because it's really easy to use.
Just type the following code where you want to test for memory usage.
from guppy import hpy
hp = hpy()
print hp.heap()

Related

Does "sys.settrace" work properly in Python 3.5 but not in Python 3.6?

While trying to answer another question, it dawned on me that you can have code run any time in a thread when you theoretically should not have control. CPython has a settrace function for registering a tracing function in one's code. To test this idea from the use of a class, the following code was written. The problem is that tracing does not seem to occur, and no data is generated in the tracing log. What is causing the problem in the code shown below?
#! /usr/bin/env python3
import atexit
import collections
import pprint
import sys
def main():
i = create_instance()
print(len(i.data), flush=True)
def create_instance():
instance = Tracer()
return instance
class Tracer:
def __init__(self):
self.data = []
sys.settrace(self.trace)
atexit.register(pprint.pprint, self.data)
atexit.register(sys.settrace, None)
def trace(self, frame, event, arg):
print('Tracing ...', flush=True)
self.data.append(TraceRecord(frame, event, arg))
return self.trace
TraceRecord = collections.namedtuple('TraceRecord', 'frame, event, arg')
if __name__ == '__main__':
main()
Addendum:
The problem is not apparent when running Python 3.5 on Windows. However, tracing does not occur in Python 3.6 such that the trace log is not printed. If someone can confirm the bug for me as a well-presented answer, there is a good chance the submission would be accepted and awarded the bounty.

I tried your program, and indeed as posted it does not have anything to trace. The built-in functions print() and len() do not generate trace events, presumably since they are defined by the platform and it is assumed that their internals work correctly and are not interesting.
The documentation states:
The trace function is invoked (with event set to 'call') whenever a new local scope is entered;
I modified your program to define a function and to call it. When I did this, then your trace function was called.
The function I defined:
def doSomething():
print("Hello World")
My version of your main() function:
def main():
i = create_instance()
print(len(i.data))
doSomething()
print(len(i.data))
The output I see:
0
Tracing ...
Tracing ...
Hello World
Tracing ...
3

Creating a timeout function in Python with multiprocessing

I'm trying to create a timeout function in Python 2.7.11 (on Windows) with the multiprocessing library.
My basic goal is to return one value if the function times out and the actual value if it doesn't timeout.
My approach is the following:
from multiprocessing import Process, Manager
def timeoutFunction(puzzleFileName, timeLimit):
manager = Manager()
returnVal = manager.list()
# Create worker function
def solveProblem(return_val):
return_val[:] = doSomeWork(puzzleFileName) # doSomeWork() returns list
p = Process(target=solveProblem, args=[returnVal])
p.start()
p.join(timeLimit)
if p.is_alive():
p.terminate()
returnVal = ['Timeout']
return returnVal
And I call the function like this:
if __name__ == '__main__':
print timeoutFunction('example.txt', 600)
Unfortunately this doesn't work and I receive some sort of EOF error in pickle.py
Can anyone see what I'm doing wrong?
Thanks in advance,
Alexander
Edit: doSomeWork() is not an actual function. Just a filler for some other work I do. That work is not done in parallel and does not use any shared variables. I'm only trying to run a single function and have it possibly timeout.

You can use the Pebble library for this.
from pebble import concurrent
from concurrent.futures import TimeoutError
TIMEOUT_IN_SECONDS = 10
#concurrent.process(timeout=TIMEOUT_IN_SECONDS)
def function(foo, bar=0):
return foo + bar
future = function(1, bar=2)
try:
result = future.result() # blocks until results are ready or timeout
except TimeoutError as error:
print "Function took longer than %d seconds" % error.args[1]
result = 'timeout'
The documentation has more complete examples.
The library will terminate the function if it timeouts so you don't need to worry about IO or CPU being wasted.
EDIT:
If you're doing an assignment, you can still look at its implementation.
Short example:
from multiprocessing import Pipe, Process
def worker(pipe, function, args, kwargs):
try:
results = function(*args, **kwargs)
except Exception as error:
results = error
pipe.send(results)
pipe = Pipe(duplex=False)
process = Process(target=worker, args=(pipe, function, args, kwargs))
if pipe.poll(timeout=5):
process.terminate()
process.join()
results = 'timeout'
else:
results = pipe.recv()
Pebble provides a neat API, takes care of corner cases and uses more robust mechanisms. Yet this is more or less what it does under the hood.

The problem seems to have been that the function solveProblem was defined inside my outer function. Python doesn't seem to like that. Once I moved it outside it worked fine.
I'll mark noxdafox answer as an answer as I implementing the pebble solution led me to this answer.
Thanks all!

How to timeout function in python, timeout less than a second

Specification of the problem:
I'm searching through really great amount of lines of a log file and I'm distributing those lines to groups in order to regular expressions(RegExses) I have stored using the re.match() function. Unfortunately some of my RegExses are too complicated and Python sometimes gets himself to backtracking hell. Due to this I need to protect it with some kind of timeout.
Problems:
re.match, I'm using, is Python's function and as I found out somewhere here on StackOverflow (I'm really sorry, I can not find the link now :-( ). It is very difficult to interrupt thread with running Python's library. For this reason threads are out of the game.
Because evaluating of re.match function takes relatively short time and I want to analyse with this function great amount of lines, I need some timeout function that wont't take too long to execute (this makes threads even less suitable, it takes really long time to initialise new thread) and can be set to less than one second.
For those reasons, answers here - Timeout on a function call
and here - Timeout function if it takes too long to finish with decorator (alarm - 1sec and more) are off the table.
I've spent this morning searching for solution to this question but I did not find any satisfactory answer.

Solution:
I've just modified a script posted here: Timeout function if it takes too long to finish.
And here is the code:
from functools import wraps
import errno
import os
import signal
class TimeoutError(Exception):
pass
def timeout(seconds=10, error_message=os.strerror(errno.ETIME)):
def decorator(func):
def _handle_timeout(signum, frame):
raise TimeoutError(error_message)
def wrapper(*args, **kwargs):
signal.signal(signal.SIGALRM, _handle_timeout)
signal.setitimer(signal.ITIMER_REAL,seconds) #used timer instead of alarm
try:
result = func(*args, **kwargs)
finally:
signal.alarm(0)
return result
return wraps(func)(wrapper)
return decorator
And then you can use it like this:
from timeout import timeout
from time import time
#timeout(0.01)
def loop():
while True:
pass
try:
begin = time.time()
loop()
except TimeoutError, e:
print "Time elapsed: {:.3f}s".format(time.time() - begin)
Which prints
Time elapsed: 0.010s

Python line-by-line memory profiler?

I'm looking to generate, from a large Python codebase, a summary of heap usage or memory allocations over the course of a function's run.
I'm familiar with heapy, and it's served me well for taking "snapshots" of the heap at particular points in my code, but I've found it difficult to generate a "memory-over-time" summary with it. I've also played with line_profiler, but that works with run time, not memory.
My fallback right now is Valgrind with massif, but that lacks a lot of the contextual Python information that both Heapy and line_profiler give. Is there some sort of combination of the latter two that give a sense of memory usage or heap growth over the execution span of a Python program?

I would use sys.settrace at program startup to register a custom tracer function. The custom_trace_function will be called for each line of code. Then you can use that function to store information gathered by heapy or meliae in a file for later processing.
Here is a very simple example which logs the output of hpy.heap() each second to a plain text file:
import sys
import time
import atexit
from guppy import hpy
_last_log_time = time.time()
_logfile = open('logfile.txt', 'w')
def heapy_profile(frame, event, arg):
currtime = time.time()
if currtime - _last_log_time < 1:
return
_last_log_time = currtime
code = frame.f_code
filename = code.co_filename
lineno = code.co_firstlineno
idset = hpy().heap()
logfile.write('%s %s:%s\n%s\n\n' % (currtime, filename, lineno, idset))
logfile.flush()
atexit.register(_logfile.close)
sys.settrace(heapy_profile)

You might be interested by memory_profiler.

Measure time of a function with arguments in Python

I am trying to measure the time of raw_queries(...), unsuccessfully so far. I found that I should use the timeit module. The problem is that I can't (= I don't know how) pass the arguments to the function from the environment.
Important note: Before calling raw_queries, we have to execute phase2() (environment initialization).
Side note: The code is in Python 3.
def raw_queries(queries, nlp):
""" Submit queries without getting visual response """
for q in queries:
nlp.query(q)
def evaluate_queries(queries, nlp):
""" Measure the time that the queries need to return their results """
t = Timer("raw_queries(queries, nlp)", "?????")
print(t.timeit())
def phase2():
""" Load dictionary to memory and subsequently submit queries """
# prepare Linguistic Processor to submit it the queries
all_files = get_files()
b = LinguisticProcessor(all_files)
b.loadDictionary()
# load the queries
queries_file = 'queries.txt'
queries = load_queries(queries_file)
if __name__ == '__main__':
phase2()
Thanks for any help.
UPDATE: We can call phase2() using the second argument of Timer. The problem is that we need the arguments (queries, nlp) from the environment.
UPDATE: The best solution so far, with unutbu's help (only what has changed):
def evaluate_queries():
""" Measure the time that the queries need to return their results """
t = Timer("main.raw_queries(queries, nlp)", "import main;\
(queries,nlp)=main.phase2()")
sf = 'Execution time: {} ms'
print(sf.format(t.timeit(number=1000)))
def phase2():
...
return queries, b
def main():
evaluate_queries()
if __name__ == '__main__':
main()

First, never use the time module to time functions. It can easily lead to wrong conclusions. See timeit versus timing decorator for an example.
The easiest way to time a function call is to use IPython's %timeit command.
There, you simply start an interactive IPython session, call phase2(), define queries,
and then run
%timeit raw_queries(queries,nlp)
The second easiest way that I know to use timeit is to call it from the command-line:
python -mtimeit -s"import test; queries=test.phase2()" "test.raw_queries(queries)"
(In the command above, I assume the script is called test.py)
The idiom here is
python -mtimeit -s"SETUP_COMMANDS" "COMMAND_TO_BE_TIMED"
To be able to pass queries to the raw_queries function call, you have to define the queries variable. In the code you posted queries is defined in phase2(), but only locally. So to setup queries as a global variable, you need to do something like have phase2 return queries:
def phase2():
...
return queries
If you don't want to mess up phase2 this way, create a dummy function:
def phase3():
# Do stuff like phase2() but return queries
return queries

Custom timer function may be a solution:
import time
def timer(fun,*args):
start = time.time()
ret = fun(*args)
end = time.time()
return (ret, end-start)
Using like this:
>>> from math import sin
>>> timer(sin, 0.5)
(0.47942553860420301, 6.9141387939453125e-06)
It means that sin returned 0.479... and it took 6.9e-6 seconds. Make sure your functions run long enough if you want to obtain reliable numbers (not like in the example above).

Normally, you would use timeit.
Examples are here and here.
Also Note:
By default, timeit() temporarily turns
off garbage collection during the
timing. The advantage of this approach
is that it makes independent timings
more comparable. This disadvantage is
that GC may be an important component
of the performance of the function
being measured
Or you can write your own custom timer using the time module.
If you go with a custom timer, remember that you should use time.clock() on windows and time.time() on other platforms. (timeit chooses internally)
import sys
import time
# choose timer to use
if sys.platform.startswith('win'):
default_timer = time.clock
else:
default_timer = time.time
start = default_timer()
# do something
finish = default_timer()
elapsed = (finish - start)

I'm not sure about this, I've never used it, but from what I've read it should be something like this:
....
t = Timer("raw_queries(queries, nlp)", "from __main__ import raw_queries")
print t.timeit()
I took this from http://docs.python.org/library/timeit.html (if this helps).

You don't say so, but are you by any chance trying to make the code go faster? If so, I suggest you not focus in on a particular routine and try to time it. Even if you get a number, it won't really tell you what to fix. If you can pause the program under the IDE several times and examine it's state, including the call stack, it will tell you what is taking the time and why. Here is a link that gives a brief explanation of how and why it works.*
*When you follow the link, you may have to go the the bottom of the previous page of answers. SO is having trouble following a link to an answer.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

A module to profile peak memory usage of Python code - python

I mainly use Heapy because it's really easy to use. Just type the following code where you want to test for memory usage. from guppy import hpy hp = hpy() print hp.heap()

Related

Does "sys.settrace" work properly in Python 3.5 but not in Python 3.6?

Creating a timeout function in Python with multiprocessing

How to timeout function in python, timeout less than a second

Python line-by-line memory profiler?

Measure time of a function with arguments in Python

Categories

Resources