Check maximum memory usage of Python program under Windows - python

I would like to obtain, from within the Python program, the maximum physical memory used by a Python program (ActiveState Python 3.2, under Windows 7).
Ideally, every 0.1 sec or so, I'd like to have memory usage polled, and if it exceeds the maximum seen so far, the maximum value (a global variable stored somewhere), updated.
UPDATE:
I realize my question is a close duplicate of Which Python memory profiler is recommended?.
Sorry for being unclear. I have two specific problems that are not, to my knowledge, addressed by a regular memory profiler:
I need to see not only the memory allocated by Python, but the total memory used by the Python program (under Windows, this would include DLLs, etc.). In other words, under Windows, this is exactly what you'd see in the Task Manager.
I need to see the maximum memory rather than the memory at any given instant. I can't think of a way to do that other than to place numerous memory checks all around the code whenever I think I'm allocating something large.

I think you could go under Control Panel, Admin Tools, Performances, click on the '+', select 'Processor' under 'Performance Object', Pool Paged Bytes or Pool Unpaged Bytes under the left column and you select your process under the right column.
You can generate a log file with the Performance monitor.

I did not find any way to do precisely what I want with Python.
However, in playing with Windows 7 Task Manager, I found I can add a column "Peak Working Set (Memory)". So if I simply pause the python program before it can exit (and catch exceptions from main() to pause for the same reason), I can see the peak memory in the task manager.
I know it's stupid (e.g., it doesn't allow me to print it to a log file, do something based on how memory I'm using, etc.), but it's better than nothing.

Here's a link to a similar question:
Which Python memory profiler is recommended?
I've used Guppy/Heapy and Meliae. They're a little different, but I've found both are quite useful.

Related

Is using `multiprocessing` still the easiest way to use make a subprocess that releases all memory when finished?

This is a follow up to a stackoverflow answer from 2009
How can I explicitly free memory in Python?
Unfortunately (depending on your version and release of Python) some
types of objects use "free lists" which are a neat local optimization
but may cause memory fragmentation, specifically by making more and
more memory "earmarked" for only objects of a certain type and thereby
unavailable to the "general fund".
The only really reliable way to ensure that a large but temporary use
of memory DOES return all resources to the system when it's done, is
to have that use happen in a subprocess, which does the memory-hungry
work then terminates. Under such conditions, the operating system WILL
do its job, and gladly recycle all the resources the subprocess may
have gobbled up. Fortunately, the multiprocessing module makes this
kind of operation (which used to be rather a pain) not too bad in
modern versions of Python.
In your use case, it seems that the best way for the subprocesses to
accumulate some results and yet ensure those results are available to
the main process is to use semi-temporary files (by semi-temporary I
mean, NOT the kind of files that automatically go away when closed,
just ordinary files that you explicitly delete when you're all done
with them).
It's been 10 years since that answer, and I am wondering if there is a better way to create some sort of process/subprocess/function/method that releases all of it's memory when completed.
The motivation for this is an issue I am having, where a forloop creates a memory error, despite creating no new variables.
Repeated insertions into sqlite database via sqlalchemy causing memory leak?
It is insertion to a database. I know it's not the database itself that is causing the memory error because when I restart my runtime, the database is still preserved, but the crash doesn't happen until another several hundred iterations of the for loop.

Python 3 - Faster Print & I/O

I'm currently involved in a Python project that involves handling massive amounts of data. In this, I have to print massive amounts of data to files. They are always one-liners, but sometimes consisting of millions of digits.
The actual mathematical operations in Python only take seconds, minutes at most. Printing them to a file takes up to several hours; which I don't always have.
Is there any way of speeding up the I/O?
From what I figure, the number is stored in the RAM (Or at least I assume so, it's the only thing which would take up 11GB of RAM), but Python does not print it to a text file immediately. Is there a way to dump that information -- if it is the number -- to a file? I've tried Task Manager's Dump, which gave me a 22GB dump file (Yes, you read that right), and it doesn't look like there's what I was looking for in there, albeit it wasn't very clear.
If it makes a difference, I have Python 3.5.1 (Anaconda and Spyder), Windows 8.1 x64 and 16GB RAM.
By the way, I do run Garbage Collect (gc module) inside the script, and I delete variables that are not needed, so those 11GB aren't just junk.
If you are indeed I/O bound by the time it takes to write the file, multi-threading with a pool of threads may help. Of course, there is a limit to that, but at least, it would allow you to issue non-blocking file writes.
Multithreading could speed it up (have printers on other threads that you write to in memory that have a queue).
Maybe a system design standpoint, but maybe evaluate whether or not you need to write everything to the file. Perhaps consider creating various levels of logging so that a release mode could run faster (if that makes sense in your context).
Use HDF5 file format
The problem is, you have to write a lot of data.
HDF5 is format being very efficient in size and allowing access to it by various tools.
Be prepared for few challenges:
there are multiple python packages for HDF5, you will have to find the one which fits your needs
installation is not always very simple (but there might be Windows installation binary)
expect a bit of study to understand the data structures to be stored.
it will occasionally need some CPU cycles - typically you write a lot of data quickly and at one moment it has to be flushed to the disk. At this moment it starts compressing the data what can take few seconds. See GIL for IO bounded thread in C extension (HDF5)
Anyway, I think, it is very likely, you will manage and apart of faster writes to the files you will also gain smaller files, which are simpler to handle.

Release memory in python when removing item from a list [duplicate]

I have a few related questions regarding memory usage in the following example.
If I run in the interpreter,
foo = ['bar' for _ in xrange(10000000)]
the real memory used on my machine goes up to 80.9mb. I then,
del foo
real memory goes down, but only to 30.4mb. The interpreter uses 4.4mb baseline so what is the advantage in not releasing 26mb of memory to the OS? Is it because Python is "planning ahead", thinking that you may use that much memory again?
Why does it release 50.5mb in particular - what is the amount that is released based on?
Is there a way to force Python to release all the memory that was used (if you know you won't be using that much memory again)?
NOTE
This question is different from How can I explicitly free memory in Python?
because this question primarily deals with the increase of memory usage from baseline even after the interpreter has freed objects via garbage collection (with use of gc.collect or not).
I'm guessing the question you really care about here is:
Is there a way to force Python to release all the memory that was used (if you know you won't be using that much memory again)?
No, there is not. But there is an easy workaround: child processes.
If you need 500MB of temporary storage for 5 minutes, but after that you need to run for another 2 hours and won't touch that much memory ever again, spawn a child process to do the memory-intensive work. When the child process goes away, the memory gets released.
This isn't completely trivial and free, but it's pretty easy and cheap, which is usually good enough for the trade to be worthwhile.
First, the easiest way to create a child process is with concurrent.futures (or, for 3.1 and earlier, the futures backport on PyPI):
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
result = executor.submit(func, *args, **kwargs).result()
If you need a little more control, use the multiprocessing module.
The costs are:
Process startup is kind of slow on some platforms, notably Windows. We're talking milliseconds here, not minutes, and if you're spinning up one child to do 300 seconds' worth of work, you won't even notice it. But it's not free.
If the large amount of temporary memory you use really is large, doing this can cause your main program to get swapped out. Of course you're saving time in the long run, because that if that memory hung around forever it would have to lead to swapping at some point. But this can turn gradual slowness into very noticeable all-at-once (and early) delays in some use cases.
Sending large amounts of data between processes can be slow. Again, if you're talking about sending over 2K of arguments and getting back 64K of results, you won't even notice it, but if you're sending and receiving large amounts of data, you'll want to use some other mechanism (a file, mmapped or otherwise; the shared-memory APIs in multiprocessing; etc.).
Sending large amounts of data between processes means the data have to be pickleable (or, if you stick them in a file or shared memory, struct-able or ideally ctypes-able).
Memory allocated on the heap can be subject to high-water marks. This is complicated by Python's internal optimizations for allocating small objects (PyObject_Malloc) in 4 KiB pools, classed for allocation sizes at multiples of 8 bytes -- up to 256 bytes (512 bytes in 3.3). The pools themselves are in 256 KiB arenas, so if just one block in one pool is used, the entire 256 KiB arena will not be released. In Python 3.3 the small object allocator was switched to using anonymous memory maps instead of the heap, so it should perform better at releasing memory.
Additionally, the built-in types maintain freelists of previously allocated objects that may or may not use the small object allocator. The int type maintains a freelist with its own allocated memory, and clearing it requires calling PyInt_ClearFreeList(). This can be called indirectly by doing a full gc.collect.
Try it like this, and tell me what you get. Here's the link for psutil.Process.memory_info.
import os
import gc
import psutil
proc = psutil.Process(os.getpid())
gc.collect()
mem0 = proc.memory_info().rss
# create approx. 10**7 int objects and pointers
foo = ['abc' for x in range(10**7)]
mem1 = proc.memory_info().rss
# unreference, including x == 9999999
del foo, x
mem2 = proc.memory_info().rss
# collect() calls PyInt_ClearFreeList()
# or use ctypes: pythonapi.PyInt_ClearFreeList()
gc.collect()
mem3 = proc.memory_info().rss
pd = lambda x2, x1: 100.0 * (x2 - x1) / mem0
print "Allocation: %0.2f%%" % pd(mem1, mem0)
print "Unreference: %0.2f%%" % pd(mem2, mem1)
print "Collect: %0.2f%%" % pd(mem3, mem2)
print "Overall: %0.2f%%" % pd(mem3, mem0)
Output:
Allocation: 3034.36%
Unreference: -752.39%
Collect: -2279.74%
Overall: 2.23%
Edit:
I switched to measuring relative to the process VM size to eliminate the effects of other processes in the system.
The C runtime (e.g. glibc, msvcrt) shrinks the heap when contiguous free space at the top reaches a constant, dynamic, or configurable threshold. With glibc you can tune this with mallopt (M_TRIM_THRESHOLD). Given this, it isn't surprising if the heap shrinks by more -- even a lot more -- than the block that you free.
In 3.x range doesn't create a list, so the test above won't create 10 million int objects. Even if it did, the int type in 3.x is basically a 2.x long, which doesn't implement a freelist.
eryksun has answered question #1, and I've answered question #3 (the original #4), but now let's answer question #2:
Why does it release 50.5mb in particular - what is the amount that is released based on?
What it's based on is, ultimately, a whole series of coincidences inside Python and malloc that are very hard to predict.
First, depending on how you're measuring memory, you may only be measuring pages actually mapped into memory. In that case, any time a page gets swapped out by the pager, memory will show up as "freed", even though it hasn't been freed.
Or you may be measuring in-use pages, which may or may not count allocated-but-never-touched pages (on systems that optimistically over-allocate, like linux), pages that are allocated but tagged MADV_FREE, etc.
If you really are measuring allocated pages (which is actually not a very useful thing to do, but it seems to be what you're asking about), and pages have really been deallocated, two circumstances in which this can happen: Either you've used brk or equivalent to shrink the data segment (very rare nowadays), or you've used munmap or similar to release a mapped segment. (There's also theoretically a minor variant to the latter, in that there are ways to release part of a mapped segment—e.g., steal it with MAP_FIXED for a MADV_FREE segment that you immediately unmap.)
But most programs don't directly allocate things out of memory pages; they use a malloc-style allocator. When you call free, the allocator can only release pages to the OS if you just happen to be freeing the last live object in a mapping (or in the last N pages of the data segment). There's no way your application can reasonably predict this, or even detect that it happened in advance.
CPython makes this even more complicated—it has a custom 2-level object allocator on top of a custom memory allocator on top of malloc. (See the source comments for a more detailed explanation.) And on top of that, even at the C API level, much less Python, you don't even directly control when the top-level objects are deallocated.
So, when you release an object, how do you know whether it's going to release memory to the OS? Well, first you have to know that you've released the last reference (including any internal references you didn't know about), allowing the GC to deallocate it. (Unlike other implementations, at least CPython will deallocate an object as soon as it's allowed to.) This usually deallocates at least two things at the next level down (e.g., for a string, you're releasing the PyString object, and the string buffer).
If you do deallocate an object, to know whether this causes the next level down to deallocate a block of object storage, you have to know the internal state of the object allocator, as well as how it's implemented. (It obviously can't happen unless you're deallocating the last thing in the block, and even then, it may not happen.)
If you do deallocate a block of object storage, to know whether this causes a free call, you have to know the internal state of the PyMem allocator, as well as how it's implemented. (Again, you have to be deallocating the last in-use block within a malloced region, and even then, it may not happen.)
If you do free a malloced region, to know whether this causes an munmap or equivalent (or brk), you have to know the internal state of the malloc, as well as how it's implemented. And this one, unlike the others, is highly platform-specific. (And again, you generally have to be deallocating the last in-use malloc within an mmap segment, and even then, it may not happen.)
So, if you want to understand why it happened to release exactly 50.5mb, you're going to have to trace it from the bottom up. Why did malloc unmap 50.5mb worth of pages when you did those one or more free calls (for probably a bit more than 50.5mb)? You'd have to read your platform's malloc, and then walk the various tables and lists to see its current state. (On some platforms, it may even make use of system-level information, which is pretty much impossible to capture without making a snapshot of the system to inspect offline, but luckily this isn't usually a problem.) And then you have to do the same thing at the 3 levels above that.
So, the only useful answer to the question is "Because."
Unless you're doing resource-limited (e.g., embedded) development, you have no reason to care about these details.
And if you are doing resource-limited development, knowing these details is useless; you pretty much have to do an end-run around all those levels and specifically mmap the memory you need at the application level (possibly with one simple, well-understood, application-specific zone allocator in between).
First, you may want to install glances:
sudo apt-get install python-pip build-essential python-dev lm-sensors
sudo pip install psutil logutils bottle batinfo https://bitbucket.org/gleb_zhulik/py3sensors/get/tip.tar.gz zeroconf netifaces pymdstat influxdb elasticsearch potsdb statsd pystache docker-py pysnmp pika py-cpuinfo bernhard
sudo pip install glances
Then run it in the terminal!
glances
In your Python code, add at the begin of the file, the following:
import os
import gc # Garbage Collector
After using the "Big" variable (for example: myBigVar) for which, you would like to release memory, write in your python code the following:
del myBigVar
gc.collect()
In another terminal, run your python code and observe in the "glances" terminal, how the memory is managed in your system!
Good luck!
P.S. I assume you are working on a Debian or Ubuntu system

Quit Python program when it hits memory limit

I have a couple of Python/Numpy programs that tend to cause the PC to freeze/run very slowly when they use too much memory. I can't even stop the scripts or move the cursor anymore, when it uses to much memory (e.g. 3.8/4GB)
Therefore, I would like to quit the program automatically when it hits a critical limit of memory usage, e.g. 3GB.
I could not find a solution yet. Is there a Pythonic way to deal with this, since I run my scripts on Windows and Linux machines.
You could limit the process'es memory limit, but that is OS specific.
Another solution would be checking value of psutil.virtual_memory(), and exiting your program if it reaches some point.
Though OS-independent, the second solution is not Pythonic at all. Memory management is one of the things we have operating systems for.
I'd agree that in general you want to do this from within the operating system - only because there's a reliability factor in having "possibly runaway code check itself for possibly runaway behavior"
If a hard and fast requirement is to do this WITHIN the script, then I think we'd need to know more about what you're actually doing. If you have a single large data structure that's consuming the majority of the memory, you can use sys.getsizeof to identify how large that structure is, and throw/catch an error if it gets larger than you want.
But without knowing at least a little more about the program structure, I think it'll be hard to help...

Releasing memory in Python

I have a few related questions regarding memory usage in the following example.
If I run in the interpreter,
foo = ['bar' for _ in xrange(10000000)]
the real memory used on my machine goes up to 80.9mb. I then,
del foo
real memory goes down, but only to 30.4mb. The interpreter uses 4.4mb baseline so what is the advantage in not releasing 26mb of memory to the OS? Is it because Python is "planning ahead", thinking that you may use that much memory again?
Why does it release 50.5mb in particular - what is the amount that is released based on?
Is there a way to force Python to release all the memory that was used (if you know you won't be using that much memory again)?
NOTE
This question is different from How can I explicitly free memory in Python?
because this question primarily deals with the increase of memory usage from baseline even after the interpreter has freed objects via garbage collection (with use of gc.collect or not).
I'm guessing the question you really care about here is:
Is there a way to force Python to release all the memory that was used (if you know you won't be using that much memory again)?
No, there is not. But there is an easy workaround: child processes.
If you need 500MB of temporary storage for 5 minutes, but after that you need to run for another 2 hours and won't touch that much memory ever again, spawn a child process to do the memory-intensive work. When the child process goes away, the memory gets released.
This isn't completely trivial and free, but it's pretty easy and cheap, which is usually good enough for the trade to be worthwhile.
First, the easiest way to create a child process is with concurrent.futures (or, for 3.1 and earlier, the futures backport on PyPI):
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
result = executor.submit(func, *args, **kwargs).result()
If you need a little more control, use the multiprocessing module.
The costs are:
Process startup is kind of slow on some platforms, notably Windows. We're talking milliseconds here, not minutes, and if you're spinning up one child to do 300 seconds' worth of work, you won't even notice it. But it's not free.
If the large amount of temporary memory you use really is large, doing this can cause your main program to get swapped out. Of course you're saving time in the long run, because that if that memory hung around forever it would have to lead to swapping at some point. But this can turn gradual slowness into very noticeable all-at-once (and early) delays in some use cases.
Sending large amounts of data between processes can be slow. Again, if you're talking about sending over 2K of arguments and getting back 64K of results, you won't even notice it, but if you're sending and receiving large amounts of data, you'll want to use some other mechanism (a file, mmapped or otherwise; the shared-memory APIs in multiprocessing; etc.).
Sending large amounts of data between processes means the data have to be pickleable (or, if you stick them in a file or shared memory, struct-able or ideally ctypes-able).
Memory allocated on the heap can be subject to high-water marks. This is complicated by Python's internal optimizations for allocating small objects (PyObject_Malloc) in 4 KiB pools, classed for allocation sizes at multiples of 8 bytes -- up to 256 bytes (512 bytes in 3.3). The pools themselves are in 256 KiB arenas, so if just one block in one pool is used, the entire 256 KiB arena will not be released. In Python 3.3 the small object allocator was switched to using anonymous memory maps instead of the heap, so it should perform better at releasing memory.
Additionally, the built-in types maintain freelists of previously allocated objects that may or may not use the small object allocator. The int type maintains a freelist with its own allocated memory, and clearing it requires calling PyInt_ClearFreeList(). This can be called indirectly by doing a full gc.collect.
Try it like this, and tell me what you get. Here's the link for psutil.Process.memory_info.
import os
import gc
import psutil
proc = psutil.Process(os.getpid())
gc.collect()
mem0 = proc.memory_info().rss
# create approx. 10**7 int objects and pointers
foo = ['abc' for x in range(10**7)]
mem1 = proc.memory_info().rss
# unreference, including x == 9999999
del foo, x
mem2 = proc.memory_info().rss
# collect() calls PyInt_ClearFreeList()
# or use ctypes: pythonapi.PyInt_ClearFreeList()
gc.collect()
mem3 = proc.memory_info().rss
pd = lambda x2, x1: 100.0 * (x2 - x1) / mem0
print "Allocation: %0.2f%%" % pd(mem1, mem0)
print "Unreference: %0.2f%%" % pd(mem2, mem1)
print "Collect: %0.2f%%" % pd(mem3, mem2)
print "Overall: %0.2f%%" % pd(mem3, mem0)
Output:
Allocation: 3034.36%
Unreference: -752.39%
Collect: -2279.74%
Overall: 2.23%
Edit:
I switched to measuring relative to the process VM size to eliminate the effects of other processes in the system.
The C runtime (e.g. glibc, msvcrt) shrinks the heap when contiguous free space at the top reaches a constant, dynamic, or configurable threshold. With glibc you can tune this with mallopt (M_TRIM_THRESHOLD). Given this, it isn't surprising if the heap shrinks by more -- even a lot more -- than the block that you free.
In 3.x range doesn't create a list, so the test above won't create 10 million int objects. Even if it did, the int type in 3.x is basically a 2.x long, which doesn't implement a freelist.
eryksun has answered question #1, and I've answered question #3 (the original #4), but now let's answer question #2:
Why does it release 50.5mb in particular - what is the amount that is released based on?
What it's based on is, ultimately, a whole series of coincidences inside Python and malloc that are very hard to predict.
First, depending on how you're measuring memory, you may only be measuring pages actually mapped into memory. In that case, any time a page gets swapped out by the pager, memory will show up as "freed", even though it hasn't been freed.
Or you may be measuring in-use pages, which may or may not count allocated-but-never-touched pages (on systems that optimistically over-allocate, like linux), pages that are allocated but tagged MADV_FREE, etc.
If you really are measuring allocated pages (which is actually not a very useful thing to do, but it seems to be what you're asking about), and pages have really been deallocated, two circumstances in which this can happen: Either you've used brk or equivalent to shrink the data segment (very rare nowadays), or you've used munmap or similar to release a mapped segment. (There's also theoretically a minor variant to the latter, in that there are ways to release part of a mapped segment—e.g., steal it with MAP_FIXED for a MADV_FREE segment that you immediately unmap.)
But most programs don't directly allocate things out of memory pages; they use a malloc-style allocator. When you call free, the allocator can only release pages to the OS if you just happen to be freeing the last live object in a mapping (or in the last N pages of the data segment). There's no way your application can reasonably predict this, or even detect that it happened in advance.
CPython makes this even more complicated—it has a custom 2-level object allocator on top of a custom memory allocator on top of malloc. (See the source comments for a more detailed explanation.) And on top of that, even at the C API level, much less Python, you don't even directly control when the top-level objects are deallocated.
So, when you release an object, how do you know whether it's going to release memory to the OS? Well, first you have to know that you've released the last reference (including any internal references you didn't know about), allowing the GC to deallocate it. (Unlike other implementations, at least CPython will deallocate an object as soon as it's allowed to.) This usually deallocates at least two things at the next level down (e.g., for a string, you're releasing the PyString object, and the string buffer).
If you do deallocate an object, to know whether this causes the next level down to deallocate a block of object storage, you have to know the internal state of the object allocator, as well as how it's implemented. (It obviously can't happen unless you're deallocating the last thing in the block, and even then, it may not happen.)
If you do deallocate a block of object storage, to know whether this causes a free call, you have to know the internal state of the PyMem allocator, as well as how it's implemented. (Again, you have to be deallocating the last in-use block within a malloced region, and even then, it may not happen.)
If you do free a malloced region, to know whether this causes an munmap or equivalent (or brk), you have to know the internal state of the malloc, as well as how it's implemented. And this one, unlike the others, is highly platform-specific. (And again, you generally have to be deallocating the last in-use malloc within an mmap segment, and even then, it may not happen.)
So, if you want to understand why it happened to release exactly 50.5mb, you're going to have to trace it from the bottom up. Why did malloc unmap 50.5mb worth of pages when you did those one or more free calls (for probably a bit more than 50.5mb)? You'd have to read your platform's malloc, and then walk the various tables and lists to see its current state. (On some platforms, it may even make use of system-level information, which is pretty much impossible to capture without making a snapshot of the system to inspect offline, but luckily this isn't usually a problem.) And then you have to do the same thing at the 3 levels above that.
So, the only useful answer to the question is "Because."
Unless you're doing resource-limited (e.g., embedded) development, you have no reason to care about these details.
And if you are doing resource-limited development, knowing these details is useless; you pretty much have to do an end-run around all those levels and specifically mmap the memory you need at the application level (possibly with one simple, well-understood, application-specific zone allocator in between).
First, you may want to install glances:
sudo apt-get install python-pip build-essential python-dev lm-sensors
sudo pip install psutil logutils bottle batinfo https://bitbucket.org/gleb_zhulik/py3sensors/get/tip.tar.gz zeroconf netifaces pymdstat influxdb elasticsearch potsdb statsd pystache docker-py pysnmp pika py-cpuinfo bernhard
sudo pip install glances
Then run it in the terminal!
glances
In your Python code, add at the begin of the file, the following:
import os
import gc # Garbage Collector
After using the "Big" variable (for example: myBigVar) for which, you would like to release memory, write in your python code the following:
del myBigVar
gc.collect()
In another terminal, run your python code and observe in the "glances" terminal, how the memory is managed in your system!
Good luck!
P.S. I assume you are working on a Debian or Ubuntu system

Categories