xPython savefig memory leak unresolved with formerly proposed solutions

xPython savefig memory leak unresolved with formerly proposed solutions - python

EDIT: SOLVED
But i'm not sure how... i moved as much as I could to pylab instead of pyplot, implemented a multiprocessing approach described [here][1], and at that point it was working fine. Then in an attempt to pinpoint the issue, i reversed my code step-by-step. But it kept working, even without the multiprocessing, even with pyplot, etc. Only when i take off fig.clf() now it does not work, which is what most people seem to experience but was not the issue with me initially. Well maybe it was, maybe the clf() statement wasnt at the right place or something. Thanks !
EDIT: ISSUE STILL NOT SOLVED
That's very surprising. I now moved my savefig() function into an external module, which I import when executing my script. Here is the function:
def savethefig(fig,outpath,luname):
plt.savefig(outpath+luname+'testredtarget.png',dpi=500)
So now i do something like:
from myfile import savethefig
fig = plt.figure()
ax1 = fig.add_subplot(311)
pftmap=zeros(shape=(1800,3600))*nan
for i in range(len(pftspatialnames)):
pftmap[indlat,indlon]=data[i,:]
fig = pylab.gcf()
a1=ax1.imshow(pftmap, interpolation='nearest',origin='upper', aspect='auto')
savethefig(fig,outpath,luname)
I did everything step by step, line by line and the RAM definitely goes up when hiting the savefig() function within the external function. Goes up by about 500MB. Then when going back to the main script, that memory is not released.
Aren't external function supposed to clear everything ? I'm missing something...
ORIGINAL POST:
I'm using python EDP 7.3-2 (32 bit) (python 2.7.3).
I have a program that does some computations, then maps some of the results out and saves the images, with matplotlib.
That's a pretty large amount of data, and if i try to map and save too many images, I hit a memory limit. Which I shouldnt since I re-use the same figure, and dont create new variables. I've struggled for a while with that, tried all the figure clearing/deleting etc solutions, changing the backend used by matplotlib, but nothing does it, everytime the code hits the savefig function, it adds a lot of memory and does not take it off later.
I'm far from being an expert on memory stuff (or python by the way), but here is one diagnose attempt i ran, using the module objgraph:
from numpy import *
import matplotlib
matplotlib.use('Agg')
from matplotlib.figure import Figure
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib import pyplot as plt
outpath = '/Users/yannick/Documents/outputs/'
pftspatialnames = ['forest','shrubs','grass']
Computations (not shown)
fig = plt.figure()
for i in range(len(pftspatialnames)):
pftmap=zeros(shape=(1800,3600))*nan
pftmap[indlat,indlon]=data[i,:]
fig = pylab.gcf()
ax1 = plt.subplot2grid((3,1),(0,0))
a1=ax1.imshow(pftmap, interpolation='nearest',origin='upper', aspect='auto')
del(pftmap)
gc.collect()
print 'MEMORY LEAK DETECTOR before figsave'
objgraph.show_growth()
plt.savefig(outpath+pftspatialnames[i]+'testredtarget.png', dpi=500)
print 'MEMORY LEAK DETECTOR after figsave'
objgraph.show_growth()
Pretty big maps (and there's 3 of them, 3 subplots, just showed one here), but it handles it pretty well. It takes about 4 figures to saturate memory. Some stuff may be superfluous (e.g. deleting pftmap), but i was trying everything to clear some memory.
And here is the printed output:
MEMORY LEAK DETECTOR before figsave
dict 6640 +2931
weakref 3218 +1678
instance 1440 +1118
tuple 3501 +939
function 12486 +915
Bbox 229 +229
instancemethod 684 +171
Line2D 147 +147
TransformedBbox 115 +115
Path 127 +114
MEMORY LEAK DETECTOR after figsave
dict 7188 +548
Path 422 +295
weakref 3494 +276
instance 1679 +239
tuple 3703 +202
function 12670 +184
TransformedPath 87 +87
instancemethod 741 +57
Line2D 201 +54
FontProperties 147 +36
So before calling savefig there's a lot of new objects (that's normal i do a bunch of stuff in the code before). But then by just calling savefig we add 550 dict, etc. Could that be the source of my problem ? Note that this happens the first time i call savefig only. Any subsequent call does the following:
MEMORY LEAK DETECTOR before figsave
MEMORY LEAK DETECTOR after figsave
tuple 3721 +6
dict 7206 +6
function 12679 +3
list 2001 +3
weakref 3503 +3
instance 1688 +3
Bbox 260 +3
but memory still keeps growing and soon i hit memory limit.
Thanks a lot !

Googled anecdotal evidence seems to suggest that executing 'matplotlib.pyplot.close' after a savefig will reclaim/free the memory associated with the figure. See pyplot docs for all the calling options.

I forgot which stack overflow thread/website I found this solution at, but using the figure directly (and not touching the pyplot state machine) usually does the trick. So:
fig = figure.Figure()
canvas = FigureCanvas(fig)
...
canvas.print_figure(outpath+pftspatialnames[i]+'testredtarget.png', dpi=500)

EDIT: SOLVED
But i'm not sure how... i moved as much as I could to pylab instead of pyplot, implemented a multiprocessing approach described [here][1], and at that point it was working fine. Then in an attempt to pinpoint the issue, i reversed my code step-by-step. But it kept working, even without the multiprocessing, even with pyplot, etc. Only when i take off fig.clf() now it does not work, which is what most people seem to experience but was not the issue with me initially. Well maybe it was, maybe the clf() statement wasnt at the right place or something. Thanks

Related

PyCharm - Auto Completion for matplotlib (and other imported modules)

I am using PyCharm 2016.1 and Python 2.7 on Windows 10 and imported the matplotlib module.
As the matplotlib module ist very extensive and I am relatively new to Python, I hoped the Auto Complete function in PyCharm could help me to get an overview of the existent properties/ functions of an object. It would be more convenient as digging through the api documentation every time, not knowing what to look for an where to find it.
For example:
from matplotlib import pyplot as plt
fig, ax = plt.subplots()
When I type ax. there ist no auto completion for the properties, functions etc. of the axis, I only get the suggestions list.
I already tried this and imported the axis module directly with:
import matplotlib.axis as axis
or
from matplotlib.axis import Axis as axis
Smart Auto Completion and 'Collect run-time types information' is already enabled.
Is there a way to enable the auto completion like described or is there another IDE that supports that?

I believe your problem is highlighted here:
https://intellij-support.jetbrains.com/hc/en-us/community/posts/205816499-Improving-collecting-run-time-type-information-for-code-insight?sort_by=votes
Tldr return types can vary, so it cant be figured out at compile time.
Most accepted way is to use a type hint, since it can only figure out what type it as run time :
import matplotlib.axes._axes as axes
fig = plt.figure(figsize=(5,10))
ax1 = fig.add_subplot(3,1,1) # type:axes.Axes
ax1.set_xlabel('Test') <- now autocompletes
You can also try an assert isinstance:
import matplotlib.axes._axes as axes
fig = plt.figure(figsize=(5,10))
ax1 = fig.add_subplot(3,1,1)
assert isinstance(ax1, axes.Axes)
ax1.set_xlabel('Test')
It wont find the autocomplete if you do it after the method you are looking for:
ax1.set_xlabel('Test')
assert isinstance(ax1, axes.Axes)
With this, you shouldnt let isinstance dictate the control flow of your code, if you are trying to run a method that doesnt exist on an object, it should crash, however, if your different object has a method of the same name (!) then you have inadvertently reached that goal without annotations being there. So I like it better, since you want it to crash early and in the correct place. YMMV
From the doc:
Assertions should not be used to test for failure cases that can
occur because of bad user input or operating system/environment
failures, such as a file not being found. Instead, you should raise an
exception, or print an error message, or whatever is appropriate. One
important reason why assertions should only be used for self-tests of
the program is that assertions can be disabled at compile time.
If Python is started with the -O option, then assertions will be
stripped out and not evaluated. So if code uses assertions heavily,
but is performance-critical, then there is a system for turning them
off in release builds. (But don't do this unless it's really
necessary.
https://wiki.python.org/moin/UsingAssertionsEffectively
Alternatively, if you dont want to add to your code in this fashion, and have Ipython/jupyter installed through anoconda, you can get the code completion from the console by right clicking the code to be ran and choosing "execute selection in console"

In addition to Paul's answer. If you are using fig, ax = plt.subplots() , you could use figure type hint. See below example:
from matplotlib import pyplot as plt
import matplotlib.axes._axes as axes
import matplotlib.figure as figure
fig, ax = plt.subplots() # type:figure.Figure, axes.Axes
ax.
fig.

How to manipulate figures while a script is running in Python?

Introduction
As I am coming from matlab, I am used to an interactive interface where a script can update figures while it is running. During the processing each figure can be re-sized or even closed. This probably means that each figure is running in its own thread which is obviously not the case with matplotlib.
IPython can imitate the Matlab behavior using the magic command %pylab or %matplotlib which does something that I don't understand yet and which is the very point of my question.
My goal is then to allow standalone Python scripts to work as Matlab does (or as IPython with %matplotlib does). In other words, I would like this script to be executed from the command line. I am expecting a new figure that pop-up every 3 seconds. During the execution I would be able to zoom, resize or even close the figure.
#!/usr/bin/python
import matplotlib.pyplot as plt
import time
def do_some_work():
time.sleep(3)
for i in range(10):
plt.plot([1,2,3,4])
plt.show() # this is way too boilerplate, I'd like to avoid it too.
do_some_work()
What alternative to %matplotlib I can use to manipulate figures while a script is running in Python (not IPython)?
What solutions I've already investigated?
I currently found 3 way to get a plot show.
1. %pylab / %matplotlib
As tom said, the use of %pylab should be avoided to prevent the namespace to be polluted.
>>> %pylab
>>> plot([1,2,3,4])
This solution is sweet, the plot is non-blocking, there is no need for an additionnal show(), I can still add a grid with grid() afterwards and I can close, resize or zoom on my figure with no additional issues.
Unfortunately the %matplotlib command is only available on IPython.
2. from pylab import * or from matplotlib.pyplot import plt
>>> from pylab import *
>>> plot([1,2,3,4])
Things are quite different here. I need to add the command show() to display my figure which is blocking. I cannot do anything but closing the figure to execute the next command such as grid() which will have no effect since the figure is now closed...
** 3. from pylab import * or from matplotlib.pyplot import plt + ion()**
Some suggestions recommend to use the ion() command as follow:
>>> from pylab import *
>>> ion()
>>> plot([1,2,3,4])
>>> draw()
>>> pause(0.0001)
Unfortunately, even if the plot shows, I cannot close the figure manually. I will need to execute close() on the terminal which is not very convenient. Moreover the need for two additional commands such as draw(); pause(0.0001) is not what I am expecting.
Summary
With %pylab, everything is wonderful, but I cannot use it outside of IPython
With from pylab import * followed by a plot, I get a blocking behavior and all the power of IPython is wasted.
from pylab import * followed by ion offers a nice alternative to the previous one, but I have to use the weird pause(0.0001) command that leads to a window that I cannot close manually (I know that the pause is not needed with some backends. I am using WxAgg which is the only one that works well on Cygwin x64.
This question advices to use matplotlib.interactive(True). Unfortunately it does not work and gives the same behavior as ion() does.

Change your do_some_work function to the following and it should work.
def do_some_work():
plt.pause(3)
For interactive backends plt.pause(3) starts the event loop for 3 seconds so that it can process your resize events. Note that the documentation says that it is an experimental function and that for complex animations you should use the animation module.
The, %pylab and %matplotlib magic commands also start an event loop, which is why user interaction with the plots is possible. Alternatively, you can start the event loop with %gui wx, and turn it off with %gui. You can use the IPython.lib.guisupport.is_event_loop_running_wx() function to test if it is running.
The reason for using ion() or ioff() is very well explained in the 'What is interactive mode' page. In principle, user interaction is possible without IPython. However, I could not get the interactive-example from that page to work with the Qt4Agg backend, only with the MacOSX backend (on my Mac). I didn't try with the WX backend.
Edit
I did manage to get the interactive-example to work with the Qt4Agg backend by using PyQt4 instead of PySide (so by setting backend.qt4 : PyQt4 in my ~/.config/matplotlibrc file). I think the example doesn't work with all backends. I submitted an issue here.
Edit 2
I'm afraid I can't think of a way of manipulating the figure while a long calculation is running, without using threads. As you mentioned: Matplotlib doesn't start a thread, and neither does IPython. The %pylab and %matplotlib commands alternate between processing commands from the read-eval-print loop and letting the GUI processing events for a short time. They do this sequentially.
In fact, I'm unable to reproduce your behavior, even with the %matplotlib or %pylab magic. (Just to be clear: in ipython I first call %matplotlib and then %run yourscript.py). The %matplotlib magic puts Matplotlib in interactive-mode, which makes the plt.show() call non-blocking so that the do_some_work function is executed immediately. However, during the time.sleep(3) call, the figure is unresponsive (this becomes even more apparent if I increase the sleeping period). I don't understand how this can work at your end.
Unless I'm wrong you'll have to break up your calculation in smaller parts and use plt.pause (or even better, the animation module) to update the figures.

My advice would be to keep using IPython, since it manages the GUI event loop for you (that's what pylab/pylot does).
I tried interactive plotting in a normal interpreter and it worked the way it is expected, even without calling ion() (Debian unstable, Python 3.4.3+, Matplotlib 1.4.2-3.1). If I recall it right, it's a fairly new feature in Matplotlib.
Alternatively, you can also use Matplotlib's animation capabilities to update a plot periodically:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import time
plt.ion()
tt = np.linspace(0, 1, 200)
freq = 1 # parameter for sine
t0 = time.time() # for measuring ellapsed time
fig, ax = plt.subplots()
def draw_func(i):
""" This function gets called repeated times """
global freq # needed because freq is changed in this function
xx = np.sin(2*np.pi*freq*tt)/freq
ax.set_title("Passed Time: %.1f s, " % (time.time()-t0) +
"Parameter i=%d" % i)
ax.plot(tt, xx, label="$f=%d$ Hz" % freq)
ax.legend()
freq += 1
# call draw_func every 3 seconds 1 + 4 times (first time is initialization):
ani = animation.FuncAnimation(fig, draw_func, np.arange(4), interval=3000,
repeat=False)
# plt.show()
Checkout matplotlib.animation.FuncAnimation for details. You'll find further examples in the examples section.

Why does python keep crashing when plotting with matplotlib?

I am running a python code that simulates and predicts conditions inside a rapid compression machine (such as temperature, pressure, chemical species, etc.). The code is set up so that it calculates the time evolution of all these parameters (which are saved in numpy arrays) and then outputs them to a function, which creates and saves plots of these parameters against time.
These arrays are very large, and the function creates about 14 plots using matplotlib. The function will get to about the 7th plot (sometimes more, sometimes less), when I get an error reading "python.exe has stopped working". Not sure if it's a memory issue because the plots are so big or what's going on.
I have tried both (as you'll see in my sample code) plt.figure.clf() and plt.close(). I've even tried doing a time.sleep inbetween plots. The following is an example snippet of my function that does the plotting (not the real code, just an example necessary to illustrate my problem).
I am using Windows, Python 2.7, and Matplotlib 1.4.2
def graph_parameters(time, a, b, c, d, e): #a,b,c,d,e are my parameters
delay = 10
plt.figure(1)
plt.figure(facecolor="white")
plt.plot(time, a)
plt.ylabel('a')
plt.xlabel('Time(s)')
plt.savefig('fig1.png')
plt.figure(1).clf()
plt.close('all')
time.sleep(delay)
plt.figure(2)
plt.figure(facecolor="white")
plt.plot(time, b)
plt.ylabel('b')
plt.xlabel('Time(s)')
plt.savefig('fig2.png')
plt.figure(2).clf()
plt.close('all')
time.sleep(delay)
etc.
Thanks in advance.

I'm not completely sure, but I think it was a memory issue. I found a way (from another stackoverflow entry) to delete variables I don't need. Per my example above, if I only want to keep 'time_' and parameters 'a_','b_','c_','d_' and 'e_', I run the following at the end of my main code:
graph_array = np.array([time_, a_, b_, c_, d_, e_])
#Delete variables that aren't going to be plotted in order to free up memory
attributes = sys.modules[__name__]
for name in dir():
if name[0]!='_' and np.all( name!= np.array(['graph_array', 'attributes','np', 'gc', 'graph_parameters']) ):
delattr(attributes, name)
gc.collect()
graph_parameters(*graph_array)
Basically what the for loop does is keep private and magical function names and names of specified variables, and deletes all other names. I got this idea from the answer to the following link. How do I clear all variables in the middle of a Python script?
I watched my task manager as the code ran, and there was a significant drop in memory usage (~1,600,000 K to ~100,000 K) by Python right before plots began being saved, indicating that this method did free up memory. Also, no crash.

Visualizing data with matplotlib

In Matlab, you can use drawnow to view a calculation's result while it's in progress. I have tried a similar syntax in Python, both with matplotlib and mayavi.
I know it's possible to animate in one dimension with ion and set_data. However, animating in two dimensions (via imshow) is useful and I couldn't find an easy way to do that.
I know it's possible to animate using a function call but that's not as useful for algorithm development (since you can't use IPython's %run and query your program).
In matplotlib, I can use
N = 16
first_image = arange(N*N).reshape(N,N)
myobj = imshow(first_image)
for i in arange(N*N):
first_image.flat[i] = 0
myobj.set_data(first_image)
draw()
to animate an image but this script doesn't respond to <Cntrl-C> -- it hangs and disables future animations (on this machine). Despite this SO answer, different ways of invoking this animation process does not work. How do I view 2D data as it's being calculated?

EDIT: I have since made a package called python-drawnow to implement the following answer.
You just want to visualize the data from some complicated computation and not smoothly animate an image, correct? Then you can just define some simple functions:
def drawnow(draw_fig, wait_secs=1):
"""
draw_fig: (callable, no args by use of python's global scope) your
function to draw the figure. it should include the figure() call --
just like you'd normally do it. However, you must leave out the
show().
wait_secs : optional, how many seconds to wait. note that if this is 0
and your computation is fast, you don't really see the plot update.
does not work in ipy-qt. only works in the ipython shell.
"""
close()
draw_fig()
draw()
time.sleep(wait_secs)
def drawnow_init():
ion()
An example of this:
def draw_fig():
figure()
imshow(z, interpolation='nearest')
#show()
N = 16
x = linspace(-1, 1, num=N)
x, y = meshgrid(x, x)
z = x**2 + y**2
drawnow_init()
for i in arange(2*N):
z.flat[i] = 0
drawnow(draw_fig)
Notice this requires that the variables you're plotting be global. This shouldn't be a problem since it seems that the variables you want to visualize are global.
This method responds fine to cntrl-c and is visible even during fast computations (through wait_secs.

python (matplotlib) __NSAutoreleaseNoPool error ... - just leaking

[I originally posted this in serverfault, but was advised there to post it here instead.]
Matplotlib is a python library for data visualization. When I attempt to display a graph on the screen, I get the following error/warnings:
2012-12-21 16:40:05.532 python[9705:903] *** __NSAutoreleaseNoPool(): Object 0x103e25d80 of class NSCFArray autoreleased with no pool in place - just leaking
2012-12-21 16:40:05.534 python[9705:903] *** __NSAutoreleaseNoPool(): Object 0x103e26820 of class __NSFastEnumerationEnumerator autoreleased with no pool in place - just leaking
2012-12-21 16:40:05.535 python[9705:903] *** __NSAutoreleaseNoPool(): Object 0x103e9f080 of class NSObject autoreleased with no pool in place - just leaking
FWIW, one way to produce these results is shown below; all the steps shown (including the call to ipython) are taken from a matplotlib tutorial:
% ipython
...
In [1]: import matplotlib.pyplot as plt
In [2]: plt.plot([1, 3, 2, 4])
Out[3]: [<matplotlib.lines.Line2D at 0x106aabd90>]
In [3]: plt.show()
ALso, FWIW, I've observed exactly the same behavior with multiple styles of installation (on the same machine) of python+numpy+matplotlib+ipython, including installs that use the system-supplied python, those that use the python installed by homebrew, or those that use a python installed directly from source into a location off my home directory.
Any ideas of what may be going on, or what I could do about it?

I am having the same problem, one solution I found is to add the line:
plt.ion()
before the first plotting command. This turns on the interactive plotting mode and the error messages go away. This has only worked for me when plotting on the command line, if I do ion() and then show() in a script the plots don't show up at all, and if I leave the ion() out, I can see my plots, but I get the error messages. This has only happened since updated to version 1.2.0.

It's trying to do something with Cocoa, but Cocoa hasn't really been initialized or anything. You may be able to silence the errors and fix the problems by running this before:
from Foundation import NSAutoreleasePool
pool = NSAutoreleasePool()
And this after:
from AppKit import NSApplication
NSApplication.sharedApplication().run()
This requires PyObjC. Unfortunately, this may only allow for displaying one plot per IPython session. You may wish to try the IPython notebook instead, which removes the dependency on Cocoa.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

xPython savefig memory leak unresolved with formerly proposed solutions - python

Googled anecdotal evidence seems to suggest that executing 'matplotlib.pyplot.close' after a savefig will reclaim/free the memory associated with the figure. See pyplot docs for all the calling options.

Related

PyCharm - Auto Completion for matplotlib (and other imported modules)

How to manipulate figures while a script is running in Python?

Why does python keep crashing when plotting with matplotlib?

Visualizing data with matplotlib

python (matplotlib) __NSAutoreleaseNoPool error ... - just leaking

Categories

Resources