How can I release memory after creating matplotlib figures - python

I have several matlpotlib functions rolled into some django-celery tasks.
Every time the tasks are called more RAM is dedicated to python. Before too long, python is taking up all of the RAM.
QUESTION: How can I release this memory?
UPDATE 2 - A Second Solution:
I asked a similar question specifically about the memory locked up when matplotlib errors, but I got a good answer to this question .clf(), .close(), and gc.collect() aren't needed if you use multiprocess to run the plotting function in a separate process whose memory will automatically be freed once the process ends.
Matplotlib errors result in a memory leak. How can I free up that memory?
UPDATE - The Solution:
These stackoverflow posts suggested that I can release the memory used by matplotlib objects with the following commands:
.clf(): Matplotlib runs out of memory when plotting in a loop
.close(): Python matplotlib: memory not being released when specifying figure size
import gc
gc.collect()
Here is the example I used to test the solution:
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from pylab import import figure, savefig
import numpy as np
import gc
a = np.arange(1000000)
b = np.random.randn(1000000)
fig = plt.figure(num=1, dpi=100, facecolor='w', edgecolor='w')
fig.set_size_inches(10,7)
ax = fig.add_subplot(111)
ax.plot(a, b)
fig.clf()
plt.close()
del a, b
gc.collect()

Did you try to run you task function several times (in a for) to be sure that not your function is leaking no matter of celery?
Make sure that django.settings.DEBUG is set False( The connection object holds all queries in memmory when DEBUG=True).

import matplotlib.pyplot as plt
from datetime import datetime
import gc
class MyClass:
def plotmanytimesandsave(self):
plt.plot([1, 2, 3])
ro2 = datetime.now()
f =ro2.second
name =str(f)+".jpg"
plt.savefig(name)
plt.draw()
plt.clf()
plt.close("all")
for y in range(1, 10):
k = MyClass()
k.plotmanytimesandsave()
del k
k = "now our class object is a string"
print(k)
del k
gc.collect
with this program you will save directly as many times you want without the plt.show() command. And the memory consumption will be low.

Related

Matplotlib and python garbage collection in for loop

I need to export multiple plots with Python and Matplotlib but I have issues with the memory management.
I'm using Jupyter Notebook in Windows and the code is structured as shown in the working example here below. Basically, I'm generating a daily plot over a period of several years.
Following the memory usage I can see that usually for the first 4 "months" the memory usage is fairly stable but then it increases more than 15MB for each "day". With the complete code this leads to a memory error.
I tried to use plt.close() as suggested here. I also tried plt.close(fig), plt.close('all') and to manually delete some variables with del. Finally I've put gc.collect in the loop. Nothing seems to have any effect.
import numpy as np
import matplotlib.pyplot as plt
import gc
sz = 1000
i=0
for year in [2014,2015,2016,2017]:
for month in [1,2,3,4,5,9,10,11,12]:
for day in range(1,30):
# Plot
fig, ((ax1,ax2),(ax3,ax4)) = plt.subplots(2,2,figsize=(14,10))
x1 = np.arange(1,sz+1,1)
y1 = np.random.random(sz)
ax1.scatter(x1,y1)
ax2.scatter(x1,y1)
ax3.scatter(x1,y1)
ax4.scatter(x1,y1)
filename = 'test/test_output{}{}{}.png'.format(year,month,day)
plt.tight_layout()
plt.savefig(filename, bbox_inches='tight')
#plt.close(fig)
#plt.close('all')
plt.close()
del x1, y1, ax1, ax2, ax3, ax4, fig
# counter
i=i+1
# Monthly operations
print(filename,i)
gc.collect()
# Triggered outside the loops
gc.collect()
On the other hand, if I interrupt the code (or I wait for the "Memory error" to occur) and then I run manually gc.collect, the memory is cleaned right away.
Why gc.collect does not have any effect in the nested loops?
Is there a way to improve the memory management for this code?

How to avoid running out of memory in Python plotting?

I need to plot a great bunch of different objects (~10^5 filled ellipses and similar shapes). What I do is add them one at a time using the command plt.gcf().gca().add_artist(e) and then use plt.show() at the end. This requires more memory than what I have.
Is there a way to plot them one at a time (that is, without adding them as I did above), and thus reduce the amount of memory I consume? I would be fine even with a solution that significantly increases the amount of time required for the plotting.
To draw a large quantity of similar objects you have to use one of the different matplotlib.collections classes — alas, their usage is a bit arcane, at least when it is my understanding that is involved...
Anyway, starting from the docs and this official example
I was able to put together the following code
$ cat ellipses.py
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import EllipseCollection
N = 10**5
# centres of ellipses — uniform distribution, -5<=x<5, -3<=y<3
xy = np.random.random((N,2))*np.array((5*2,3*2))-np.array((5,3))
# width, height of ellipses
w, h = np.random.random(N)/10, np.random.random(N)/10
# rotation angles, anticlockwise
a = np.random.random(N)*180-90
# we need an axes object for the correct scaling of the ellipses
fig, ax = plt.subplots()
# create the collection
ec = EllipseCollection(w, h, a,
units='x',
offsets=xy,
transOffset=ax.transData)
ax.add_collection(ec)
ax.autoscale(tight=True)
plt.savefig('el10^5.png')
I timed it on my almost low-end notebook
$ time python -c 'import numpy; import matplotlib.pyplot as p; f, a = p.subplots()'
real 0m0.697s
user 0m0.620s
sys 0m0.072s
$ time python ellipses.py
real 0m5.704s
user 0m5.616s
sys 0m0.080s
$
As you can see, when you discount the staging required for every plot, it takes about
5 seconds — and what is the result?
I think that the details about eccentricity and angle are lost in such a dense representation, but I don't know the specifics of your task and won't comment further.

Matplotlib multiprocessing fonts corruption using savefig

I have troubles with multiprocessing in Matplotlib since version 1.5. The fonts are randomly jumping around their original position. Example is here:
The simple example to reproduce this bug is here:
import multiprocessing
import matplotlib.pyplot as plt
fig = plt.figure()
def plot(i):
fig = plt.gcf()
plt.plot([],[])
fig.savefig('%d.png' % i)
plot(0)
pool = multiprocessing.Pool(4)
pool.map(plot, range(10))
if the order of multiprocessing and simple plotting is reversed
pool = multiprocessing.Pool(4)
plot(0)
pool.map(plot, range(10))
then it works, but this workaround is useless for my purpose.
Thank you.
I've recently run into this same problem while testing methods for parallel plotting large numbers of plots. While I haven't found a solution using the multiprocessing module, I've found that I do not see the same errors using the Parallel Python package (http://www.parallelpython.com/). It seems to be ~50% slower than the multiprocessing module in my early tests, but still a significant speedup over serial plotting. It's also a little finicky regarding module imports so I would ultimately prefer to find a solution using multiprocessing, but for now this is a passable workaround (for me at least). That said, I'm pretty new to parallel processing so there may be some nuances of the two approaches that I'm missing here.
###############################################################################
import os
import sys
import time
#import numpy as np
import numpy # Importing with 'as' doesn't work with Parallel Python
#import matplotlib.pyplot as plt
import matplotlib.pyplot # Importing with 'as' doesn't work with Parallel Python
import pp
import multiprocessing as mp
###############################################################################
path1='./Test_PP'
path2='./Test_MP'
nplots=100
###############################################################################
def plotrandom(plotid,N,path):
numpy.random.seed() # Required for multiprocessing module but not Parallel Python...
x=numpy.random.randn(N)
y=x**2
matplotlib.pyplot.scatter(x,y)
matplotlib.pyplot.savefig(os.path.join(path,'test_%d.png'%(plotid)),dpi=150)
matplotlib.pyplot.close('all')
############################################################################## #
# Parallel Python implementation
tstart_1=time.time()
if not os.path.exists(path1):
os.makedirs(path1)
ppservers = ()
if len(sys.argv) > 1:
ncpus = int(sys.argv[1])
job_server = pp.Server(ncpus, ppservers=ppservers)
else:
job_server = pp.Server(ppservers=ppservers)
print "Starting Parallel Python v2 with", job_server.get_ncpus(), "workers"
jobs = [(input_i, job_server.submit(plotrandom,(input_i,10,path1),(),("numpy","matplotlib.pyplot"))) for input_i in range(nplots)]
for input_i, job in jobs:
job()
tend_1=time.time()
t1=tend_1-tstart_1
print 'Parallel Python = %0.5f sec'%(t1)
job_server.print_stats()
############################################################################## #
# Multiprocessing implementation
tstart_2=time.time()
if not os.path.exists(path2):
os.makedirs(path2)
if len(sys.argv) > 1:
ncpus = int(sys.argv[1])
else:
ncpus = mp.cpu_count()
print "Starting multiprocessing v2 with %d workers"%(ncpus)
pool = mp.Pool(processes=ncpus)
jobs = [pool.apply_async(plotrandom, args=(i,10,path2)) for i in range(nplots)]
results = [r.get() for r in jobs] # This line actually runs the jobs
pool.close()
pool.join()
tend_2=time.time()
t2=tend_2-tstart_2
print 'Multiprocessing = %0.5f sec'%(t2)
###############################################################################
I have found a solution. The main cause of the troubles is the font caching in dictionary _fontd in /matplotlib/backends/backend_agg.py
Therefore, I have used a different hash for each process by adding multiprocessing.current_process().pid to hash called key in function _get_agg_font.
If anybody know more elegant solution which would not require modification of matplotlib files, just let me know.
Here is shown how I changed function _get_agg_font in backend_agg.py:
from multiprocessing import current_process
def _get_agg_font(self, prop):
"""
Get the font for text instance t, cacheing for efficiency
"""
if __debug__: verbose.report('RendererAgg._get_agg_font',
'debug-annoying')
key = hash(prop)
key += current_process().pid
font = RendererAgg._fontd.get(key)
if font is None:
fname = findfont(prop)
#font = RendererAgg._fontd.get(fname)
if font is None:
font = FT2Font(
fname,
hinting_factor=rcParams['text.hinting_factor'])
RendererAgg._fontd[fname] = font
RendererAgg._fontd[key] = font
font.clear()
size = prop.get_size_in_points()
font.set_size(size, self.dpi)
return font
The solution to this is to place your matplotlib imports inside the function you're passing to multiprocessing.
I had the same problem: Concurrent use of cached fonts created glitches in text plots (e.g. axes numbers). The problems appears to be font caching by an lru cache:
https://github.com/matplotlib/matplotlib/blob/7b6eb77731ff2b58c43c0d75a9cc038ada8d89cd/lib/matplotlib/font_manager.py#L1316
For me upgrading to python 3.7 solved it (which apparently supports cleaning state after forking).
Running the following in your workers might help as well:
import matplotlib
matplotlib.font_manager._get_font.cache_clear()

How to manipulate figures while a script is running in Python?

Introduction
As I am coming from matlab, I am used to an interactive interface where a script can update figures while it is running. During the processing each figure can be re-sized or even closed. This probably means that each figure is running in its own thread which is obviously not the case with matplotlib.
IPython can imitate the Matlab behavior using the magic command %pylab or %matplotlib which does something that I don't understand yet and which is the very point of my question.
My goal is then to allow standalone Python scripts to work as Matlab does (or as IPython with %matplotlib does). In other words, I would like this script to be executed from the command line. I am expecting a new figure that pop-up every 3 seconds. During the execution I would be able to zoom, resize or even close the figure.
#!/usr/bin/python
import matplotlib.pyplot as plt
import time
def do_some_work():
time.sleep(3)
for i in range(10):
plt.plot([1,2,3,4])
plt.show() # this is way too boilerplate, I'd like to avoid it too.
do_some_work()
What alternative to %matplotlib I can use to manipulate figures while a script is running in Python (not IPython)?
What solutions I've already investigated?
I currently found 3 way to get a plot show.
1. %pylab / %matplotlib
As tom said, the use of %pylab should be avoided to prevent the namespace to be polluted.
>>> %pylab
>>> plot([1,2,3,4])
This solution is sweet, the plot is non-blocking, there is no need for an additionnal show(), I can still add a grid with grid() afterwards and I can close, resize or zoom on my figure with no additional issues.
Unfortunately the %matplotlib command is only available on IPython.
2. from pylab import * or from matplotlib.pyplot import plt
>>> from pylab import *
>>> plot([1,2,3,4])
Things are quite different here. I need to add the command show() to display my figure which is blocking. I cannot do anything but closing the figure to execute the next command such as grid() which will have no effect since the figure is now closed...
** 3. from pylab import * or from matplotlib.pyplot import plt + ion()**
Some suggestions recommend to use the ion() command as follow:
>>> from pylab import *
>>> ion()
>>> plot([1,2,3,4])
>>> draw()
>>> pause(0.0001)
Unfortunately, even if the plot shows, I cannot close the figure manually. I will need to execute close() on the terminal which is not very convenient. Moreover the need for two additional commands such as draw(); pause(0.0001) is not what I am expecting.
Summary
With %pylab, everything is wonderful, but I cannot use it outside of IPython
With from pylab import * followed by a plot, I get a blocking behavior and all the power of IPython is wasted.
from pylab import * followed by ion offers a nice alternative to the previous one, but I have to use the weird pause(0.0001) command that leads to a window that I cannot close manually (I know that the pause is not needed with some backends. I am using WxAgg which is the only one that works well on Cygwin x64.
This question advices to use matplotlib.interactive(True). Unfortunately it does not work and gives the same behavior as ion() does.
Change your do_some_work function to the following and it should work.
def do_some_work():
plt.pause(3)
For interactive backends plt.pause(3) starts the event loop for 3 seconds so that it can process your resize events. Note that the documentation says that it is an experimental function and that for complex animations you should use the animation module.
The, %pylab and %matplotlib magic commands also start an event loop, which is why user interaction with the plots is possible. Alternatively, you can start the event loop with %gui wx, and turn it off with %gui. You can use the IPython.lib.guisupport.is_event_loop_running_wx() function to test if it is running.
The reason for using ion() or ioff() is very well explained in the 'What is interactive mode' page. In principle, user interaction is possible without IPython. However, I could not get the interactive-example from that page to work with the Qt4Agg backend, only with the MacOSX backend (on my Mac). I didn't try with the WX backend.
Edit
I did manage to get the interactive-example to work with the Qt4Agg backend by using PyQt4 instead of PySide (so by setting backend.qt4 : PyQt4 in my ~/.config/matplotlibrc file). I think the example doesn't work with all backends. I submitted an issue here.
Edit 2
I'm afraid I can't think of a way of manipulating the figure while a long calculation is running, without using threads. As you mentioned: Matplotlib doesn't start a thread, and neither does IPython. The %pylab and %matplotlib commands alternate between processing commands from the read-eval-print loop and letting the GUI processing events for a short time. They do this sequentially.
In fact, I'm unable to reproduce your behavior, even with the %matplotlib or %pylab magic. (Just to be clear: in ipython I first call %matplotlib and then %run yourscript.py). The %matplotlib magic puts Matplotlib in interactive-mode, which makes the plt.show() call non-blocking so that the do_some_work function is executed immediately. However, during the time.sleep(3) call, the figure is unresponsive (this becomes even more apparent if I increase the sleeping period). I don't understand how this can work at your end.
Unless I'm wrong you'll have to break up your calculation in smaller parts and use plt.pause (or even better, the animation module) to update the figures.
My advice would be to keep using IPython, since it manages the GUI event loop for you (that's what pylab/pylot does).
I tried interactive plotting in a normal interpreter and it worked the way it is expected, even without calling ion() (Debian unstable, Python 3.4.3+, Matplotlib 1.4.2-3.1). If I recall it right, it's a fairly new feature in Matplotlib.
Alternatively, you can also use Matplotlib's animation capabilities to update a plot periodically:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import time
plt.ion()
tt = np.linspace(0, 1, 200)
freq = 1 # parameter for sine
t0 = time.time() # for measuring ellapsed time
fig, ax = plt.subplots()
def draw_func(i):
""" This function gets called repeated times """
global freq # needed because freq is changed in this function
xx = np.sin(2*np.pi*freq*tt)/freq
ax.set_title("Passed Time: %.1f s, " % (time.time()-t0) +
"Parameter i=%d" % i)
ax.plot(tt, xx, label="$f=%d$ Hz" % freq)
ax.legend()
freq += 1
# call draw_func every 3 seconds 1 + 4 times (first time is initialization):
ani = animation.FuncAnimation(fig, draw_func, np.arange(4), interval=3000,
repeat=False)
# plt.show()
Checkout matplotlib.animation.FuncAnimation for details. You'll find further examples in the examples section.

How to update matplotlib plot in while loop? [duplicate]

what I am trying to do is having a script compute something, prepare a plot and show the already obtained results as a pylab.figure - in python 2 (specifically python 2.7) with a stable matplotlib (which is 1.1.1).
In python 3 (python 3.2.3 with a matplotlib git build ... version 1.2.x), this works fine. As a simple example (simulating a lengthy computation by time.sleep()) consider
import pylab
import time
import random
dat=[0,1]
pylab.plot(dat)
pylab.ion()
pylab.draw()
for i in range (18):
dat.append(random.uniform(0,1))
pylab.plot(dat)
pylab.draw()
time.sleep(1)
In python 2 (version 2.7.3 vith matplotlib 1.1.1), the code runs cleanly without errors but does not show the figure. A little trial and error with the python2 interpreter seemed to suggest to replace pylab.draw() with pylab.show(); doing this once is apparently sufficient (not, as with draw calling it after every change/addition to the plot). Hence:
import pylab
import time
import random
dat=[0,1]
pylab.plot(dat)
pylab.ion()
pylab.show()
for i in range (18):
dat.append(random.uniform(0,1))
pylab.plot(dat)
#pylab.draw()
time.sleep(1)
However, this doesn't work either. Again, it runs cleanly but does not show the figure. It seems to do so only when waiting for user input. It is not clear to me why this is, but the plot is finally shown when a raw_input() is added to the loop
import pylab
import time
import random
dat=[0,1]
pylab.plot(dat)
pylab.ion()
pylab.show()
for i in range (18):
dat.append(random.uniform(0,1))
pylab.plot(dat)
#pylab.draw()
time.sleep(1)
raw_input()
With this, the script will of course wait for user input while showing the plot and will not continue computing the data before the user hits enter. This was, of course, not the intention.
This may be caused by different versions of matplotlib (1.1.1 and 1.2.x) or by different python versions (2.7.3 and 3.2.3).
Is there any way to accomplish with python 2 with a stable (1.1.1) matplotlib what the above script (the first one) does in python 3, matplotlib 1.2.x:
- computing data (which takes a while, in the above example simulated by time.sleep()) in a loop or iterated function and
- (while still computing) showing what has already been computed in previous iterations
- and not bothering the user to continually hit enter for the computation to continue
Thanks; I'd appreciate any help...
You want the pause function to give the gui framework a chance to re-draw the screen:
import pylab
import time
import random
import matplotlib.pyplot as plt
dat=[0,1]
fig = plt.figure()
ax = fig.add_subplot(111)
Ln, = ax.plot(dat)
ax.set_xlim([0,20])
plt.ion()
plt.show()
for i in range (18):
dat.append(random.uniform(0,1))
Ln.set_ydata(dat)
Ln.set_xdata(range(len(dat)))
plt.pause(1)
print 'done with loop'
You don't need to create a new Line2D object every pass through, you can just update the data in the existing one.
Documentation:
pause(interval)
Pause for *interval* seconds.
If there is an active figure it will be updated and displayed,
and the gui event loop will run during the pause.
If there is no active figure, or if a non-interactive backend
is in use, this executes time.sleep(interval).
This can be used for crude animation. For more complex
animation, see :mod:`matplotlib.animation`.
This function is experimental; its behavior may be changed
or extended in a future release.
A really over-kill method to is to use the matplotlib.animate module. On the flip side, it gives you a nice way to save the data if you want (ripped from my answer to Python- 1 second plots continous presentation).
example, api, tutorial
Some backends (in my experience "Qt4Agg") require the pause function, as #tcaswell suggested.
Other backends (in my experience "TkAgg") seem to just update on draw() without requiring a pause. So another solution is to switch your backend, for example with matplotlib.use('TkAgg').

Categories