Matplotlib and python garbage collection in for loop - python

I need to export multiple plots with Python and Matplotlib but I have issues with the memory management.
I'm using Jupyter Notebook in Windows and the code is structured as shown in the working example here below. Basically, I'm generating a daily plot over a period of several years.
Following the memory usage I can see that usually for the first 4 "months" the memory usage is fairly stable but then it increases more than 15MB for each "day". With the complete code this leads to a memory error.
I tried to use plt.close() as suggested here. I also tried plt.close(fig), plt.close('all') and to manually delete some variables with del. Finally I've put gc.collect in the loop. Nothing seems to have any effect.
import numpy as np
import matplotlib.pyplot as plt
import gc
sz = 1000
i=0
for year in [2014,2015,2016,2017]:
for month in [1,2,3,4,5,9,10,11,12]:
for day in range(1,30):
# Plot
fig, ((ax1,ax2),(ax3,ax4)) = plt.subplots(2,2,figsize=(14,10))
x1 = np.arange(1,sz+1,1)
y1 = np.random.random(sz)
ax1.scatter(x1,y1)
ax2.scatter(x1,y1)
ax3.scatter(x1,y1)
ax4.scatter(x1,y1)
filename = 'test/test_output{}{}{}.png'.format(year,month,day)
plt.tight_layout()
plt.savefig(filename, bbox_inches='tight')
#plt.close(fig)
#plt.close('all')
plt.close()
del x1, y1, ax1, ax2, ax3, ax4, fig
# counter
i=i+1
# Monthly operations
print(filename,i)
gc.collect()
# Triggered outside the loops
gc.collect()
On the other hand, if I interrupt the code (or I wait for the "Memory error" to occur) and then I run manually gc.collect, the memory is cleaned right away.
Why gc.collect does not have any effect in the nested loops?
Is there a way to improve the memory management for this code?

Related

plt.show() behaving weirdly with multiprocessing and jupyter notebook

So I have a module that is supposed to read stuff from a couple of csv files, do some math and create plots out of it. I call the function that does this from inside a jupyter notebook.
The structure is roughly like this
Module:
def fun(file):
data = math(file)
plt.plot(data)
plt.show()
def multi_fun(files):
with Pool(cpu_count(), maxtasksperchild=1) as pool:
m = pool.imap_unordered(fun, files)
for res in m:
pass
And in the notebook multifun([file1, file2])
Done like this the plots don't pop up in the notebook. However if I change it to
def fun(file):
data = math(file)
plt.plot(data)
print("")
plt.show()
The plots to appear.
Adding the print statement to multi_fun or the notebook does not work. Adding print('', end='') also does not show the plots, however print(u'\u200c', end='') does.
I don't really need this to work (was more for debugging), but I'm super curious how this happens. Is there some magic going on with print forcing some joining of the threads?

When I plot something in python the programs execution stops until I close the plot figure

This is my code and after calculating some stuff I want it to draw them at each step
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
FilePatch='E:\\# Civil Engineering Undergraduate\\Projects\\Python\\Frame'
NodesFile=FilePatch+'\\nodes.xlsx'
MemsFile=FilePatch+'\\members.xlsx'
MatsFile=FilePatch+'\\sections.xlsx'
nodes=pd.read_excel(NodesFile)
mems=pd.read_excel(MemsFile)
mats=pd.read_excel(MatsFile)
nodes=np.array(nodes)
mems=np.array(mems)
mats=np.array(mats)
np.nan_to_num(nodes)
np.nan_to_num(mems)
np.nan_to_num(mats)
Segments=100
Scale=1
n=np.size(nodes[:,0])
m=np.size(mems[:,0])
UsedEIA=np.zeros((m,3))
.
.
.
But the problem is that when it calls the plt.plot(...) for the first time it stops execution and won't go on unless I close the figure!
Is there any solution for this issue??
.
.
.
for i in range(1,1+n):
dx=Scale*D[3*i-3,0]
dy=Scale*D[3*i-2,0]
xn=nodes[nodes[:,0]==i,1]+dx
yn=nodes[nodes[:,0]==i,2]+dy
plt.text(xn,yn,str(i))
s=np.sum(nodes[nodes[:,0]==i,3:5])
if nodes[nodes[:,0]==i,5]==1:
plt.scatter(xn,yn,c='r',marker='s')
elif nodes[nodes[:,0]==i,3]==1 or nodes[nodes[:,0]==i,4]==1:
plt.scatter(xn,yn,c='g',marker='^')
plt.axis('equal')
plt.show()
time.sleep(0.1)
Also I wanna add some text in to my plot but it gives me an error which I can't understand it!
Here it is:
p=mems[i,4]
px=mems[i,3]
dl=mems[i,5]*L
w=mems[i,6]
xtxt=(FrameShape[0,0]+FrameShape[0:])/2
ytxt=(FrameShape[1,0]+FrameShape[1:])/2
xtxtp=FrameShape[0,0]
xtxtpx=FrameShape[0,0]+abs(px)/(1+abs(p))
xtxtw=FrameShape[0,0]+abs(p)/(1+abs(p))+abs(px)/(1+abs(px))
if p!=0 or px!=0:
btxt=' Py='+str(p)+' , Px=',str(px)+' #'+str(dl)
plt.text(xtxtp,ytxt-0.5,btxt)
XY=np.array([X,Shape])
FrameShape=np.transpose(T[0:2,0:2])#XY
FrameShape[0,:]=FrameShape[0,:]+xi
FrameShape[1,:]=FrameShape[1,:]+yi
if w!=0:
atxt='UL='+str(w)
plt.text(xtxtw,ytxt+0.5,atxt)
This is the error it gives me in the console:
TypeError: only size-1 arrays can be converted to Python scalars
plt.show() blocks the execution of your code. To avoid that, you could replace that line by plt.show(block=False). Your application will then run, but, as described in this post, your plots will likely not show up during execution.
So instead, try replacing plt.show() by
plt.show(block=False)
plt.pause(0.001)
in order to see the plots during runtime.
Finally, add a plt.show() at the very end of your program to keep the plots open, elsewise every figure will be closed upon program termination.

How to manipulate figures while a script is running in Python?

Introduction
As I am coming from matlab, I am used to an interactive interface where a script can update figures while it is running. During the processing each figure can be re-sized or even closed. This probably means that each figure is running in its own thread which is obviously not the case with matplotlib.
IPython can imitate the Matlab behavior using the magic command %pylab or %matplotlib which does something that I don't understand yet and which is the very point of my question.
My goal is then to allow standalone Python scripts to work as Matlab does (or as IPython with %matplotlib does). In other words, I would like this script to be executed from the command line. I am expecting a new figure that pop-up every 3 seconds. During the execution I would be able to zoom, resize or even close the figure.
#!/usr/bin/python
import matplotlib.pyplot as plt
import time
def do_some_work():
time.sleep(3)
for i in range(10):
plt.plot([1,2,3,4])
plt.show() # this is way too boilerplate, I'd like to avoid it too.
do_some_work()
What alternative to %matplotlib I can use to manipulate figures while a script is running in Python (not IPython)?
What solutions I've already investigated?
I currently found 3 way to get a plot show.
1. %pylab / %matplotlib
As tom said, the use of %pylab should be avoided to prevent the namespace to be polluted.
>>> %pylab
>>> plot([1,2,3,4])
This solution is sweet, the plot is non-blocking, there is no need for an additionnal show(), I can still add a grid with grid() afterwards and I can close, resize or zoom on my figure with no additional issues.
Unfortunately the %matplotlib command is only available on IPython.
2. from pylab import * or from matplotlib.pyplot import plt
>>> from pylab import *
>>> plot([1,2,3,4])
Things are quite different here. I need to add the command show() to display my figure which is blocking. I cannot do anything but closing the figure to execute the next command such as grid() which will have no effect since the figure is now closed...
** 3. from pylab import * or from matplotlib.pyplot import plt + ion()**
Some suggestions recommend to use the ion() command as follow:
>>> from pylab import *
>>> ion()
>>> plot([1,2,3,4])
>>> draw()
>>> pause(0.0001)
Unfortunately, even if the plot shows, I cannot close the figure manually. I will need to execute close() on the terminal which is not very convenient. Moreover the need for two additional commands such as draw(); pause(0.0001) is not what I am expecting.
Summary
With %pylab, everything is wonderful, but I cannot use it outside of IPython
With from pylab import * followed by a plot, I get a blocking behavior and all the power of IPython is wasted.
from pylab import * followed by ion offers a nice alternative to the previous one, but I have to use the weird pause(0.0001) command that leads to a window that I cannot close manually (I know that the pause is not needed with some backends. I am using WxAgg which is the only one that works well on Cygwin x64.
This question advices to use matplotlib.interactive(True). Unfortunately it does not work and gives the same behavior as ion() does.
Change your do_some_work function to the following and it should work.
def do_some_work():
plt.pause(3)
For interactive backends plt.pause(3) starts the event loop for 3 seconds so that it can process your resize events. Note that the documentation says that it is an experimental function and that for complex animations you should use the animation module.
The, %pylab and %matplotlib magic commands also start an event loop, which is why user interaction with the plots is possible. Alternatively, you can start the event loop with %gui wx, and turn it off with %gui. You can use the IPython.lib.guisupport.is_event_loop_running_wx() function to test if it is running.
The reason for using ion() or ioff() is very well explained in the 'What is interactive mode' page. In principle, user interaction is possible without IPython. However, I could not get the interactive-example from that page to work with the Qt4Agg backend, only with the MacOSX backend (on my Mac). I didn't try with the WX backend.
Edit
I did manage to get the interactive-example to work with the Qt4Agg backend by using PyQt4 instead of PySide (so by setting backend.qt4 : PyQt4 in my ~/.config/matplotlibrc file). I think the example doesn't work with all backends. I submitted an issue here.
Edit 2
I'm afraid I can't think of a way of manipulating the figure while a long calculation is running, without using threads. As you mentioned: Matplotlib doesn't start a thread, and neither does IPython. The %pylab and %matplotlib commands alternate between processing commands from the read-eval-print loop and letting the GUI processing events for a short time. They do this sequentially.
In fact, I'm unable to reproduce your behavior, even with the %matplotlib or %pylab magic. (Just to be clear: in ipython I first call %matplotlib and then %run yourscript.py). The %matplotlib magic puts Matplotlib in interactive-mode, which makes the plt.show() call non-blocking so that the do_some_work function is executed immediately. However, during the time.sleep(3) call, the figure is unresponsive (this becomes even more apparent if I increase the sleeping period). I don't understand how this can work at your end.
Unless I'm wrong you'll have to break up your calculation in smaller parts and use plt.pause (or even better, the animation module) to update the figures.
My advice would be to keep using IPython, since it manages the GUI event loop for you (that's what pylab/pylot does).
I tried interactive plotting in a normal interpreter and it worked the way it is expected, even without calling ion() (Debian unstable, Python 3.4.3+, Matplotlib 1.4.2-3.1). If I recall it right, it's a fairly new feature in Matplotlib.
Alternatively, you can also use Matplotlib's animation capabilities to update a plot periodically:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import time
plt.ion()
tt = np.linspace(0, 1, 200)
freq = 1 # parameter for sine
t0 = time.time() # for measuring ellapsed time
fig, ax = plt.subplots()
def draw_func(i):
""" This function gets called repeated times """
global freq # needed because freq is changed in this function
xx = np.sin(2*np.pi*freq*tt)/freq
ax.set_title("Passed Time: %.1f s, " % (time.time()-t0) +
"Parameter i=%d" % i)
ax.plot(tt, xx, label="$f=%d$ Hz" % freq)
ax.legend()
freq += 1
# call draw_func every 3 seconds 1 + 4 times (first time is initialization):
ani = animation.FuncAnimation(fig, draw_func, np.arange(4), interval=3000,
repeat=False)
# plt.show()
Checkout matplotlib.animation.FuncAnimation for details. You'll find further examples in the examples section.

Why does python keep crashing when plotting with matplotlib?

I am running a python code that simulates and predicts conditions inside a rapid compression machine (such as temperature, pressure, chemical species, etc.). The code is set up so that it calculates the time evolution of all these parameters (which are saved in numpy arrays) and then outputs them to a function, which creates and saves plots of these parameters against time.
These arrays are very large, and the function creates about 14 plots using matplotlib. The function will get to about the 7th plot (sometimes more, sometimes less), when I get an error reading "python.exe has stopped working". Not sure if it's a memory issue because the plots are so big or what's going on.
I have tried both (as you'll see in my sample code) plt.figure.clf() and plt.close(). I've even tried doing a time.sleep inbetween plots. The following is an example snippet of my function that does the plotting (not the real code, just an example necessary to illustrate my problem).
I am using Windows, Python 2.7, and Matplotlib 1.4.2
def graph_parameters(time, a, b, c, d, e): #a,b,c,d,e are my parameters
delay = 10
plt.figure(1)
plt.figure(facecolor="white")
plt.plot(time, a)
plt.ylabel('a')
plt.xlabel('Time(s)')
plt.savefig('fig1.png')
plt.figure(1).clf()
plt.close('all')
time.sleep(delay)
plt.figure(2)
plt.figure(facecolor="white")
plt.plot(time, b)
plt.ylabel('b')
plt.xlabel('Time(s)')
plt.savefig('fig2.png')
plt.figure(2).clf()
plt.close('all')
time.sleep(delay)
etc.
Thanks in advance.
I'm not completely sure, but I think it was a memory issue. I found a way (from another stackoverflow entry) to delete variables I don't need. Per my example above, if I only want to keep 'time_' and parameters 'a_','b_','c_','d_' and 'e_', I run the following at the end of my main code:
graph_array = np.array([time_, a_, b_, c_, d_, e_])
#Delete variables that aren't going to be plotted in order to free up memory
attributes = sys.modules[__name__]
for name in dir():
if name[0]!='_' and np.all( name!= np.array(['graph_array', 'attributes','np', 'gc', 'graph_parameters']) ):
delattr(attributes, name)
gc.collect()
graph_parameters(*graph_array)
Basically what the for loop does is keep private and magical function names and names of specified variables, and deletes all other names. I got this idea from the answer to the following link. How do I clear all variables in the middle of a Python script?
I watched my task manager as the code ran, and there was a significant drop in memory usage (~1,600,000 K to ~100,000 K) by Python right before plots began being saved, indicating that this method did free up memory. Also, no crash.

How can I release memory after creating matplotlib figures

I have several matlpotlib functions rolled into some django-celery tasks.
Every time the tasks are called more RAM is dedicated to python. Before too long, python is taking up all of the RAM.
QUESTION: How can I release this memory?
UPDATE 2 - A Second Solution:
I asked a similar question specifically about the memory locked up when matplotlib errors, but I got a good answer to this question .clf(), .close(), and gc.collect() aren't needed if you use multiprocess to run the plotting function in a separate process whose memory will automatically be freed once the process ends.
Matplotlib errors result in a memory leak. How can I free up that memory?
UPDATE - The Solution:
These stackoverflow posts suggested that I can release the memory used by matplotlib objects with the following commands:
.clf(): Matplotlib runs out of memory when plotting in a loop
.close(): Python matplotlib: memory not being released when specifying figure size
import gc
gc.collect()
Here is the example I used to test the solution:
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from pylab import import figure, savefig
import numpy as np
import gc
a = np.arange(1000000)
b = np.random.randn(1000000)
fig = plt.figure(num=1, dpi=100, facecolor='w', edgecolor='w')
fig.set_size_inches(10,7)
ax = fig.add_subplot(111)
ax.plot(a, b)
fig.clf()
plt.close()
del a, b
gc.collect()
Did you try to run you task function several times (in a for) to be sure that not your function is leaking no matter of celery?
Make sure that django.settings.DEBUG is set False( The connection object holds all queries in memmory when DEBUG=True).
import matplotlib.pyplot as plt
from datetime import datetime
import gc
class MyClass:
def plotmanytimesandsave(self):
plt.plot([1, 2, 3])
ro2 = datetime.now()
f =ro2.second
name =str(f)+".jpg"
plt.savefig(name)
plt.draw()
plt.clf()
plt.close("all")
for y in range(1, 10):
k = MyClass()
k.plotmanytimesandsave()
del k
k = "now our class object is a string"
print(k)
del k
gc.collect
with this program you will save directly as many times you want without the plt.show() command. And the memory consumption will be low.

Categories