Is matplotlib savefig threadsafe? - python

I have an in-house distributed computing library that we use all the time for parallel computing jobs. After the processes are partitioned, they run their data loading and computation steps and then finish with a "save" step. Usually this involved writing data to database tables.
But for a specific task, I need the output of each process to be a .png file with some data plots. There are 95 processes in total, so 95 .pngs.
Inside of my "save" step (executed on each process), I have some very simple code that makes a boxplot with matplotlib's boxplot function and some code that uses savefig to write it to a .png file that has a unique name based on the specific data used in that process.
However, I occasionally see output where it appears that two or more sets of data were written into the same output file, despite the unique names.
Does matplotlib use temporary file saves when making boxplots or saving figures? If so, does it always use the same temp file names (thus leading to over-write conflicts)? I have run my process using strace and cannot see anything that obviously looks like temp file writing from matplotlib.
How can I ensure that this will be threadsafe? I definitely want to conduct the file saving in parallel, as I am looking to expand the number of output .pngs considerably, so the option of first storing all the data and then just serially executing the plot/save portion is very undesirable.
It's impossible for me to reproduce the full parallel infrastructure we are using, but below is the function that gets called to create the plot handle, and then the function that gets called to save the plot. You should assume for the sake of the question that the thread safety has nothing to do with our distributed library. We know it's not coming from our code, which has been used for years for our multiprocessing jobs without threading issues like this (especially not for something we don't directly control, like any temp files from matplotlib).
import pandas
import numpy as np
import matplotlib.pyplot as plt
def plot_category_data(betas, category_name):
"""
Function to organize beta data by date into vectors and pass to box plot
code for producing a single chart of multi-period box plots.
"""
beta_vector_list = []
yms = np.sort(betas.yearmonth.unique())
for ym in yms:
beta_vector_list.append(betas[betas.yearmonth==ym].Beta.values.flatten().tolist())
###
plot_output = plt.boxplot(beta_vector_list)
axs = plt.gcf().gca()
axs.set_xticklabels(betas.FactorDate.unique(), rotation=40, horizontalalignment='right')
axs.set_xlabel("Date")
axs.set_ylabel("Beta")
axs.set_title("%s Beta to BMI Global"%(category_name))
axs.set_ylim((-1.0, 3.0))
return plot_output
### End plot_category_data
def save(self):
"""
Make calls to store the plot to the desired output file.
"""
out_file = self.output_path + "%s.png"%(self.category_name)
fig = plt.gcf()
fig.set_figheight(6.5)
fig.set_figwidth(10)
fig.savefig(out_file, bbox_inches='tight', dpi=150)
print "Finished and stored output file %s"%(out_file)
return None
### End save

In your two functions, you're calling plt.gcf(). I would try generating a new figure every time you plot with plt.figure() and referencing that one explicitly so you skirt the whole issue entirely.

Related

plot in matplotlib that continuously updates

I am trying to make a live plot in matplotlib, meaning that the data is coming from a CSV file that keeps getting updated with new data. So far, I can't succeed to make the plot update continually.
My intention is that I want that with time passing the graph old point will get out of the plot figure.
Can someone help me please?
This is my code:
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import csv
import time
fig=plt.figure()
ax1=fig.add_subplot(1,1,1)
def animate(i):
graph_data=open('DATA.csv')
xs=[]
ys=[]
for line in graph_data:
time,hrt=line.split(',')
xs.append(float(time))
ys.append(float(hrt))
ax1.clear()
ax1.plot(xs,ys,'b',linewidth=0.5)
ani=animation.FuncAnimation(fig,animate,interval=8)
plt.show()
Use How to update/refresh my graph (animated graph) after another cycle, with matplotlib?
but open the file outside of the function.
Inside of the function check if there is new data. It's not that good to use it from a file.
You could count the number of lines that you read and reopen the file new each time and skip that number of lines.
You could check the time stamp and read the last line.
You could read all the data and plot the whole data again. Ok for little data.
You could open the file in binary mode and check the number of bytes you read and then read up to the next line separator.

Save an "interactive figure" with matplotlib [duplicate]

Is there a way to save a Matplotlib figure such that it can be re-opened and have typical interaction restored? (Like the .fig format in MATLAB?)
I find myself running the same scripts many times to generate these interactive figures. Or I'm sending my colleagues multiple static PNG files to show different aspects of a plot. I'd rather send the figure object and have them interact with it themselves.
I just found out how to do this. The "experimental pickle support" mentioned by #pelson works quite well.
Try this:
# Plot something
import matplotlib.pyplot as plt
fig,ax = plt.subplots()
ax.plot([1,2,3],[10,-10,30])
After your interactive tweaking, save the figure object as a binary file:
import pickle
pickle.dump(fig, open('FigureObject.fig.pickle', 'wb')) # This is for Python 3 - py2 may need `file` instead of `open`
Later, open the figure and the tweaks should be saved and GUI interactivity should be present:
import pickle
figx = pickle.load(open('FigureObject.fig.pickle', 'rb'))
figx.show() # Show the figure, edit it, etc.!
You can even extract the data from the plots:
data = figx.axes[0].lines[0].get_data()
(It works for lines, pcolor & imshow - pcolormesh works with some tricks to reconstruct the flattened data.)
I got the excellent tip from Saving Matplotlib Figures Using Pickle.
As of Matplotlib 1.2, we now have experimental pickle support. Give that a go and see if it works well for your case. If you have any issues, please let us know on the Matplotlib mailing list or by opening an issue on github.com/matplotlib/matplotlib.
This would be a great feature, but AFAIK it isn't implemented in Matplotlib and likely would be difficult to implement yourself due to the way figures are stored.
I'd suggest either (a) separate processing the data from generating the figure (which saves data with a unique name) and write a figure generating script (loading a specified file of the saved data) and editing as you see fit or (b) save as PDF/SVG/PostScript format and edit in some fancy figure editor like Adobe Illustrator (or Inkscape).
EDIT post Fall 2012: As others pointed out below (though mentioning here as this is the accepted answer), Matplotlib since version 1.2 allowed you to pickle figures. As the release notes state, it is an experimental feature and does not support saving a figure in one matplotlib version and opening in another. It's also generally unsecure to restore a pickle from an untrusted source.
For sharing/later editing plots (that require significant data processing first and may need to be tweaked months later say during peer review for a scientific publication), I still recommend the workflow of (1) have a data processing script that before generating a plot saves the processed data (that goes into your plot) into a file, and (2) have a separate plot generation script (that you adjust as necessary) to recreate the plot. This way for each plot you can quickly run a script and re-generate it (and quickly copy over your plot settings with new data). That said, pickling a figure could be convenient for short term/interactive/exploratory data analysis.
Why not just send the Python script? MATLAB's .fig files require the recipient to have MATLAB to display them, so that's about equivalent to sending a Python script that requires Matplotlib to display.
Alternatively (disclaimer: I haven't tried this yet), you could try pickling the figure:
import pickle
output = open('interactive figure.pickle', 'wb')
pickle.dump(gcf(), output)
output.close()
Good question. Here is the doc text from pylab.save:
pylab no longer provides a save function, though the old pylab
function is still available as matplotlib.mlab.save (you can still
refer to it in pylab as "mlab.save"). However, for plain text
files, we recommend numpy.savetxt. For saving numpy arrays,
we recommend numpy.save, and its analog numpy.load, which are
available in pylab as np.save and np.load.
I figured out a relatively simple way (yet slightly unconventional) to save my matplotlib figures. It works like this:
import libscript
import matplotlib.pyplot as plt
import numpy as np
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2*np.pi*t)
#<plot>
plt.plot(t, s)
plt.xlabel('time (s)')
plt.ylabel('voltage (mV)')
plt.title('About as simple as it gets, folks')
plt.grid(True)
plt.show()
#</plot>
save_plot(fileName='plot_01.py',obj=sys.argv[0],sel='plot',ctx=libscript.get_ctx(ctx_global=globals(),ctx_local=locals()))
with function save_plot defined like this (simple version to understand the logic):
def save_plot(fileName='',obj=None,sel='',ctx={}):
"""
Save of matplolib plot to a stand alone python script containing all the data and configuration instructions to regenerate the interactive matplotlib figure.
Parameters
----------
fileName : [string] Path of the python script file to be created.
obj : [object] Function or python object containing the lines of code to create and configure the plot to be saved.
sel : [string] Name of the tag enclosing the lines of code to create and configure the plot to be saved.
ctx : [dict] Dictionary containing the execution context. Values for variables not defined in the lines of code for the plot will be fetched from the context.
Returns
-------
Return ``'done'`` once the plot has been saved to a python script file. This file contains all the input data and configuration to re-create the original interactive matplotlib figure.
"""
import os
import libscript
N_indent=4
src=libscript.get_src(obj=obj,sel=sel)
src=libscript.prepend_ctx(src=src,ctx=ctx,debug=False)
src='\n'.join([' '*N_indent+line for line in src.split('\n')])
if(os.path.isfile(fileName)): os.remove(fileName)
with open(fileName,'w') as f:
f.write('import sys\n')
f.write('sys.dont_write_bytecode=True\n')
f.write('def main():\n')
f.write(src+'\n')
f.write('if(__name__=="__main__"):\n')
f.write(' '*N_indent+'main()\n')
return 'done'
or defining function save_plot like this (better version using zip compression to produce lighter figure files):
def save_plot(fileName='',obj=None,sel='',ctx={}):
import os
import json
import zlib
import base64
import libscript
N_indent=4
level=9#0 to 9, default: 6
src=libscript.get_src(obj=obj,sel=sel)
obj=libscript.load_obj(src=src,ctx=ctx,debug=False)
bin=base64.b64encode(zlib.compress(json.dumps(obj),level))
if(os.path.isfile(fileName)): os.remove(fileName)
with open(fileName,'w') as f:
f.write('import sys\n')
f.write('sys.dont_write_bytecode=True\n')
f.write('def main():\n')
f.write(' '*N_indent+'import base64\n')
f.write(' '*N_indent+'import zlib\n')
f.write(' '*N_indent+'import json\n')
f.write(' '*N_indent+'import libscript\n')
f.write(' '*N_indent+'bin="'+str(bin)+'"\n')
f.write(' '*N_indent+'obj=json.loads(zlib.decompress(base64.b64decode(bin)))\n')
f.write(' '*N_indent+'libscript.exec_obj(obj=obj,tempfile=False)\n')
f.write('if(__name__=="__main__"):\n')
f.write(' '*N_indent+'main()\n')
return 'done'
This makes use a module libscript of my own, which mostly relies on modules inspect and ast. I can try to share it on Github if interest is expressed (it would first require some cleanup and me to get started with Github).
The idea behind this save_plot function and libscript module is to fetch the python instructions that create the figure (using module inspect), analyze them (using module ast) to extract all variables, functions and modules import it relies on, extract these from the execution context and serialize them as python instructions (code for variables will be like t=[0.0,2.0,0.01] ... and code for modules will be like import matplotlib.pyplot as plt ...) prepended to the figure instructions. The resulting python instructions are saved as a python script whose execution will re-build the original matplotlib figure.
As you can imagine, this works well for most (if not all) matplotlib figures.
If you are looking to save python plots as an interactive figure to modify and share with others like MATLAB .fig file then you can try to use the following code. Here z_data.values is just a numpy ndarray and so you can use the same code to plot and save your own data. No need of using pandas then.
The file generated here can be opened and interactively modified by anyone with or without python just by clicking on it and opening in browsers like Chrome/Firefox/Edge etc.
import plotly.graph_objects as go
import pandas as pd
z_data=pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/api_docs/mt_bruno_elevation.csv')
fig = go.Figure(data=[go.Surface(z=z_data.values)])
fig.update_layout(title='Mt Bruno Elevation', autosize=False,
width=500, height=500,
margin=dict(l=65, r=50, b=65, t=90))
fig.show()
fig.write_html("testfile.html")

Python SAC plot w/ grid

I'm required to use the information from a .sac file and plot it against a grid. I know that using various ObsPy functions one is able to plot the Seismograms using st.plot() but I can't seem to get it against a grid. I've also tried following the example given here "How do I draw a grid onto a plot in Python?" but have trouble when trying to configure my x axis to use UTCDatetime. I'm new to python and programming of this sort so any advice / help would be greatly appreciated.
Various resources used:
"http://docs.obspy.org/tutorial/code_snippets/reading_seismograms.html"
"http://docs.obspy.org/packages/autogen/obspy.core.stream.Stream.plot.html#obspy.core.stream.Stream.plot"
The Stream's plot() method actually automatically generates a grid, e.g. if you take the default example and plot it via:
from obspy.core import read
st = read() # without filename an example file is loaded
tr = st[0] # we will use only the first channel
tr.plot()
You may want to play with the number_of_ticks, tick_format and tick_rotationparameters as pointed out in http://docs.obspy.org/packages/autogen/obspy.core.stream.Stream.plot.html.
However if you want more control you can pass a matplotlib figure as input parameter to the plot() method:
from obspy.core import read
import matplotlib.pyplot as plt
fig = plt.figure()
st = read('/path/to/file.sac')
st.plot(fig=fig)
# at this point do whatever you want with your figure, e.g.
fig.gca().set_axis_off()
# finally display your figure
fig.show()
Hope it helps.

Efficient memory usage in python

I wrote a Python script using pyodbc to transfer data from an excel sheet into ms access and also using matplotlib to use some of the data in the excel sheet to create plots and save them to a folder. When I ran the script, it did what I expected it to do ; however, I was monitoring it with task manager and it ended up using over 1500 MB of RAM!
I don't even understand how that is possible. It created 560 images but the total size of those images was only 17 MB. The excel sheet is 8.5 MB. I understand that maybe you can't tell me exactly what the problem is without seeing all my code (I don't know exactly what the problem is so I would just have to post the whole thing and I don't think it is reasonable to ask you to read my entire code) but some general guidelines would suffice.
Thanks.
Update
I did just as #HYRY suggested and split my code up. I ran the script first with only the matplotlib functions and then afterwards without them. As those who have commented so far have suspected, the memory hog is from the matplotlib functions. Now that we have narrowed it down I will post some of my code. Note that the code below executes in two for loops. The inner for loop will always execute four times while the outer for loop executes however many times is necessary.
#Plot waveform and then relative harmonic orders on a bar graph.
#Remember that table is the sheet name which is named after the ExperimentID
cursorEx.execute('select ['+phase+' Time] from ['+table+']')
Time = cursorEx.fetchall()
cursorEx.execute('select ['+phase+' Waveform] from ['+table+']')
Current = np.asanyarray(cursorEx.fetchall())
experiment = table[ :-1]
plt.figure()
#A scale needs to be added to the primary current values
if line == 'P':
ratioCurrent = Current / 62.5
plt.plot(Time, ratioCurrent)
else:
plt.plot(Time, Current)
plt.title(phaseTitle)
plt.xlabel('Time (s)')
plt.ylabel('Current (A)')
plt.savefig(os.getcwd()+'\\HarmonicsGraph\\'+line+'H'+experiment+'.png')
cursorEx.execute('select ['+phase+' Order] from ['+table+']')
cursorEx.fetchone() #the first row is zero
order = cursorEx.fetchmany(51)
cursorEx.execute('select ['+phase+' Harmonics] from ['+table+']')
cursorEx.fetchone()
percentage = np.asanyarray(cursorEx.fetchmany(51))
intOrder = np.arange(1, len(order) + 1, 1)
plt.figure()
plt.bar(intOrder, percentage, width = 0.35, color = 'g')
plt.title(orderTitle)
plt.xlabel('Harmonic Order')
plt.ylabel('Percentage')
plt.axis([1, 51, 0, 100])
plt.savefig(os.getcwd()+'\\HarmonicsGraph\\'+line+'O'+experiment+'.png')
I think that plt.close() is a very hard solution when you will do more than one plot in your script. A lot of times, if you keep the figure reference, you can do all the work on them, calling before:
plt.clf()
you will see how your code is faster (is not the same to generate the canvas every time!). Memory leaks are terrible when you call multiple axes o figures without the proper clean!
I don't see a cleanup portion in your code, but I'd be willing to bet that the issue is that you are not calling
plt.close()
after you are finished with each plot. Add that one line after you are done with each figure and see if it helps.

Saving interactive Matplotlib figures

Is there a way to save a Matplotlib figure such that it can be re-opened and have typical interaction restored? (Like the .fig format in MATLAB?)
I find myself running the same scripts many times to generate these interactive figures. Or I'm sending my colleagues multiple static PNG files to show different aspects of a plot. I'd rather send the figure object and have them interact with it themselves.
I just found out how to do this. The "experimental pickle support" mentioned by #pelson works quite well.
Try this:
# Plot something
import matplotlib.pyplot as plt
fig,ax = plt.subplots()
ax.plot([1,2,3],[10,-10,30])
After your interactive tweaking, save the figure object as a binary file:
import pickle
pickle.dump(fig, open('FigureObject.fig.pickle', 'wb')) # This is for Python 3 - py2 may need `file` instead of `open`
Later, open the figure and the tweaks should be saved and GUI interactivity should be present:
import pickle
figx = pickle.load(open('FigureObject.fig.pickle', 'rb'))
figx.show() # Show the figure, edit it, etc.!
You can even extract the data from the plots:
data = figx.axes[0].lines[0].get_data()
(It works for lines, pcolor & imshow - pcolormesh works with some tricks to reconstruct the flattened data.)
I got the excellent tip from Saving Matplotlib Figures Using Pickle.
As of Matplotlib 1.2, we now have experimental pickle support. Give that a go and see if it works well for your case. If you have any issues, please let us know on the Matplotlib mailing list or by opening an issue on github.com/matplotlib/matplotlib.
This would be a great feature, but AFAIK it isn't implemented in Matplotlib and likely would be difficult to implement yourself due to the way figures are stored.
I'd suggest either (a) separate processing the data from generating the figure (which saves data with a unique name) and write a figure generating script (loading a specified file of the saved data) and editing as you see fit or (b) save as PDF/SVG/PostScript format and edit in some fancy figure editor like Adobe Illustrator (or Inkscape).
EDIT post Fall 2012: As others pointed out below (though mentioning here as this is the accepted answer), Matplotlib since version 1.2 allowed you to pickle figures. As the release notes state, it is an experimental feature and does not support saving a figure in one matplotlib version and opening in another. It's also generally unsecure to restore a pickle from an untrusted source.
For sharing/later editing plots (that require significant data processing first and may need to be tweaked months later say during peer review for a scientific publication), I still recommend the workflow of (1) have a data processing script that before generating a plot saves the processed data (that goes into your plot) into a file, and (2) have a separate plot generation script (that you adjust as necessary) to recreate the plot. This way for each plot you can quickly run a script and re-generate it (and quickly copy over your plot settings with new data). That said, pickling a figure could be convenient for short term/interactive/exploratory data analysis.
Why not just send the Python script? MATLAB's .fig files require the recipient to have MATLAB to display them, so that's about equivalent to sending a Python script that requires Matplotlib to display.
Alternatively (disclaimer: I haven't tried this yet), you could try pickling the figure:
import pickle
output = open('interactive figure.pickle', 'wb')
pickle.dump(gcf(), output)
output.close()
Good question. Here is the doc text from pylab.save:
pylab no longer provides a save function, though the old pylab
function is still available as matplotlib.mlab.save (you can still
refer to it in pylab as "mlab.save"). However, for plain text
files, we recommend numpy.savetxt. For saving numpy arrays,
we recommend numpy.save, and its analog numpy.load, which are
available in pylab as np.save and np.load.
I figured out a relatively simple way (yet slightly unconventional) to save my matplotlib figures. It works like this:
import libscript
import matplotlib.pyplot as plt
import numpy as np
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2*np.pi*t)
#<plot>
plt.plot(t, s)
plt.xlabel('time (s)')
plt.ylabel('voltage (mV)')
plt.title('About as simple as it gets, folks')
plt.grid(True)
plt.show()
#</plot>
save_plot(fileName='plot_01.py',obj=sys.argv[0],sel='plot',ctx=libscript.get_ctx(ctx_global=globals(),ctx_local=locals()))
with function save_plot defined like this (simple version to understand the logic):
def save_plot(fileName='',obj=None,sel='',ctx={}):
"""
Save of matplolib plot to a stand alone python script containing all the data and configuration instructions to regenerate the interactive matplotlib figure.
Parameters
----------
fileName : [string] Path of the python script file to be created.
obj : [object] Function or python object containing the lines of code to create and configure the plot to be saved.
sel : [string] Name of the tag enclosing the lines of code to create and configure the plot to be saved.
ctx : [dict] Dictionary containing the execution context. Values for variables not defined in the lines of code for the plot will be fetched from the context.
Returns
-------
Return ``'done'`` once the plot has been saved to a python script file. This file contains all the input data and configuration to re-create the original interactive matplotlib figure.
"""
import os
import libscript
N_indent=4
src=libscript.get_src(obj=obj,sel=sel)
src=libscript.prepend_ctx(src=src,ctx=ctx,debug=False)
src='\n'.join([' '*N_indent+line for line in src.split('\n')])
if(os.path.isfile(fileName)): os.remove(fileName)
with open(fileName,'w') as f:
f.write('import sys\n')
f.write('sys.dont_write_bytecode=True\n')
f.write('def main():\n')
f.write(src+'\n')
f.write('if(__name__=="__main__"):\n')
f.write(' '*N_indent+'main()\n')
return 'done'
or defining function save_plot like this (better version using zip compression to produce lighter figure files):
def save_plot(fileName='',obj=None,sel='',ctx={}):
import os
import json
import zlib
import base64
import libscript
N_indent=4
level=9#0 to 9, default: 6
src=libscript.get_src(obj=obj,sel=sel)
obj=libscript.load_obj(src=src,ctx=ctx,debug=False)
bin=base64.b64encode(zlib.compress(json.dumps(obj),level))
if(os.path.isfile(fileName)): os.remove(fileName)
with open(fileName,'w') as f:
f.write('import sys\n')
f.write('sys.dont_write_bytecode=True\n')
f.write('def main():\n')
f.write(' '*N_indent+'import base64\n')
f.write(' '*N_indent+'import zlib\n')
f.write(' '*N_indent+'import json\n')
f.write(' '*N_indent+'import libscript\n')
f.write(' '*N_indent+'bin="'+str(bin)+'"\n')
f.write(' '*N_indent+'obj=json.loads(zlib.decompress(base64.b64decode(bin)))\n')
f.write(' '*N_indent+'libscript.exec_obj(obj=obj,tempfile=False)\n')
f.write('if(__name__=="__main__"):\n')
f.write(' '*N_indent+'main()\n')
return 'done'
This makes use a module libscript of my own, which mostly relies on modules inspect and ast. I can try to share it on Github if interest is expressed (it would first require some cleanup and me to get started with Github).
The idea behind this save_plot function and libscript module is to fetch the python instructions that create the figure (using module inspect), analyze them (using module ast) to extract all variables, functions and modules import it relies on, extract these from the execution context and serialize them as python instructions (code for variables will be like t=[0.0,2.0,0.01] ... and code for modules will be like import matplotlib.pyplot as plt ...) prepended to the figure instructions. The resulting python instructions are saved as a python script whose execution will re-build the original matplotlib figure.
As you can imagine, this works well for most (if not all) matplotlib figures.
If you are looking to save python plots as an interactive figure to modify and share with others like MATLAB .fig file then you can try to use the following code. Here z_data.values is just a numpy ndarray and so you can use the same code to plot and save your own data. No need of using pandas then.
The file generated here can be opened and interactively modified by anyone with or without python just by clicking on it and opening in browsers like Chrome/Firefox/Edge etc.
import plotly.graph_objects as go
import pandas as pd
z_data=pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/api_docs/mt_bruno_elevation.csv')
fig = go.Figure(data=[go.Surface(z=z_data.values)])
fig.update_layout(title='Mt Bruno Elevation', autosize=False,
width=500, height=500,
margin=dict(l=65, r=50, b=65, t=90))
fig.show()
fig.write_html("testfile.html")

Categories