I'm hoping to find a way to optimise the following situation. I have a large contour plot created with imshow of matplotlib. I then want to use this contour plot to create a large number of png images, where each image is a small section of the contour image by changing the x and y limits and the aspect ratio.
So no plot data is changing in the loop, only the axis limits and the aspect ratio are changing between each png image.
The following MWE creates 70 png images in a "figs" folder demonstrating the simplified idea. About 80% of the runtime is taken up by fig.savefig('figs/'+filename).
I've looked into the following without coming up with an improvement:
An alternative to matplotlib with a focus on speed -- I've struggled to find any examples/documentation of contour/surface plots with similar requirements
Multiprocessing -- Similar questions I've seen here appear to require fig = plt.figure() and ax.imshow to be called within the loop, since fig and ax can't be pickled. In my case this will be more expensive than any speed gains achieved by implementing multiprocessing.
I'd appreciate any insight or suggestions you might have.
import numpy as np
import matplotlib as mpl
mpl.use('agg')
import matplotlib.pyplot as plt
import time, os
def make_plot(x, y, fix, ax):
aspect = np.random.random(1)+y/2.0-x
xrand = np.random.random(2)*x
xlim = [min(xrand), max(xrand)]
yrand = np.random.random(2)*y
ylim = [min(yrand), max(yrand)]
filename = '{:d}_{:d}.png'.format(x,y)
ax.set_aspect(abs(aspect[0]))
ax.set_xlim(xlim)
ax.set_ylim(ylim)
fig.savefig('figs/'+filename)
if not os.path.isdir('figs'):
os.makedirs('figs')
data = np.random.rand(25, 25)
fig = plt.figure()
ax = fig.add_axes([0., 0., 1., 1.])
# in the real case, imshow is an expensive calculation which can't be put inside the loop
ax.imshow(data, interpolation='nearest')
tstart = time.clock()
for i in range(1, 8):
for j in range(3, 13):
make_plot(i, j, fig, ax)
print('took {:.2f} seconds'.format(time.clock()-tstart))
Since the limitation in this case is the call to plt.savefig() it cannot be optimized a lot. Internally the figure is rendered from scratch and that takes a while. Possibly reducing the number of vertices to be drawn might reduce the time a bit.
The time to run your code on my machine (Win 8, i5 with 4 cores 3.5GHz) is 2.5 seconds. This seems not too bad. One can get a little improvement by using Multiprocessing.
A note about Multiprocessing: It may seem surprising that using the state machine of pyplot inside multiprocessing should work at all. But it does.
And in this case here, since every image is based on the same figure and axes object, one does not even have to create new figures and axes.
I modified an answer I gave here a while ago for your case and the total time is roughly halved using multiprocessing and 5 processes on 4 cores. I appended a barplot which shows the effect of multiprocessing.
import numpy as np
#import matplotlib as mpl
#mpl.use('agg') # use of agg seems to slow things down a bit
import matplotlib.pyplot as plt
import multiprocessing
import time, os
def make_plot(d):
start = time.clock()
x,y=d
#using aspect in this way causes a warning for me
#aspect = np.random.random(1)+y/2.0-x
xrand = np.random.random(2)*x
xlim = [min(xrand), max(xrand)]
yrand = np.random.random(2)*y
ylim = [min(yrand), max(yrand)]
filename = '{:d}_{:d}.png'.format(x,y)
ax = plt.gca()
#ax.set_aspect(abs(aspect[0]))
ax.set_xlim(xlim)
ax.set_ylim(ylim)
plt.savefig('figs/'+filename)
stop = time.clock()
return np.array([x,y, start, stop])
if not os.path.isdir('figs'):
os.makedirs('figs')
data = np.random.rand(25, 25)
fig = plt.figure()
ax = fig.add_axes([0., 0., 1., 1.])
ax.imshow(data, interpolation='nearest')
some_list = []
for i in range(1, 8):
for j in range(3, 13):
some_list.append((i,j))
if __name__ == "__main__":
multiprocessing.freeze_support()
tstart = time.clock()
print tstart
num_proc = 5
p = multiprocessing.Pool(num_proc)
nu = p.map(make_plot, some_list)
tooktime = 'Plotting of {} frames took {:.2f} seconds'
tooktime = tooktime.format(len(some_list), time.clock()-tstart)
print tooktime
nu = np.array(nu)
plt.close("all")
fig, ax = plt.subplots(figsize=(8,5))
plt.suptitle(tooktime)
ax.barh(np.arange(len(some_list)), nu[:,3]-nu[:,2],
height=np.ones(len(some_list)), left=nu[:,2], align="center")
ax.set_xlabel("time [s]")
ax.set_ylabel("image number")
ax.set_ylim([-1,70])
plt.tight_layout()
plt.savefig(__file__+".png")
plt.show()
Related
I am giving data to a matrix (e.g. with shape 100x100) by the following code:
from random import randint
import matplotlib.pyplot as plt
import numpy as np
import random as rand
tab = np.eye(100, 100)
x = np.arange(0, 100, 1)
plt.ion()
fig = plt.figure()
ax = fig.add_subplot(111)
for i in range(100):
for j in range(100):
tab[i, j] = rand.randint(0, 254)
line1, = ax.plot(x, tab[i, :], 'r-')
line1.set_ydata(tab[i, j])
fig.canvas.draw()
fig.canvas.flush_events()
ax.lines.remove(line1)
I need to update matrix using loops and upgrade plot in the same time.
When loop with j ends, i-loop want to clear plot and start plotting again. Is it possible?
My result:
What I need:
After reading your comment i think i understood what you where trying to do
the reason you got those horizontal lines was that you're setting ydata again after plotting(to a constant so its like plotting a horizontal line)
consider the code below:
from random import randint
import matplotlib.pyplot as plt
import numpy as np
import random as rand
tab = np.eye(100, 100)
x = np.arange(0, 100, 1)
plt.ion()
fig = plt.figure()
ax = fig.add_subplot(111)
for i in range(100):
for j in range(100):
tab[i, j] = ((50-i/2)*(50-i/2)-(50-j)*(50-j))/100
for i in range(100):
line1, = ax.plot(x, tab[i, :], 'r-')
fig.canvas.draw()
fig.canvas.flush_events()
ax.lines.remove(line1)
I used another for to instantiate the tab map (since you're using sensor data I guess that is exactly what you're doing in your code because you need to read all of the data (at least the ones for the current cross section) to be able to plot the type of graph you want. this is equivalent to reading all of the data at the beginning and then starting to plot it)
(I also used simulated values instead of random values for the sake of testing)
if you want to draw the data AS THEY COME FROM THE SENSOR then you must define a function to get the data of the current cross section from the sensor and return an array. Idk the library you're using for the sensor but I'm assuming the scan functions are synchronous so the function will return exactly after the input is over making the whole thing pseudo-real time
from random import randint
import matplotlib.pyplot as plt
import numpy as np
import random as rand
x = np.arange(0, 100, 1)
plt.ion()
fig = plt.figure()
ax = fig.add_subplot(111)
for i in range(100):
data = READ_CURRENT_CROSS_SECTION()
line1, = ax.plot(x, data, 'r-')
fig.canvas.draw()
fig.canvas.flush_events()
ax.lines.remove(line1)
again, if plotting the data as the come from the sensor is your goal here it is going to depend a lot on the library you're using but except for all of that the problem with your code was that it was trying to plot while it was getting the data point by point which gives you insufficient data for plotting a cross section(hence the straight lines) (PS: there actually are some ways to pull it off like this but will be extremely slow!)
So either
write a function to scan the whole 2d area and return the whole map before you start plotting(which will be like my first code and the function i just said will replace lines 11-13). this takes away the real time feature but it will give you a beautiful animated plot in a short time
write a function to scan each cross section and return it as a 100 element array. which makes it kind of real time but i guess is harder to implement. This is like my second code but you have to define READ_CURRENT_CROSS_SECTION yourself
The iteration update very slow, n+=3 for each time only but my data has 10000 elements. Like, It tries to update every single frame n=1,n=2,n=3.. but the hist function is really power consuming. I don't know if there are any way I could skip frames like from n=1 go straight to n=500 and to n=1000.
import matplotlib.animation as animation
import numpy as np
import matplotlib.pyplot as plt
n=10000
def update(curr):
if curr==n:
a.event_source.stop()
first_histogram.cla()
sec_histogram.cla()
thi_histogram.cla()
for_histogram.cla()
first_histogram.hist(x1[:curr], bins=np.arange(-6,2,0.5))
sec_histogram.hist(x2[:curr], bins=np.arange(-1,15,1))
thi_histogram.hist(x3[:curr], bins=np.arange(2,22,1))
for_histogram.hist(x4[:curr], bins=np.arange(13,21,1))
first_histogram.set_title('n={}'.format(curr))
fig=plt.figure()
gspec=gridspec.GridSpec(2,2)
first_histogram=plt.subplot(gspec[0,0])
sec_histogram=plt.subplot(gspec[0,1])
thi_histogram=plt.subplot(gspec[1,0])
for_histogram=plt.subplot(gspec[1,1])
a = animation.FuncAnimation(fig,update,blit=True,interval=1,repeat=False)
How can I make it faster ? Thank you!
There are several things to note here.
blit=True is not useful when clearing the axes in between. It would either not take effect, or you would get wrong tick labels on the axes.
It would only be useful if the axes limits do not change from frame to frame. However in a normal histogram, where more and more data is animated, this would necessarily need to be the case, else your bars either grow out of the axes, or you do not see the low numbers at the start. As an alternative, you could plot a normalized histogram (i.e. a density plot).
Also, interval=1 is not useful. You will not be able to animate 4 subplots with a 1 millisecond frame rate on any normal system. Matplotlib is too slow for that. However, consider that the human brain can usually not resolve framerates above some 25 fps, i.e. 40 ms, anyways. That's probably the frame rate to aim at (although matplotlib may not achieve that)
So a way to set this up is simply via
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
x1 = np.random.normal(-2.5, 1, 10000)
def update(curr):
ax.clear()
ax.hist(x1[:curr], bins=np.arange(-6,2,0.5))
ax.set_title('n={}'.format(curr))
fig, ax = plt.subplots()
a = animation.FuncAnimation(fig, update, frames=len(x1), interval=40, repeat=False, blit=False)
plt.show()
If you feel like you want to arrive more quickly at the final number of items in the list, use less frames. E.g. for a 25 times faster animation, show only every 25th state,
a = animation.FuncAnimation(fig, update, frames=np.arange(0, len(x1)+1, 25),
interval=40, repeat=False, blit=False)
This code runs with a framerate of 11 fps (interval of ~85 ms), so it's slower than specified, which in turn means, we could directly set interval=85.
In order to increase the frame rate one may use blitting.
For that, you will need to not update the axes limits at all. To optimize further you may precompute all the histograms to show. Note however that the axes limits should then not change, so we set them at the beginning, which leads to a different plot.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
x1 = np.random.normal(-2.5, 1, 10000)
bins = np.arange(-6,2,0.5)
hist = np.empty((len(x1), len(bins)-1))
for i in range(len(x1)):
hist[i, :], _ = np.histogram(x1[:i], bins=bins)
def update(i):
for bar, y in zip(bars, hist[i,:]):
bar.set_height(y)
text.set_text('n={}'.format(i))
return list(bars) + [text]
fig, ax = plt.subplots()
ax.set_ylim(0,hist.max()*1.05)
bars = ax.bar(bins[:-1], hist[0,:], width=np.diff(bins), align="edge")
text = ax.text(.99,.99, "", ha="right", va="top", transform=ax.transAxes)
ani = animation.FuncAnimation(fig, update, frames=len(x1), interval=1, repeat=False, blit=True)
plt.show()
Running this code give me a framerate of 215 fps, (4.6 ms per frame), so we could set the interval to 4.6 ms.
Tested in python 3.10 and matplotlib 3.5.1
10000 samples creates a 40MB animation, which exceeds the 2MB limit for posting a gif.
The following animation example uses 500 samples, x1 = np.random.normal(-2.5, 1, 500)
I'm trying to make an interactive program which primarily uses matplotlib to make scatter plots of rather a lot of points (10k-100k or so). Right now it works, but changes take too long to render. Small numbers of points are ok, but once the number rises things get frustrating in a hurry. So, I'm working on ways to speed up scatter, but I'm not having much luck
There's the obvious way to do thing (the way it's implemented now)
(I realize the plot redraws without updating. I didn't want to alter the fps result with large calls to random).
import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mpl
import time
X = np.random.randn(10000) #x pos
Y = np.random.randn(10000) #y pos
C = np.random.random(10000) #will be color
S = (1+np.random.randn(10000)**2)*3 #size
#build the colors from a color map
colors = mpl.cm.jet(C)
#there are easier ways to do static alpha, but this allows
#per point alpha later on.
colors[:,3] = 0.1
fig, ax = plt.subplots()
fig.show()
background = fig.canvas.copy_from_bbox(ax.bbox)
#this makes the base collection
coll = ax.scatter(X,Y,facecolor=colors, s=S, edgecolor='None',marker='D')
fig.canvas.draw()
sTime = time.time()
for i in range(10):
print i
#don't change anything, but redraw the plot
ax.cla()
coll = ax.scatter(X,Y,facecolor=colors, s=S, edgecolor='None',marker='D')
fig.canvas.draw()
print '%2.1f FPS'%( (time.time()-sTime)/10 )
Which gives a speedy 0.7 fps
Alternatively, I can edit the collection returned by scatter. For that, I can change color and position, but don't know how to change the size of each point. That would I think look something like this
import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mpl
import time
X = np.random.randn(10000) #x pos
Y = np.random.randn(10000) #y pos
C = np.random.random(10000) #will be color
S = (1+np.random.randn(10000)**2)*3 #size
#build the colors from a color map
colors = mpl.cm.jet(C)
#there are easier ways to do static alpha, but this allows
#per point alpha later on.
colors[:,3] = 0.1
fig, ax = plt.subplots()
fig.show()
background = fig.canvas.copy_from_bbox(ax.bbox)
#this makes the base collection
coll = ax.scatter(X,Y,facecolor=colors, s=S, edgecolor='None', marker='D')
fig.canvas.draw()
sTime = time.time()
for i in range(10):
print i
#don't change anything, but redraw the plot
coll.set_facecolors(colors)
coll.set_offsets( np.array([X,Y]).T )
#for starters lets not change anything!
fig.canvas.restore_region(background)
ax.draw_artist(coll)
fig.canvas.blit(ax.bbox)
print '%2.1f FPS'%( (time.time()-sTime)/10 )
This results in a slower 0.7 fps. I wanted to try using CircleCollection or RegularPolygonCollection, as this would allow me to change the sizes easily, and I don't care about changing the marker. But, I can't get either to draw so I have no idea if they'd be faster. So, at this point I'm looking for ideas.
I've been through this a few times trying to speed up scatter plots with large numbers of points, variously trying:
Different marker types
Limiting colours
Cutting down the dataset
Using a heatmap / grid instead of a scatter plot
And none of these things worked. Matplotlib is just not very performant when it comes to scatter plots. My only recommendation is to use a different plotting library, though I haven't personally found one that was suitable. I know this doesn't help much, but it may save you some hours of fruitless tinkering.
We are actively working on performance for large matplotlib scatter plots.
I'd encourage you to get involved in the conversation (http://matplotlib.1069221.n5.nabble.com/mpl-1-2-1-Speedup-code-by-removing-startswith-calls-and-some-for-loops-td41767.html) and, even better, test out the pull request that has been submitted to make life much better for a similar case (https://github.com/matplotlib/matplotlib/pull/2156).
HTH
I've run into a fairly serious issue with matplotlib and Python. I have a dense periodogram data set and want to plot it. The issue is that when there are more data points than can be plotted on a pixel, the package does not pick the min and max to display. This means a casual look at the plot can lead you to incorrect conclusions.
Here's an example of such a problem:
The dataset was plotted with plot() and scatter() overlayed. You can see that in the dense data fields, the blue line that connects the data does not reach the actual peaks, leading a human viewer to conclude the peak at ~2.4 is the maximum, when it's really not.
If you zoom-in or force a wide viewing window, it is displayed correctly. rasterize and aa keywords have no effect on the issue.
Is there a way to ensure that the min/max points of a plot() call are always rendered? Otherwise, this needs to be addressed in an update to matplotlib. I've never had a plotting package behave like this, and this is a pretty major issue.
Edit:
x = numpy.linspace(0,1,2000000)
y = numpy.random.random(x.shape)
y[1000000]=2
plot(x,y)
show()
Should replicate the problem. Though it may depend on your monitor resolution. By dragging and resizing the window, you should see the problem. One data point should stick out a y=2, but that doesn't always display.
This is due to the path-simplification algorithm in matplotlib. While it's certainly not desirable in some cases, it's deliberate behavior to speed up rendering.
The simplification algorithm was changed at some point to avoid skipping "outlier" points, so newer versions of mpl don't exhibit this exact behavior (the path is still simplified, though).
If you don't want to simplify paths, then you can disable it in the rc parameters (either in your .matplotlibrc file or at runtime).
E.g.
import matplotlib as mpl
mpl.rcParams['path.simplify'] = False
import matplotlib.pyplot as plt
However, it may make more sense to use an "envelope" style plot. As a quick example:
import matplotlib.pyplot as plt
import numpy as np
def main():
num = 10000
x = np.linspace(0, 10, num)
y = np.cos(x) + 5 * np.random.random(num)
fig, (ax1, ax2) = plt.subplots(nrows=2)
ax1.plot(x, y)
envelope_plot(x, y, winsize=40, ax=ax2)
plt.show()
def envelope_plot(x, y, winsize, ax=None, fill='gray', color='blue'):
if ax is None:
ax = plt.gca()
# Coarsely chunk the data, discarding the last window if it's not evenly
# divisible. (Fast and memory-efficient)
numwin = x.size // winsize
ywin = y[:winsize * numwin].reshape(-1, winsize)
xwin = x[:winsize * numwin].reshape(-1, winsize)
# Find the min, max, and mean within each window
ymin = ywin.min(axis=1)
ymax = ywin.max(axis=1)
ymean = ywin.mean(axis=1)
xmean = xwin.mean(axis=1)
fill_artist = ax.fill_between(xmean, ymin, ymax, color=fill,
edgecolor='none', alpha=0.5)
line, = ax.plot(xmean, ymean, color=color, linestyle='-')
return fill_artist, line
if __name__ == '__main__':
main()
I need to animate data as they come with a 2D histogram2d ( maybe later 3D but as I hear mayavi is better for that ).
Here's the code:
import numpy as np
import numpy.random
import matplotlib.pyplot as plt
import time, matplotlib
plt.ion()
# Generate some test data
x = np.random.randn(50)
y = np.random.randn(50)
heatmap, xedges, yedges = np.histogram2d(x, y, bins=5)
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
# start counting for FPS
tstart = time.time()
for i in range(10):
x = np.random.randn(50)
y = np.random.randn(50)
heatmap, xedges, yedges = np.histogram2d(x, y, bins=5)
plt.clf()
plt.imshow(heatmap, extent=extent)
plt.draw()
# calculate and print FPS
print 'FPS:' , 20/(time.time()-tstart)
It returns 3 fps, too slow apparently. Is it the use of the numpy.random in each iteration? Should I use blit? If so how?
The docs have some nice examples but for me I need to understand what everything does.
Thanks to #Chris I took a look at the examples again and also found this incredibly helpful post in here.
As #bmu states in he's answer (see post) using animation.FuncAnimation was the way for me.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
def generate_data():
# do calculations and stuff here
return # an array reshaped(cols,rows) you want the color map to be
def update(data):
mat.set_data(data)
return mat
def data_gen():
while True:
yield generate_data()
fig, ax = plt.subplots()
mat = ax.matshow(generate_data())
plt.colorbar(mat)
ani = animation.FuncAnimation(fig, update, data_gen, interval=500,
save_count=50)
plt.show()
I suspect it is the use of np.histogram2d in each loop iteration. or that in each loop iteration of the for loop you are clearing and drawing a new figure. To speed things up you should create a figure once and just update the properties and data of the figure in a loop. Have a look through the matplotlib animation examples for some pointers on how to do this. Typically it involves calling matplotlib.pyploy.plot then, in a loop, calling axes.set_xdata and axes.set_ydata.
In your case however, take a look at the matplotlib animation example dynamic image 2. In this example the generation of data is separated from the animation of the data (may not be a great approach if you have lots of data). By splitting these two parts up you can see which is causing a bottleneck, numpy.histrogram2d or imshow (use time.time() around each part).
P.s. np.random.randn is a psuedo-random number generator. These tend to be simple linear generators which can generate many millions of (psuedo-)random numbers per second, so this is almost certainly not your bottleneck - drawing to screen is almost always a slower process than any number crunching.