matplotlib major display issue with dense data sets - python

I've run into a fairly serious issue with matplotlib and Python. I have a dense periodogram data set and want to plot it. The issue is that when there are more data points than can be plotted on a pixel, the package does not pick the min and max to display. This means a casual look at the plot can lead you to incorrect conclusions.
Here's an example of such a problem:
The dataset was plotted with plot() and scatter() overlayed. You can see that in the dense data fields, the blue line that connects the data does not reach the actual peaks, leading a human viewer to conclude the peak at ~2.4 is the maximum, when it's really not.
If you zoom-in or force a wide viewing window, it is displayed correctly. rasterize and aa keywords have no effect on the issue.
Is there a way to ensure that the min/max points of a plot() call are always rendered? Otherwise, this needs to be addressed in an update to matplotlib. I've never had a plotting package behave like this, and this is a pretty major issue.
Edit:
x = numpy.linspace(0,1,2000000)
y = numpy.random.random(x.shape)
y[1000000]=2
plot(x,y)
show()
Should replicate the problem. Though it may depend on your monitor resolution. By dragging and resizing the window, you should see the problem. One data point should stick out a y=2, but that doesn't always display.

This is due to the path-simplification algorithm in matplotlib. While it's certainly not desirable in some cases, it's deliberate behavior to speed up rendering.
The simplification algorithm was changed at some point to avoid skipping "outlier" points, so newer versions of mpl don't exhibit this exact behavior (the path is still simplified, though).
If you don't want to simplify paths, then you can disable it in the rc parameters (either in your .matplotlibrc file or at runtime).
E.g.
import matplotlib as mpl
mpl.rcParams['path.simplify'] = False
import matplotlib.pyplot as plt
However, it may make more sense to use an "envelope" style plot. As a quick example:
import matplotlib.pyplot as plt
import numpy as np
def main():
num = 10000
x = np.linspace(0, 10, num)
y = np.cos(x) + 5 * np.random.random(num)
fig, (ax1, ax2) = plt.subplots(nrows=2)
ax1.plot(x, y)
envelope_plot(x, y, winsize=40, ax=ax2)
plt.show()
def envelope_plot(x, y, winsize, ax=None, fill='gray', color='blue'):
if ax is None:
ax = plt.gca()
# Coarsely chunk the data, discarding the last window if it's not evenly
# divisible. (Fast and memory-efficient)
numwin = x.size // winsize
ywin = y[:winsize * numwin].reshape(-1, winsize)
xwin = x[:winsize * numwin].reshape(-1, winsize)
# Find the min, max, and mean within each window
ymin = ywin.min(axis=1)
ymax = ywin.max(axis=1)
ymean = ywin.mean(axis=1)
xmean = xwin.mean(axis=1)
fill_artist = ax.fill_between(xmean, ymin, ymax, color=fill,
edgecolor='none', alpha=0.5)
line, = ax.plot(xmean, ymean, color=color, linestyle='-')
return fill_artist, line
if __name__ == '__main__':
main()

Related

Adjust axes to make space for offset line plot

I would like to plot a series of curves in the same Axes each having a constant y offset from eachother. Because the data I have needs to be displayed in log scale, simply adding a y offset to each curve (as done here) does not give the desired output.
I have tried using matplotlib.transforms to achieve the same, i.e. artificially shifting the curve in Figure coordinates. This achieves the desired result, but requires adjusting the Axes y limits so that the shifted curves are visible. Here is an example to illustrate this, though such data would not require log scale to be visible:
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1,1)
for i in range(1,19):
x, y = np.arange(200), np.random.rand(200)
dy = 0.5*i
shifted = mpl.transforms.offset_copy(ax.transData, y=dy, fig=fig, units='inches')
ax.set_xlim(0, 200)
ax.set_ylim(0.1, 1e20)
ax.set_yscale('log')
ax.plot(x, y, transform=shifted, c=mpl.cm.plasma(i/18), lw=2)
The problem is that to make all the shifted curves visible, I would need to adjust the ylim to a very high number, which compresses all the curves so that the features visible because of the log scale cannot be seen anymore.
Since the displayed y axis values are meaningless to me, is there any way to artificially extend the Axes limits to display all the curves, without having to make the Figure very large? Apparently this can be done with seaborn, but if possible I would like to stick to matplotlib.
EDIT:
This is the kind of data I need to plot (an X-ray diffraction pattern varying with temperature):

Faster way to provide rotation to scatter point plots in matplotlib?

Currently I use the following to plot a set of rotated lines (geologic strike indicators). However, this section of code takes a long time even with only a modest amount of strikes (5000). Each point has a unique rotation. Is there a way to give matplotlib a list with the rotations and perform the plotting faster than rotating one-by-one like this?
sample=#3d-array of points(x,y,theta) where theta is an amount I want to rotate the points by.
for i in range(len(sample.T)):
t = matplotlib.markers.MarkerStyle(marker='|')
t._transform = t.get_transform().rotate_deg(sample[2,i])
plt.scatter(sample[0,i],sample[1,i],marker=t,s=50,c='0',linewidth=1)
Here you create 5000 individual scatter plots. That is for sure inefficient. You may use a solution I proposed in this answer, namely to set the individual markers as paths to a PathCollection. This would work similar to a scatter, with an additional argument m for the markers.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.markers as mmarkers
def mscatter(x,y,ax=None, m=None, **kw):
import matplotlib.markers as mmarkers
if not ax: ax=plt.gca()
sc = ax.scatter(x,y,**kw)
if (m is not None) and (len(m)==len(x)):
paths = []
for marker in m:
if isinstance(marker, mmarkers.MarkerStyle):
marker_obj = marker
else:
marker_obj = mmarkers.MarkerStyle(marker)
path = marker_obj.get_path().transformed(
marker_obj.get_transform())
paths.append(path)
sc.set_paths(paths)
return sc
np.random.seed(42)
data = np.random.rand(5000,3)
data[:,2] *= 360
markers = []
fig, ax = plt.subplots()
for i in range(len(data)):
t = mmarkers.MarkerStyle(marker='|')
t._transform = t.get_transform().rotate_deg(data[i,2])
markers.append(t)
mscatter(data[:,0], data[:,1], m=markers, s=50, c='0', linewidth=1)
plt.show()
If we time this we find that this takes ~250 ms to create the plot with 5000 points and 5000 different angles. The loop solution would in contrast take more than 12 seconds.
So far for the general question on how to rotate many markers. For the special case here, it seems you want to use simple line markers. This could easily be done using a quiver plot. One may then turn the arrow heads off to have the arrows look like lines.
fig, ax = plt.subplots()
ax.quiver(data[:,0], data[:,1], 1,1, angles=data[:,2]+90, scale=1/10, scale_units="dots",
units="dots", color="k", pivot="mid",width=1, headwidth=1, headlength=0)
The result is pretty much the same, with the benefit of this plot only taking ~80 ms, which is again three times faster than the PathCollection.

Zoom an inline 3D matplotlib figure *without* using the mouse?

This question explains how to change the "camera position" of a 3D plot in matplotlib by specifying the elevation and azimuth angles. ax.view_init(elev=10,azim=20), for example.
Is there a similar way to specify the zoom of the figure numerically -- i.e. without using the mouse?
The only relevant question I could find is this one, but the accepted answer to that involves installing another library, which then also requires using the mouse to zoom.
EDIT:
Just to be clear, I'm not talking about changing the figure size (using fig.set_size_inches() or similar). The figure size is fine; the problem is that the plotted stuff only takes up a small part of the figure:
The closest solution to view_init is setting ax.dist directly. According to the docs for get_proj "dist is the distance of the eye viewing point from the object point". The initial value is currently hardcoded with dist = 10. Lower values (above 0!) will result in a zoomed in plot.
Note: This behavior is not really documented and may change. Changing the limits of the axes to plot only the relevant parts is probably a better solution in most cases. You could use ax.autoscale(tight=True) to do this conveniently.
Working IPython/Jupyter example:
%matplotlib inline
from IPython.display import display
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Grab some test data.
X, Y, Z = axes3d.get_test_data(0.05)
# Plot a basic wireframe.
ax.view_init(90, 0)
ax.plot_wireframe(X, Y, Z, rstride=10, cstride=10)
plt.close()
from ipywidgets import interact
#interact(dist=(1, 20, 1))
def update(dist=10):
ax.dist = dist
display(fig)
Output
dist = 10
dist = 5

Wireframe joins the wrong way in numpy matplotlib mplot3d

I'm trying to create a 3D wireframe in Python using matplotlib.
When I get to the actual graph plotting, however, the wireframe joins the wrong way, as shown in the images below.
How can I force matplotlib to join the wireframe along a certain axis?
My code is below:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
def rossler(x_n, y_n, z_n, h, a, b, c):
#defining the rossler function
x_n1=x_n+h*(-y_n-z_n)
y_n1=y_n+h*(x_n+a*y_n)
z_n1=z_n+h*(b+z_n*(x_n-c))
return x_n1,y_n1,z_n1
#defining a, b, and c
a = 1.0/5.0
b = 1.0/5.0
c = 5
#defining time limits and steps
t_0 = 0
t_f = 32*np.pi
h = 0.01
steps = int((t_f-t_0)/h)
#3dify
c_list = np.linspace(5,10,6)
c_size = len(c_list)
c_array = np.zeros((c_size,steps))
for i in range (0, c_size):
for j in range (0, steps):
c_array[i][j] = c_list[i]
#create plotting values
t = np.zeros((c_size,steps))
for i in range (0, c_size):
t[i] = np.linspace(t_0,t_f,steps)
x = np.zeros((c_size,steps))
y = np.zeros((c_size,steps))
z = np.zeros((c_size,steps))
binvar, array_size = x.shape
#initial conditions
x[0] = 0
y[0] = 0
z[0] = 0
for j in range(0, c_size-1):
for i in range(array_size-1):
c = c_list[j]
#re-evaluate the values of the x-arrays depending on the initial conditions
[x[j][i+1],y[j][i+1],z[j][i+1]]=rossler(x[j][i],y[j][i],z[j][i],t[j][i+1]-t[j][i],a,b,c)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_wireframe(t,x,c_array, rstride=10, cstride=10)
plt.show()
I am getting this as an output:
The same output from another angle:
Whereas I'd like the wireframe to join along the wave-peaks. Sorry, I can't give you an image I'd like to see, that's my problem, but I guess it'd be more like the tutorial image.
If I understood, you want to link the 6 traces with polygons. You can do that by triangulating the traces 2 by 2, then plotting the surface with no edges or antialising. Maybe choosing a good colormap will also help.
Just keep in mind that this will be a very heavy plot. The exported SVG weight 10mb :)
import matplotlib.tri as mtri
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for LineIndex in range(c_size-1):
# If plotting all at once, you get a MemoryError. I'll plot each 6 points
for Sample in range(0, array_size-1, 3):
# I switched x and c_array, because the surface and the triangles
# will look better by default
X = np.concatenate([t[LineIndex,Sample:Sample+3], t[LineIndex+1,Sample:Sample+3]])
Y = np.concatenate([c_array[LineIndex,Sample:Sample+3], c_array[LineIndex+1,Sample:Sample+3]])
Z = np.concatenate([x[LineIndex,Sample:Sample+3], x[LineIndex+1,Sample:Sample+3]])
T = mtri.Triangulation(X, Y)
ax.plot_trisurf(X, Y, Z, triangles=T.triangles, edgecolor='none', antialiased=False)
ax.set_xlabel('t')
ax.set_zlabel('x')
plt.savefig('Test.png', format='png', dpi=600)
plt.show()
Here is the resulting image:
I'm quite unsure about what you're exactly trying to achieve, but I don't think it will work.
Here's what your data looks like when plotted layer by layer (without and with filling):
You're trying to plot this as a wireframe plot. Here's how a wireframe plot looks like as per the manual:
Note the huge differene: a wireframe plot is essentially a proper surface plot, the only difference is that the faces of the surface are fully transparent. This also implies that you can only plot
single-valued functions of the form z(x,y), which are furthermore
specified on a rectangular mesh (at least topologically)
Your data is neither: your points are given along lines, and they are stacked on top of each other, so there's no chance that this is a single surface that can be plotted.
If you just want to visualize your functions above each other, here's how I plotted the above figures:
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for zind in range(t.shape[0]):
tnow,xnow,cnow = t[zind,:],x[zind,:],c_array[zind,:]
hplot = ax.plot(tnow,xnow,cnow)
# alternatively fill:
stride = 10
tnow,xnow,cnow = tnow[::stride],xnow[::stride],cnow[::stride]
slice_from = slice(None,-1)
slice_to = slice(1,None)
xpoly = np.array([tnow[slice_from],
tnow[slice_to],
tnow[slice_to],
tnow[slice_from]]
).T
ypoly = np.array([xnow[slice_from],
xnow[slice_to],
np.zeros_like(xnow[slice_to]),
np.zeros_like(xnow[slice_from])]
).T
zpoly = np.array([cnow[slice_from],
cnow[slice_to],
cnow[slice_to],
cnow[slice_from]]
).T
tmppoly = [tuple(zip(xrow,yrow,zrow)) for xrow,yrow,zrow in zip(xpoly,ypoly,zpoly)]
poly3dcoll = Poly3DCollection(tmppoly,linewidth=0.0)
poly3dcoll.set_edgecolor(hplot[0].get_color())
poly3dcoll.set_facecolor(hplot[0].get_color())
ax.add_collection3d(poly3dcoll)
plt.xlabel('t')
plt.ylabel('x')
plt.show()
There is one other option: switching your coordinate axes, such that the (x,t) pair corresponds to a vertical plane rather than a horizontal one. In this case your functions for various c values are drawn on parallel planes. This allows a wireframe plot to be used properly, but since your functions have extrema in different time steps, the result is as confusing as your original plot. You can try using very few plots along the t axis, and hoping that the extrema are close. This approach needs so much guesswork that I didn't try to do this myself. You can plot each function as a filled surface instead, though:
from matplotlib.collections import PolyCollection
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for zind in range(t.shape[0]):
tnow,xnow,cnow = t[zind,:],x[zind,:],c_array[zind,:]
hplot = ax.plot(tnow,cnow,xnow)
# alternative to fill:
stride = 10
tnow,xnow,cnow = tnow[::stride],xnow[::stride],cnow[::stride]
slice_from = slice(None,-1)
slice_to = slice(1,None)
xpoly = np.array([tnow[slice_from],
tnow[slice_to],
tnow[slice_to],
tnow[slice_from]]
).T
ypoly = np.array([xnow[slice_from],
xnow[slice_to],
np.zeros_like(xnow[slice_to]),
np.zeros_like(xnow[slice_from])]
).T
tmppoly = [tuple(zip(xrow,yrow)) for xrow,yrow in zip(xpoly,ypoly)]
polycoll = PolyCollection(tmppoly,linewidth=0.5)
polycoll.set_edgecolor(hplot[0].get_color())
polycoll.set_facecolor(hplot[0].get_color())
ax.add_collection3d(polycoll,zdir='y',zs=cnow[0])
hplot[0].set_color('none')
ax.set_xlabel('t')
ax.set_zlabel('x')
plt.show()
This results in something like this:
There are a few things to note, however.
3d scatter and wire plots are very hard to comprehend, due to the lacking depth information. You might be approaching your visualization problem in a fundamentally wrong way: maybe there are other options with which you can visualize your data.
Even if you do something like the plots I showed, you should be aware that matplotlib has historically been failing to plot complicated 3d objects properly. Now by "properly" I mean "with physically reasonable apparent depth", see also the mplot3d FAQ note describing exactly this. The core of the problem is that matplotlib projects every 3d object to 2d, and draws these pancakes on the sreen one after the other. Sometimes the asserted drawing order of the pancakes doesn't correspond to their actual relative depth, which leads to artifacts that are both very obvious to humans and uncanny to look at. If you take a closer look at the first filled plot in this post, you'll see that the gold flat plot is behind the magenta one, even though it should be on top of it. Similar things often happen with 3d bar plots and convoluted surfaces.
When you're saying "Sorry, I can't give you an image I'd like to see, that's my problem", you're very wrong. It's not just your problem. It might be crystal clear in your head what you're trying to achieve, but unless you very clearly describe what you see in your head, the outside world will have to resort to guesswork. You can make the work of others and yourself alike easier by trying to be as informative as possible.

How do I plot a spectrogram the same way that pylab's specgram() does?

In Pylab, the specgram() function creates a spectrogram for a given list of amplitudes and automatically creates a window for the spectrogram.
I would like to generate the spectrogram (instantaneous power is given by Pxx), modify it by running an edge detector on it, and then plot the result.
(Pxx, freqs, bins, im) = pylab.specgram( self.data, Fs=self.rate, ...... )
The problem is that whenever I try to plot the modified Pxx using imshow or even NonUniformImage, I run into the error message below.
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/image.py:336: UserWarning: Images are not supported on non-linear axes.
warnings.warn("Images are not supported on non-linear axes.")
For example, a part of the code I'm working on right is below.
# how many instantaneous spectra did we calculate
(numBins, numSpectra) = Pxx.shape
# how many seconds in entire audio recording
numSeconds = float(self.data.size) / self.rate
ax = fig.add_subplot(212)
im = NonUniformImage(ax, interpolation='bilinear')
x = np.arange(0, numSpectra)
y = np.arange(0, numBins)
z = Pxx
im.set_data(x, y, z)
ax.images.append(im)
ax.set_xlim(0, numSpectra)
ax.set_ylim(0, numBins)
ax.set_yscale('symlog') # see http://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes.set_yscale
ax.set_title('Spectrogram 2')
Actual Question
How do you plot image-like data with a logarithmic y axis with matplotlib/pylab?
Use pcolor or pcolormesh. pcolormesh is much faster, but is limited to rectilinear grids, where as pcolor can handle arbitrary shaped cells. specgram uses pcolormesh, if I recall correctly. (It uses imshow.)
As a quick example:
import numpy as np
import matplotlib.pyplot as plt
z = np.random.random((11,11))
x, y = np.mgrid[:11, :11]
fig, ax = plt.subplots()
ax.set_yscale('symlog')
ax.pcolormesh(x, y, z)
plt.show()
The differences you're seeing are due to plotting the "raw" values that specgram returns. What specgram actually plots is a scaled version.
import matplotlib.pyplot as plt
import numpy as np
x = np.cumsum(np.random.random(1000) - 0.5)
fig, (ax1, ax2) = plt.subplots(nrows=2)
data, freqs, bins, im = ax1.specgram(x)
ax1.axis('tight')
# "specgram" actually plots 10 * log10(data)...
ax2.pcolormesh(bins, freqs, 10 * np.log10(data))
ax2.axis('tight')
plt.show()
Notice that when we plot things using pcolormesh, there's no interpolation. (That's part of the point of pcolormesh--it's just vector rectangles instead of an image.)
If you want things on a log scale, you can use pcolormesh with it:
import matplotlib.pyplot as plt
import numpy as np
x = np.cumsum(np.random.random(1000) - 0.5)
fig, (ax1, ax2) = plt.subplots(nrows=2)
data, freqs, bins, im = ax1.specgram(x)
ax1.axis('tight')
# We need to explictly set the linear threshold in this case...
# Ideally you should calculate this from your bin size...
ax2.set_yscale('symlog', linthreshy=0.01)
ax2.pcolormesh(bins, freqs, 10 * np.log10(data))
ax2.axis('tight')
plt.show()
Just to add to Joe's answer...
I was getting small differences between the visual output of specgram compared to pcolormesh (as noisygecko also was) that were bugging me.
Turns out that if you pass frequency and time bins returned from specgram to pcolormesh, it treats these values as values on which to centre the rectangles rather than edges of them.
A bit of fiddling gets them to allign better (though still not 100% perfect). The colours are identical now also.
x = np.cumsum(np.random.random(1024) - 0.2)
overlap_frac = 0
plt.subplot(3,1,1)
data, freqs, bins, im = pylab.specgram(x, NFFT=128, Fs=44100, noverlap = 128*overlap_frac, cmap='plasma')
plt.title("specgram plot")
plt.subplot(3,1,2)
plt.pcolormesh(bins, freqs, 20 * np.log10(data), cmap='plasma')
plt.title("pcolormesh no adj.")
# bins actually returns middle value of each chunk
# so need to add an extra element at zero, and then add first to all
bins = bins+(bins[0]*(1-overlap_frac))
bins = np.concatenate((np.zeros(1),bins))
max_freq = freqs.max()
diff = (max_freq/freqs.shape[0]) - (max_freq/(freqs.shape[0]-1))
temp_vec = np.arange(freqs.shape[0])
freqs = freqs+(temp_vec*diff)
freqs = np.concatenate((freqs,np.ones(1)*max_freq))
plt.subplot(3,1,3)
plt.pcolormesh(bins, freqs, 20 * np.log10(data), cmap='plasma')
plt.title("pcolormesh post adj.")

Categories