I have some data that I have clustered, and wish to compare it to human-annotations of the same data. The problem I have is that the human-annotator has marked some of the points to have co-existing events (i.e. some points in the space would have two or more labels associated with them). Is there a way that I can show this using matplotlib?
I am thinking something along the lines of the following simplified example:
The main point is to not have the points with events 1 & 2 be classified as some new event '3', but instead to plot the points that have two events, to display these events independently. I assume that this is not an easy task, as there could potentially be more than two coexisting events, but for this example I am only focussing on two.
My plan was to create a one-hot array of shape=(n_points, n_events), and linearly selecting colours for a colourmap by using plt.cm.rainbow which would represent each unique event. But I have gotten stuck here as I do not know how to plot the points with >1 label.
The style in which the points are plotted does not strictly matter (i.e. having the side-by-side colours as I have illustrated is not a requirement), any method of displaying them should be adequate so long as points with multiple events are easily identifiable.
I would post my attempt so far, but as I am stuck on such an early step, it only goes as far as generating a random toy dataset of shape (20, 2), and creating the one-hot array of labels as I had previously mentioned.
You can specify color as well as markerfacecoloralt together with fillstyle='left' in order to obtain a side-by-side color plot. For more information and other styles see this tutorial.
import matplotlib.pyplot as plt
import numpy as np
x = np.sort(np.random.random(size=20))
y = x + np.random.normal(scale=0.2, size=x.shape)
i, j = len(x)//2 - 2, len(x)//2 + 3 # separate the points in left and right
colors = ['#1f77b4', '#ff7f0e']
fig, ax = plt.subplots()
ax.plot(x[:i], y[:i], 'o', color=colors[0], ms=15) # left part
ax.plot(x[j:], y[j:], 'o', color=colors[1], ms=15) # right part
ax.plot(x[i:j], y[i:j], 'o', # middle part
fillstyle='left', color=colors[0], markerfacecoloralt=colors[1], ms=15)
plt.show()
Example plot:
If you don't require side-by-side colors you could plot two points on top of each other, using different sizes.
import matplotlib.pyplot as plt
import numpy as np
x = np.sort(np.random.random(size=20))
y = x + np.random.normal(scale=0.2, size=x.shape)
i, j = len(x)//2 - 2, len(x)//2 + 3 # separate the points in left and right
colors = ['#1f77b4', '#ff7f0e']
fig, ax = plt.subplots()
ax.scatter(x[:j], y[:j], c=colors[0], s=100) # left part (including middle)
ax.scatter(x[j:], y[j:], c=colors[1], s=100) # right part
ax.scatter(x[i:j], y[i:j], c=colors[1], s=20) # middle part (using smaller size)
plt.show()
Example plot:
Related
I have a 2 Dimensional set of time series data containing about 1000 samples. I.e I have a long list of 1000 elements each entry is a list of two numbers.
It can be thought of the position, x and y coordinates, of a car with a picture taken each second for 1000 seconds. When I plot this, as seen below, you get a decent idea of the trajectory but it's unclear where the car starts or finishes, i.e which direction it is traveling in. I was thinking about including arrows between each point but I think this would get quite clustered (maybe you know a way to overcome that issue?) Also, I thought of colouring each point with a spectrum that made it clear to see time increasing, i.e hotter points to colder points as time goes on. Any idea how to achieve this in matplotlib?
I believe both your ideas would work well, I just think you need to test which option works best for your case.
Option 1: arrows
To avoid a cluttered plot I believe you could plot arrows between only a selection of points to show the general direction of your trajectory. In my example below I only plot an arrow between points 1 and 2, 6 and 7, and so on and. You might want to increase the spacing between the points to make this work for your long series. It is also possible to connect points that are seperated by, say, 10 points to make them more clearly visible.
import numpy as np
import matplotlib.pyplot as plt
# example data
x = np.linspace(0, 10, 100)
y = x
plt.figure()
# plot the data points
for i in range(len(x)):
plt.plot(x[i], y[i], "ro")
# plot arrows between points 1 and 2, 6 and 7 and so on.
for i in range(0, len(x)-1, 5):
plt.arrow(x[i], y[i], x[i+1] - x[i], y[i+1] - y[i], color = "black",zorder = 2, width = 0.05)
plt.xlabel('x')
plt.ylabel('y')
plt.show()
This yields this plot.
Option 2: colors
You can generate any number of colors from a colormap, meaning you can make a list of 1000 sequential colors. This way you can plot each of your points in an increasingly warm color.
Example:
import numpy as np
import matplotlib.pyplot as plt
# example data
x = np.linspace(0, 10, 100)
y = x
# generate 100 (number of data points) colors from colormap
colors = [plt.get_cmap("coolwarm")(i) for i in np.linspace(0,1, len(x))]
plt.figure()
# plot the data points with the generated colors
for i in range(len(x)):
plt.plot(x[i], y[i], color = colors[i], marker = "o")
plt.xlabel('x')
plt.ylabel('y')
plt.show()
This yields this figure, where the oldest data point is cool (blue) and the newest is red (warm).
I have some code that expresses the balance of an election result in the form of a violin plot:
plt.plot((0, 0), (0.85, 1.15), 'k-')
p = plt.violinplot(np.array(results['Balance']),vert=False, bw_method='scott', widths=0.2,showextrema=False)
This returns a plot that looks like the following:
I would like to colour this plot either side of the "Decision Line" to reflect voting intention. Something like the mock-up shown below:
I've tried plotting the two distinct sets independently of one another,
e.g.
p = plt.violinplot(np.array(results[results['Balance']>=0]['Balance']),vert=False, bw_method='scott', widths=0.2,showextrema=False)
n = plt.violinplot(np.array(results[results['Balance']<0]['Balance']),vert=False, bw_method='scott', widths=0.2,showextrema=False)
I might then use the method discussed here to colour each PolyCollection differently.
But the shape returned no longer reflects the original distribution, and so is too far from what I'm looking for to be helpful in this instance.
Does anyone have any ideas or techniques for achieving something closer to my coloured mockup?
The problem is that the violinplot will compute a different violin if the data is cut. So One would need to work with the same violin for both sides of the separating line.
An idea could be to use the path of the violin, which can be obtained via path = p['bodies'][0].get_paths()[0], to cut out part of two differently-colored, plot-filling rectangles on either side of the separating line.
import matplotlib.pyplot as plt
import matplotlib.patches
import numpy as np
import pandas as pd
#generate some data
a = np.log(1+np.random.poisson(size=1500))
b = 0.2+np.random.rand(1500)*0.3
c = 0.1+np.random.rand(1500)*0.6
results=pd.DataFrame({'Balance':np.r_[a*b,-a*c]})
#create figure and axes
fig, ax=plt.subplots()
# sep is the point where the separation should occur
sep = -0.05
plt.plot((sep, sep), (0.85, 1.15), 'k-')
# plot the violin
p = plt.violinplot(np.array(results['Balance']),vert=False,
bw_method='scott', widths=0.2,showextrema=False)
# obtain path of violin surrounding
path = p['bodies'][0].get_paths()[0]
#create two rectangles left and right of the separation line
r = matplotlib.patches.Rectangle((results['Balance'].min(),0.85),
width=sep-results['Balance'].min(), height=0.3, facecolor="r")
r2 = matplotlib.patches.Rectangle((sep,0.85),
width=results['Balance'].max()-sep, height=0.3, facecolor="b")
# clip the rectangles with the violin path
r.set_clip_path(path, transform=ax.transData)
r2.set_clip_path(path, transform=ax.transData)
ax.add_patch(r)
ax.add_patch(r2)
#optionally add edge around violin.
s = matplotlib.patches.PathPatch(path, linewidth=1, edgecolor="k", fill=False)
ax.add_patch(s)
plt.show()
I'm trying to create a 3D wireframe in Python using matplotlib.
When I get to the actual graph plotting, however, the wireframe joins the wrong way, as shown in the images below.
How can I force matplotlib to join the wireframe along a certain axis?
My code is below:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
def rossler(x_n, y_n, z_n, h, a, b, c):
#defining the rossler function
x_n1=x_n+h*(-y_n-z_n)
y_n1=y_n+h*(x_n+a*y_n)
z_n1=z_n+h*(b+z_n*(x_n-c))
return x_n1,y_n1,z_n1
#defining a, b, and c
a = 1.0/5.0
b = 1.0/5.0
c = 5
#defining time limits and steps
t_0 = 0
t_f = 32*np.pi
h = 0.01
steps = int((t_f-t_0)/h)
#3dify
c_list = np.linspace(5,10,6)
c_size = len(c_list)
c_array = np.zeros((c_size,steps))
for i in range (0, c_size):
for j in range (0, steps):
c_array[i][j] = c_list[i]
#create plotting values
t = np.zeros((c_size,steps))
for i in range (0, c_size):
t[i] = np.linspace(t_0,t_f,steps)
x = np.zeros((c_size,steps))
y = np.zeros((c_size,steps))
z = np.zeros((c_size,steps))
binvar, array_size = x.shape
#initial conditions
x[0] = 0
y[0] = 0
z[0] = 0
for j in range(0, c_size-1):
for i in range(array_size-1):
c = c_list[j]
#re-evaluate the values of the x-arrays depending on the initial conditions
[x[j][i+1],y[j][i+1],z[j][i+1]]=rossler(x[j][i],y[j][i],z[j][i],t[j][i+1]-t[j][i],a,b,c)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_wireframe(t,x,c_array, rstride=10, cstride=10)
plt.show()
I am getting this as an output:
The same output from another angle:
Whereas I'd like the wireframe to join along the wave-peaks. Sorry, I can't give you an image I'd like to see, that's my problem, but I guess it'd be more like the tutorial image.
If I understood, you want to link the 6 traces with polygons. You can do that by triangulating the traces 2 by 2, then plotting the surface with no edges or antialising. Maybe choosing a good colormap will also help.
Just keep in mind that this will be a very heavy plot. The exported SVG weight 10mb :)
import matplotlib.tri as mtri
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for LineIndex in range(c_size-1):
# If plotting all at once, you get a MemoryError. I'll plot each 6 points
for Sample in range(0, array_size-1, 3):
# I switched x and c_array, because the surface and the triangles
# will look better by default
X = np.concatenate([t[LineIndex,Sample:Sample+3], t[LineIndex+1,Sample:Sample+3]])
Y = np.concatenate([c_array[LineIndex,Sample:Sample+3], c_array[LineIndex+1,Sample:Sample+3]])
Z = np.concatenate([x[LineIndex,Sample:Sample+3], x[LineIndex+1,Sample:Sample+3]])
T = mtri.Triangulation(X, Y)
ax.plot_trisurf(X, Y, Z, triangles=T.triangles, edgecolor='none', antialiased=False)
ax.set_xlabel('t')
ax.set_zlabel('x')
plt.savefig('Test.png', format='png', dpi=600)
plt.show()
Here is the resulting image:
I'm quite unsure about what you're exactly trying to achieve, but I don't think it will work.
Here's what your data looks like when plotted layer by layer (without and with filling):
You're trying to plot this as a wireframe plot. Here's how a wireframe plot looks like as per the manual:
Note the huge differene: a wireframe plot is essentially a proper surface plot, the only difference is that the faces of the surface are fully transparent. This also implies that you can only plot
single-valued functions of the form z(x,y), which are furthermore
specified on a rectangular mesh (at least topologically)
Your data is neither: your points are given along lines, and they are stacked on top of each other, so there's no chance that this is a single surface that can be plotted.
If you just want to visualize your functions above each other, here's how I plotted the above figures:
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for zind in range(t.shape[0]):
tnow,xnow,cnow = t[zind,:],x[zind,:],c_array[zind,:]
hplot = ax.plot(tnow,xnow,cnow)
# alternatively fill:
stride = 10
tnow,xnow,cnow = tnow[::stride],xnow[::stride],cnow[::stride]
slice_from = slice(None,-1)
slice_to = slice(1,None)
xpoly = np.array([tnow[slice_from],
tnow[slice_to],
tnow[slice_to],
tnow[slice_from]]
).T
ypoly = np.array([xnow[slice_from],
xnow[slice_to],
np.zeros_like(xnow[slice_to]),
np.zeros_like(xnow[slice_from])]
).T
zpoly = np.array([cnow[slice_from],
cnow[slice_to],
cnow[slice_to],
cnow[slice_from]]
).T
tmppoly = [tuple(zip(xrow,yrow,zrow)) for xrow,yrow,zrow in zip(xpoly,ypoly,zpoly)]
poly3dcoll = Poly3DCollection(tmppoly,linewidth=0.0)
poly3dcoll.set_edgecolor(hplot[0].get_color())
poly3dcoll.set_facecolor(hplot[0].get_color())
ax.add_collection3d(poly3dcoll)
plt.xlabel('t')
plt.ylabel('x')
plt.show()
There is one other option: switching your coordinate axes, such that the (x,t) pair corresponds to a vertical plane rather than a horizontal one. In this case your functions for various c values are drawn on parallel planes. This allows a wireframe plot to be used properly, but since your functions have extrema in different time steps, the result is as confusing as your original plot. You can try using very few plots along the t axis, and hoping that the extrema are close. This approach needs so much guesswork that I didn't try to do this myself. You can plot each function as a filled surface instead, though:
from matplotlib.collections import PolyCollection
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for zind in range(t.shape[0]):
tnow,xnow,cnow = t[zind,:],x[zind,:],c_array[zind,:]
hplot = ax.plot(tnow,cnow,xnow)
# alternative to fill:
stride = 10
tnow,xnow,cnow = tnow[::stride],xnow[::stride],cnow[::stride]
slice_from = slice(None,-1)
slice_to = slice(1,None)
xpoly = np.array([tnow[slice_from],
tnow[slice_to],
tnow[slice_to],
tnow[slice_from]]
).T
ypoly = np.array([xnow[slice_from],
xnow[slice_to],
np.zeros_like(xnow[slice_to]),
np.zeros_like(xnow[slice_from])]
).T
tmppoly = [tuple(zip(xrow,yrow)) for xrow,yrow in zip(xpoly,ypoly)]
polycoll = PolyCollection(tmppoly,linewidth=0.5)
polycoll.set_edgecolor(hplot[0].get_color())
polycoll.set_facecolor(hplot[0].get_color())
ax.add_collection3d(polycoll,zdir='y',zs=cnow[0])
hplot[0].set_color('none')
ax.set_xlabel('t')
ax.set_zlabel('x')
plt.show()
This results in something like this:
There are a few things to note, however.
3d scatter and wire plots are very hard to comprehend, due to the lacking depth information. You might be approaching your visualization problem in a fundamentally wrong way: maybe there are other options with which you can visualize your data.
Even if you do something like the plots I showed, you should be aware that matplotlib has historically been failing to plot complicated 3d objects properly. Now by "properly" I mean "with physically reasonable apparent depth", see also the mplot3d FAQ note describing exactly this. The core of the problem is that matplotlib projects every 3d object to 2d, and draws these pancakes on the sreen one after the other. Sometimes the asserted drawing order of the pancakes doesn't correspond to their actual relative depth, which leads to artifacts that are both very obvious to humans and uncanny to look at. If you take a closer look at the first filled plot in this post, you'll see that the gold flat plot is behind the magenta one, even though it should be on top of it. Similar things often happen with 3d bar plots and convoluted surfaces.
When you're saying "Sorry, I can't give you an image I'd like to see, that's my problem", you're very wrong. It's not just your problem. It might be crystal clear in your head what you're trying to achieve, but unless you very clearly describe what you see in your head, the outside world will have to resort to guesswork. You can make the work of others and yourself alike easier by trying to be as informative as possible.
I've got a lot of points to plot and am noticing that plotting them individually in matplotlib takes much longer (more than 100 times longer, according to cProfile) than plotting them all at once.
However, I need to color code the points (based on data associated with each one) and can't figure out how to plot more than one color for a given call to Axes.plot(). For example, I can get a result similar to the one I want with something like
fig, ax = matplotlib.pyplot.subplots()
rands = numpy.random.random_sample((10000,))
for x in range(10000):
ax.plot(x, rands[x], 'o', color=str(rands[x]))
matplotlib.pyplot.show()
but would rather do something much faster like
fig, ax = matplotlib.pyplot.subplots()
rands = numpy.random.random_sample((10000,))
# List of colors doesn't work
ax.plot(range(10000), rands, 'o', color=[str(y) for y in rands])
matplotlib.pyplot.show()
but providing a list as the value for color doesn't work in this way.
Is there a way to provide a list of colors (and for that matter, edge colors, face colors , shapes, z-order, etc.) to Axes.plot() so that each point can potentially be customized, but all points can be plotted at once?
Using Axes.scatter() seems to get part way there, since it allows for individual setting of point color; but color is as far as that seems to go. (Axes.scatter() also lays out the figure completely differently.)
It is about 5 times faster for me to create the objects (patches) directly. To illustrate the example, I have changed the limits (which have to be set manually with this method). The circle themselves are draw with matplotlib.path.Path.circle. Minimal working example:
import numpy as np
import pylab as plt
from matplotlib.patches import Circle
from matplotlib.collections import PatchCollection
fig, ax = plt.subplots(figsize=(10,10))
rands = np.random.random_sample((N,))
patches = []
colors = []
for x in range(N):
C = Circle((x/float(N), rands[x]), .01)
colors.append([rands[x],rands[x],rands[x]])
patches.append(C)
plt.axis('equal')
ax.set_xlim(0,1)
ax.set_ylim(0,1)
collection = PatchCollection(patches)
collection.set_facecolor(colors)
ax.add_collection(collection)
plt.show()
Changing the vertical distance between two subplot using tight_layout(h_pad=-1) changes the total figuresize. How can I define the figuresize using tight_layout?
Here is the code:
#define figure
pl.figure(figsize=(10, 6.25))
ax1=subplot(211)
img=pl.imshow(np.random.random((10,50)), interpolation='none')
ax1.set_xticklabels(()) #hides the tickslabels of the first plot
subplot(212)
x=linspace(0,50)
pl.plot(x,x,'k-')
xlim( ax1.get_xlim() ) #same x-axis for both plots
And here is the results:
If I write
pl.tight_layout(h_pad=-2)
in the last line, then I get this:
As you can see, the figure is bigger...
You can use a GridSpec object to control precisely width and height ratios, as answered on this thread and documented here.
Experimenting with your code, I could produce something like what you want, by using a height_ratio that assigns twice the space to the upper subplot, and increasing the h_pad parameter to the tight_layout call. This does not sound completely right, but maybe you can adjust this further ...
import numpy as np
from matplotlib.pyplot import *
import matplotlib.pyplot as pl
import matplotlib.gridspec as gridspec
#define figure
fig = pl.figure(figsize=(10, 6.25))
gs = gridspec.GridSpec(2, 1, height_ratios=[2,1])
ax1=subplot(gs[0])
img=pl.imshow(np.random.random((10,50)), interpolation='none')
ax1.set_xticklabels(()) #hides the tickslabels of the first plot
ax2=subplot(gs[1])
x=np.linspace(0,50)
ax2.plot(x,x,'k-')
xlim( ax1.get_xlim() ) #same x-axis for both plots
fig.tight_layout(h_pad=-5)
show()
There were other issues, like correcting the imports, adding numpy, and plotting to ax2 instead of directly with pl. The output I see is this:
This case is peculiar because of the fact that the default aspect ratios of images and plots are not the same. So it is worth noting for people looking to remove the spaces in a grid of subplots consisting of images only or of plots only that you may find an appropriate solution among the answers to this question (and those linked to it): How to remove the space between subplots in matplotlib.pyplot?.
The aspect ratios of the subplots in this particular example are as follows:
# Default aspect ratio of images:
ax1.get_aspect()
# 1.0
# Which is as it is expected based on the default settings in rcParams file:
matplotlib.rcParams['image.aspect']
# 'equal'
# Default aspect ratio of plots:
ax2.get_aspect()
# 'auto'
The size of ax1 and the space beneath it are adjusted automatically based on the number of pixels along the x-axis (i.e. width) so as to preserve the 'equal' aspect ratio while fitting both subplots within the figure. As you mentioned, using fig.tight_layout(h_pad=xxx) or the similar fig.set_constrained_layout_pads(hspace=xxx) is not a good option as this makes the figure larger.
To remove the gap while preserving the original figure size, you can use fig.subplots_adjust(hspace=xxx) or the equivalent plt.subplots(gridspec_kw=dict(hspace=xxx)), as shown in the following example:
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.2
np.random.seed(1)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 6.25),
gridspec_kw=dict(hspace=-0.206))
# For those not using plt.subplots, you can use this instead:
# fig.subplots_adjust(hspace=-0.206)
size = 50
ax1.imshow(np.random.random((10, size)))
ax1.xaxis.set_visible(False)
# Create plot of a line that is aligned with the image above
x = np.arange(0, size)
ax2.plot(x, x, 'k-')
ax2.set_xlim(ax1.get_xlim())
plt.show()
I am not aware of any way to define the appropriate hspace automatically so that the gap can be removed for any image width. As stated in the docstring for fig.subplots_adjust(), it corresponds to the height of the padding between subplots, as a fraction of the average axes height. So I attempted to compute hspace by dividing the gap between the subplots by the average height of both subplots like this:
# Extract axes positions in figure coordinates
ax1_x0, ax1_y0, ax1_x1, ax1_y1 = np.ravel(ax1.get_position())
ax2_x0, ax2_y0, ax2_x1, ax2_y1 = np.ravel(ax2.get_position())
# Compute negative hspace to close the vertical gap between subplots
ax1_h = ax1_y1-ax1_y0
ax2_h = ax2_y1-ax2_y0
avg_h = (ax1_h+ax2_h)/2
gap = ax1_y0-ax2_y1
hspace=-(gap/avg_h) # this divided by 2 also does not work
fig.subplots_adjust(hspace=hspace)
Unfortunately, this does not work. Maybe someone else has a solution for this.
It is also worth mentioning that I tried removing the gap between subplots by editing the y positions like in this example:
# Extract axes positions in figure coordinates
ax1_x0, ax1_y0, ax1_x1, ax1_y1 = np.ravel(ax1.get_position())
ax2_x0, ax2_y0, ax2_x1, ax2_y1 = np.ravel(ax2.get_position())
# Set new y positions: shift ax1 down over gap
gap = ax1_y0-ax2_y1
ax1.set_position([ax1_x0, ax1_y0-gap, ax1_x1, ax1_y1-gap])
ax2.set_position([ax2_x0, ax2_y0, ax2_x1, ax2_y1])
Unfortunately, this (and variations of this) produces seemingly unpredictable results, including a figure resizing similar to when using fig.tight_layout(). Maybe someone else has an explanation for what is happening here behind the scenes.