Related
I have a scatter plot showing a data set with the symbols coloured according some colour scale. I want to highlight several of these points, by drawing an open circle around them and connecting these with a line. In other words, in a very simplified example, I want the result to look like this:
.
I can make a plot that looks a bit like that using the following code:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(5,5))
X, Y = np.meshgrid(np.arange(10), np.arange(10))
Z = X**2 + Y**2
plt.scatter(X,Y,c=Z, s=300, cmap='viridis_r')
plt.plot([1, 4, 7], [7, 1, 4], 'k-o', linewidth=3,
markersize=14, markerfacecolor='none', markeredgewidth=2,
)
However, the result looks like this:
.
What I would like, is to have the line segments that are within the marker symbol to be hidden from view. This is because I am interested in drawing the attention to specific data points, and do not want to partially hide them from sight.
Workarounds would be:
to duplicate the data points I want to highlight, and fill the markers with the exact same colour as the original symbol that the marker is hiding. This is not satisfactory because (a) I don't always have access to the data, only to the output figure or axis instance, and (b) I wouldn't know how to do this in the case of semi-transparent symbols, which is the case in my actual dataset.
to manually compute adapted coordinates to make the connector lines such that they only reach the edge of the markers. This also seems quite unsatisfactory (and I wouldn't know how to go about, given that the coordinate system of the axis and of the symbols are not going to be the same).
My question is: what would be the best way to go about? Is there a better way than options 1 and 2 above, and if not, what would be the best approach? As said, with 1 I foresee issues with transparency (which I am using), and with 2 I foresee coordinate system issues, e.g. in case of zooming in or out, etc.
(the other small observation in my example that I'm slightly confused about: plt.plot and plt.scatter don't seem to plot things in quite the same location (see this figure) and also the size definition is inconsistent)
You want to get transparent (open) circles, positionned in data coordinates, but with a radius in points. You want to connect the circles with each other, not the center points. This means you cannot use a normal line. Instead several ConnectionPatches could help. You can shrink them by half the markersize, so they touch the circles' border.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import ConnectionPatch
fig, ax = plt.subplots(figsize=(5,5))
X, Y = np.meshgrid(np.arange(10), np.arange(10))
Z = X**2 + Y**2
ax.scatter(X,Y,c=Z, s=300, cmap='viridis_r')
xc = [1, 4, 7]
yc = [7, 1, 4]
ax.plot(xc, yc, linewidth=0, marker="o", color="black",
markersize=np.sqrt(300), markerfacecolor='none', markeredgewidth=3)
for i in range(len(xc)-1):
cp = ConnectionPatch((xc[i],yc[i]), (xc[i+1], yc[i+1]),
coordsA='data', coordsB='data', axesA=ax, axesB=ax,
shrinkA=np.sqrt(300)/2, shrinkB=np.sqrt(300)/2,
linewidth=2)
ax.add_patch(cp)
plt.show()
An alternative to the accepted answer is to use a separate scatter for the markers and a separate line for the connections. Using z-order, you can ensure that the lines are drawn below the main scatter plot, and the markers are drawn above.
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(5,5))
X, Y = np.meshgrid(np.arange(10), np.arange(10))
Z = X**2 + Y**2
pts = np.array([[1, 4, 7],
[7, 1, 4]])
plt.scatter(X, Y, c=Z, s=300, cmap='viridis_r', vmin=Z.min(), vmax=Z.max())
plt.scatter(X[pts[1], pts[0]], Y[pts[1], pts[0]], s=300, c=Z[pts[1], pts[0]],
marker='o', edgecolor='k', linewidth=2, zorder=20,
vmin=Z.min(), vmax=Z.max(), cmap='viridis_r')
plt.plot(X[pts[1], pts[0]], Y[pts[1], pts[0]], 'k-', linewidth=2, zorder=10)
Instead of facecolor='none', we're overlaying the circles with another set of markers. Using vmin and vmax for the two scatter plots ensures that the colors of the vertex overlays are identical to the underlying circles.
I have a matplotlib scatter plot with many markers:
plt.scatter(x_position,y_position,c=z_position,s=90, cmap=cm.bwr,linewidth=1,edgecolor='k')
Sometimes the markers overlap. I want the zorder of each to be based on the z_position of the individual marker.
Is this possible in a scatterplot or would I have to have an separate line for each data point with its own zorder value?
Thank you.
import numpy as np
import matplotlib.pyplot as plt
x = np.array([0,1,0,1])
y = np.array([0,0,1,1])
z = np.array([8,4,6,2])
If you now call
plt.scatter(x, y, c=z, s=1000, marker="X",
cmap=plt.cm.bwr, linewidth=1, edgecolor='k')
markers overlap:
The last marker in the arrays is drawn last, hence the one with z=2 is in front.
You can sort the arrays by z to change the order of appearance.
order = np.argsort(z)
plt.scatter(x[order], y[order], c=z[order], s=1000, marker="X",
cmap=plt.cm.bwr, linewidth=1, edgecolor='k')
I have the following matplotlib snippet:
fig, ax = plt.subplots(figsize=(6,6))
values = np.random.normal(loc=0, scale=1, size=10)
ax.plot(range(10), values, 'r^', markersize=15, alpha=0.4);
which produces
as planned.
I'd like to make the line invisible where it overlaps with the points so that the points look more joined by the line rather than lying on top of the line. It is possible to do this by either making the line invisible where they overlap or to create a new line object that simply links the points rather than traces them?
To be explicit, I do not want the entire line removed, just the sections that overlap with the points.
It is in general hard to let the lines stop at the edges of the markers. The reason is that lines are defined in data coordinates, while the markers are defined in points.
A workaround would be to hide the lines where the markers are. We may think of a three layer system. The lowest layer (zorder=1) contains the lines, just as they are. The layer above contains markers of the same shape and size as those which are to be shown. Yet they would be colored in the same color as the background (usually white). The topmost layer contains the markers as desired.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
fig, ax = plt.subplots(figsize=(6,5))
def plot_hidden_lines(x,y, ax = None, ms=15, color="r",
marker="^", alpha=0.4,**kwargs):
if not ax: ax=plt.gca()
ax.scatter(x,y, c=color, s=ms**2, marker=marker, alpha=alpha, zorder=3)
ax.scatter(x,y, c="w", s=ms**2, marker=marker, alpha=1, zorder=2)
ax.plot(x,y, color=color, zorder=1,alpha=alpha,**kwargs)
values1 = np.random.normal(loc=0, scale=1, size=10)
values2 = np.random.normal(loc=0, scale=1, size=10)
x = np.arange(len(values1))
plot_hidden_lines(x,values1)
plot_hidden_lines(x,values2, color="indigo", ms=20, marker="s")
plt.show()
I think the best way to go about it is to overlay the triangles over the lines:
import matplotlib.pyplot as plt
import numpy as np
values = np.random.normal(loc=0, scale=1, size=10)
plt.plot( range(10), values, marker='^', markerfacecolor='red', markersize=15, color='red', linewidth=2)
plt.show()
The program outputs:
If you really want the see through aspect, I suggest you somehow calculate where the lines overlap with the markers and only draw the lines inbetween:
import numpy as np
import matplotlib.pyplot as plt
values = np.random.normal(loc= 0, scale=1, size=10)
for i in range(9):
start_coordinate, end_coordinate = some_function(values[i], values[i+1])
plt.plot([i, i+1], [start_coordinate, end_coordinate], *whatever_other_arguments)
plt.scatter(range(10), values, *whatever_other_arguments)
plt.show()
The hard part here is of course calculating these coordinates (if you want to zoom in this won't work), but honestly, given the difficulty of this question, I think you won't find anything much better...
I'm trying to include an outline to lines plotted with plt.errorbar(). As suggested by Can I give a border (outline) to a line in matplotlib plot function?\,, I tried to use path_effects, however I need a different path_effect for the markers and the line.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as pe
x = np.arange(20)
y = x**1.3
# Two lines.
fig, ax = plt.subplots()
ax.errorbar(x, y, fmt='-o', lw=5, mew=3, ms=5, c='k')
ax.errorbar(x, y, fmt='-o', lw=2, mew=0, ms=5, c='r')
# Single line with path_effects.
y += 10
ax.errorbar(x, y, fmt='-o', lw=2, mew=0, ms=5, c='b',
path_effects=[pe.Stroke(linewidth=5, foreground='k'), pe.Normal()])
which produces the following output:
.
The difference between these methods is that in the former, the outline appears as a constant width around both the line and the marker, while in the one using path_effects, the outline is thicker around the markers. Is there a way to adjust the outline linewidth for the marker and the line separately?
Here's some code that does scatter plot of a number of different series using matplotlib and then adds the line y=x:
import numpy as np, matplotlib.pyplot as plt, matplotlib.cm as cm, pylab
nseries = 10
colors = cm.rainbow(np.linspace(0, 1, nseries))
all_x = []
all_y = []
for i in range(nseries):
x = np.random.random(12)+i/10.0
y = np.random.random(12)+i/5.0
plt.scatter(x, y, color=colors[i])
all_x.extend(x)
all_y.extend(y)
# Could I somehow do the next part (add identity_line) if I haven't been keeping track of all the x and y values I've seen?
identity_line = np.linspace(max(min(all_x), min(all_y)),
min(max(all_x), max(all_y)))
plt.plot(identity_line, identity_line, color="black", linestyle="dashed", linewidth=3.0)
plt.show()
In order to achieve this I've had to keep track of all the x and y values that went into the scatter plot so that I know where identity_line should start and end. Is there a way I can get y=x to show up even if I don't have a list of all the points that I plotted? I would think that something in matplotlib can give me a list of all the points after the fact, but I haven't been able to figure out how to get that list.
You don't need to know anything about your data per se. You can get away with what your matplotlib Axes object will tell you about the data.
See below:
import numpy as np
import matplotlib.pyplot as plt
# random data
N = 37
x = np.random.normal(loc=3.5, scale=1.25, size=N)
y = np.random.normal(loc=3.4, scale=1.5, size=N)
c = x**2 + y**2
# now sort it just to make it look like it's related
x.sort()
y.sort()
fig, ax = plt.subplots()
ax.scatter(x, y, s=25, c=c, cmap=plt.cm.coolwarm, zorder=10)
Here's the good part:
lims = [
np.min([ax.get_xlim(), ax.get_ylim()]), # min of both axes
np.max([ax.get_xlim(), ax.get_ylim()]), # max of both axes
]
# now plot both limits against eachother
ax.plot(lims, lims, 'k-', alpha=0.75, zorder=0)
ax.set_aspect('equal')
ax.set_xlim(lims)
ax.set_ylim(lims)
fig.savefig('/Users/paul/Desktop/so.png', dpi=300)
Et voilĂ
In one line:
ax.plot([0,1],[0,1], transform=ax.transAxes)
No need to modify the xlim or ylim.
Starting with matplotlib 3.3 this has been made very simple with the axline method which only needs a point and a slope. To plot x=y:
ax.axline((0, 0), slope=1)
You don't need to look at your data to use this because the point you specify (i.e. here (0,0)) doesn't actually need to be in your data or plotting range.
If you set scalex and scaley to False, it saves a bit of bookkeeping. This is what I have been using lately to overlay y=x:
xpoints = ypoints = plt.xlim()
plt.plot(xpoints, ypoints, linestyle='--', color='k', lw=3, scalex=False, scaley=False)
or if you've got an axis:
xpoints = ypoints = ax.get_xlim()
ax.plot(xpoints, ypoints, linestyle='--', color='k', lw=3, scalex=False, scaley=False)
Of course, this won't give you a square aspect ratio. If you care about that, go with Paul H's solution.