scatter plot with aligned annotations at each data point - python

I want to produce a scatter plot with dozens of points, which could potentially be very close to each other. I've tried the method of annotation from the answer to the question:
>> matplotlib scatter plot with different text at each data point
but you can see that the labels / annotations overlap when the points are close enough to each other. Is there any library or method to generate such plots with individual annotations that don't collide with each other, nor with borders of the plot?
import matplotlib.pyplot as plt
z = [0.15, 0.3, 0.45, 0.46, 0.6, 0.75]
y = [2.56422, 3.77284, 3.52623, 3.52623, 3.51468, 3.02199]
n = [58, 651, 393, "393(2)", 203, 123]
fig, ax = plt.subplots()
ax.scatter( z, y )
for i, txt in enumerate( n ):
ax.annotate( txt, ( z[i] + .01, y[i] + .01 ) )
plt.show()

I have written a library adjustText which does exactly this. https://github.com/Phlya/adjustText
z = [0.15, 0.3, 0.45, 0.46, 0.6, 0.75]
y = [2.56422, 3.77284, 3.52623, 3.52623, 3.51468, 3.02199]
n = [58, 651, 393, "393(2)", 203, 123]
fig, ax = plt.subplots()
ax.scatter( z, y )
texts = []
for i, txt in enumerate( n ):
texts.append(ax.text(z[i], y[i], txt))
adjust_text(texts)
plt.show()

Related

How do I smooth out the edges of a closed line similar to d3's curveCardinal method implementation?

I have a few data points that I am connecting using a closed line plot, and I want the line to have smooth edges similar to how the curveCardinal methods in d3 do it. Link Here
Here's a minimal example of what I want to do:
import numpy as np
from matplotlib import pyplot as plt
x = np.array([0.5, 0.13, 0.4, 0.5, 0.6, 0.7, 0.5])
y = np.array([1.0, 0.7, 0.5, 0.2, 0.4, 0.6, 1.0])
fig, ax = plt.subplots()
ax.plot(x, y)
ax.scatter(x, y)
Now, I'd like to smooth out/interpolate the line similar to d3's curveCardinal methods. Here are a few things that I've tried.
from scipy import interpolate
tck, u = interpolate.splprep([x, y], s=0, per=True)
xi, yi = interpolate.splev(np.linspace(0, 1, 100), tck)
fig, ax = plt.subplots(1, 1)
ax.plot(xi, yi, '-b')
ax.plot(x, y, 'k')
ax.scatter(x[:2], y[:2], s=200)
ax.scatter(x, y)
The result of the above code is not bad, but I was hoping that the curve would stay closer to the line when the data points are far apart (I increased the size of two such data points above to highlight this). Essentially, have the curve stay close to the line.
Using interp1d (has the same problem as the code above):
from scipy.interpolate import interp1d
x = [0.5, 0.13, 0.4, 0.5, 0.6, 0.7, 0.5]
y = [1.0, 0.7, 0.5, 0.2, 0.4, 0.6, 1.0]
orig_len = len(x)
x = x[-3:-1] + x + x[1:3]
y = y[-3:-1] + y + y[1:3]
t = np.arange(len(x))
ti = np.linspace(2, orig_len + 1, 10 * orig_len)
kind='cubic'
xi = interp1d(t, x, kind=kind)(ti)
yi = interp1d(t, y, kind=kind)(ti)
fig, ax = plt.subplots()
ax.plot(xi, yi, 'g')
ax.plot(x, y, 'k')
ax.scatter(x, y)
I also looked at the Chaikins Corner Cutting algorithm, but I don't like the result.
def chaikins_corner_cutting(coords, refinements=5):
coords = np.array(coords)
for _ in range(refinements):
L = coords.repeat(2, axis=0)
R = np.empty_like(L)
R[0] = L[0]
R[2::2] = L[1:-1:2]
R[1:-1:2] = L[2::2]
R[-1] = L[-1]
coords = L * 0.75 + R * 0.25
return coords
fig, ax = plt.subplots()
ax.plot(x, y, 'k', linewidth=1)
ax.plot(chaikins_corner_cutting(x, 4), chaikins_corner_cutting(y, 4))
I also, superficially, looked at Bezier curves, matplotlibs PathPatch, and Fancy box implementations, but I couldn't get any satisfactory results.
Suggestions are greatly appreciated.
So, here's how I ended up doing it. I decided to introduce new points between every two existing data points. The following image shows how I am adding these new points. Red are data that I have. Using a convex hull I calculate the geometric center of the data points and draw lines to it from each point (shown with blue lines). Divide these lines twice in half and connect the resulting points (green line). The center of the green line is the new point added.
Here are the functions that accomplish this:
def midpoint(p1, p2, sf=1):
"""Calculate the midpoint, with an optional
scaling-factor (sf)"""
xm = ((p1[0]+p2[0])/2) * sf
ym = ((p1[1]+p2[1])/2) * sf
return (xm, ym)
def star_curv(old_x, old_y):
""" Interpolates every point by a star-shaped curve. It does so by adding
"fake" data points in-between every two data points, and pushes these "fake"
points towards the center of the graph (roughly 1/4 of the way).
"""
try:
points = np.array([old_x, old_y]).reshape(7, 2)
hull = ConvexHull(points)
x_mid = np.mean(hull.points[hull.vertices,0])
y_mid = np.mean(hull.points[hull.vertices,1])
except:
x_mid = 0.5
y_mid = 0.5
c=1
x, y = [], []
for i, j in zip(old_x, old_y):
x.append(i)
y.append(j)
try:
xm_i, ym_i = midpoint((i, j),
midpoint((i, j), (x_mid, y_mid)))
xm_j, ym_j = midpoint((old_x[c], old_y[c]),
midpoint((old_x[c], old_y[c]), (x_mid, y_mid)))
xm, ym = midpoint((xm_i, ym_i), (xm_j, ym_j))
x.append(xm)
y.append(ym)
c += 1
except IndexError:
break
orig_len = len(x)
x = x[-3:-1] + x + x[1:3]
y = y[-3:-1] + y + y[1:3]
t = np.arange(len(x))
ti = np.linspace(2, orig_len + 1, 10 * orig_len)
kind='quadratic'
xi = interp1d(t, x, kind=kind)(ti)
yi = interp1d(t, y, kind=kind)(ti)
return xi, yi
Here's how it looks:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from scipy.spatial import ConvexHull
x = [0.5, 0.13, 0.4, 0.5, 0.6, 0.7, 0.5]
y = [1.0, 0.7, 0.5, 0.2, 0.4, 0.6, 1.0]
xi, yi = star_curv(x, y)
fig, ax = plt.subplots()
ax.plot(xi, yi, 'g')
ax.plot(x, y, 'k', alpha=0.5)
ax.scatter(x, y, color='r')
The result is especially noticeable when the data points are more symmetric, for example the following x, y values give the results in the image below:
x = [0.5, 0.32, 0.34, 0.5, 0.66, 0.65, 0.5]
y = [0.71, 0.6, 0.41, 0.3, 0.41, 0.59, 0.71]
Comparison between the interpolation presented here, with the default interp1d interpolation.
I would create another array with the vertices extended in/out or up/down by about 5%. So if a point is lower than the average of the neighbouring points, make it a bit lower still.
Then do a linear interpolation between the new points, say 10 points/edge. Finally do a spline between the second last point per edge and the actual vertex. If you use Bezier curves, you can make the spline come in at the same angle on each side.
It's a bit of work, but of course you can use this anywhere.

Place multiple plots into one big axis at specific coordinates

I am trying to put multiple matplotlib subplots into a big axis, where tick labels on the big axis correspond to some parameter values for which the data in each subplot has been obtained. Here's an example,
import matplotlib.pyplot as plt
data = {}
data[(10, 10)] = [0.45, 0.30, 0.25]
data[(10, 20)] = [0.2, 0.5, 0.3]
data[(20, 10)] = [0.1, 0.3, 0.6]
data[(20, 20)] = [0.6, 0.15, 0.25]
data[(30, 10)] = [0.4, 0.35, 0.25]
data[(30, 20)] = [0.5, 0.1, 0.4]
# x and y coordinates for the big plot
x_coords = list(set([k[0] for k in data.keys()]))
y_coords = list(set([k[1] for k in data.keys()]))
labels = ['Frogs', 'Hogs', 'Dogs']
explode = (0.05, 0.05, 0.05) #
colors = ['gold', 'beige', 'lightcoral']
fig, axes = plt.subplots(len(y_coords), len(x_coords))
for row_topToDown in range(len(y_coords)):
row = (len(y_coords)-1) - row_topToDown
for col in range(len(x_coords)):
axes[row][col].pie(data[(x_coords[col], y_coords[row_topToDown])], explode=explode, colors = colors, \
autopct=None, pctdistance = 1.4, \
shadow=True, startangle=90, radius=0.7, \
wedgeprops = {'linewidth':1, 'edgecolor':'Black'}
)
axes[row][col].axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
axes[row][col].set_title('(' + str(x_coords[col]) + ', ' + str(y_coords[row_topToDown]) + ')')
fig.tight_layout()
plt.show()
and here's how I'd like the output to look like:
I see two options:
A. use a single axes
You may plot all pie charts to the same axes. Use the center and radius argument to scale the pies in data coordinates. This could look as follows.
import matplotlib.pyplot as plt
data = {}
data[(10, 10)] = [0.45, 0.30, 0.25]
data[(10, 20)] = [0.2, 0.5, 0.3]
data[(20, 10)] = [0.1, 0.3, 0.6]
data[(20, 20)] = [0.6, 0.15, 0.25]
data[(30, 10)] = [0.4, 0.35, 0.25]
data[(30, 20)] = [0.5, 0.1, 0.4]
labels = ['Frogs', 'Hogs', 'Dogs']
explode = [.2]*3
colors = ['gold', 'beige', 'lightcoral']
radius = 4
margin = 2
fig, ax = plt.subplots()
for x,y in data.keys():
d = data[(x,y)]
ax.pie(d, explode=explode, colors = colors, center=(x,y),
shadow=True, startangle=90, radius=radius,
wedgeprops = {'linewidth':1, 'edgecolor':'Black'})
ax.annotate("({},{})".format(x,y), xy = (x, y+radius),
xytext = (0,5), textcoords="offset points", ha="center")
ax.set_frame_on(True)
xaxis = list(set([x for x,y in data.keys()]))
yaxis = list(set([y for x,y in data.keys()]))
ax.set(aspect="equal",
xlim=(min(xaxis)-radius-margin,max(xaxis)+radius+margin),
ylim=(min(yaxis)-radius-margin,max(yaxis)+radius+margin),
xticks=xaxis, yticks=yaxis)
fig.tight_layout()
plt.show()
B. use inset axes
You can put each pie in its own axes and position the axes in data coordinates. This is facilitated by using mpl_toolkits.axes_grid1.inset_locator.inset_axes. The main difference to the above is that you may use a non-equal aspect of the parent axes, and that it's not possible to use tight_layout.
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
data = {}
data[(10, 10)] = [0.45, 0.30, 0.25]
data[(10, 20)] = [0.2, 0.5, 0.3]
data[(20, 10)] = [0.1, 0.3, 0.6]
data[(20, 20)] = [0.6, 0.15, 0.25]
data[(30, 10)] = [0.4, 0.35, 0.25]
data[(30, 20)] = [0.5, 0.1, 0.4]
labels = ['Frogs', 'Hogs', 'Dogs']
explode = [.05]*3
colors = ['gold', 'beige', 'lightcoral']
radius = 4
margin = 2
fig, axes = plt.subplots()
for x,y in data.keys():
d = data[(x,y)]
ax = inset_axes(axes, "100%", "100%",
bbox_to_anchor=(x-radius, y-radius, radius*2, radius*2),
bbox_transform=axes.transData, loc="center")
ax.pie(d, explode=explode, colors = colors,
shadow=True, startangle=90,
wedgeprops = {'linewidth':1, 'edgecolor':'Black'})
ax.set_title("({},{})".format(x,y))
xaxis = list(set([x for x,y in data.keys()]))
yaxis = list(set([y for x,y in data.keys()]))
axes.set(aspect="equal",
xlim=(min(xaxis)-radius-margin,max(xaxis)+radius+margin),
ylim=(min(yaxis)-radius-margin,max(yaxis)+radius+margin),
xticks=xaxis, yticks=yaxis)
plt.show()
For how to put a legend outside the plot, I would refer you to How to put the legend out of the plot. And for how to create a legend for a pie chart to How to add a legend to matplotlib pie chart?
Also Python - Legend overlaps with the pie chart may be of interest.

Drawing heat map in python

I'm having two lists x, y representing coordinates in 2D. For example x = [1,4,0.5,2,5,10,33,0.04] and y = [2,5,44,0.33,2,14,20,0.03]. x[i] and y[i] represent one point in 2D. Now I also have a list representing "heat" values for each (x,y) point, for example z = [0.77, 0.88, 0.65, 0.55, 0.89, 0.9, 0.8,0.95]. Of course x,y and z are much higher dimensional than the example.
Now I would like to plot a heat map in 2D where x and y represents the axis coordinates and z represents the color. How can this be done in python?
This code produces a heat map. With a few more data points, the plot starts looking pretty nice and I've found it to be very quick in general even for >100k points.
import matplotlib.pyplot as plt
import matplotlib.tri as tri
import numpy as np
import math
x = [1,4,0.5,2,5,10,33,0.04]
y = [2,5,44,0.33,2,14,20,0.03]
z = [0.77, 0.88, 0.65, 0.55, 0.89, 0.9, 0.8, 0.95]
levels = [0.7, 0.75, 0.8, 0.85, 0.9]
plt.figure()
ax = plt.gca()
ax.set_aspect('equal')
CS = ax.tricontourf(x, y, z, levels, cmap=plt.get_cmap('jet'))
cbar = plt.colorbar(CS, ticks=np.sort(np.array(levels)),ax=ax, orientation='horizontal', shrink=.75, pad=.09, aspect=40,fraction=0.05)
cbar.ax.set_xticklabels(list(map(str,np.sort(np.array(levels))))) # horizontal colorbar
cbar.ax.tick_params(labelsize=8)
plt.title('Heat Map')
plt.xlabel('X Label')
plt.ylabel('Y Label')
plt.show()
Produces this image:
or if you're looking for a more gradual color change, change the tricontourf line to this:
CS = ax.tricontourf(x, y, z, np.linspace(min(levels),max(levels),256), cmap=cmap)
and then the plot will change to:
Based on this answer, you might want to do something like:
import numpy as np
from matplotlib.mlab import griddata
import matplotlib.pyplot as plt
xs0 = [1,4,0.5,2,5,10,33,0.04]
ys0 = [2,5,44,0.33,2,14,20,0.03]
zs0 = [0.77, 0.88, 0.65, 0.55, 0.89, 0.9, 0.8,0.95]
N = 30j
extent = (np.min(xs0),np.max(xs0),np.min(ys0),np.max(ys0))
xs,ys = np.mgrid[extent[0]:extent[1]:N, extent[2]:extent[3]:N]
resampled = griddata(xs0, ys0, zs0, xs, ys, interp='linear')
plt.imshow(np.fliplr(resampled).T, extent=extent,interpolation='none')
plt.colorbar()
The example here might also help: http://matplotlib.org/examples/pylab_examples/griddata_demo.html

Label python data points on plot

I searched for ages (hours which is like ages) to find the answer to a really annoying (seemingly basic) problem, and because I cant find a question that quite fits the answer I am posting a question and answering it in the hope that it will save someone else the huge amount of time I just spent on my noobie plotting skills.
If you want to label your plot points using python matplotlib
from matplotlib import pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
A = anyarray
B = anyotherarray
plt.plot(A,B)
for i,j in zip(A,B):
ax.annotate('%s)' %j, xy=(i,j), xytext=(30,0), textcoords='offset points')
ax.annotate('(%s,' %i, xy=(i,j))
plt.grid()
plt.show()
I know that xytext=(30,0) goes along with the textcoords, you use those 30,0 values to position the data label point, so its on the 0 y axis and 30 over on the x axis on its own little area.
You need both the lines plotting i and j otherwise you only plot x or y data label.
You get something like this out (note the labels only):
Its not ideal, there is still some overlap - but its better than nothing which is what I had..
How about print (x, y) at once.
from matplotlib import pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
A = -0.75, -0.25, 0, 0.25, 0.5, 0.75, 1.0
B = 0.73, 0.97, 1.0, 0.97, 0.88, 0.73, 0.54
ax.plot(A,B)
for xy in zip(A, B): # <--
ax.annotate('(%s, %s)' % xy, xy=xy, textcoords='data') # <--
ax.grid()
plt.show()
I had a similar issue and ended up with this:
For me this has the advantage that data and annotation are not overlapping.
from matplotlib import pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
A = -0.75, -0.25, 0, 0.25, 0.5, 0.75, 1.0
B = 0.73, 0.97, 1.0, 0.97, 0.88, 0.73, 0.54
plt.plot(A,B)
# annotations at the side (ordered by B values)
x0,x1=ax.get_xlim()
y0,y1=ax.get_ylim()
for ii, ind in enumerate(np.argsort(B)):
x = A[ind]
y = B[ind]
xPos = x1 + .02 * (x1 - x0)
yPos = y0 + ii * (y1 - y0)/(len(B) - 1)
ax.annotate('',#label,
xy=(x, y), xycoords='data',
xytext=(xPos, yPos), textcoords='data',
arrowprops=dict(
connectionstyle="arc3,rad=0.",
shrinkA=0, shrinkB=10,
arrowstyle= '-|>', ls= '-', linewidth=2
),
va='bottom', ha='left', zorder=19
)
ax.text(xPos + .01 * (x1 - x0), yPos,
'({:.2f}, {:.2f})'.format(x,y),
transform=ax.transData, va='center')
plt.grid()
plt.show()
Using the text argument in .annotate ended up with unfavorable text positions.
Drawing lines between a legend and the data points is a mess, as the location of the legend is hard to address.

Matplotlib Scatter plot [duplicate]

I am trying to make a scatter plot and annotate data points with different numbers from a list.
So, for example, I want to plot y vs x and annotate with corresponding numbers from n.
y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]
ax = fig.add_subplot(111)
ax1.scatter(z, y, fmt='o')
Any ideas?
I'm not aware of any plotting method which takes arrays or lists but you could use annotate() while iterating over the values in n.
import matplotlib.pyplot as plt
y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]
fig, ax = plt.subplots()
ax.scatter(z, y)
for i, txt in enumerate(n):
ax.annotate(txt, (z[i], y[i]))
There are a lot of formatting options for annotate(), see the matplotlib website:
In case anyone is trying to apply the above solutions to a .scatter() instead of a .subplot(),
I tried running the following code
import matplotlib.pyplot as plt
y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]
fig, ax = plt.scatter(z, y)
for i, txt in enumerate(n):
ax.annotate(txt, (z[i], y[i]))
But ran into errors stating "cannot unpack non-iterable PathCollection object", with the error specifically pointing at codeline fig, ax = plt.scatter(z, y)
I eventually solved the error using the following code
import matplotlib.pyplot as plt
plt.scatter(z, y)
for i, txt in enumerate(n):
plt.annotate(txt, (z[i], y[i]))
I didn't expect there to be a difference between .scatter() and .subplot()
I should have known better.
In versions earlier than matplotlib 2.0, ax.scatter is not necessary to plot text without markers. In version 2.0 you'll need ax.scatter to set the proper range and markers for text.
import matplotlib.pyplot as plt
y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]
fig, ax = plt.subplots()
for i, txt in enumerate(n):
ax.annotate(txt, (z[i], y[i]))
And in this link you can find an example in 3d.
You may also use pyplot.text (see here).
def plot_embeddings(M_reduced, word2Ind, words):
"""
Plot in a scatterplot the embeddings of the words specified in the list "words".
Include a label next to each point.
"""
for word in words:
x, y = M_reduced[word2Ind[word]]
plt.scatter(x, y, marker='x', color='red')
plt.text(x+.03, y+.03, word, fontsize=9)
plt.show()
M_reduced_plot_test = np.array([[1, 1], [-1, -1], [1, -1], [-1, 1], [0, 0]])
word2Ind_plot_test = {'test1': 0, 'test2': 1, 'test3': 2, 'test4': 3, 'test5': 4}
words = ['test1', 'test2', 'test3', 'test4', 'test5']
plot_embeddings(M_reduced_plot_test, word2Ind_plot_test, words)
I would love to add that you can even use arrows /text boxes to annotate the labels. Here is what I mean:
import random
import matplotlib.pyplot as plt
y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]
fig, ax = plt.subplots()
ax.scatter(z, y)
ax.annotate(n[0], (z[0], y[0]), xytext=(z[0]+0.05, y[0]+0.3),
arrowprops=dict(facecolor='red', shrink=0.05))
ax.annotate(n[1], (z[1], y[1]), xytext=(z[1]-0.05, y[1]-0.3),
arrowprops = dict( arrowstyle="->",
connectionstyle="angle3,angleA=0,angleB=-90"))
ax.annotate(n[2], (z[2], y[2]), xytext=(z[2]-0.05, y[2]-0.3),
arrowprops = dict(arrowstyle="wedge,tail_width=0.5", alpha=0.1))
ax.annotate(n[3], (z[3], y[3]), xytext=(z[3]+0.05, y[3]-0.2),
arrowprops = dict(arrowstyle="fancy"))
ax.annotate(n[4], (z[4], y[4]), xytext=(z[4]-0.1, y[4]-0.2),
bbox=dict(boxstyle="round", alpha=0.1),
arrowprops = dict(arrowstyle="simple"))
plt.show()
Which will generate the following graph:
For limited set of values matplotlib is fine. But when you have lots of values the tooltip starts to overlap over other data points. But with limited space you can't ignore the values. Hence it's better to zoom out or zoom in.
Using plotly
import plotly.express as px
import pandas as pd
df = px.data.tips()
df = px.data.gapminder().query("year==2007 and continent=='Americas'")
fig = px.scatter(df, x="gdpPercap", y="lifeExp", text="country", log_x=True, size_max=100, color="lifeExp")
fig.update_traces(textposition='top center')
fig.update_layout(title_text='Life Expectency', title_x=0.5)
fig.show()
Python 3.6+:
coordinates = [('a',1,2), ('b',3,4), ('c',5,6)]
for x in coordinates: plt.annotate(x[0], (x[1], x[2]))
This might be useful when you need individually annotate in different time (I mean, not in a single for loop)
ax = plt.gca()
ax.annotate('your_lable', (x,y))
where x and y are the your target coordinate and type is float/int.
As a one liner using list comprehension and numpy:
[ax.annotate(x[0], (x[1], x[2])) for x in np.array([n,z,y]).T]
setup is ditto to Rutger's answer.

Categories