I am trying to plot a bivariate gaussian distribution using matplotlib. I want to do this using the xy coordinates of two scatter points (Group A), (Group B).
I want to adjust the distribution by adjusting the COV matrix to account for each Groups velocity and their distance to an additional xy coordinate used as a reference point.
I've calculated the distance of each groups xy coordinate to that of the reference point. The distance is expressed as a radius, labelled [GrA_Rad],[GrB_Rad].
So the further they are away from the reference point the greater the radius. I've also calculated velocity labelled [GrA_Vel],[GrB_Vel]. The direction of each group is expressed as the orientation. This is labelled [GrA_Rotation],[GrB_Rotation]
Question on how I want the distribution to be adjusted for velocity and distance (radius):
I'm hoping to use SVD. Specifically, if I have the rotation angle of each scatter, this provides the direction. The velocity can be used to describe a scaling matrix [GrA_Scaling],[GrB_Scaling]. So this scaling matrix can be used to expand the radius in the x-direction and contract the radius in the y-direction. This expresses the COV matrix.
Finally, the distribution mean value is found by translating the groups location (x,y) by half the velocity.
Put simply: the radius is applied to each group's scatter point. The COV matrix is adjusted by the radius and velocity. So using the scaling matrix to expand the radius in x-direction and contract in y-direction. The direction is measured from the rotation angle. Then determine the distribution mean value by translating the groups location (x,y) by half the velocity.
Below is the df of these variables
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as animation
d = ({
'Time' : [1,2,3,4,5,6,7,8],
'GrA_X' : [10,12,17,16,16,14,12,8],
'GrA_Y' : [10,12,13,7,6,7,8,8],
'GrB_X' : [5,8,13,16,19,15,13,5],
'GrB_Y' : [6,15,12,7,8,9,10,8],
'Reference_X' : [6,8,14,18,13,11,16,15],
'Reference_Y' : [10,12,8,12,15,12,10,8],
'GrA_Rad' : [8.3,8.25,8.2,8,8.15,8.15,8.2,8.3],
'GrB_Rad' : [8.3,8.25,8.3,8.4,8.6,8.4,8.3,8.65],
'GrA_Vel' : [0,2.8,5.1,6.1,1.0,2.2,2.2,4.0],
'GrB_Vel' : [0,9.5,5.8,5.8,3.16,4.12,2.2,8.2],
'GrA_Scaling' : [0,0.22,0.39,0.47,0.07,0.17,0.17,0.31],
'GrB_Scaling' : [0,0.53,0.2,0.2,0.06,0.1,0.03,0.4],
'GrA_Rotation' : [0,45,23.2,-26.56,-33.69,-36.86,-45,-135],
'GrB_Rotation' : [0,71.6,36.87,5.2,8.13,16.70,26.57,90],
})
df = pd.DataFrame(data = d)
I've made an animated plot of each xy coordinate.
GrA_X = [10,12,17,16,16,14,12,8]
GrA_Y = [10,12,13,7,6,7,8,8]
GrB_X = [5,8,13,16,19,15,13,5]
GrB_Y = [6,15,12,10,8,9,10,8]
Item_X = [6,8,14,18,13,11,16,15]
Item_Y = [10,12,8,12,15,12,10,8]
scatter_GrA = ax.scatter(GrA_X, GrA_Y)
scatter_GrB = ax.scatter(GrB_X, GrB_Y)
scatter_Item = ax.scatter(Item_X, Item_Y)
def animate(i) :
scatter_GrA.set_offsets([[GrA_X[0+i], GrA_Y[0+i]]])
scatter_GrB.set_offsets([[GrB_X[0+i], GrB_Y[0+i]]])
scatter_Item.set_offsets([[Item_X[0+i], Item_Y[0+i]]])
ani = animation.FuncAnimation(fig, animate, np.arange(0,9),
interval = 1000, blit = False)
Update
The question has been updated, and has gotten somewhat clearer. I've updated my code to match. Here's the latest output:
Aside from the styling, I think this matches what the OP described.
Here's the code that was used to produce the above plot:
dfake = ({
'GrA_X' : [15,15],
'GrA_Y' : [15,15],
'Reference_X' : [15,3],
'Reference_Y' : [15,15],
'GrA_Rad' : [15,25],
'GrA_Vel' : [0,10],
'GrA_Scaling' : [0,0.5],
'GrA_Rotation' : [0,45]
})
dffake = pd.DataFrame(dfake)
fig,axs = plt.subplots(1, 2, figsize=(16,8))
fig.subplots_adjust(0,0,1,1)
plotone(dffake, 'A', 0, xlim=(0,30), ylim=(0,30), fig=fig, ax=axs[0])
plotone(dffake, 'A', 1, xlim=(0,30), ylim=(0,30), fig=fig, ax=axs[1])
plt.show()
and the complete implementation of the plotone function that I used is in the code block below. If you just want to know about the math used to generate and transform the 2D gaussian PDF, check out the mvpdf function (and the rot and getcov functions it depends on):
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as sts
def rot(theta):
theta = np.deg2rad(theta)
return np.array([
[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)]
])
def getcov(radius=1, scale=1, theta=0):
cov = np.array([
[radius*(scale + 1), 0],
[0, radius/(scale + 1)]
])
r = rot(theta)
return r # cov # r.T
def mvpdf(x, y, xlim, ylim, radius=1, velocity=0, scale=0, theta=0):
"""Creates a grid of data that represents the PDF of a multivariate gaussian.
x, y: The center of the returned PDF
(xy)lim: The extent of the returned PDF
radius: The PDF will be dilated by this factor
scale: The PDF be stretched by a factor of (scale + 1) in the x direction, and squashed by a factor of 1/(scale + 1) in the y direction
theta: The PDF will be rotated by this many degrees
returns: X, Y, PDF. X and Y hold the coordinates of the PDF.
"""
# create the coordinate grids
X,Y = np.meshgrid(np.linspace(*xlim), np.linspace(*ylim))
# stack them into the format expected by the multivariate pdf
XY = np.stack([X, Y], 2)
# displace xy by half the velocity
x,y = rot(theta) # (velocity/2, 0) + (x, y)
# get the covariance matrix with the appropriate transforms
cov = getcov(radius=radius, scale=scale, theta=theta)
# generate the data grid that represents the PDF
PDF = sts.multivariate_normal([x, y], cov).pdf(XY)
return X, Y, PDF
def plotmv(x, y, xlim=None, ylim=None, radius=1, velocity=0, scale=0, theta=0, xref=None, yref=None, fig=None, ax=None):
"""Plot an xy point with an appropriately tranformed 2D gaussian around it.
Also plots other related data like the reference point.
"""
if xlim is None: xlim = (x - 5, x + 5)
if ylim is None: ylim = (y - 5, y + 5)
if fig is None:
fig = plt.figure(figsize=(8,8))
ax = fig.gca()
elif ax is None:
ax = fig.gca()
# plot the xy point
ax.plot(x, y, '.', c='C0', ms=20)
if not (xref is None or yref is None):
# plot the reference point, if supplied
ax.plot(xref, yref, '.', c='w', ms=12)
# plot the arrow leading from the xy point
if velocity > 0:
ax.arrow(x, y, *rot(theta) # (velocity, 0),
width=.4, length_includes_head=True, ec='C0', fc='C0')
# fetch the PDF of the 2D gaussian
X, Y, PDF = mvpdf(x, y, xlim=xlim, ylim=ylim, radius=radius, velocity=velocity, scale=scale, theta=theta)
# normalize PDF by shifting and scaling, so that the smallest value is 0 and the largest is 1
normPDF = PDF - PDF.min()
normPDF = normPDF/normPDF.max()
# plot and label the contour lines of the 2D gaussian
cs = ax.contour(X, Y, normPDF, levels=6, colors='w', alpha=.5)
ax.clabel(cs, fmt='%.3f', fontsize=12)
# plot the filled contours of the 2D gaussian. Set levels high for smooth contours
cfs = ax.contourf(X, Y, normPDF, levels=50, cmap='viridis', vmin=-.9, vmax=1)
# create the colorbar and ensure that it goes from 0 -> 1
cbar = fig.colorbar(cfs, ax=ax)
cbar.set_ticks([0, .2, .4, .6, .8, 1])
# add some labels
ax.grid()
ax.set_xlabel('X distance (M)')
ax.set_ylabel('Y distance (M)')
# ensure that x vs y scaling doesn't disrupt the transforms applied to the 2D gaussian
ax.set_aspect('equal', 'box')
return fig, ax
def fetchone(df, l, i, **kwargs):
"""Fetch all the needed data for one xy point
"""
keytups = (
('x', 'Gr%s_X'%l),
('y', 'Gr%s_Y'%l),
('radius', 'Gr%s_Rad'%l),
('velocity', 'Gr%s_Vel'%l),
('scale', 'Gr%s_Scaling'%l),
('theta', 'Gr%s_Rotation'%l),
('xref', 'Reference_X'),
('yref', 'Reference_Y')
)
ret = {k:df.loc[i, l] for k,l in keytups}
# add in any overrides
ret.update(kwargs)
return ret
def plotone(df, l, i, xlim=None, ylim=None, fig=None, ax=None, **kwargs):
"""Plot exactly one point from the dataset
"""
# look up all the data to plot one datapoint
xydata = fetchone(df, l, i, **kwargs)
# do the plot
return plotmv(xlim=xlim, ylim=ylim, fig=fig, ax=ax, **xydata)
Old answer -2
I've adjusted my answer to match the example the OP posted:
Here's the code that produced the above image:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as sts
def rot(theta):
theta = np.deg2rad(theta)
return np.array([
[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)]
])
def getcov(radius=1, scale=1, theta=0):
cov = np.array([
[radius*(scale + 1), 0],
[0, radius/(scale + 1)]
])
r = rot(theta)
return r # cov # r.T
def datalimits(*data, pad=.15):
dmin,dmax = min(d.min() for d in data), max(d.max() for d in data)
spad = pad*(dmax - dmin)
return dmin - spad, dmax + spad
d = ({
'Time' : [1,2,3,4,5,6,7,8],
'GrA_X' : [10,12,17,16,16,14,12,8],
'GrA_Y' : [10,12,13,7,6,7,8,8],
'GrB_X' : [5,8,13,16,19,15,13,5],
'GrB_Y' : [6,15,12,7,8,9,10,8],
'Reference_X' : [6,8,14,18,13,11,16,15],
'Reference_Y' : [10,12,8,12,15,12,10,8],
'GrA_Rad' : [8.3,8.25,8.2,8,8.15,8.15,8.2,8.3],
'GrB_Rad' : [8.3,8.25,8.3,8.4,8.6,8.4,8.3,8.65],
'GrA_Vel' : [0,2.8,5.1,6.1,1.0,2.2,2.2,4.0],
'GrB_Vel' : [0,9.5,5.8,5.8,3.16,4.12,2.2,8.2],
'GrA_Scaling' : [0,0.22,0.39,0.47,0.07,0.17,0.17,0.31],
'GrB_Scaling' : [0,0.53,0.2,0.2,0.06,0.1,0.03,0.4],
'GrA_Rotation' : [0,45,23.2,-26.56,-33.69,-36.86,-45,-135],
'GrB_Rotation' : [0,71.6,36.87,5.2,8.13,16.70,26.57,90],
})
df = pd.DataFrame(data=d)
limitpad = .5
clevels = 5
cflevels = 50
xmin,xmax = datalimits(df['GrA_X'], df['GrB_X'], pad=limitpad)
ymin,ymax = datalimits(df['GrA_Y'], df['GrB_Y'], pad=limitpad)
X,Y = np.meshgrid(np.linspace(xmin, xmax), np.linspace(ymin, ymax))
fig = plt.figure(figsize=(10,6))
ax = plt.gca()
Zs = []
for l,color in zip('AB', ('red', 'yellow')):
# plot all of the points from a single group
ax.plot(df['Gr%s_X'%l], df['Gr%s_Y'%l], '.', c=color, ms=15, label=l)
Zrows = []
for _,row in df.iterrows():
x,y = row['Gr%s_X'%l], row['Gr%s_Y'%l]
cov = getcov(radius=row['Gr%s_Rad'%l], scale=row['Gr%s_Scaling'%l], theta=row['Gr%s_Rotation'%l])
mnorm = sts.multivariate_normal([x, y], cov)
Z = mnorm.pdf(np.stack([X, Y], 2))
Zrows.append(Z)
Zs.append(np.sum(Zrows, axis=0))
# plot the reference points
# create Z from the difference of the sums of the 2D Gaussians from group A and group B
Z = Zs[0] - Zs[1]
# normalize Z by shifting and scaling, so that the smallest value is 0 and the largest is 1
normZ = Z - Z.min()
normZ = normZ/normZ.max()
# plot and label the contour lines
cs = ax.contour(X, Y, normZ, levels=clevels, colors='w', alpha=.5)
ax.clabel(cs, fmt='%2.1f', colors='w')#, fontsize=14)
# plot the filled contours. Set levels high for smooth contours
cfs = ax.contourf(X, Y, normZ, levels=cflevels, cmap='viridis', vmin=0, vmax=1)
# create the colorbar and ensure that it goes from 0 -> 1
cbar = fig.colorbar(cfs, ax=ax)
cbar.set_ticks([0, .2, .4, .6, .8, 1])
ax.set_aspect('equal', 'box')
Old answer -1
It's a little hard to tell exactly what you're after. It is possible to scale and rotate a multivariate gaussian distribution via its covariance matrix. Here's an example of how to do so based on your data:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as sts
def rot(theta):
theta = np.deg2rad(theta)
return np.array([
[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)]
])
def getcov(scale, theta):
cov = np.array([
[1*(scale + 1), 0],
[0, 1/(scale + 1)]
])
r = rot(theta)
return r # cov # r.T
d = ({
'Time' : [1,2,3,4,5,6,7,8],
'GrA_X' : [10,12,17,16,16,14,12,8],
'GrA_Y' : [10,12,13,7,6,7,8,8],
'GrB_X' : [5,8,13,16,19,15,13,5],
'GrB_Y' : [6,15,12,7,8,9,10,8],
'Reference_X' : [6,8,14,18,13,11,16,15],
'Reference_Y' : [10,12,8,12,15,12,10,8],
'GrA_Rad' : [8.3,8.25,8.2,8,8.15,8.15,8.2,8.3],
'GrB_Rad' : [8.3,8.25,8.3,8.4,8.6,8.4,8.3,8.65],
'GrA_Vel' : [0,2.8,5.1,6.1,1.0,2.2,2.2,4.0],
'GrB_Vel' : [0,9.5,5.8,5.8,3.16,4.12,2.2,8.2],
'GrA_Scaling' : [0,0.22,0.39,0.47,0.07,0.17,0.17,0.31],
'GrB_Scaling' : [0,0.53,0.2,0.2,0.06,0.1,0.03,0.4],
'GrA_Rotation' : [0,45,23.2,-26.56,-33.69,-36.86,-45,-135],
'GrB_Rotation' : [0,71.6,36.87,5.2,8.13,16.70,26.57,90],
})
df = pd.DataFrame(data=d)
xmin,xmax = min(df['GrA_X'].min(), df['GrB_X'].min()), max(df['GrA_X'].max(), df['GrB_X'].max())
ymin,ymax = min(df['GrA_Y'].min(), df['GrB_Y'].min()), max(df['GrA_Y'].max(), df['GrB_Y'].max())
X,Y = np.meshgrid(
np.linspace(xmin - (xmax - xmin)*.1, xmax + (xmax - xmin)*.1),
np.linspace(ymin - (ymax - ymin)*.1, ymax + (ymax - ymin)*.1)
)
fig,axs = plt.subplots(df.shape[0], sharex=True, figsize=(4, 4*df.shape[0]))
fig.subplots_adjust(0,0,1,1,0,-.82)
for (_,row),ax in zip(df.iterrows(), axs):
for c in 'AB':
x,y = row['Gr%s_X'%c], row['Gr%s_Y'%c]
cov = getcov(scale=row['Gr%s_Scaling'%c], theta=row['Gr%s_Rotation'%c])
mnorm = sts.multivariate_normal([x, y], cov)
Z = mnorm.pdf(np.stack([X, Y], 2))
ax.contour(X, Y, Z)
ax.plot(row['Gr%s_X'%c], row['Gr%s_Y'%c], 'x')
ax.set_aspect('equal', 'box')
This outputs:
I cannot find a way to draw an arbitrary line with matplotlib Python library. It allows to draw horizontal and vertical lines (with matplotlib.pyplot.axhline and matplotlib.pyplot.axvline, for example), but i do not see how to draw a line through two given points (x1, y1) and (x2, y2). Is there a way? Is there a simple way?
This will draw a line that passes through the points (-1, 1) and (12, 4), and another one that passes through the points (1, 3) et (10, 2)
x1 are the x coordinates of the points for the first line, y1 are the y coordinates for the same -- the elements in x1 and y1 must be in sequence.
x2 and y2 are the same for the other line.
import matplotlib.pyplot as plt
x1, y1 = [-1, 12], [1, 4]
x2, y2 = [1, 10], [3, 2]
plt.plot(x1, y1, x2, y2, marker = 'o')
plt.show()
I suggest you spend some time reading / studying the basic tutorials found on the very rich matplotlib website to familiarize yourself with the library.
What if I don't want line segments?
[edit]:
As shown by #thomaskeefe, starting with matplotlib 3.3, this is now builtin as a convenience: plt.axline((x1, y1), (x2, y2)), rendering the following obsolete.
There are no direct ways to have lines extend to infinity... matplotlib will either resize/rescale the plot so that the furthest point will be on the boundary and the other inside, drawing line segments in effect; or you must choose points outside of the boundary of the surface you want to set visible, and set limits for the x and y axis.
As follows:
import matplotlib.pyplot as plt
x1, y1 = [-1, 12], [1, 10]
x2, y2 = [-1, 10], [3, -1]
plt.xlim(0, 8), plt.ylim(-2, 8)
plt.plot(x1, y1, x2, y2, marker = 'o')
plt.show()
As of matplotlib 3.3, you can do this with plt.axline((x1, y1), (x2, y2)).
I was checking how ax.axvline does work, and I've written a small function that resembles part of its idea:
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
def newline(p1, p2):
ax = plt.gca()
xmin, xmax = ax.get_xbound()
if(p2[0] == p1[0]):
xmin = xmax = p1[0]
ymin, ymax = ax.get_ybound()
else:
ymax = p1[1]+(p2[1]-p1[1])/(p2[0]-p1[0])*(xmax-p1[0])
ymin = p1[1]+(p2[1]-p1[1])/(p2[0]-p1[0])*(xmin-p1[0])
l = mlines.Line2D([xmin,xmax], [ymin,ymax])
ax.add_line(l)
return l
So, if you run the following code you will realize how does it work. The line will span the full range of your plot (independently on how big it is), and the creation of the line doesn't rely on any data point within the axis, but only in two fixed points that you need to specify.
import numpy as np
x = np.linspace(0,10)
y = x**2
p1 = [1,20]
p2 = [6,70]
plt.plot(x, y)
newline(p1,p2)
plt.show()
Just want to mention another option here.
You can compute the coefficients using numpy.polyfit(), and feed the coefficients to numpy.poly1d(). This function can construct polynomials using the coefficients, you can find more examples here
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.poly1d.html
Let's say, given two data points (-0.3, -0.5) and (0.8, 0.8)
import numpy as np
import matplotlib.pyplot as plt
# compute coefficients
coefficients = np.polyfit([-0.3, 0.8], [-0.5, 0.8], 1)
# create a polynomial object with the coefficients
polynomial = np.poly1d(coefficients)
# for the line to extend beyond the two points,
# create the linespace using the min and max of the x_lim
# I'm using -1 and 1 here
x_axis = np.linspace(-1, 1)
# compute the y for each x using the polynomial
y_axis = polynomial(x_axis)
fig = plt.figure()
axes = fig.add_axes([0.1, 0.1, 1, 1])
axes.set_xlim(-1, 1)
axes.set_ylim(-1, 1)
axes.plot(x_axis, y_axis)
axes.plot(-0.3, -0.5, 0.8, 0.8, marker='o', color='red')
Hope it helps.
In case somebody lands here trying to plot many segments in one go, here is a way. Say the segments are defined by two 2-d arrays of same length, e.g. a and b. We want to plot segments between each a[i] and b[i]. In that case:
Solution 1
ab_pairs = np.c_[a, b]
plt_args = ab_pairs.reshape(-1, 2, 2).swapaxes(1, 2).reshape(-1, 2)
ax.plot(*plt_args, ...)
Example:
np.random.seed(0)
n = 32
a = np.random.uniform(0, 1, (n, 2))
b = np.random.uniform(0, 1, (n, 2))
fig, ax = plt.subplots(figsize=(3, 3))
ab_pairs = np.c_[a, b]
ab_args = ab_pairs.reshape(-1, 2, 2).swapaxes(1, 2).reshape(-1, 2)
# segments
ax.plot(*ab_args, c='k')
# identify points: a in blue, b in red
ax.plot(*a.T, 'bo')
ax.plot(*b.T, 'ro')
plt.show()
Solution 2
The above creates many matplotlib.lines.Line2D. If you'd like a single line, we can do it by interleaving NaN between pairs:
ax.plot(*np.c_[a, b, a*np.nan].reshape(-1, 2).T, ...)
Example:
# same init as example above, then
fig, ax = plt.subplots(figsize=(3, 3))
# segments (all at once)
ax.plot(*np.c_[a, b, a*np.nan].reshape(-1, 2).T, 'k')
# identify points: a in blue, b in red
ax.plot(*a.T, 'bo')
ax.plot(*b.T, 'ro')
plt.show()
(Same figure as above).
Based on #Alejandro's answer:
if you want to add a line to an existing Axes (e.g. a scatter plot), and
all you know is the slope and intercept of the desired line (e.g. a regression line), and
you want it to cover the entire visible X range (already computed), and
you want to use the object-oriented interface (not pyplot).
Then you can do this (existing Axes in ax):
# e.g. slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(xs, ys)
xmin, xmax = ax.get_xbound()
ymin = (xmin * slope) + intercept
ymax = (xmax * slope) + intercept
l = matplotlib.lines.Line2D([xmin, xmax], [ymin, ymax])
ax.add_line(l)