matplotlib separating scatterplot points and creating a divisionary curve - python

I'm attempting to create a divisionary curve on a scatter plot in matplotlib that would divide my scatterplot according to marker size.
The (x,y) are phi0 and phi0dot and I'm coloring/sizing according a to third variable 'e-folds'. I'd like to draw an 'S' shaped curve that divides the plot into small, black markers and large, cyan markers.
Here is a sample scatterplot run with a very few number of points for an example. Ultimately I will run with tens of thousands of points of data such that the divisionary would be much finer and more obviously 'S' shaped. This is roughly what I have in mind.
My code thus far looks like this:
# Set up the PDF
pdf_pages = PdfPages(outfile)
plt.rcParams["font.family"] = "serif"
# Create the canvas
canvas = plt.figure(figsize=(14.0, 14.0), dpi=100)
plt.subplot(1, 1, 1)
for a, phi0, phi0dot, efolds in datastore:
if efolds[-1] > 65:
plt.scatter(phi0[0], phi0dot[0], s=200, color='aqua')
else:
plt.scatter(phi0[0], phi0dot[0], s=30, color='black')
# Apply labels
plt.xlabel(r"$\phi_0$")
plt.ylabel(r"$\dot{\phi}_0$")
# Finish the file
pdf_pages.savefig(canvas)
pdf_pages.close()
print("Finished!")
This type of separation is very akin to what I'd like to do, but don't see immediately how I would extend this to my problem. Any advice would be much appreciated.

I would assume that the separation line between the differently classified points is a simple contour line along the threshold value.
Here I'm assuming classification takes values of 0 or 1, hence one can draw a contour along 0.5,
ax.contour(x,y,clas, [0.5])
Example:
import numpy as np
import matplotlib.pyplot as plt
# Some data on a grid
x,y = np.meshgrid(np.arange(20), np.arange(10))
z = np.sin(y+1) + 2*np.cos(x/5) + 2
fig, ax = plt.subplots()
# Threshold; values above the threshold belong to another class as those below.
thresh = 2.5
clas = z > thresh
size = 100*clas + 30*~clas
# scatter plot
ax.scatter(x.flatten(), y.flatten(), s = size.flatten(), c=clas.flatten(), cmap="bwr")
# threshold line
ax.contour(x,y,clas, [.5], colors="k", linewidths=2)
plt.show()

Related

How do I correctly implement contours of histograms with logscale binning in numpy/matplotlib

I am trying to plot contours of data that his been binned using numpy.hist2d, except the bins are set using numpy.logscale (equal binning in log space).
Unfortunately, this results in a strange behavior that I can't seem to resolve: the placement of the contours does not match the location of the points in x/y. I plot both the 2d histogram of the data, and the contours, and they do not overlap.
It looks like what is actually happening is the contours are being placed on the physical location of the plot in linear space where I expect them to be placed in log space.
It's a strange phenomenon that I think can be best described by the following plots, using identical data but binned in different ways.:
Here is a minimum working example to produce the logbinned data:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.normal(loc=500, scale=100,size=10000)
y = np.random.normal(loc=600, scale=60, size=10000)
nbins = 50
bins = (np.logspace(np.log10(10),np.log10(1000),nbins),np.logspace(np.log10(10),np.log10(1000),nbins))
HH, xe, ye = np.histogram2d(x,y,bins=bins)
plt.hist2d(x,y,bins=bins,cmin=1);
grid = HH.transpose()
extent = np.array([xe.min(), xe.max(), ye.min(), ye.max()])
cs = plt.contourf(grid,2,extent=extent,extend='max',cmap='plasma',alpha=0.5,zorder=100)
plt.contour(grid,2,extent=extent,colors='k',zorder=100)
plt.yscale('log')
plt.xscale('log')
It's fairly clear what is happening -- the contour is getting misplaced do the scaling of the bins. I'd like to be able to plot the histogram and the contour here together.
If anyone has an idea of how to resolve this, that would be very helpful - thanks!
This is your problem:
cs = plt.contourf(grid,2,extent=extent,...)
You are passing in a single 2d array specifying the values of the histograms, but you aren't passing the x and y coordinates these data correspond to. By only passing in extent there's no way for pyplot to do anything other than assume that the underlying grid is uniform, stretched out to fit extent.
So instead what you have to do is to define x and y components for each value in grid. You have to think a bit how to do this, because you have (n, n)-shaped data and (n+1,)-shaped edges to go with it. We should probably choose the center of each bin to associate a data point with. So we need to find the midpoint of each bin, and pass those arrays to contour[f].
Something like this:
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng()
size = 10000
x = rng.normal(loc=500, scale=100, size=size)
y = rng.normal(loc=600, scale=60, size=size)
nbins = 50
bins = (np.geomspace(10, 1000, nbins),) * 2
HH, xe, ye = np.histogram2d(x, y, bins=bins)
fig, ax = plt.subplots()
ax.hist2d(x, y, bins=bins, cmin=1)
grid = HH.transpose()
# compute bin midpoints
midpoints = (xe[1:] + xe[:-1])/2, (ye[1:] + ye[:-1])/2
cs = ax.contourf(*midpoints, grid, levels=2, extend='max', cmap='plasma', alpha=0.5, zorder=100)
ax.contour(*midpoints, grid, levels=2, colors='k', zorder=100)
# these are a red herring during debugging:
#ax.set_yscale('log')
#ax.set_xscale('log')
(I've cleaned up your code a bit.)
Alternatively, if you want to avoid having those white strips at the top and edge, you can keep your bin edges, and pad your grid with zeros:
grid_padded = np.pad(grid, [(0, 1)])
cs = ax.contourf(xe, ye, grid_padded, levels=2, extend='max', cmap='plasma', alpha=0.5, zorder=100)
ax.contour(xe, ye, grid_padded, levels=2, colors='k', zorder=100)
This gives us something like
This seems prettier, but if you think about your data this is less exact, because your data points are shifted with respect to the bin coordinates they correspond to. If you look closely you can see the contours being shifted with respect to the output of hist2d. You could fix this by generating geomspaces with one more final value which you only use for this final plotting step, and again use the midpoints of these edges (complete with a last auxiliary one).

How to determine the x value on the edge of the violinplot for a mean line

I am trying to draw a mean line on violin plots, since I was not able to find a way to make sns replace the "median" line that comes from "quartiles", I decided to code so that for each case it draws on top. I am planning on drawing horizontal lines using plt.plot on the mean value (y value) of each of the three graphs I have.
I have the exact y (height) values where I want my horizontal line to be drawn, however, I am having difficulty trying to figure out the bound of each violin graph on that specific y value. I know since it is symmetric the domain is (-x, x), so I need a way to find that "x" value for me to be able to have 3 added horizontal lines which each bounded by the violin graphs that I have.
Here is my code, the x value of the plt.plot is -0.37, which is something I found by trial and error, I want python to find that for me for a given y value.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
data = [2.57e-05, 4.17e-06, -5.4e-06, -5.05e-06, 1.15e-05, -6.7e-06, 1.01e-05, 5.53e-06, 8.13e-06, 1.27e-05, 1.11e-06, -2.87e-06, -1.38e-06, -1.07e-05, -8.04e-06, 4.77e-06, 3.22e-07, 9.86e-06, 1.38e-05, 1.32e-05, -3.48e-06, -4.69e-06, 8.15e-06, 4.21e-07, 2.71e-06, 7.52e-08, 1.04e-06, -1.92e-06, -4.08e-06, 4.76e-06]
vg = sns.violinplot(data=data, inner="quartile", scale="width")
a = sns.pointplot(data=data, zlinestyles='-', join=False, ci=None, color='red')
for p in vg.lines:
p.set_linestyle('-')
p.set_linewidth(0.8) # Sets the thickness of the quartile lines
p.set_color('white') # Sets the color of the quartile lines
p.set_alpha(0.8)
for p in vg.lines[1::3]: # these are the median lines; not means
p.set_linestyle('-')
p.set_linewidth(0) # Sets the thickness of the median lines
p.set_color('black') # Sets the color of the median lines
p.set_alpha(0.8)
# add a mean line from the edge of the violin plot
plt.plot([-0.37, 0], [np.mean(data), np.mean(data)], 'k-', lw=1)
plt.show()
Refer to the picture where I removed the median point but left the quartile lines, where I want to draw mean lines across where the blue dots are visible
And here is a picture once I draw that plt.plot with the x value I found via trial and error: For case I only
You can draw a line that is too long, and then clip it with the polygon forming the violin.
Note that inner='quartile' shows the 25%, 50% and 75% lines. The 50% line is also known as the median. This is similar to how boxplots are usually drawn. It is rather confusing to show the mean in a too similar style. That's why seaborn (and many other libraries) prefer to show the mean as a point.
Here is some example code (note that the return value of sns.violinplot is an ax, and naming it very different makes it rather hard to find your way into matplotlib and seaborn docs and examples).
import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
import seaborn as sns
import pandas as pd
import numpy as np
tips = sns.load_dataset('tips')
tips['day'] = pd.Categorical(tips['day'])
ax = sns.violinplot(data=tips, x='day', y='total_bill', hue='day', inner='quartile', scale='width', dodge=False)
sns.pointplot(data=tips, x='day', y='total_bill', join=False, ci=None, color='yellow', ax=ax)
ax.legend_.remove()
for p in ax.lines:
p.set_linestyle('-')
p.set_linewidth(0.8) # Sets the thickness of the quartile lines
p.set_color('white') # Sets the color of the quartile lines
p.set_alpha(0.8)
for x, (day, violin) in enumerate(zip(tips['day'].cat.categories, ax.collections)):
line = ax.hlines(tips[tips['day'] == day]['total_bill'].mean(), x - 0.5, x + 0.5, color='black', ls=':', lw=2)
patch = PathPatch(violin.get_paths()[0], transform=ax.transData)
line.set_clip_path(patch) # clip the line by the form of the violin
plt.show()
Updated to use a list of lists of data:
data = [np.random.randn(10, 7).cumsum(axis=0).ravel() for _ in range(3)]
ax = sns.violinplot(data=data, inner='quartile', scale='width', palette='Set2')
# sns.pointplot(data=data, join=False, ci=None, color='red', ax=ax) # shows the means
ax.set_xticks(range(len(data)))
ax.set_xticklabels(['I' * (k + 1) for k in range(len(data))])
for p in ax.lines:
p.set_linestyle('-')
p.set_linewidth(0.8) # Sets the thickness of the quartile lines
p.set_color('white') # Sets the color of the quartile lines
p.set_alpha(0.8)
for x, (data_x, violin) in enumerate(zip(data, ax.collections)):
line = ax.hlines(np.mean(data_x), x - 0.5, x + 0.5, color='black', ls=':', lw=2)
patch = PathPatch(violin.get_paths()[0], transform=ax.transData)
line.set_clip_path(patch)
plt.show()
PS: Some further explanation about enumerate(zip(...))
for data_x in data: would loop through the entries of the list data, first assigning data[0] to data_x etc.
for x, data_x in enumerate(data): would loop through the entries of the list data and at the same time increment a variable x from 0 to 1 and finally to 2.
for data_x, violin in zip(data, ax.collections): would the data_x loop through the entries of the list data and simultaneously a variable violin through the list stored in ax.collections (this is where matplotlib stores the shapes of the violins)
for x, (data_x, violin) in enumerate(zip(data, ax.collections)): combines the enumeration with zip`

Matplotlib plot has slanted lines

I'm trying to plot projections of coordinates onto a line, but for some reason, Matplotlib is plotting the projections in a slightly slanted manner. Ideally, I would like the (blue) projections to be perpendicular to the (green) line. Here's an image of how it looks with sample data:
As you can see, the angles between the blue lines and the green line are slightly obtuse instead of right. I tried playing around with the rotation parameter to the annotate function, but this did not help. The code for this plot is below, although the data might look a bit different since the random generator is not seeded:
import numpy as np
import matplotlib.pyplot as plt
prefs = {'color':'purple','edgecolors':'black'}
X = np.dot(np.random.rand(2,2), np.random.rand(2,50)).T
pts = np.linspace(-1,1)
v1_m = 0.8076549717643662
plt.scatter(X[:,0],X[:,1],**prefs)
plt.plot(pts, [v1_m*x for x in pts], color='lightgreen')
for x,y in X:
# slope of connecting line
# y = mx+b
m = -np.reciprocal(v1_m)
b = y-m*x
# find intersecting point
zx = b/(v1_m-m)
zy = v1_m*zx
# draw line
plt.annotate('',(zx,zy),(x,y),arrowprops=dict(linewidth=2,arrowstyle='-',color='lightblue'))
plt.show()
The problem lies in the unequal axes which makes it look like they are not at a right angle. Use plt.axis('equal') to have equal axis spans on x- and y-axis and a square figure with equal height and width. plt.axis('scaled') works the same way. As pointed out by #CedricZoppolo, you should set the equal aspect ratios before plt.show(). As per docs, setting the aspect ratio to "equal" means
same scaling from data to plot units for x and y
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(8,8))
# Your code here
plt.axis('equal')
plt.show()
Choosing a square figure is not necessary as it works also with rectangular figures as
fig = plt.figure(figsize=(8,6))
# Your code here
plt.axis('equal')
plt.show()
The blue lines not being perpendicular is due to axis not being equal.
You just need to add below line before plt.show()
plt.gca().set_aspect('equal')
Below you can see the resulted graph:

Reducing axis length while maintaining equal aspect ratio in 3D plot

I am trying to create a 3-D plot and a 2-D plot side-by-side in python. I need equal aspect ratios for both plots, which I managed using code provided by this answer: https://stackoverflow.com/a/31364297/125507. The problem I'm having now is how to effectively "crop" the 3-D plot so it doesn't take up so much white space. That is to say, I want to reduce the length of the X and Y axes while maintaining equal scale to the (longer) Z-axis. Here is a sample code and plot:
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
import matplotlib.pyplot as plt
import numpy as np
def set_axes_equal(ax):
'''Make axes of 3D plot have equal scale so that spheres appear as spheres,
cubes as cubes, etc.. This is one possible solution to Matplotlib's
ax.set_aspect('equal') and ax.axis('equal') not working for 3D.
Input
ax: a matplotlib axis, e.g., as output from plt.gca().
'''
x_limits = ax.get_xlim3d()
y_limits = ax.get_ylim3d()
z_limits = ax.get_zlim3d()
x_range = abs(x_limits[1] - x_limits[0])
x_middle = np.mean(x_limits)
y_range = abs(y_limits[1] - y_limits[0])
y_middle = np.mean(y_limits)
z_range = abs(z_limits[1] - z_limits[0])
z_middle = np.mean(z_limits)
# The plot bounding box is a sphere in the sense of the infinity
# norm, hence I call half the max range the plot radius.
plot_radius = 0.5*max([x_range, y_range, z_range])
ax.set_xlim3d([x_middle - plot_radius, x_middle + plot_radius])
ax.set_ylim3d([y_middle - plot_radius, y_middle + plot_radius])
ax.set_zlim3d([z_middle - plot_radius, z_middle + plot_radius])
ax = [None]*2
fig = plt.figure()
ax[0] = fig.add_subplot(121, projection='3d', aspect='equal')
ax[1] = fig.add_subplot(122, aspect='equal')
nn = 30
phis = np.linspace(0,np.pi, nn).reshape(1,nn)
psis = np.linspace(0,np.pi*2,nn).reshape(nn,1)
ones = np.ones((nn,1))
el_h = np.linspace(-5, 5, nn).reshape(1,nn)
x_sph = np.sin(phis)*np.cos(psis)
y_sph = np.sin(phis)*np.sin(psis)
z_sph = np.cos(phis)*ones
x_elp = np.sin(phis)*np.cos(psis)*.25
y_elp = np.sin(phis)*np.sin(psis)*.25
z_elp = el_h*ones
ax[0].scatter(x_sph, y_sph, z_sph)
ax[0].scatter(x_elp, y_elp, z_elp)
ax[1].scatter(y_sph, z_sph)
ax[1].scatter(y_elp, z_elp)
for ii in range(2):
ax[ii].set_xlabel('X')
ax[ii].set_ylabel('Y')
ax[0].set_zlabel('Z')
set_axes_equal(ax[0])
plt.savefig('SphereElipse.png', dpi=300)
And here is its image output:
3-D and 2-D sphere and ellipse side-by-side
Clearly the 2D plot automatically modifies the length of the axes while maintaining the scale, but the 3D plot doesn't, leading to a tiny representation which does not well use the space allotted to its subplot. Is there any way to do this? This question is similar to an earlier unanswered question How do I crop an Axes3D plot with square aspect ratio?, except it adds the stipulation of multiple subplots, which means the answers provided there do not work.

plotting coordinate as a matrix matplotlib python

I have a set of coordinates, say [(2,3),(45,4),(3,65)]
I need to plot them as a matrix is there anyway I can do this in matplotlib so I want it to have this sort of look http://imgur.com/Q6LLhmk
Edit: My original answer used ax.scatter. There is a problem with this: If two points are side-by-side, ax.scatter may draw them with a bit of space in between, depending on the scale:
For example, with
data = np.array([(2,3),(3,3)])
Here is a zoomed-in detail:
So here is a alternative solution that fixes this problem:
import matplotlib.pyplot as plt
import numpy as np
data = np.array([(2,3),(3,3),(45,4),(3,65)])
N = data.max() + 5
# color the background white (1 is white)
arr = np.ones((N,N), dtype = 'bool')
# color the dots black (0)
arr[data[:,1], data[:,0]] = 0
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.imshow(arr, interpolation='nearest', cmap = 'gray')
ax.invert_yaxis()
# ax.axis('off')
plt.show()
No matter how much you zoom in, the adjacent squares at (2,3) and (3,3) will remain side-by-side.
Unfortunately, unlike ax.scatter, using ax.imshow requires building an N x N array, so it could be more memory-intensive than using ax.scatter. That should not be a problem unless data contains very large numbers, however.

Categories