Python matplotlib - Combine categorical background along with scatter plot

Python matplotlib - Combine categorical background along with scatter plot - python

I am trying to figure out a right library in Python to create a complex plot which looks something like this:
The plot background is classified into 3 regions (Yellow, Red, Green) based on conditions of X and Y. For Example :
For Green area: (X<=1 and Y<=1) OR (X<0.5)
For Yellow area: (0.5<X<1 and Y>1) OR (1<X<1.5 and 1<Y<3) OR (1.5<X<2 and Y<2)
Similarly for the Red area....
These conditions remain the same throughout my application.
I have the coordinates in a csv file and know how to plot the scatter plot. But I am stuck because of the background color code.
Is there a Python library that I can use to plot the scatter plot along with these grid colors at the back. I checked many sites and questions but unfortunately found nothing useful/related.
Any suggestions/help is appreciated.

You can use matplotlib's imshow() with a 2D array. The coordinates of the 2D array can be created using np.meshgrid(). These coordinates will be the lower left vertices of each grid cell. They can address into the 2D array, e.g. with [((X < 1) & (Y < 1)) | (X < 0.5)]. Filling the 2D arrays with 0, 1 and 2 at the appropriate locations allows to create the background.
Matplotlib's scatter() will place scatter dots.
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from matplotlib.ticker import MultipleLocator
import numpy as np
xvals = np.random.uniform(0, 3, 50)
yvals = np.random.uniform(0, 4.2, 50)
X1d = np.arange(0, 3.0001, 0.25)
Y1d = np.arange(0, 4.2001, 0.20)
X, Y = np.meshgrid(X1d, Y1d)
backgr = np.full_like(X, 2)
backgr[((X < 1.5) & (Y < 3)) | ((X < 2) & (Y < 2)) | (X < 1)] = 1
backgr[((X < 1) & (Y < 1)) | (X < 0.5)] = 0
fig, ax = plt.subplots()
ax.scatter(xvals, yvals, color='black')
cmap = ListedColormap(['lime', 'gold', 'crimson'])
ax.imshow(backgr[:-1, :-1], cmap=cmap, alpha=0.2, extent=[0, X1d[-1], 0, Y1d[-1]], origin='lower', aspect='auto')
ax.set_xticks(X1d, minor=True)
ax.set_yticks(Y1d, minor=True)
ax.xaxis.set_major_locator(MultipleLocator(1))
ax.yaxis.set_major_locator(MultipleLocator(1))
ax.grid(True, which='both', lw=1, ls=':', color='black')
plt.show()

Related

matplotlib fill_between leaving gaps between regions

I'm trying to use fill_between to fill different regions of a plot, but I get gaps between the regions I'm trying to fill.
I've tried using interpolate=True, but this results in non rectangular shapes...
`
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
x = np.arange(0, 4 * np.pi, 0.01)
y = np.sin(x)
ax.plot(x, y, color='black')
threshold = 0.75
ax.axhline(threshold, color='green', lw=2, alpha=0.7)
ax.fill_between(x, 0, 1, where=y > threshold,
facecolor=(0.5,0,0,0.5), ec=None,transform=ax.get_xaxis_transform())
ax.fill_between(x, 0, 1, where=y <= threshold,
facecolor=(0,0.5,0,0.5), ec=None, transform=ax.get_xaxis_transform())
`
I've attched a zoomed in screenshot of the plot.

You could do one or both of the following:
use finer-grainded x values, e.g.x = np.arange(0, 4 * np.pi, 0.0001). This will remove the white stripes at full view, but if you zoom in they will re-appear at a certain zoom level.
first draw the green background without a where condition over the full x range and then plot the red sections at the required sections. In case of non-opaque colors as in the example you'll need to manually re-calculate the semitransparent color on the default white background to a fully opaque color:
x = np.arange(0, 4 * np.pi, 0.001)
# ...
ax.fill_between(x, 0, 1, facecolor=(0, 0.5, 0, 0.5), ec=None,
transform=ax.get_xaxis_transform())
ax.fill_between(x, 0, 1, where=y>threshold, facecolor=(0.75, 0.5, 0.5),
ec=None, transform=ax.get_xaxis_transform())

I found an alternative way of solving this problem, by using pcolormesh where the color array is 1xn:
C = np.reshape(np.array(trnsys_out["LCG_state"][:-1].values), (-1, 1)).T
x = trnsys_out.index
y = [Pmin, Pmax]
ctrl = ax2.pcolormesh(x, y, C, shading="flat", cmap="binary", alpha=0.5, vmin=0, vmax=5)

Weird behaviors on interactive imshow plot in Python

I'm trying to construct a grid of black squares, and everytime you click on one it turns white. Now for some reason my code does very weird things:
The coordinates I input doesn't correspond to the array coordinates. I tried to change that by letting i = y - (N-1) and j = x with (x,y) the mouse coordinates. But only the first line will be converted properly (top row of the plot). The rest will be inverted vertically.
When all squares are white the plot automatically reset to black squares.
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import math
N = 3
# Make an empty data set
data = np.zeros((N, N))
# Make a figure + axes
fig, ax = plt.subplots(1, 1, tight_layout=True)
# Draw the boxes
box = ax.imshow(data, cmap='gray', extent=[0, N, 0, N])
# Draw the grid
for x in range(N + 1):
ax.axhline(x, lw=2, color='w', zorder=5)
ax.axvline(x, lw=2, color='w', zorder=5)
# Create interactivity
def on_click(event):
gx = event.xdata
gy = event.ydata
print('x=',gx)
print('y=',gy)
i = int(gy) - N + 1
j = int(gx)
data[i,j] = 1
ax.imshow(data, cmap='gray', extent=[0, N, 0, N])
fig.canvas.draw_idle()
fig = plt.gcf()
fig.canvas.mpl_connect('button_press_event', on_click)
# Turn off the axis labels
ax.axis('off')
plt.show()
Thanks for your help

Well I found the issues:
It was N - 1 - int(gy)
Matplotlib normalizes the values for me when they aren't specified. So a grid of all zeros is the same as a grid of all ones to it (each cell is equal to the mean, in both cases).

Is there a way to bin a set of 2D coordinates in Python

Hey so I have a 2D array of x,y coordinates in the form: [[x0,y0],[x1,y1],....,[xn,yn]]. These coordinates lie within a rectangle of size x_length and y_length. I want to split the rectangle into a set of squares and find how many coordinates lie within each square, if that makes any sense. Using the 2D histogram function (np.histogram2d()) I've managed to do something similar, but it doesn't tell me the actual number of points within each bin (which is what I'm trying to get). I've attached an example of the 2D histogram for reference.
enter image description here

values, xbins, ybins = np.histogram2d(x=a[:,0], y=a[:,1]) gives the actual number of points of each bin into values. Note that many matplotlib functions index first by y, so you might need values.T depending on the use case.
Here is a visualization showing how the values can be used.
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects
import numpy as np
x = np.linspace(-0.212, 0.233, 50)
y = x * 0.5 - 0.01
hist, xbins, ybins = np.histogram2d(x=x, y=y, bins=(np.arange(-0.25, 0.25001, 0.02), np.arange(-0.15, 0.15001, 0.02)))
fig, ax = plt.subplots(figsize=(11, 6))
for i in range(len(xbins) - 1):
for j in range(len(ybins) - 1):
text = ax.text((xbins[i] + xbins[i + 1]) / 2, (ybins[j] + ybins[j + 1]) / 2, f"{hist[i, j]:.0f}",
color='cornflowerblue', size=16, ha='center', va='center')
text.set_path_effects([path_effects.Stroke(linewidth=3, foreground='white', alpha=0.6), path_effects.Normal()])
ax.plot(x, y, '-ro')
ax.set_xlim(xbins.min(), xbins.max())
ax.set_ylim(ybins.min(), ybins.max())
ax.set_xticks(xbins + 0.0001, minor=True)
ax.set_yticks(ybins + 0.0001, minor=True)
ax.grid(which='minor', color='dodgerblue', ls='--')
ax.set_aspect(1)
plt.show()

How to plot a mean line on a distplot between 0 and the y value of the mean?

I have a distplot and I would like to plot a mean line that goes from 0 to the y value of the mean frequency. I want to do this, but have the line stop at when the distplot does. Why isn't there a simple parameter that does this? It would be very useful.
I have some code that gets me almost there:
plt.plot([x.mean(),x.mean()], [0, *what here?*])
This code plots a line just as I'd like except for my desired y-value. What would the correct math be to get the y max to stop at the frequency of the mean in the distplot? An example of one of my distplots is below using 0.6 as the y-max. It would be awesome if there was some math to make it stop at the y-value of the mean. I have tried dividing the mean by the count etc.

Update for the latest versions of matplotlib (3.3.4) and seaborn (0.11.1): the kdeplot with shade=True now doesn't create a line object anymore. To get the same outcome as before, setting shade=False will still create the line object. The curve can then be filled with ax.fill_between(). The code below is changed accordingly. (Use the revision history to see the older version.)
ax.lines[0] gets the curve of the kde, of which you can extract the x and y data.
np.interp then can find the height of the curve for a given x-value:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
x = np.random.normal(np.tile(np.random.uniform(10, 30, 5), 50), 3)
ax = sns.kdeplot(x, shade=False, color='crimson')
kdeline = ax.lines[0]
mean = x.mean()
xs = kdeline.get_xdata()
ys = kdeline.get_ydata()
height = np.interp(mean, xs, ys)
ax.vlines(mean, 0, height, color='crimson', ls=':')
ax.fill_between(xs, 0, ys, facecolor='crimson', alpha=0.2)
plt.show()
The same approach can be extended to show the mean together with the standard deviation, or the median and the quartiles:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
x = np.random.normal(np.tile(np.random.uniform(10, 30, 5), 50), 3)
fig, axes = plt.subplots(ncols=2, figsize=(12, 4))
for ax in axes:
sns.kdeplot(x, shade=False, color='crimson', ax=ax)
kdeline = ax.lines[0]
xs = kdeline.get_xdata()
ys = kdeline.get_ydata()
if ax == axes[0]:
middle = x.mean()
sdev = x.std()
left = middle - sdev
right = middle + sdev
ax.set_title('Showing mean and sdev')
else:
left, middle, right = np.percentile(x, [25, 50, 75])
ax.set_title('Showing median and quartiles')
ax.vlines(middle, 0, np.interp(middle, xs, ys), color='crimson', ls=':')
ax.fill_between(xs, 0, ys, facecolor='crimson', alpha=0.2)
ax.fill_between(xs, 0, ys, where=(left <= xs) & (xs <= right), interpolate=True, facecolor='crimson', alpha=0.2)
# ax.set_ylim(ymin=0)
plt.show()
PS: for the mode of the kde:
mode_idx = np.argmax(ys)
ax.vlines(xs[mode_idx], 0, ys[mode_idx], color='lime', ls='--')

With plt.get_ylim() you can get the limits of the current plot: [bottom, top].
So, in your case, you can extract the actual limits and save them in ylim, then draw the line:
fig, ax = plt.subplots()
ylim = ax.get_ylim()
ax.plot([x.mean(),x.mean()], ax.get_ylim())
ax.set_ylim(ylim)
As ax.plot changes the ylims afterwards, you have to re-set them with ax.set_ylim as above.

How can I create custom break points in a matplotlib colorbar?

I'm borrowing an example from the matplotlib custom cmap examples page:
https://matplotlib.org/examples/pylab_examples/custom_cmap.html
This produces the same image with different numbers of shading contours, as specified in the number of bins: n_bins:
https://matplotlib.org/_images/custom_cmap_00.png
However, I'm interested not only in the number of bins, but the specific break points between the color values. For example, when nbins=6 in the top right subplot, how can I specify the ranges of the bins to such that the shading is filled in these custom areas:
n_bins_ranges = ([-10,-5],[-5,-2],[-2,-0.5],[-0.5,2.5],[2.5,7.5],[7.5,10])
Is it also possible to specify the inclusivity of the break points? For example, I'd like to specify in the range between -2 and 0.5 whether it's -2 < x <= -0.5 or -2 <= x < -0.5.
EDIT WITH ANSWER BELOW:
Using the accepted answer below, here is code that plots each step including finally adding custom colorbar ticks at the midpoint. Note I can't post an image since I'm a new user.
Set up data and 6 color bins:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
# Make some illustrative fake data:
x = np.arange(0, np.pi, 0.1)
y = np.arange(0, 2*np.pi, 0.1)
X, Y = np.meshgrid(x, y)
Z = np.cos(X) * np.sin(Y) * 10
# Create colormap with 6 discrete bins
colors = [(1, 0, 0), (0, 1, 0), (0, 0, 1)] # R -> G -> B
n_bin = 6
cmap_name = 'my_list'
cm = matplotlib.colors.LinearSegmentedColormap.from_list(
cmap_name, colors, N=n_bin)
Plot different options:
# Set up 4 subplots
fig, axs = plt.subplots(2, 2, figsize=(6, 9))
fig.subplots_adjust(left=0.02, bottom=0.06, right=0.95, top=0.94, wspace=0.05)
# Plot 6 bin figure
im = axs[0,0].imshow(Z, interpolation='nearest', origin='lower', cmap=cm)
axs[0,0].set_title("Original 6 Bin")
fig.colorbar(im, ax=axs[0,0])
# Change the break points
n_bins_ranges = [-10,-5,-2,-0.5,2.5,7.5,10]
norm = matplotlib.colors.BoundaryNorm(n_bins_ranges, len(n_bins_ranges))
im = axs[0,1].imshow(Z, interpolation='nearest', origin='lower', cmap=cm, norm=norm)
axs[0,1].set_title("Custom Break Points")
fig.colorbar(im, ax=axs[0,1])
# Arrange color labels by data interval (not colors)
im = axs[1,0].imshow(Z, interpolation='nearest', origin='lower', cmap=cm, norm=norm)
axs[1,0].set_title("Linear Color Distribution")
fig.colorbar(im, ax=axs[1,0], spacing="proportional")
# Provide custom labels at color midpoints
# And change inclusive equality by adding arbitrary small value
n_bins_ranges_arr = np.asarray(n_bins_ranges)+1e-9
norm = matplotlib.colors.BoundaryNorm(n_bins_ranges, len(n_bins_ranges))
n_bins_ranges_midpoints = (n_bins_ranges_arr[1:] + n_bins_ranges_arr[:-1])/2.0
im = axs[1,1].imshow(Z, interpolation='nearest', origin='lower', cmap=cm ,norm=norm)
axs[1,1].set_title("Midpoint Labels\n Switched Equal Sign")
cbar=fig.colorbar(im, ax=axs[1,1], spacing="proportional",
ticks=n_bins_ranges_midpoints.tolist())
cbar.ax.set_yticklabels(['Red', 'Brown', 'Green 1','Green 2','Gray Blue','Blue'])
plt.show()

You can use a BoundaryNorm as follows:
import matplotlib.pyplot as plt
import matplotlib.colors
import numpy as np
x = np.arange(0, np.pi, 0.1)
y = np.arange(0, 2*np.pi, 0.1)
X, Y = np.meshgrid(x, y)
Z = np.cos(X) * np.sin(Y) * 10
colors = [(1, 0, 0), (0, 1, 0), (0, 0, 1)] # R -> G -> B
n_bin = 6 # Discretizes the interpolation into bins
n_bins_ranges = [-10,-5,-2,-0.5,2.5,7.5,10]
cmap_name = 'my_list'
fig, ax = plt.subplots()
# Create the colormap
cm = matplotlib.colors.LinearSegmentedColormap.from_list(
cmap_name, colors, N=n_bin)
norm = matplotlib.colors.BoundaryNorm(n_bins_ranges, len(n_bins_ranges))
# Fewer bins will result in "coarser" colomap interpolation
im = ax.imshow(Z, interpolation='nearest', origin='lower', cmap=cm, norm=norm)
ax.set_title("N bins: %s" % n_bin)
fig.colorbar(im, ax=ax)
plt.show()
Or, if you want proportional spacing, i.e. the distance between colors according to their values,
fig.colorbar(im, ax=ax, spacing="proportional")
As the boundary norm documentation states
If b[i] <= v < b[i+1]
then v is mapped to color j; as i varies from 0 to len(boundaries)-2, j goes from 0 to ncolors-1.
So the colors are always chosen as -2 <= x < -0.5, in order to obtain the equal sign on the other side you would need to supply
something like n_bins_ranges = np.array([-10,-5,-2,-0.5,2.5,7.5,10])-1e-9

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python matplotlib - Combine categorical background along with scatter plot - python

Related

matplotlib fill_between leaving gaps between regions

Weird behaviors on interactive imshow plot in Python

Is there a way to bin a set of 2D coordinates in Python

How to plot a mean line on a distplot between 0 and the y value of the mean?

How can I create custom break points in a matplotlib colorbar?

Categories

Resources