I am trying to perform a scatter plot within a boxplot as subplot. When I do for just one boxsplot, it works. I can define a specific point with specific color inside of the boxsplot. The green ball (Image 1) is representing an specific number in comparision with boxplot values.
for columnName in data_num.columns:
plt.figure(figsize=(2, 2), dpi=100)
bp = data_num.boxplot(column=columnName, grid=False)
y = S[columnName]
x = columnName
if y > data_num[columnName].describe().iloc[5]:
plt.plot(1, y, 'r.', alpha=0.7,color='green',markersize=12)
count_G = count_G + 1
elif y < data_num[columnName].describe().iloc[5]:
plt.plot(1, y, 'r.', alpha=0.7,color='red',markersize=12)
count_L = count_L + 1
else:
plt.plot(1, y, 'r.', alpha=0.7,color='yellow',markersize=12)
count_E = count_E + 1
Image 1 - Scatter + 1 boxplot
I can create a subplot with boxplots.
fig, axes = plt.subplots(6,10,figsize=(16,16)) # create figure and axes
fig.subplots_adjust(hspace=0.6, wspace=1)
for j,columnName in enumerate(list(data_num.columns.values)[:-1]):
bp = data_num.boxplot(columnName,ax=axes.flatten()[j])
Image 2 - Subplots + Boxplots
But when I try to plot a specific number inside of each boxplot, actually it subscribes the entire plot.
plt.subplot(6,10,j+1)
if y > data_num[columnName].describe().iloc[5]:
plt.plot(1, y, 'r.', alpha=0.7,color='green',markersize=12)
count_G = count_G + 1
elif y < data_num[columnName].describe().iloc[5]:
plt.plot(1, y, 'r.', alpha=0.7,color='red',markersize=12)
count_L = count_L + 1
else:
plt.plot(1, y, 'r.', alpha=0.7,color='black',markersize=12)
count_E = count_E + 1
Image 3 - Subplots + scatter
It is not completely clear what is going wrong. Probably the call to plt.subplot(6,10,j+1) is erasing some stuff. However, such a call is not necessary with the standard modern use of matplotlib, where the subplots are created via fig, axes = plt.subplots(). Be careful to use ax.plot() instead of plt.plot(). plt.plot() plots on the "current" ax, which can be a bit confusing when there are lots of subplots.
The sample code below first creates some toy data (hopefully similar to the data in the question). Then the boxplots and the individual dots are drawn in a loop. To avoid repetition, the counts and the colors are stored in dictionaries. As data_num[columnName].describe().iloc[5] seems to be the median, for readability the code directly calculates that median.
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
column_names = list('abcdef')
S = {c: np.random.randint(2, 6) for c in column_names}
data_num = pd.DataFrame({c: np.random.randint(np.random.randint(0, 3), np.random.randint(4, 8), 20)
for c in column_names})
colors = {'G': 'limegreen', 'E': 'gold', 'L': 'crimson'}
counts = {c: 0 for c in colors}
fig, axes = plt.subplots(1, 6, figsize=(12, 3), gridspec_kw={'hspace': 0.6, 'wspace': 1})
for columnName, ax in zip(data_num.columns, axes.flatten()):
data_num.boxplot(column=columnName, grid=False, ax=ax)
y = S[columnName] # in case S would be a dataframe with one row: y = S[columnName].values[0]
data_median = data_num[columnName].median()
classification = 'G' if y > data_median else 'L' if y < data_median else 'E'
ax.plot(1, y, '.', alpha=0.9, color=colors[classification], markersize=12)
counts[classification] += 1
print(counts)
plt.show()
Related
I am plotting several lines in the same figure (which is a football pitch) like so:
fig, ax = create_pitch(120, 80,'white')
for index, pass_ in cluster5.iterrows():
if (index < 0):
continue
x, y = pass_['x'], pass_['y']
end_x, end_y = pass_['end_x'], pass_['end_y']
y = 80 - y
end_y = 80 - end_y
color = color_map[pass_['possession']]
ax.plot([x, end_x], [y, end_y], linewidth = 3, alpha = 0.75, color = color, label = pass_['possession'])
ax.legend(loc = 'upper left')
There are several groups and I would like to plot a single legend for them.
However, I now have a legend of repeated items (one for each call to ax plot for each label).
How can I just plot a single legend item for each label?
Thanks a lot in advance!
I solved this by adding a proxy plot and handle:
for c in color_labels:
ax.plot(x, y, linewidth = 3, color = color_map[c], alpha = 0.75, label = c)
with x, y being the last used one. such that the final color is the same.
I am plotting separate figures for each attribute and label for each data sample. Here is the illustration:
As illustrated in the the last subplot (Label), my data contains seven classes (numerically) (0 to 6). I'd like to visualize these classes using a different fancy colors and a legend. Please note that I just want colors for last subplot. How should I do that?
Here is the code of above plot:
x, y = test_data["x"], test_data["y"]
# determine the total number of plots
n, off = x.shape[1] + 1, 0
plt.rcParams["figure.figsize"] = (40, 15)
# plot all the attributes
for i in range(6):
plt.subplot(n, 1, off + 1)
plt.plot(x[:, off])
plt.title('Attribute:' + str(i), y=0, loc='left')
off += 1
# plot Labels
plt.subplot(n, 1, n)
plt.plot(y)
plt.title('Label', y=0, loc='left')
plt.savefig(save_file_name, bbox_inches="tight")
plt.close()
First, just to set up a similar dataset:
import matplotlib.pyplot as plt
import numpy as np
x = np.random.random((100,6))
y = np.random.randint(0, 6, (100))
fig, axs = plt.subplots(6, figsize=(40,15))
We could use plt.scatter() to give individual points different marker styles:
for i in range(x.shape[-1]):
axs[i].scatter(range(x.shape[0]), x[:,i], c=y)
Or we could mask the arrays we're plotting:
for i in range(x.shape[-1]):
for j in np.unique(y):
axs[i].plot(np.ma.masked_where(y!=j, x[:,i]), 'o')
Either way we get the same results:
Edit: Ah you've edited your question! You can do exactly the same thing for your last plot only, just modify my code above to take it out of the loop of subplots :)
As suggested, we imitate the matplotlib step function by creating a LineCollection to color the different line segments:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.collections import LineCollection
from matplotlib.patches import Patch
#random data generation
np.random.seed(12345)
number_of_categories=4
y = np.concatenate([np.repeat(np.random.randint(0, number_of_categories), np.random.randint(1, 30)) for _ in range(20)])
#check the results with less points
#y = y[:10]
x = y[None] * np.linspace(1, 5, 3)[:, None]
x += 2 * np.random.random(x.shape) - 1
#your initial plot
num_plots = x.shape[0] + 1
fig, axes = plt.subplots(num_plots, 1, sharex=True, figsize=(10, 8))
for i, ax in enumerate(axes.flat[:-1]):
ax.plot(x[i,:])
#first we create the matplotlib step function with x-values as their midpoint
axes.flat[-1].step(np.arange(y.size), y, where="mid", color="lightgrey", zorder=-1)
#then we plot colored segments with shifted index simulating the step function
shifted_x = np.arange(y.size+1)-0.5
#and identify the step indexes
idx_steps, = np.nonzero(np.diff(y, prepend=np.inf, append=np.inf))
#create collection of plateau segments
colored_segments = np.zeros((idx_steps.size-1, 2, 2))
colored_segments[:, :, 0] = np.vstack((shifted_x[idx_steps[:-1]], shifted_x[idx_steps[1:]])).T
colored_segments[:, :, 1] = np.repeat(y[idx_steps[:-1]], 2).reshape(-1, 2)
#generate discrete color list
n_levels, idx_levels = np.unique(y[idx_steps[:-1]], return_inverse=True)
colorarr = np.asarray(plt.cm.tab10.colors[:n_levels.size])
#and plot the colored segments
lc_cs = LineCollection(colored_segments, colors=colorarr[idx_levels, :], lw=10)
lines_cs = axes.flat[-1].add_collection(lc_cs)
#scaling and legend generation
axes.flat[-1].set_ylim(n_levels.min()-0.5, n_levels.max()+0.5)
axes.flat[-1].legend([Patch(color=colorarr[i, :]) for i, _ in enumerate(n_levels)],
[f"cat {i}" for i in n_levels],
loc="upper center", bbox_to_anchor=(0.5, -0.15),
ncol=n_levels.size)
plt.show()
Sample output:
Alternatively, you can use broken barh plots or color this axis or even all axes using axvspan.
I am trying to plot 4 columns data using bar3d command in python.
column1 - X
column2 - Y
column3 - Z
column4 - e
So far I am able to plot three column data as seen in figure:
Now, i would like to stack the column 4 ("e") as a colur in this plot.
Could some one please recommend a way to do this in python.
Code: plot1 = ax.bar3d(X,Y,Z,dx,dy,dz)
Thanks in advance.
In general, you define a color array by using
colors = plt.cm.jet(your_4th_dimension) and apply it to your plot as in plot1 = ax.bar3d(X,Y,Z,dx,dy,dz, color= colors). Since you didn't provide your data, here is a generic example. In the following example, a color bar is also created:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(10, 10))
ax = fig.gca(fc='white', projection='3d')
x = [i for i in range(10)]
y = [2*i + 3 for i in range(10)]
z = [i**2 for i in range(10)]
#fouth dimension:
fourth_dim = [2* i**2 for i in range(10)]
colors = plt.cm.jet(fourth_dim)
ax.bar3d(x, y, 0, 1, 1, z, color= colors)
#colorbar creation:
colorMap = plt.cm.ScalarMappable(cmap=plt.cm.rainbow_r)
colorMap.set_array(fourth_dim)
colBar = plt.colorbar(colorMap).set_label('fourth dimension')
ax.set_xlabel("time")
ax.set_ylabel("distance")
ax.set_zlabel("cost")
plt.show()
I'm borrowing an example from the matplotlib custom cmap examples page:
https://matplotlib.org/examples/pylab_examples/custom_cmap.html
This produces the same image with different numbers of shading contours, as specified in the number of bins: n_bins:
https://matplotlib.org/_images/custom_cmap_00.png
However, I'm interested not only in the number of bins, but the specific break points between the color values. For example, when nbins=6 in the top right subplot, how can I specify the ranges of the bins to such that the shading is filled in these custom areas:
n_bins_ranges = ([-10,-5],[-5,-2],[-2,-0.5],[-0.5,2.5],[2.5,7.5],[7.5,10])
Is it also possible to specify the inclusivity of the break points? For example, I'd like to specify in the range between -2 and 0.5 whether it's -2 < x <= -0.5 or -2 <= x < -0.5.
EDIT WITH ANSWER BELOW:
Using the accepted answer below, here is code that plots each step including finally adding custom colorbar ticks at the midpoint. Note I can't post an image since I'm a new user.
Set up data and 6 color bins:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
# Make some illustrative fake data:
x = np.arange(0, np.pi, 0.1)
y = np.arange(0, 2*np.pi, 0.1)
X, Y = np.meshgrid(x, y)
Z = np.cos(X) * np.sin(Y) * 10
# Create colormap with 6 discrete bins
colors = [(1, 0, 0), (0, 1, 0), (0, 0, 1)] # R -> G -> B
n_bin = 6
cmap_name = 'my_list'
cm = matplotlib.colors.LinearSegmentedColormap.from_list(
cmap_name, colors, N=n_bin)
Plot different options:
# Set up 4 subplots
fig, axs = plt.subplots(2, 2, figsize=(6, 9))
fig.subplots_adjust(left=0.02, bottom=0.06, right=0.95, top=0.94, wspace=0.05)
# Plot 6 bin figure
im = axs[0,0].imshow(Z, interpolation='nearest', origin='lower', cmap=cm)
axs[0,0].set_title("Original 6 Bin")
fig.colorbar(im, ax=axs[0,0])
# Change the break points
n_bins_ranges = [-10,-5,-2,-0.5,2.5,7.5,10]
norm = matplotlib.colors.BoundaryNorm(n_bins_ranges, len(n_bins_ranges))
im = axs[0,1].imshow(Z, interpolation='nearest', origin='lower', cmap=cm, norm=norm)
axs[0,1].set_title("Custom Break Points")
fig.colorbar(im, ax=axs[0,1])
# Arrange color labels by data interval (not colors)
im = axs[1,0].imshow(Z, interpolation='nearest', origin='lower', cmap=cm, norm=norm)
axs[1,0].set_title("Linear Color Distribution")
fig.colorbar(im, ax=axs[1,0], spacing="proportional")
# Provide custom labels at color midpoints
# And change inclusive equality by adding arbitrary small value
n_bins_ranges_arr = np.asarray(n_bins_ranges)+1e-9
norm = matplotlib.colors.BoundaryNorm(n_bins_ranges, len(n_bins_ranges))
n_bins_ranges_midpoints = (n_bins_ranges_arr[1:] + n_bins_ranges_arr[:-1])/2.0
im = axs[1,1].imshow(Z, interpolation='nearest', origin='lower', cmap=cm ,norm=norm)
axs[1,1].set_title("Midpoint Labels\n Switched Equal Sign")
cbar=fig.colorbar(im, ax=axs[1,1], spacing="proportional",
ticks=n_bins_ranges_midpoints.tolist())
cbar.ax.set_yticklabels(['Red', 'Brown', 'Green 1','Green 2','Gray Blue','Blue'])
plt.show()
You can use a BoundaryNorm as follows:
import matplotlib.pyplot as plt
import matplotlib.colors
import numpy as np
x = np.arange(0, np.pi, 0.1)
y = np.arange(0, 2*np.pi, 0.1)
X, Y = np.meshgrid(x, y)
Z = np.cos(X) * np.sin(Y) * 10
colors = [(1, 0, 0), (0, 1, 0), (0, 0, 1)] # R -> G -> B
n_bin = 6 # Discretizes the interpolation into bins
n_bins_ranges = [-10,-5,-2,-0.5,2.5,7.5,10]
cmap_name = 'my_list'
fig, ax = plt.subplots()
# Create the colormap
cm = matplotlib.colors.LinearSegmentedColormap.from_list(
cmap_name, colors, N=n_bin)
norm = matplotlib.colors.BoundaryNorm(n_bins_ranges, len(n_bins_ranges))
# Fewer bins will result in "coarser" colomap interpolation
im = ax.imshow(Z, interpolation='nearest', origin='lower', cmap=cm, norm=norm)
ax.set_title("N bins: %s" % n_bin)
fig.colorbar(im, ax=ax)
plt.show()
Or, if you want proportional spacing, i.e. the distance between colors according to their values,
fig.colorbar(im, ax=ax, spacing="proportional")
As the boundary norm documentation states
If b[i] <= v < b[i+1]
then v is mapped to color j; as i varies from 0 to len(boundaries)-2, j goes from 0 to ncolors-1.
So the colors are always chosen as -2 <= x < -0.5, in order to obtain the equal sign on the other side you would need to supply
something like n_bins_ranges = np.array([-10,-5,-2,-0.5,2.5,7.5,10])-1e-9
I want 3 graphs on one axes object, for example:
#example x- and y-data
x_values1=[1,2,3,4,5]
y_values1=[1,2,3,4,5]
x_values2=[-1000,-800,-600,-400,-200]
y_values2=[10,20,39,40,50]
x_values3=[150,200,250,300,350]
y_values3=[10,20,30,40,50]
#make axes
fig=plt.figure()
ax=fig.add_subplot(111)
now I want to add all three data sets to ax. But they shouldn't share any x- or y-axis (since then because of the diffenrent scales one would be way smaller thant the other. I need something like ax.twinx(), ax.twiny(), but both the x- and y-axis need to be independent.
I want to do this, because I want to put the two attached plots (and a third one, that is similar to the second one) in one plot ("put them on top of each other").
Plot1
Plot2
I then would put the x/y-labels (and/or ticks, limits) of the second plot on the right/top and the x/y-limits of another plot in the bottom/left. I dont need x/y-labels of the 3. plot.
How do I do this?
The idea would be to create three subplots at the same position. In order to make sure, they will be recognized as different plots, their properties need to differ - and the easiest way to achieve this is simply to provide a different label, ax=fig.add_subplot(111, label="1").
The rest is simply adjusting all the axes parameters, such that the resulting plot looks appealing.
It's a little bit of work to set all the parameters, but the following should do what you need.
import matplotlib.pyplot as plt
x_values1=[1,2,3,4,5]
y_values1=[1,2,2,4,1]
x_values2=[-1000,-800,-600,-400,-200]
y_values2=[10,20,39,40,50]
x_values3=[150,200,250,300,350]
y_values3=[10,20,30,40,50]
fig=plt.figure()
ax=fig.add_subplot(111, label="1")
ax2=fig.add_subplot(111, label="2", frame_on=False)
ax3=fig.add_subplot(111, label="3", frame_on=False)
ax.plot(x_values1, y_values1, color="C0")
ax.set_xlabel("x label 1", color="C0")
ax.set_ylabel("y label 1", color="C0")
ax.tick_params(axis='x', colors="C0")
ax.tick_params(axis='y', colors="C0")
ax2.scatter(x_values2, y_values2, color="C1")
ax2.xaxis.tick_top()
ax2.yaxis.tick_right()
ax2.set_xlabel('x label 2', color="C1")
ax2.set_ylabel('y label 2', color="C1")
ax2.xaxis.set_label_position('top')
ax2.yaxis.set_label_position('right')
ax2.tick_params(axis='x', colors="C1")
ax2.tick_params(axis='y', colors="C1")
ax3.plot(x_values3, y_values3, color="C3")
ax3.set_xticks([])
ax3.set_yticks([])
plt.show()
You could also standardize the data so it shares the same limits and then plot the limits of the desired second scale "manually".
This function standardizes the data to the limits of the first set of points:
def standardize(data):
for a in range(2):
span = max(data[0][a]) - min(data[0][a])
min_ = min(data[0][a])
for idx in range(len(data)):
standardize = (max(data[idx][a]) - min(data[idx][a]))/span
data[idx][a] = [i/standardize + min_ - min([i/standardize
for i in data[idx][a]]) for i in data[idx][a]]
return data
Then, plotting the data is easy:
import matplotlib.pyplot as plt
data = [[[1,2,3,4,5],[1,2,2,4,1]], [[-1000,-800,-600,-400,-200], [10,20,39,40,50]], [[150,200,250,300,350], [10,20,30,40,50]]]
limits = [(min(data[1][a]), max(data[1][a])) for a in range(2)]
norm_data = standardize(data)
fig, ax = plt.subplots()
for x, y in norm_data:
ax.plot(x, y)
ax2, ax3 = ax.twinx(), ax.twiny()
ax2.set_ylim(limits[1])
ax3.set_xlim(limits[0])
plt.show()
Since all data points have the limits of the first set of points, we can just plot them on the same axis. Then, using the limits of the desired second x and y axis we can set the limits for these two.
In this example, you can plot multiple lines in each x-y-axis, and legend each line.
import numpy as np
import matplotlib.pyplot as plt
X1 = np.arange(10)
X1 = np.stack([X1, X1])
Y1 = np.random.randint(1, 10, (2, 10))
X2 = np.arange(0, 1000, 200)
X2 = np.stack([X2, X2])
Y2 = np.random.randint(100, 200, (2, 5))
x_label_names = ['XXX', 'xxx']
y_label_names = ['YYY', 'yyy']
X1_legend_names = ['X1_legend1', 'X1_legend2']
X2_legend_names = ['X2_legend1', 'X2_legend2']
def plot_by_two_xaxis(X1, Y1, X2, Y2, x_label_names: list, y_label_names: list, X1_legend_names: list, X2_legend_names: list):
fig = plt.figure()
ax1s = []
ax2s = []
lines = []
j = 0
for i in range(len(X1)):
j += 1
ax1s.append(fig.add_subplot(111, label=f"{j}", frame_on=(j == 1)))
for i in range(len(X2)):
j += 1
ax2s.append(fig.add_subplot(111, label=f"{j}", frame_on=(j == 1)))
k = 0
for i in range(len(X1)):
lines.append(ax1s[i].plot(X1[i], Y1[i], color=f"C{k}")[0])
if i == 0:
ax1s[i].set_xlabel(x_label_names[0], color=f"C{k}")
ax1s[i].set_ylabel(y_label_names[0], color=f"C{k}")
ax1s[i].tick_params(axis='x', colors=f"C{k}")
ax1s[i].tick_params(axis='y', colors=f"C{k}")
else:
ax1s[i].set_xticks([])
ax1s[i].set_yticks([])
k += 1
for i in range(len(X1)):
lines.append(ax2s[i].plot(X2[i], Y2[i], color=f"C{k}")[0])
if i == 0:
ax2s[i].xaxis.tick_top()
ax2s[i].yaxis.tick_right()
ax2s[i].set_xlabel(x_label_names[1], color=f"C{k}")
ax2s[i].set_ylabel(y_label_names[1], color=f"C{k}")
ax2s[i].xaxis.set_label_position('top')
ax2s[i].yaxis.set_label_position('right')
ax2s[i].tick_params(axis='x', colors=f"C{k}")
ax2s[i].tick_params(axis='y', colors=f"C{k}")
else:
ax2s[i].set_xticks([])
ax2s[i].set_yticks([])
k += 1
ax1s[0].legend(lines, X1_legend_names + X2_legend_names)
plt.show()
plot_by_two_xaxis(X1, Y1, X2, Y2, x_label_names,
y_label_names, X1_legend_names, X2_legend_names)