I am plotting several lines in the same figure (which is a football pitch) like so:
fig, ax = create_pitch(120, 80,'white')
for index, pass_ in cluster5.iterrows():
if (index < 0):
continue
x, y = pass_['x'], pass_['y']
end_x, end_y = pass_['end_x'], pass_['end_y']
y = 80 - y
end_y = 80 - end_y
color = color_map[pass_['possession']]
ax.plot([x, end_x], [y, end_y], linewidth = 3, alpha = 0.75, color = color, label = pass_['possession'])
ax.legend(loc = 'upper left')
There are several groups and I would like to plot a single legend for them.
However, I now have a legend of repeated items (one for each call to ax plot for each label).
How can I just plot a single legend item for each label?
Thanks a lot in advance!
I solved this by adding a proxy plot and handle:
for c in color_labels:
ax.plot(x, y, linewidth = 3, color = color_map[c], alpha = 0.75, label = c)
with x, y being the last used one. such that the final color is the same.
Related
Dataset: I have a series (n = 30) of X (wavelength) and Y (reflectance) data, each associated with a unique value Z (age). Z values are stored as a separate ordered list.
Goal: I am trying to create a series of line plots which display each of the 30 datasets together, where each line is appropriately colored according their Z value (age). I am hoping for weighted colorization depending on the Z value, and an associated colorbar() or similar.
Attempts: I tried manipulating rcParams to do this by iterating through a color-scheme per plot [i], but the colors are not weighted properly to the Z value. See example figure. I think my issue is similar to this question here.
I feel like this shouldn't be so hard and that I am missing something obvious!
#plot
target_x = nm_names
target_y = data_plot
target_names = ages
N = len(target_y) # number of objects to plot i.e. color cycle count
plt.rcParams["figure.figsize"] = [16,7] # fig size
plt.rcParams["axes.prop_cycle"] = plt.cycler("color", plt.cm.PiYG(np.linspace(0,1,N))) # colors to cycle through, choose default like 'viridis' or 'PiYG'
fig, ax = plt.subplots()
for i in range(N):
ax.plot(target_x, target_y.iloc[i], label = target_names[i]) # for i in range of objects, plot x,y
#axes
plt.xticks(fontsize = 10, rotation=70, size = 8)
ax.xaxis.set_major_locator(ticker.MultipleLocator(50))
plt.xlabel('Wavelength (nm)', fontsize = 14)
plt.yticks(fontsize = 12)
plt.ylabel('Normalized Relative Reflectance', fontsize = 13)
plt.title("Spectral Profile", size = 14)
plt.title
plt.xlim(375,2500)
# legend location
box = ax.get_position()
ax.set_position([box.x0, box.y0 + box.height * 0.1,
box.width, box.height * .9])
ax.legend(loc='lower left', bbox_to_anchor=(1, 0),
fancybox=True, shadow=True, ncol=1, title = 'Age (ky)') # Put a legend below current axis
plt.rcdefaults() # reset global plt parameters, IMPORTANT!
plt.show()
My plot, where 'age' is the 'Z' value
This question already has an answer here:
Plotting stochastic processes in Python
(1 answer)
Closed 2 years ago.
Basically, I want to plot a scatter plot between two variables with varying percentile, I've plotted the scatter plot with the following toy code but I'm unable to plot it for different percentile (quantile).
quantiles = [1,10,25,50,50,75,90,99]
grays = ["#DCDCDC", "#A9A9A9", "#2F4F4F","#A9A9A9", "#DCDCDC"]
alpha = 0.3
data = df[['area_log','mr_ecdf']]
y = data['mr_ecdf']
x = data['area_log']
idx = np.argsort(x)
x = np.array(x)[idx]
y = np.array(y)[idx]
for i in range(len(quantiles)//2):
plt.fill_between(x, y, y, color='black', alpha = alpha, label=f"{quantiles[i]}")
lower_lim = np.percentile(y, quantiles[i])
upper_lim = np.percentile(y, 100-quantiles[i])
data = data[data['mr_ecdf'] >= lower_lim]
data = data[data['mr_ecdf'] <= upper_lim]
y = data['mr_ecdf']
x = data['area_log']
idx = np.argsort(x)
x = np.array(x)[idx]
y = np.array(y)[idx]
data = df[['area_log','mr_ecdf']]
y = data['mr_ecdf']
x = data['area_log']
plt.scatter(x, y,s=1, color = 'r', label = 'data')
plt.legend()
# axes.set_ylim([0,1])
enter image description here
data link : here
I want plot something like this (First- (1,1)):
As was mentioned by #Mr. T, one way to do that is to calculate the CIs yourself and then plot them using plt.fill_between. The data you show pose a problem since there is not enough points and variance so you'll never get what is on your pictures (and the separation in my figure is also not clear so I have put another example below to show how it works). If you have data for that, post it, I will update. Anyway, you should check the post I mentioned in the comment and some way of doing it follows:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
idx = np.argsort(x)
x = np.array(x)[idx]
y = np.array(y)[idx]
# Create a list of quantiles to calculate
quantiles = [0.05, 0.25, 0.75, 0.95]
grays = ["#DCDCDC", "#A9A9A9", "#2F4F4F","#A9A9A9", "#DCDCDC"]
alpha = 0.3
plt.fill_between(x, y-np.percentile(y, 0.5), y+np.percentile(y, 0.5), color=grays[2], alpha = alpha, label="0.50")
# if the percentiles are symmetrical and we want labels on both sides
for i in range(len(quantiles)//2):
plt.fill_between(x, y, y+np.percentile(y, quantiles[i]), color=grays[i], alpha = alpha, label=f"{quantiles[i]}")
plt.fill_between(x, y-np.percentile(y, quantiles[-(i+1)]),y, color=grays[-(i+1)], alpha = alpha, label=f"{quantiles[-(i+1)]}")
plt.scatter(x, y, color = 'r', label = 'data')
plt.legend()
EDIT:
Some explanation. I am not sure what is not correct in my code, but I would be happy if you can tell me -- there is always a way for improvement (Thanks to #Mr T. again for the catch). Nevertheless, the fill between function does the following:
Fill the area between two horizontal curves.
The curves are defined by the points (x, y1) and (x, y2)
So you specify by the y1 and y2 where you want to have the graph filled with a colour. Let me bring another example:
X = np.linspace(120, 50, 71)
Y = X + 20*np.random.randn(71)
plt.fill_between(X, Y-np.percentile(Y, 95),Y+np.percentile(Y, 95), color="k", alpha = alpha)
plt.fill_between(X, Y-np.percentile(Y, 80),Y+np.percentile(Y, 80), color="r", alpha = alpha)
plt.fill_between(X, Y-np.percentile(Y, 60),Y, color="b", alpha = alpha)
plt.scatter(X, Y, color = 'r', label = 'data')
I generated some random data to see what is happening. The line plt.fill_between(X, Y-np.percentile(Y, 60),Y, color="b", alpha = alpha) is plotting the fill only from the 60th percentile below Y up to Y. The other two lines are covering the space always from both sides of Y (hence the +-). You can see that the percentiles overlap, of course they do, they must -- a 90 percentile includes the 60 as well. So you see only the differences between them. You could plot the data in the opposite order (or change z-factor) but then all would be covered by the highest percentile. I hope this clarifies the answer. Also, your question is perfectly fine, sorry if my answer feels not neutral. Just if you had also the data for the graphs and not only the picture, my/others answer could be more tailored :).
I am trying to perform a scatter plot within a boxplot as subplot. When I do for just one boxsplot, it works. I can define a specific point with specific color inside of the boxsplot. The green ball (Image 1) is representing an specific number in comparision with boxplot values.
for columnName in data_num.columns:
plt.figure(figsize=(2, 2), dpi=100)
bp = data_num.boxplot(column=columnName, grid=False)
y = S[columnName]
x = columnName
if y > data_num[columnName].describe().iloc[5]:
plt.plot(1, y, 'r.', alpha=0.7,color='green',markersize=12)
count_G = count_G + 1
elif y < data_num[columnName].describe().iloc[5]:
plt.plot(1, y, 'r.', alpha=0.7,color='red',markersize=12)
count_L = count_L + 1
else:
plt.plot(1, y, 'r.', alpha=0.7,color='yellow',markersize=12)
count_E = count_E + 1
Image 1 - Scatter + 1 boxplot
I can create a subplot with boxplots.
fig, axes = plt.subplots(6,10,figsize=(16,16)) # create figure and axes
fig.subplots_adjust(hspace=0.6, wspace=1)
for j,columnName in enumerate(list(data_num.columns.values)[:-1]):
bp = data_num.boxplot(columnName,ax=axes.flatten()[j])
Image 2 - Subplots + Boxplots
But when I try to plot a specific number inside of each boxplot, actually it subscribes the entire plot.
plt.subplot(6,10,j+1)
if y > data_num[columnName].describe().iloc[5]:
plt.plot(1, y, 'r.', alpha=0.7,color='green',markersize=12)
count_G = count_G + 1
elif y < data_num[columnName].describe().iloc[5]:
plt.plot(1, y, 'r.', alpha=0.7,color='red',markersize=12)
count_L = count_L + 1
else:
plt.plot(1, y, 'r.', alpha=0.7,color='black',markersize=12)
count_E = count_E + 1
Image 3 - Subplots + scatter
It is not completely clear what is going wrong. Probably the call to plt.subplot(6,10,j+1) is erasing some stuff. However, such a call is not necessary with the standard modern use of matplotlib, where the subplots are created via fig, axes = plt.subplots(). Be careful to use ax.plot() instead of plt.plot(). plt.plot() plots on the "current" ax, which can be a bit confusing when there are lots of subplots.
The sample code below first creates some toy data (hopefully similar to the data in the question). Then the boxplots and the individual dots are drawn in a loop. To avoid repetition, the counts and the colors are stored in dictionaries. As data_num[columnName].describe().iloc[5] seems to be the median, for readability the code directly calculates that median.
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
column_names = list('abcdef')
S = {c: np.random.randint(2, 6) for c in column_names}
data_num = pd.DataFrame({c: np.random.randint(np.random.randint(0, 3), np.random.randint(4, 8), 20)
for c in column_names})
colors = {'G': 'limegreen', 'E': 'gold', 'L': 'crimson'}
counts = {c: 0 for c in colors}
fig, axes = plt.subplots(1, 6, figsize=(12, 3), gridspec_kw={'hspace': 0.6, 'wspace': 1})
for columnName, ax in zip(data_num.columns, axes.flatten()):
data_num.boxplot(column=columnName, grid=False, ax=ax)
y = S[columnName] # in case S would be a dataframe with one row: y = S[columnName].values[0]
data_median = data_num[columnName].median()
classification = 'G' if y > data_median else 'L' if y < data_median else 'E'
ax.plot(1, y, '.', alpha=0.9, color=colors[classification], markersize=12)
counts[classification] += 1
print(counts)
plt.show()
I have a 2 dimensional time series plotted as FacetGrid via xarray.
p = gmt.plot.line(x='time', add_legend=False, alpha = 0.1, color = ('k'), ylim = (-1, 1.2), col='MCrun', col_wrap = 5)
I want to add another lineplot with the same axes and dimensions on top. For individual members that's simply:
gmt.isel(MCrun=0).plot.line(x='time', add_legend=False, alpha = 0.1, color = 'k', ylim = (-3, 1.2))
gmt_esmean.isel(MCrun=0).plot.line(x='time', add_legend=False, color = 'red')
But using the same with two facet grids results in 20 plots - 10 with the individual lines and 10 with the mean. The closest I've come is
def smean_plot(*args, **kwargs):
gmt_esmean.plot.line(x='time', add_legend=False, color = 'red')
p = gmt.plot.line(x='time', add_legend=False, alpha = 0.1, color = ('k'), ylim = (-1, 1.2), col='MCrun', col_wrap = 5)
p.map(smean_plot)
Which plot all means in all plots and adds unwanted axes titles.
Any ideas how to only add the mean to the corresponding ensemble are greatly appreciated.
Ok one approach I was happy with is to plot the figures one-by-one via subplot in a loop. Set x and y axes as shared and reduce figure margin. It's not as elegant as I would've hoped but works just fine.
fig, axs = plt.subplots(ncols=5, nrows=2, figsize=(18,6), sharex=True, sharey=True, gridspec_kw={'hspace': 0.2, 'wspace': 0.1})
axs = axs.ravel()
for i in range(10):
gmt.isel(MCrun=i).plot.line(ax = axs[i], x='time', add_legend=False, alpha = 0.1, color = ('k'), ylim = (-1.2, 0.8))
gmt_esmean.isel(MCrun=i).plot.line(ax = axs[i], x='time', add_legend=False, color = 'red')+ 1
plt.draw()
I just created a horizontal stacked bar chart using matplotlib, and I can't figure out why there is extra space between the x axis and the first bar (code and picture below). Any suggestions or questions? Thanks!
Code:
fig = figure(facecolor="white")
ax1 = fig.add_subplot(111, axisbg="white")
heights = .43
data = np.array([source['loan1'],source['loan2'],source['loan3']])
dat2 = np.array(source2)
ind=np.arange(N)
left = np.vstack((np.zeros((data.shape[1],), dtype=data.dtype), np.cumsum(data, axis=0) [:-1]))
colors = ( '#27A545', '#7D3CBD', '#C72121')
for dat, col, lefts, pname2 in zip(data, colors, left, pname):
ax1.barh(ind+(heights/2), dat, color=col, left=lefts, height = heights, align='center', alpha = .5)
p4 = ax1.barh(ind-(heights/2), dat2, height=heights, color = "#C6C6C6", align='center', alpha = .7)
ax1.spines['right'].set_visible(False)
ax1.yaxis.set_ticks_position('left')
ax1.spines['top'].set_visible(False)
ax1.xaxis.set_ticks_position('bottom')
yticks([z for z in range(N)], namelist)
#mostly for the legend
params = {'legend.fontsize': 8}
rcParams.update(params)
box = ax1.get_position()
ax1.set_position([box.x0, box.y0 + box.height * 0.1, box.width, box.height * 0.9])
l = ax1.legend(loc = 'upper center', bbox_to_anchor=(0.5,-0.05), fancybox=True, shadow = True, ncol = 4)
show()
This is because matplotlib tries to intelligently choose minimum and maximum limits for the plot (i.e. "round-ish" numbers) by default.
This makes a lot of sense for some plots, but not for others.
To disable it, just do ax.axis('tight') to snap the data limits to the strict extents of the data.
If you want a bit of padding despite the "tight" bounds on the axes limits, use ax.margins.
In your case, you'd probably want something like:
# 5% padding on the y-axis and none on the x-axis
ax.margins(0, 0.05)
# Snap to data limits (with padding specified above)
ax.axis('tight')
Also, if you want to set the extents manually, you can just do
ax.axis([xmin, xmax, ymin, ymax])`
or use set_xlim, set_ylim, or even
ax.set(xlim=[xmin, xmax], ylim=[ymin, ymax], title='blah', xlabel='etc')