I've been trying to adjust the tick settings for a heat map through several different methods with no success. The only method that actually changes the settings of the plot is plt.xticks(np.arange(217, 8850, 85)) but even when using several different intervals for this method the data is skewed greatly to the right.
When the tick labels aren't clumped together (for example using plt.xticks(np.arange(217, 8850, 500))) the last tick mark on the end of the axis is no where near the 8850 max I need to show all the data.
I'm trying to adjust these tick settings on both the x and y in order to view the full range of data (Xmax: 8848 Xmin: 7200, Ymax: 8848 Ymin:217) with intervals that allow the tick labels to be readable.
Images of Heatmap:
First image is with plt.xticks(np.arange(217, 8850, 500)):
Second image is with plt.xticks(np.arange(217, 8850, 85)):
Third is original Heatmap:
color = 'seismic'
success_rate = (m['Ascents'] / ((m['Ascents']) + (m['Failed_Attempts'])))*100
success_rate.fillna(0).astype(float)
mm['success_rate'] = success_rate
mm['success_rate'].round(2)
vm = mm.pivot("Height(m)", "Prominence(m)", "success_rate")
cPreference = sns.heatmap(vm, vmax = 100, cmap = color, cbar_kws= {'label': 'Success Rate of Climbs (%)'})
cPreference = cPreference.invert_yaxis()
"""Methods I've Tried"""
plt.xticks(np.arange(217, 8850, 1000)) """<< Only line that actually makes visible changes but data is skewed greatly"""
cPreference.xaxis.set_ticks(np.arange(mm["Height(m)"].min(), mm["Height(m)"].max(), (mm["Height(m)"].max() - \
mm["Height(m)"].min()) / 10))
cPreference.yaxis.set_ticks(np.arange(mm["Prominence(m)"].min(), mm["Prominence(m)"].max(), (mm["Prominence(m)"].max() \
- mm["Prominence(m)"].min()) / 10))
sns.set_style("ticks", {"xtick.major.size" : 8, "ytick.major.size" : 8})
plt.title("What is a good Mountain to Climb?")
sns.plt.show()
cPreference = sns.heatmap(vm, vmax = 100, cmap = color, >>> xticklabels = 10, yticklabels = 5 <<<, cbar_kws={'label': 'Success Rate of Climbs (%)'})
By setting xticklabels or yticklabels equal to an integer it will still plot the same column but it will only display every nth item in that column.
you can specify the behavior of the tick positioning manually by setting a custom tick locator, for example
from matplotlib import ticker
tick_locator = ticker.MaxNLocator(10)
ax.xaxis.set_major_locator(tick_locator)
here is the link to the documentation for the many options
Related
I have searched many ways of making histograms centered around tick marks but not able to find a solution that works with seaborn displot. The function displot lets me stack the histogram according to a column in the dataframe and thus would prefer a solution using displot or something that allows stacking based on a column in a data frame with color-coding as with palette.
Even after setting the tick values, I am not able to get the bars to center around the tick marks.
Example code
# Center the histogram on the tick marks
tips = sns.load_dataset('tips')
sns.displot(x="total_bill",
hue="day", multiple = 'stack', data=tips)
plt.xticks(np.arange(0, 50, 5))
I would also like to plot a histogram of a variable that takes a single value and choose the bin width of the resulting histogram in such a way that it is centered around the value. (0.5 in this example.)
I can get the center point by choosing the number of bins equal to a number of tick marks but the resulting bar is very thin. How can I increase the bin size in this case, where there is only one bar but want to display all the other possible points. By displaying all the tick marks, the bar width is very tiny.
I want the same centering of the bar at the 0.5 tick mark but make it wider as it is the only value for which counts are displayed.
Any solutions?
tips['single'] = 0.5
sns.displot(x='single',
hue="day", multiple = 'stack', data=tips, bins = 10)
plt.xticks(np.arange(0, 1, 0.1))
Edit:
Would it be possible to have more control over the tick marks in the second case? I would not want to display the round off to 1 decimal place but chose which of the tick marks to display. Is it possible to display just one value in the tick mark and have it centered around that?
Does the min_val and max_val in this case refer to value of the variable which will be 0 in this case and then the x axis would be plotted on negative values even when there are none and dont want to display them.
For your first problem, you may want to figure out a few properties of the data that your plotting. For example the range of the data. Additionally, you may want to choose beforehand the number of bins that you want displayed.
tips = sns.load_dataset('tips')
min_val = tips.total_bill.min()
max_val = tips.total_bill.max()
val_width = max_val - min_val
n_bins = 10
bin_width = val_width/n_bins
sns.histplot(x="total_bill",
hue="day", multiple = 'stack', data=tips,
bins=n_bins, binrange=(min_val, max_val),
palette='Paired')
plt.xlim(0, 55) # Define x-axis limits
Another thing to remember is that width a of a bar in a histogram identifies the bounds of its range. So a bar spanning [2,5] on the x-axis implies that the values represented by that bar belong to that range.
Considering this, it is easy to formulate a solution. Assume that we want the original bar graphs - identifying the bounds of each bar graph, one solution may look like
plt.xticks(np.arange(min_val-bin_width, max_val+bin_width, bin_width))
Now, if we offset the ticks by half a bin-width, we will get to the centers of the bars.
plt.xticks(np.arange(min_val-bin_width/2, max_val+bin_width/2, bin_width))
For your single value plot, the idea remains the same. Control the bin_width and the x-axis range and ticks. Bin-width has to be controlled explicitly since automatic inference of bin-width will probably be 1 unit wide which on the plot will have no thickness. Histogram bars always indicate a range - even though when we have just one single value. This is illustrated in the following example and figure.
single_val = 23.5
tips['single'] = single_val
bin_width = 4
fig, axs = plt.subplots(1, 2, sharey=True, figsize=(12,4)) # Get 2 subplots
# Case 1 - With the single value as x-tick label on subplot 0
sns.histplot(x='single',
hue="day", multiple = 'stack', data=tips,
binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
palette='rocket',
ax=axs[0])
ticks = [single_val, single_val+bin_width] # 2 ticks - given value and given_value + width
axs[0].set(
title='Given value as tick-label starts the bin on x-axis',
xticks=ticks,
xlim=(0, int(single_val*2)+bin_width)) # x-range such that bar is at middle of x-axis
axs[0].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))
# Case 2 - With centering on the bin starting at single-value on subplot 1
sns.histplot(x='single',
hue="day", multiple = 'stack', data=tips,
binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
palette='rocket',
ax=axs[1])
ticks = [single_val+bin_width/2] # Just the bin center
axs[1].set(
title='Bin centre is offset from single_value by bin_width/2',
xticks=ticks,
xlim=(0, int(single_val*2)+bin_width) ) # x-range such that bar is at middle of x-axis
axs[1].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))
Output:
I feel from your description that what you are really implying by a bar graph is a categorical bar graph. The centering is then automatic. Because the bar is not a range anymore but a discrete category. For the numeric and continuous nature of the variable in the example data, I would not recommend such an approach. Pandas provides for plotting categorical bar plots. See here. For our example, one way to do this is as follows:
n_colors = len(tips['day'].unique()) # Get number of uniques categories
agg_df = tips[['single', 'day']].groupby(['day']).agg(
val_count=('single', 'count'),
val=('single','max')
).reset_index() # Get aggregated information along the categories
agg_df.pivot(columns='day', values='val_count', index='val').plot.bar(
stacked=True,
color=sns.color_palette("Paired", n_colors), # Choose "number of days" colors from palette
width=0.05 # Set bar width
)
plt.show()
This yields:
I have a series of plots with categorical data on the y-axis. It seems that the additional margin between the axis and the data is correlated with the number of categories on the y-axis. If there are many categories, an additional margin appears, but if there are few, the margin is so small that the data points are being cut. The plots look like this:
The plot with few categories and too small margin:
The plot with many categories and too big margins (click for full size):
For now, I only found solutions to manipulate the white space around the plot, like bbox_inches='tight' or fig.tight_layout(), but this doesn't solve my problem.
I don't have such problems with the x-axis, can this be a question of x-axis containing only numerical values and y-axis categorical data?
The code I'm using to generate all the plots looks like this:
sns.set(style='whitegrid')
plt.xlim(left=left_lim, right=right_lim)
plt.xticks(np.arange(left_lim, right_lim, step))
plot = sns.scatterplot(method.loc[:,'Len'],
method.loc[:,'Bond'],
hue = method.loc[:,'temp'],
palette= palette,
legend = False,
s = 50)
set_size(width, height)
plt.savefig("method.png", dpi = 100, bbox_inches='tight', pad_inches=0)
plt.show()
The set_size() comes from the first answer to Axes class - set explicitly size (width/height) of axes in given units.
We can slightly adapt the function from Axes class - set explicitly size (width/height) of axes in given units
to add a line setting the axes margins.
import matplotlib.pyplot as plt
def set_size(w,h, ax=None, marg=(0.1, 0.1)):
""" w, h: width, height in inches """
if not ax: ax=plt.gca()
l = ax.figure.subplotpars.left
r = ax.figure.subplotpars.right
t = ax.figure.subplotpars.top
b = ax.figure.subplotpars.bottom
figw = float(w)/(r-l)
figh = float(h)/(t-b)
ax.figure.set_size_inches(figw, figh)
ax.margins(x=marg[0]/w, y=marg[1]/h)
And call it with
set_size(width, height, marg=(xmargin, ymargin))
where xmargin, ymargin are the margins in inches.
I want to to create a figure using matplotlib where I can explicitly specify the size of the axes, i.e. I want to set the width and height of the axes bbox.
I have looked around all over and I cannot find a solution for this. What I typically find is how to adjust the size of the complete Figure (including ticks and labels), for example using fig, ax = plt.subplots(figsize=(w, h))
This is very important for me as I want to have a 1:1 scale of the axes, i.e. 1 unit in paper is equal to 1 unit in reality. For example, if xrange is 0 to 10 with major tick = 1 and x axis is 10cm, then 1 major tick = 1cm. I will save this figure as pdf to import it to a latex document.
This question brought up a similar topic but the answer does not solve my problem (using plt.gca().set_aspect('equal', adjustable='box') code)
From this other question I see that it is possible to get the axes size, but not how to modify them explicitly.
Any ideas how I can set the axes box size and not just the figure size. The figure size should adapt to the axes size.
Thanks!
For those familiar with pgfplots in latex, it will like to have something similar to the scale only axis option (see here for example).
The axes size is determined by the figure size and the figure spacings, which can be set using figure.subplots_adjust(). In reverse this means that you can set the axes size by setting the figure size taking into acount the figure spacings:
import matplotlib.pyplot as plt
def set_size(w,h, ax=None):
""" w, h: width, height in inches """
if not ax: ax=plt.gca()
l = ax.figure.subplotpars.left
r = ax.figure.subplotpars.right
t = ax.figure.subplotpars.top
b = ax.figure.subplotpars.bottom
figw = float(w)/(r-l)
figh = float(h)/(t-b)
ax.figure.set_size_inches(figw, figh)
fig, ax=plt.subplots()
ax.plot([1,3,2])
set_size(5,5)
plt.show()
It appears that Matplotlib has helper classes that allow you to define axes with a fixed size Demo fixed size axes
I have found that ImportanceofBeingErnests answer which modifies that figure size to adjust the axes size provides inconsistent results with the paticular matplotlib settings I use to produce publication ready plots. Slight errors were present in the final figure size, and I was unable to find a way to solve the issue with his approach. For most use cases I think this is not a problem, however the errors were noticeable when combining multiple pdf's for publication.
In lieu of developing a minimum working example to find the real issue I am having with the figure resizing approach I instead found a work around which uses the fixed axes size utilising the divider class.
from mpl_toolkits.axes_grid1 import Divider, Size
def fix_axes_size_incm(axew, axeh):
axew = axew/2.54
axeh = axeh/2.54
#lets use the tight layout function to get a good padding size for our axes labels.
fig = plt.gcf()
ax = plt.gca()
fig.tight_layout()
#obtain the current ratio values for padding and fix size
oldw, oldh = fig.get_size_inches()
l = ax.figure.subplotpars.left
r = ax.figure.subplotpars.right
t = ax.figure.subplotpars.top
b = ax.figure.subplotpars.bottom
#work out what the new ratio values for padding are, and the new fig size.
neww = axew+oldw*(1-r+l)
newh = axeh+oldh*(1-t+b)
newr = r*oldw/neww
newl = l*oldw/neww
newt = t*oldh/newh
newb = b*oldh/newh
#right(top) padding, fixed axes size, left(bottom) pading
hori = [Size.Scaled(newr), Size.Fixed(axew), Size.Scaled(newl)]
vert = [Size.Scaled(newt), Size.Fixed(axeh), Size.Scaled(newb)]
divider = Divider(fig, (0.0, 0.0, 1., 1.), hori, vert, aspect=False)
# the width and height of the rectangle is ignored.
ax.set_axes_locator(divider.new_locator(nx=1, ny=1))
#we need to resize the figure now, as we have may have made our axes bigger than in.
fig.set_size_inches(neww,newh)
Things worth noting:
Once you call set_axes_locator() on an axis instance you break the tight_layout() function.
The original figure size you choose will be irrelevent, and the final figure size is determined by the axes size you choose and the size of the labels/tick labels/outward ticks.
This approach doesn't work with colour scale bars.
This is my first ever stack overflow post.
another method using fig.add_axes was quite accurate. I have included 1 cm grid aswell
import matplotlib.pyplot as plt
import matplotlib as mpl
# This example fits a4 paper with 5mm margin printers
# figure settings
figure_width = 28.7 # cm
figure_height = 20 # cm
left_right_magrin = 1 # cm
top_bottom_margin = 1 # cm
# Don't change
left = left_right_magrin / figure_width # Percentage from height
bottom = top_bottom_margin / figure_height # Percentage from height
width = 1 - left*2
height = 1 - bottom*2
cm2inch = 1/2.54 # inch per cm
# specifying the width and the height of the box in inches
fig = plt.figure(figsize=(figure_width*cm2inch,figure_height*cm2inch))
ax = fig.add_axes((left, bottom, width, height))
# limits settings (important)
plt.xlim(0, figure_width * width)
plt.ylim(0, figure_height * height)
# Ticks settings
ax.xaxis.set_major_locator(mpl.ticker.MultipleLocator(5))
ax.xaxis.set_minor_locator(mpl.ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(mpl.ticker.MultipleLocator(5))
ax.yaxis.set_minor_locator(mpl.ticker.MultipleLocator(1))
# Grid settings
ax.grid(color="gray", which="both", linestyle=':', linewidth=0.5)
# your Plot (consider above limits)
ax.plot([1,2,3,5,6,7,8,9,10,12,13,14,15,17])
# save figure ( printing png file had better resolution, pdf was lighter and better on screen)
plt.show()
fig.savefig('A4_grid_cm.png', dpi=1000)
fig.savefig('tA4_grid_cm.pdf')
result:
It looks like the datapoints in the first graph accidentally overlays the second graph. The code I'm running is being run several times and it when I first have a short period and the second time I run it I have a longer period while the datapoints in the short period is also part of the longer period.
So is there a way to clean the plot before you start building a graph?
You can see the code for building the graph here:
def create_graph(self, device):
# 800 and 355 pixels.
ticks = 5
width = 8
height = 3.55
dpi = 100
bgcolor = '#f3f6f6'
font = {
'size': 16,
'family': 'Arial'
}
plt.rc('font', **font)
# size of figure and setting background color
fig = plt.gcf()
fig.set_size_inches(width, height)
fig.set_facecolor(bgcolor)
# axis color, no ticks and bottom line in grey color.
ax = plt.axes(axisbg=bgcolor, frameon=True)
ax.xaxis.set_ticks_position('none')
ax.spines['bottom'].set_color('#aabcc2')
ax.yaxis.set_ticks_position('none')
# removing all but bottom spines
for key, sp in ax.spines.items():
if key != 'bottom':
sp.set_visible(False)
# setting amounts of ticks on y axis
yloc = plt.MaxNLocator(ticks)
ax.yaxis.set_major_locator(yloc)
x_no_ticks = 8
# Deciding how many ticks we want on the graph
locator = AutoDateLocator(maxticks=x_no_ticks)
formatter = AutoDateFormatter(locator)
# Formatter always chooses the most granular since we have granular dates
# either change format or round dates depending on how granular
# we want them to be for different date ranges.
formatter.scaled[1/(24.*60.)] = '%d/%m %H:%M'
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)
# turns off small ticks
plt.tick_params(axis='x',
which='both',
bottom='on',
top='off',
pad=10)
# Can't seem to set label color differently, changing tick_params color changes labels.
ax.xaxis.label.set_color('#FFFFFF')
# setting dates in x-axis automatically triggers use of AutoDateLocator
x = [datetime.fromtimestamp(point['x']) for point in device['data']]
y = [point['y'] for point in device['data']]
plt.plot(x, y, color='#53b4d4', linewidth=2)
# pick values for y-axis
y_ticks_values = np.array([point['y'] for point in device['data']])
y_ticks = np.linspace(y_ticks_values.min(), y_ticks_values.max(), ticks)
y_ticks = np.round(y_ticks, decimals=2)
plt.yticks(y_ticks, [str(val) + self.extract_unit(device) for val in y_ticks])
# plt.ylim(ymin=0.1) # Only show values of a certain threshold.
plt.tight_layout()
buf = io.BytesIO()
plt.savefig(buf,
format='png',
facecolor=fig.get_facecolor(),
dpi=dpi)
You have to add plt.close() after plt.savefig(). So the figure won't be caught by the next plt.gcf() call.
I currently use the align=’edge’ parameter and positive/negative widths in pyplot.bar() to plot the bar data of one metric to each axis. However, if I try to plot a second set of data to one axis, it covers the first set. Is there a way for pyplot to automatically space this data correctly?
lns3 = ax[1].bar(bucket_df.index,bucket_df.original_revenue,color='c',width=-0.4,align='edge')
lns4 = ax[1].bar(bucket_df.index,bucket_df.revenue_lift,color='m',bottom=bucket_df.original_revenue,width=-0.4,align='edge')
lns5 = ax3.bar(bucket_df.index,bucket_df.perc_first_priced,color='grey',width=0.4,align='edge')
lns6 = ax3.bar(bucket_df.index,bucket_df.perc_revenue_lift,color='y',width=0.4,align='edge')
This is what it looks like when I show the plot:
The data shown in yellow completely covers the data in grey. I'd like it to be shown next to the grey data.
Is there any easy way to do this? Thanks!
The first argument to the bar() plotting method is an array of the x-coordinates for your bars. Since you pass the same x-coordinates they will all overlap. You can get what you want by staggering the bars by doing something like this:
x = np.arange(10) # define your x-coordinates
width = 0.1 # set a width for your plots
offset = 0.15 # define an offset to separate each set of bars
fig, ax = plt.subplots() # define your figure and axes objects
ax.bar(x, y1) # plot the first set of bars
ax.bar(x + offset, y2) # plot the second set of bars
Since you have a few sets of data to plot, it makes more sense to make the code a bit more concise (assume y_vals is a list containing the y-coordinates you'd like to plot, bucket_df.original_revenue, bucket_df.revenue_lift, etc.). Then your plotting code could look like this:
for i, y in enumerate(y_vals):
ax.bar(x + i * offset, y)
If you want to plot more sets of bars you can decrease the width and offset accordingly.