How to change the catplot bar positions

How to change the catplot bar positions - python

I'm having trouble with the location of the bars on the scale. I understand it to be that some of the hue amounts are 0, so this is throwing off the position of the bars. In the image, the top right plot shows the green and brown bars for 'labour' with a gap between, presumably because that color is 0. Is there a way to put the bars together, and in line with their correspondence on the y-axis?
grid = sns.catplot(x='Type', y='count',
row='Age', col='Gender',
hue='Nationality',
data=dfNorthumbria2, kind='bar', ci=None,legend=True)
grid.set(ylim=(0,5), yticks=[0,1,2,3,4,5])
grid.set(xlabel="Type of Exploitation",ylabel="Total Referrals")
for ax in grid.axes.flatten():
ax.tick_params(labelbottom=True, rotation=90)
ax.tick_params(labelleft=True)
grid.fig.tight_layout()
leg = grid._legend
leg.set_bbox_to_anchor([1.1,0.5])

You can pass a hue_order argument to sns.barplot() via sns.catplot, e.g.
grid = sns.catplot(..., hue_order=['British', 'Romanian', 'Vietnamese',
'Albanian', 'Pakistani', 'Slovak'])
This should close the gap between the green and brown bars, and they will be centered at the tick mark, as they are now in the middle of the list. However, groups of other bars will still not be centered around their tick mark.
This may be an unavoidable consequence of how this plotting function works, it's not designed for such sparse data. So if you want all the different groups of bars to be centered at their respective tick marks, you may have to use a more flexible matplotlib plotting function and create the color subsets manually.

Related

How to center the histogram bars around tick marks using seaborn displot? Stacking bars is essential

I have searched many ways of making histograms centered around tick marks but not able to find a solution that works with seaborn displot. The function displot lets me stack the histogram according to a column in the dataframe and thus would prefer a solution using displot or something that allows stacking based on a column in a data frame with color-coding as with palette.
Even after setting the tick values, I am not able to get the bars to center around the tick marks.
Example code
# Center the histogram on the tick marks
tips = sns.load_dataset('tips')
sns.displot(x="total_bill",
hue="day", multiple = 'stack', data=tips)
plt.xticks(np.arange(0, 50, 5))
I would also like to plot a histogram of a variable that takes a single value and choose the bin width of the resulting histogram in such a way that it is centered around the value. (0.5 in this example.)
I can get the center point by choosing the number of bins equal to a number of tick marks but the resulting bar is very thin. How can I increase the bin size in this case, where there is only one bar but want to display all the other possible points. By displaying all the tick marks, the bar width is very tiny.
I want the same centering of the bar at the 0.5 tick mark but make it wider as it is the only value for which counts are displayed.
Any solutions?
tips['single'] = 0.5
sns.displot(x='single',
hue="day", multiple = 'stack', data=tips, bins = 10)
plt.xticks(np.arange(0, 1, 0.1))
Edit:
Would it be possible to have more control over the tick marks in the second case? I would not want to display the round off to 1 decimal place but chose which of the tick marks to display. Is it possible to display just one value in the tick mark and have it centered around that?
Does the min_val and max_val in this case refer to value of the variable which will be 0 in this case and then the x axis would be plotted on negative values even when there are none and dont want to display them.

For your first problem, you may want to figure out a few properties of the data that your plotting. For example the range of the data. Additionally, you may want to choose beforehand the number of bins that you want displayed.
tips = sns.load_dataset('tips')
min_val = tips.total_bill.min()
max_val = tips.total_bill.max()
val_width = max_val - min_val
n_bins = 10
bin_width = val_width/n_bins
sns.histplot(x="total_bill",
hue="day", multiple = 'stack', data=tips,
bins=n_bins, binrange=(min_val, max_val),
palette='Paired')
plt.xlim(0, 55) # Define x-axis limits
Another thing to remember is that width a of a bar in a histogram identifies the bounds of its range. So a bar spanning [2,5] on the x-axis implies that the values represented by that bar belong to that range.
Considering this, it is easy to formulate a solution. Assume that we want the original bar graphs - identifying the bounds of each bar graph, one solution may look like
plt.xticks(np.arange(min_val-bin_width, max_val+bin_width, bin_width))
Now, if we offset the ticks by half a bin-width, we will get to the centers of the bars.
plt.xticks(np.arange(min_val-bin_width/2, max_val+bin_width/2, bin_width))
For your single value plot, the idea remains the same. Control the bin_width and the x-axis range and ticks. Bin-width has to be controlled explicitly since automatic inference of bin-width will probably be 1 unit wide which on the plot will have no thickness. Histogram bars always indicate a range - even though when we have just one single value. This is illustrated in the following example and figure.
single_val = 23.5
tips['single'] = single_val
bin_width = 4
fig, axs = plt.subplots(1, 2, sharey=True, figsize=(12,4)) # Get 2 subplots
# Case 1 - With the single value as x-tick label on subplot 0
sns.histplot(x='single',
hue="day", multiple = 'stack', data=tips,
binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
palette='rocket',
ax=axs[0])
ticks = [single_val, single_val+bin_width] # 2 ticks - given value and given_value + width
axs[0].set(
title='Given value as tick-label starts the bin on x-axis',
xticks=ticks,
xlim=(0, int(single_val*2)+bin_width)) # x-range such that bar is at middle of x-axis
axs[0].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))
# Case 2 - With centering on the bin starting at single-value on subplot 1
sns.histplot(x='single',
hue="day", multiple = 'stack', data=tips,
binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
palette='rocket',
ax=axs[1])
ticks = [single_val+bin_width/2] # Just the bin center
axs[1].set(
title='Bin centre is offset from single_value by bin_width/2',
xticks=ticks,
xlim=(0, int(single_val*2)+bin_width) ) # x-range such that bar is at middle of x-axis
axs[1].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))
Output:
I feel from your description that what you are really implying by a bar graph is a categorical bar graph. The centering is then automatic. Because the bar is not a range anymore but a discrete category. For the numeric and continuous nature of the variable in the example data, I would not recommend such an approach. Pandas provides for plotting categorical bar plots. See here. For our example, one way to do this is as follows:
n_colors = len(tips['day'].unique()) # Get number of uniques categories
agg_df = tips[['single', 'day']].groupby(['day']).agg(
val_count=('single', 'count'),
val=('single','max')
).reset_index() # Get aggregated information along the categories
agg_df.pivot(columns='day', values='val_count', index='val').plot.bar(
stacked=True,
color=sns.color_palette("Paired", n_colors), # Choose "number of days" colors from palette
width=0.05 # Set bar width
)
plt.show()
This yields:

Cannot position plotly legend outside plot borders

I have a plotly graph that uses the Dash library to manipulate the x-values on the plot for simple comparison. When x values, in this case countries, is greater than 1, the legend is properly positioned outside of the graph. However, when there is only one country on the plot, the legend covers half of the plot.
I have tried: setting the legend x attribute to 3, x anchor to right, and changing the graph margin to add padding. I haven't found a solution that works.
Below is the section of the code that updates the plotly fig.
window_width = 1500
fig = px.bar(data_frame=dfi5.loc[(value_country)], width=(window_width / 4) * len(value_country))
fig.update_xaxes(showticklabels=True, title='')
fig.update_yaxes(showticklabels=True, title='Percent of Respondents')
fig.update_layout(autosize=False, title_text='Favorite K-Drama by Country', title_x=.46, title_font_size=20,
legend_title_text='Drama Title', margin_r=1, legend_xanchor='right', legend_x=3, legend_itemsizing='constant')
Update: I was able to find a workaround by reducing the legend font size, which reduces the area of the legend. I would be interested to learn if there is a way to explicitly set the location of the legend. The issue seems to occur when the legend area is wider than the plot area.

Python - Legend overlaps with the pie chart

Using matplotlib in python. The legend overlaps with my pie chart. Tried various options for "loc" such as "best" ,1,2,3... but to no avail. Any Suggestions as to how to either exactly mention the legend position (such as giving padding from the pie chart boundaries) or at least make sure that it does not overlap?

The short answer is: You may use plt.legend's arguments loc, bbox_to_anchor and additionally bbox_transform and mode, to position the legend in an axes or figure.
The long version:
Step 1: Making sure a legend is needed.
In many cases no legend is needed at all and the information can be inferred by the context or the color directly:
If indeed the plot cannot live without a legend, proceed to step 2.
Step 2: Making sure, a pie chart is needed.
In many cases pie charts are not the best way to convey information.
If the need for a pie chart is unambiguously determined, let's proceed to place the legend.
Placing the legend
plt.legend() has two main arguments to determine the position of the legend. The most important and in itself sufficient is the loc argument.
E.g. plt.legend(loc="upper left") placed the legend such that it sits in the upper left corner of its bounding box. If no further argument is specified, this bounding box will be the entire axes.
However, we may specify our own bounding box using the bbox_to_anchor argument. If bbox_to_anchor is given a 2-tuple e.g. bbox_to_anchor=(1,1) it means that the bounding box is located at the upper right corner of the axes and has no extent. It then acts as a point relative to which the legend will be placed according to the loc argument. It will then expand out of the zero-size bounding box. E.g. if loc is "upper left", the upper left corner of the legend is at position (1,1) and the legend will expand to the right and downwards.
This concept is used for the above plot, which tells us the shocking truth about the bias in Miss Universe elections.
import matplotlib.pyplot as plt
import matplotlib.patches
total = [100]
labels = ["Earth", "Mercury", "Venus", "Mars", "Jupiter", "Saturn",
"Uranus", "Neptune", "Pluto *"]
plt.title('Origin of Miss Universe since 1952')
plt.gca().axis("equal")
pie = plt.pie(total, startangle=90, colors=[plt.cm.Set3(0)],
wedgeprops = { 'linewidth': 2, "edgecolor" :"k" })
handles = []
for i, l in enumerate(labels):
handles.append(matplotlib.patches.Patch(color=plt.cm.Set3((i)/8.), label=l))
plt.legend(handles,labels, bbox_to_anchor=(0.85,1.025), loc="upper left")
plt.gcf().text(0.93,0.04,"* out of competition since 2006", ha="right")
plt.subplots_adjust(left=0.1, bottom=0.1, right=0.75)
In order for the legend not to exceed the figure, we use plt.subplots_adjust to obtain more space between the figure edge and the axis, which can then be taken up by the legend.
There is also the option to use a 4-tuple to bbox_to_anchor. How to use or interprete this is detailed in this question: What does a 4-element tuple argument for 'bbox_to_anchor' mean in matplotlib?
and one may then use the mode="expand" argument to make the legend fit into the specified bounding box.
There are some useful alternatives to this approach:
Using figure coordinates
Instead of specifying the legend position in axes coordinates, one may use figure coordinates. The advantage is that this will allow to simply place the legend in one corner of the figure without adjusting much of the rest. To this end, one would use the bbox_transform argument and supply the figure transformation to it. The coordinates given to bbox_to_anchor are then interpreted as figure coordinates.
plt.legend(pie[0],labels, bbox_to_anchor=(1,0), loc="lower right",
bbox_transform=plt.gcf().transFigure)
Here (1,0) is the lower right corner of the figure. Because of the default spacings between axes and figure edge, this suffices to place the legend such that it does not overlap with the pie.
In other cases, one might still need to adapt those spacings such that no overlap is seen, e.g.
title = plt.title('What slows down my computer')
title.set_ha("left")
plt.gca().axis("equal")
pie = plt.pie(total, startangle=0)
labels=["Trojans", "Viruses", "Too many open tabs", "The anti-virus software"]
plt.legend(pie[0],labels, bbox_to_anchor=(1,0.5), loc="center right", fontsize=10,
bbox_transform=plt.gcf().transFigure)
plt.subplots_adjust(left=0.0, bottom=0.1, right=0.45)
Saving the file with bbox_inches="tight"
Now there may be cases where we are more interested in the saved figure than at what is shown on the screen. We may then simply position the legend at the edge of the figure, like so
but then save it using the bbox_inches="tight" to savefig,
plt.savefig("output.png", bbox_inches="tight")
This will create a larger figure, which sits tight around the contents of the canvas:
A sophisticated approach, which allows to place the legend tightly inside the figure, without changing the figure size is presented here:
Creating figure with exact size and no padding (and legend outside the axes)
Using Subplots
An alternative is to use subplots to reserve space for the legend. In this case one subplot could take the pie chart, another subplot would contain the legend. This is shown below.
fig = plt.figure(4, figsize=(3,3))
ax = fig.add_subplot(211)
total = [4,3,2,81]
labels = ["tough working conditions", "high risk of accident",
"harsh weather", "it's not allowed to watch DVDs"]
ax.set_title('What people know about oil rigs')
ax.axis("equal")
pie = ax.pie(total, startangle=0)
ax2 = fig.add_subplot(212)
ax2.axis("off")
ax2.legend(pie[0],labels, loc="center")

Matplotlib ticks and tick labels position anchored separately from axis

Is there a way to anchor the ticks and tick labels of the x-axis so that they cross the y-axis at a different location than where the actual x-axis crosses the y-axis? This can basically be accomplished with:
ax = plt.gca()
ax.get_xaxis().set_tick_params(pad=5)
or
ax.xaxis.set_tick_params(pad=500)
For example:
Except that I am working with audio file inputs and the y-axis is variable (based on the highest/lowest amplitude of the waveform). Therefore, the maximum and minimum y-axis values change depending on the audio file. I am concerned that pad=NUM will be moving around relative to the y-axis.
Therefore, I am looking for a way to accomplish what pad does, but have the ticks and tick labels be anchored at the minimum y-axis value.
As a bonus, flipping this around so that the y-axis is anchored somewhere differently than the y-axis tick labels would surely benefit someone also.
In my particular case, I have the x-axis crossing the y-axis at y=0. The x-axis ticks and tick labels will sometimes be at -1.0, sometimes at -0.5, sometimes at -0.25, etc. I always know what the minimum value of the y-axis is, and therefore want it to be the anchor point for x-axis ticks and tick labels. (In fact, I am happy to do it with only the x-axis tick labels, if it is possible to treat ticks and tick labels separately). An example of this is shown in this image above (which I accomplished with pad=500).
I looked around other threads and in the documentation, but I'm either missing it or don't know the correct terms to find it.

UPDATE: I added gridlines and was getting very unexpected behavior (e.g. linestyle and linewidth didn't work as expected) due to the top x-axis being shifted. I realized yet a better way - keep the axes (turn off the splines) and simply plot a second line at (0, 0) to (max_time, 0).
ax.plot([0,times[-1]], [0,0], color='k') # Creates a 'false' x-axis at y=0
ax.spines['top'].set_color('none') # Position unchanged
ax.spines['bottom'].set_color('none') # Position unchanged
Figured it out! I was thinking about this the wrong way...
Problem: Moving the bottom x-axis to the center and padding the tick labels
Solution: Keep the bottom x-axis where it is (turn off the bottom spine) and move the top x-axis to the center (keep top spine, but turn off ticks and tick labels).
ax.spines['top'].set_position('center')
ax.spines['bottom'].set_color('none') # Position unchanged
ax.xaxis.set_tick_params(top='off')

plt.setp() as in https://matplotlib.org/stable/gallery/images_contours_and_fields/image_annotated_heatmap.html#sphx-glr-gallery-images-contours-and-fields-image-annotated-heatmap-py solved the problem for me.

align grid lines on two plots

I have 2 subplots in matplotlib in Python. They are stacked on top of each other.
I want to have gridlines on each plot, which I have done successfully. But each plot has a different x axis and, therefore, the vertical grid lines of the top plot are not aligned with those of the bottom plot.
I would like the grid lines of the top plot to be in the same position on the x axis as they are on the bottom plot i.e. the vertical grid lines in both plots should be aligned.
I imaging that I can tell my grid lines exactly where to be, and so I could achieve my goal by adjusting the lines until they match as well as possible.
I just hoped that there might be some easier way that would just allow me to align the gridlines on both plots.
Edit:
I don't think the shared axis stuff is quite what I want.
My top and bottom plot have very different scales, so when I share the axes, it shifts the scaling too. For example, say my top plot has data that runs from 0-100 on the x axis and on the bottom plot the data runs from 0-50. When I share the axis, the top plot only shows data from 0-50, which I don't want it to.
I want it to show from 0-100 as it did before, but just want it to share the axis and gridlines from the other plot.

You could use LinearLocator:
from matplotlib.ticker import LinearLocator
Then on each of your x-axis or only on one of them call:
N = 6 # Set number of gridlines you want to have in each graph
ax1.xaxis.set_major_locator(LinearLocator(N))
ax2.xaxis.set_major_locator(LinearLocator(N))
Or get the number of ticks from your source axis and set it on target axis:
N = source_ax.xaxis.get_major_ticks()
target_ax.xaxis.set_major_locator(LinearLocator(N))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.