seaborn barplot add xticks for hue - python

Seaborn barplots have an xtick for every unique value along the x coordinate. But for the hue values, there are no ticks:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(columns=["model", "time", "value"])
df["model"] = ["on"]*2 + ["off"]*2
df["time"] = ["short", "long"] * 2
df["value"] = [1, 10, 2, 4]
fig, ax = plt.subplots()
bar = sns.barplot(data=df, x="model", hue="time", y="value", edgecolor="white")
Is it possible to add ticks for the hue, too?
Some of the hue colors are quite similar and I would like to add a text description, too.

You have to be careful about the number of hues that you might have in your dataset, and the number of categories and so forth.
If you have N categories, then they are each plotted at axis coordinates 0,1,...,N-1. Then the various hues are plotted centered around this coordinate. For 2 hues like in your example, the bars are at x±0.2
fig, ax = plt.subplots()
bar = sns.barplot(data=df, x="model", hue="time", y="value", edgecolor="white")
ax.set_xticks([-0.2,0.2, 0.8,1.2])
ax.set_xticklabels(["on/short","on/long",'off/short','off/long'])
Note that I would strongly recommend that you use order= and hue_order= in your call to barplot() to be sure that your labels match the bars.

Related

Heatmap with multi-color y-axis and correspondend colorbar

I want to create a heatmap with seaborn, similar to this (with the following code):
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# Create data
df = pd.DataFrame(np.random.random((5,5)), columns=["a","b","c","d","e"])
# Default heatmap
ax = sns.heatmap(df)
plt.show()
I'd also like to add a new variable (lets say new_var = pd.DataFrame(np.random.random((5,1)), columns=["new variable"])), such as that the values (and possibly the spine and ticks as well) of the y-axis are colored according to the new variable and a second color bar plotted in the same plot to represent the colors of the y-axis values. How can I do that?
This uses the new values to color the y-ticks and the y-tick labels and adds the associated colorbar.
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
import pandas as pd
import numpy as np
# Create data
df = pd.DataFrame(np.random.random((5,5)), columns=["a","b","c","d","e"])
# Default heatmap
ax = sns.heatmap(df)
new_var = pd.DataFrame(np.random.random((5,1)), columns=["new variable"])
# Create the colorbar for y-ticks and labels
norm = plt.Normalize(new_var.min(), new_var.max())
cmap = matplotlib.cm.get_cmap('turbo')
yticks_locations = ax.get_yticks()
yticks_labels = df.index.values
#hide original ticks
ax.tick_params(axis='y', left=False)
ax.set_yticklabels([])
for var, ytick_loc, ytick_label in zip(new_var.values, yticks_locations, yticks_labels):
color = cmap(norm(float(var)))
ax.annotate(ytick_label, xy=(1, ytick_loc), xycoords='data', xytext=(-0.4, ytick_loc),
arrowprops=dict(arrowstyle="-", color=color, lw=1), zorder=0, rotation=90, color=color)
# Add colorbar for y-tick colors
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
cb = ax.figure.colorbar(sm)
# Match the seaborn style
cb.outline.set_visible(False)
I found your problem interesting, and inspired by the unanswered comment above:
How do you change the second colorbar position? For example, one on top the other on bottom sides. - Py-ser
I decided to spend a while doing some tests. After a little digging i find that cbar_kws={"orientation": "horizontal"} is the argument for sns.heatmap that makes the colorbars horizontal.
Borrowing the code from the solution and making some changes, you can format your plot the way you want as in:
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
import pandas as pd
import numpy as np
# Create data
df = pd.DataFrame(np.random.random((5,5)), columns=["a","b","c","d","e"])
# Default heatmap
ax = sns.heatmap(df, cbar_kws={"orientation": "horizontal"}, square = False, annot = True)
new_var = pd.DataFrame(np.random.random((5,1)), columns=["new variable"])
# Create the colorbar for y-ticks and labels
norm = plt.Normalize(new_var.min(), new_var.max())
cmap = matplotlib.cm.get_cmap('turbo')
yticks_locations = ax.get_yticks()
yticks_labels = df.index.values
#hide original ticks
ax.tick_params(axis='y', left=False)
ax.set_yticklabels([])
for var, ytick_loc, ytick_label in zip(new_var.values, yticks_locations, yticks_labels):
color = cmap(norm(float(var)))
ax.annotate(ytick_label, xy=(1, ytick_loc), xycoords='data', xytext=(-0.4, ytick_loc),
arrowprops=dict(arrowstyle="-", color=color, lw=1), zorder=0, rotation=90, color=color)
# Add colorbar for y-tick colors
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
cb = ax.figure.colorbar(sm)
# Match the seaborn style
cb.outline.set_visible(False)
Also, you will notice that I listed the values ​​related to each cell in the heatmap, but just out of curiosity to make it clearer to check that everything was working as expected.
I'm still not very happy with the shape/size of the horizontal colorbar, but I'll keep testing and update any progress by editing this answer!
==========================================
EDIT
just to keep track of the updates, first i tried to change just some parameters of seaborn's heatmap function but wouldn't consider this a major improvement on the task... by adding
ax = sns.heatmap(df, cbar_kws = dict(use_gridspec=True, location="top", shrink =0.6), square = True, annot = True)
I end up with:
I did get to separate the colormap using the matplotlib subplot routine and honestly i believe this is the right way given the parameter control that is possible to get here, by:
# Define two rows for subplots
fig, (cax, ax) = plt.subplots(nrows=2, figsize=(5,5.025), gridspec_kw={"height_ratios":[0.025, 1]})
# Default heatmap
ax = sns.heatmap(df, cbar=False, annot = True)
# colorbar
fig.colorbar(ax.get_children()[0], cax=cax, orientation="horizontal")
plt.show()
I obtained:
Which is still not the prettiest graph I've ever made, but now the position and size of the heatmap can be edited normally within the plt.subplots subroutines that give absolute control over these parameters.

How to remove or hide y-axis ticklabels from a matplotlib / seaborn plot

I made a plot that looks like this
I want to turn off the ticklabels along the y axis. And to do that I am using
plt.tick_params(labelleft=False, left=False)
And now the plot looks like this. Even though the labels are turned off the scale 1e67 still remains.
Turning off the scale 1e67 would make the plot look better. How do I do that?
seaborn is used to draw the plot, but it's just a high-level API for matplotlib.
The functions called to remove the y-axis labels and ticks are matplotlib methods.
After creating the plot, use .set().
.set(yticklabels=[]) should remove tick labels.
This doesn't work if you use .set_title(), but you can use .set(title='')
.set(ylabel=None) should remove the axis label.
.tick_params(left=False) will remove the ticks.
Similarly, for the x-axis: How to remove or hide x-axis labels from a seaborn / matplotlib plot?
Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2, seaborn 0.12.1
Example 1
import seaborn as sns
import matplotlib.pyplot as plt
# load data
exercise = sns.load_dataset('exercise')
pen = sns.load_dataset('penguins')
# create figures
fig, ax = plt.subplots(2, 1, figsize=(8, 8))
# plot data
g1 = sns.boxplot(x='time', y='pulse', hue='kind', data=exercise, ax=ax[0])
g2 = sns.boxplot(x='species', y='body_mass_g', hue='sex', data=pen, ax=ax[1])
plt.show()
Remove Labels
fig, ax = plt.subplots(2, 1, figsize=(8, 8))
g1 = sns.boxplot(x='time', y='pulse', hue='kind', data=exercise, ax=ax[0])
g1.set(yticklabels=[]) # remove the tick labels
g1.set(title='Exercise: Pulse by Time for Exercise Type') # add a title
g1.set(ylabel=None) # remove the axis label
g2 = sns.boxplot(x='species', y='body_mass_g', hue='sex', data=pen, ax=ax[1])
g2.set(yticklabels=[])
g2.set(title='Penguins: Body Mass by Species for Gender')
g2.set(ylabel=None) # remove the y-axis label
g2.tick_params(left=False) # remove the ticks
plt.tight_layout()
plt.show()
Example 2
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# sinusoidal sample data
sample_length = range(1, 1+1) # number of columns of frequencies
rads = np.arange(0, 2*np.pi, 0.01)
data = np.array([(np.cos(t*rads)*10**67) + 3*10**67 for t in sample_length])
df = pd.DataFrame(data.T, index=pd.Series(rads.tolist(), name='radians'), columns=[f'freq: {i}x' for i in sample_length])
df.reset_index(inplace=True)
# plot
fig, ax = plt.subplots(figsize=(8, 8))
ax.plot('radians', 'freq: 1x', data=df)
# or skip the previous two lines and plot df directly
# ax = df.plot(x='radians', y='freq: 1x', figsize=(8, 8), legend=False)
Remove Labels
# plot
fig, ax = plt.subplots(figsize=(8, 8))
ax.plot('radians', 'freq: 1x', data=df)
# or skip the previous two lines and plot df directly
# ax = df.plot(x='radians', y='freq: 1x', figsize=(8, 8), legend=False)
ax.set(yticklabels=[]) # remove the tick labels
ax.tick_params(left=False) # remove the ticks

Shift bar locations on multi-bar bar plot

much searching has not yielded a working solution to a python matplotlib problem. I'm sure I'm missing something simple...
MWE:
import pandas as pd
import matplotlib.pyplot as plt
#MWE plot
T = [1, 2, 3, 4, 5, 6]
n = len(T)
d1 = list(zip([500]*n, [250]*n))
d2 = list(zip([250]*n, [125]*n))
df1 = pd.DataFrame(data=d1, index=T)
df2 = pd.DataFrame(data=d2, index=T)
fig = plt.figure()
ax = fig.add_subplot(111)
df1.plot(kind='bar', stacked=True, align='edge', width=-0.4, ax=ax)
df2.plot(kind='bar', stacked=True, align='edge', width=0.4, ax=ax)
plt.show()
Generates:
Shifted Plot
No matter what parameters I play around with, that first bar is cut off on the left. If I only plot a single bar (i.e. not clusters of bars), the bars are not cut off and in fact there is nice even white space on both sides.
I hard-coded the data for this MWE; however, I am trying to find a generic way to ensure the correct alignment since I will likely produce a LOT of these plots with varying numbers of items on the x axis and potentially a varying number of bars in each cluster.
How do I shift the bars so that the they are spaced correctly on the x axis with even white space?
It all depends on the width that you put in your plots. Put some xlim.
import pandas as pd
import matplotlib.pyplot as plt
#MWE plot
T = [1, 2, 3, 4, 5, 6]
n = len(T)
d1 = list(zip([500]*n, [250]*n))
d2 = list(zip([250]*n, [125]*n))
df1 = pd.DataFrame(data=d1, index=T)
df2 = pd.DataFrame(data=d2, index=T)
fig = plt.figure()
ax = fig.add_subplot(111)
df1.plot(kind='bar', stacked=True, align='edge', width=-0.4, ax=ax)
df2.plot(kind='bar', stacked=True, align='edge', width=0.4, ax=ax)
plt.xlim(-.4,5.4)
plt.show()
Hope it works!

Matplotlib: how to give xticks values from a list

I have the following code:
import matplotlib.pyplot as plt
import numpy as np
xticks = ['A','B','C']
Scores = np.array([[5,7],[4,6],[8,3]])
colors = ['red','blue']
fig, ax = plt.subplots()
ax.hist(Scores,bins=3,density=True,histtype='bar',color=colors)
plt.show()
Which gives the following output:
I have two questions:
How can I make the height of bars represent the values in Scores e.g. the left most red column should be of height 5 and left most blue column should be of height 7, and so on.
How can I assign values across x-axis from xticks list e.g. the left two columns should have 'A' written under them, the next two 'B' and so on.
You confound a histogram with a bar plot. Here you want a bar plot. If you want to use pandas, this is going to be very easy:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
xticks = ['A','B','C']
Scores = np.array([[5,7],[4,6],[8,3]])
colors = ['red','blue']
names = ["Cat", "Dog"]
fig, ax = plt.subplots()
pd.DataFrame(Scores, index=xticks, columns=names).plot.bar(color=colors, ax=ax)
plt.show()
If using matplotlib alone, it's slighlty more complicated, because each column needs to be plotted independently,
import matplotlib.pyplot as plt
import numpy as np
xticks = ['A','B','C']
Scores = np.array([[5,7],[4,6],[8,3]])
colors = ['red','blue']
names = ["Cat", "Dog"]
fig, ax = plt.subplots()
x = np.arange(len(Scores))
ax.bar(x-0.2, Scores[:,0], color=colors[0], width=0.4, label=names[0])
ax.bar(x+0.2, Scores[:,1], color=colors[1], width=0.4, label=names[1])
ax.set(xticks=x, xticklabels=xticks)
ax.legend()
plt.show()
You already did a lot of the work for the histogram. Now you just need some bar plots.
import matplotlib.pyplot as plt
import numpy as np
xticks = ['A','B','C']
Scores = np.array([[5,7],[4,6],[8,3]])
colors = ['red','blue']
fig, ax = plt.subplots()
# Width of bars
w=.2
# Plot both separately
ax.bar([1,2,3],Scores[:,0],width=w,color=colors[0])
ax.bar(np.add([1,2,3],w),Scores[:,1],width=w,color=colors[1])
# Assumes you want ticks in the middle
ax.set_xticks(ticks=np.add([1,2,3],w/2))
ax.set_xticklabels(xticks)
plt.show()
plt.xticks(range(0, 6), ('A', 'A', 'B', 'B', 'C', 'C')) would work to answer question part 2 I believe. I'm not sure about the heights, as I haven't made histograms.

Matplotlib Colorbar change ticks labels and locators

I would like to change the ticks locators and labels in the colorbar of the following plot.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import dates as mdates
import numpy as np
# fontdict to control style of text and labels
font = {'family': 'serif',
'color': (0.33, 0.33, 0.33),
'weight': 'normal',
'size': 18,
}
num = 1000
x = np.linspace(-4,4,num) + (0.5 - np.random.rand(num))
y = np.linspace(-2,2,num) + (0.5 - np.random.rand(num))
t = pd.date_range('1/1/2014', periods=num)
# make plot with vertical (default) colorbar
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(6, 6))
ax.set_title('Scatter plot', fontdict=font)
# plot data
s = ax.scatter(x = x, y = y,
s=50, c=t, marker='o',
cmap=plt.cm.rainbow)
# plot settings
ax.grid(True)
ax.set_aspect('equal')
ax.set_ylabel('Northing [cm]', fontdict=font)
ax.set_xlabel('Easting [cm]', fontdict=font)
# add colorbar
cbar = fig.colorbar(mappable=s, ax=ax)
cbar.set_label('Date')
# change colobar ticks labels and locators
????
The colorbar illustrates the time dependency. Thus, I would like to change the ticks from their numerical values (nanoseconds?) to more sensible date format like months and year (e.g., %b%Y or %Y-%m) where the interval could be for example 3 or 6 months. Is that possible?
I tried to play unsuccessfully with cbar.formatter, cbar.locator and mdates.
You can keep the same locators as proposed by the colorbar function but change the ticklabels in order to print the formatted date as follows:
# change colobar ticks labels and locators
cbar.set_ticks([s.colorbar.vmin + t*(s.colorbar.vmax-s.colorbar.vmin) for t in cbar.ax.get_yticks()])
cbar.set_ticklabels([mdates.datetime.datetime.fromtimestamp((s.colorbar.vmin + t*(s.colorbar.vmax-s.colorbar.vmin))/1000000000).strftime('%c') for t in cbar.ax.get_yticks()])
plt.show()
which gives the result below:
If you really want to control tick locations, you can compute the desired values (here for approximately 3 months intervals ~91.25 days):
i,ticks = 0,[s.colorbar.vmin]
while ticks[-1] < s.colorbar.vmax:
ticks.append(s.colorbar.vmin+i*24*3600*91.25*1e9)
i = i+1
ticks[-1] = s.colorbar.vmax
cbar.set_ticks(ticks)
cbar.set_ticklabels([mdates.datetime.datetime.fromtimestamp(t/1e9).strftime('%c') for t in ticks])
The colormapping machinery of matplotlib has no concepts of "units" like an x or y axis does, so you can do the conversion from date to floats manually before mapping and then set the locator and formatter manually. You can also look into how pandas maps their date object to floats, it may be a bit different than the native matplotlib mapping:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
dates = np.datetime64('2019-11-01') + np.arange(10)*np.timedelta64(1, 'D')
X= np.random.randn(10, 2)
plt.scatter(X[:, 0], X[:, 1], c=mdates.date2num(dates))
cb = plt.colorbar()
loc = mdates.AutoDateLocator()
cb.ax.yaxis.set_major_locator(loc)
cb.ax.yaxis.set_major_formatter(mdates.ConciseDateFormatter(loc))
plt.show()

Categories