Create an artificial legend in seaborn - python

I want to add an artificial legend to my plot. It is artificial because I didn't group my observation (see code below).It means I can't solve this problem with plt.legend() function: it requires grouped variables. Is there any way to handle it?
My code:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_theme(style="white")
ax = sns.boxplot(data = data.values.tolist(),palette=['white', 'black'])
ax.set_xticklabels(labels, fontsize=14)
ax.tick_params(labelsize=14)
and plot looks like:
My desire is to add a legend (maybe it is not a legend at all just a drawing) where will be written something like (sorry for size):

You can create a legend from the artists created by Seaborn as follows:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set_theme(style="white")
ax = sns.boxplot(data = np.random.randn(20,20), palette=['white', 'black'])
handles = ax.artists[:2]
handles[0].set_label("First")
handles[1].set_label("Second")
ax.legend(handles=handles)
plt.show()

As I do not have your data, I can not replicate your charts. However, you might try adding the following line at the end (after importing matplotlib.pyplot as plt).
plt.legend(['First','Second'])

Related

How can I rotate axis tickmark labels if I set axis properties before making my plot?

I'm experimenting with seaborn and have a question about specifying axes properties. In my code below, I've taken two approaches to creating a heatmap of a matrix and placing the results on two sets of axes in a figure.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
A=np.random.randn(4,4)
labels=['a','b','c','d']
fig, ax = plt.subplots(2)
sns.heatmap(ax =ax[0], data = A)
ax[0].set_xticks(range(len(labels)))
ax[0].set_xticklabels(labels,fontsize=10,rotation=45)
ax[0].set_yticks(range(len(labels)))
ax[0].set_yticklabels(labels,fontsize=10,rotation=45)
ax[1].set_xticks(range(len(labels)))
ax[1].set_xticklabels(labels,fontsize=10,rotation=45)
ax[1].set_yticks(range(len(labels)))
ax[1].set_yticklabels(labels,fontsize=10,rotation=45)
sns.heatmap(ax =ax[1], data = A,xticklabels=labels, yticklabels=labels)
plt.show()
The resulting figure looks like this:
Normally, I would always take the first approach of creating the heatmap and then specifying axis properties. However, when creating an animation (to be embedded on a tkinter canvas), which is what I'm ultimately interested in doing, I found such an ordering in my update function leads to "flickering" of axis labels. The second approach will eliminate this effect, and it also centers the tickmarks within squares along the axes.
However, the second approach does not rotate the y-axis tickmark labels as desired. Is there a simple fix to this?
I'm not sure this is what you're looking for. It looks like you create your figure after you change the yticklabels. so the figure is overwriting your yticklabels.
Below would fix your issue.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
A=np.random.randn(4,4)
labels=['a','b','c','d']
fig, ax = plt.subplots(2)
sns.heatmap(ax =ax[0], data = A)
ax[0].set_xticks(range(len(labels)))
ax[0].set_xticklabels(labels,fontsize=10,rotation=45)
ax[0].set_yticks(range(len(labels)))
ax[0].set_yticklabels(labels,fontsize=10,rotation=45)
ax[1].set_xticks(range(len(labels)))
ax[1].set_xticklabels(labels,fontsize=10,rotation=45)
ax[1].set_yticks(range(len(labels)))
sns.heatmap(ax =ax[1], data = A,xticklabels=labels, yticklabels=labels)
ax[1].set_yticklabels(labels,fontsize=10,rotation=45)
plt.show()

How to add error bars in matplotlib for multiple groups from dataframe?

I've run multiple regressions and stored the coefficients and standard errors into a data frame like this:
I wanted to make a graph that shows how the coefficient changes for each group over time, like so:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(14,8))
sns.set(style= "whitegrid")
sns.lineplot(x="time", y="coef",
hue="group",
data=eventstudy)
plt.axhline(y=0 , color='r', linestyle='--')
plt.legend(bbox_to_anchor=(1, 1), loc=2)
plt.show
plt.savefig('eventstudygraph.png')
Which produces:
But I would like to include error bars using the 'stderr' data from my main data set.
I think I can do it using 'plt.errorbar'. But can't seem to figure out how to make it work. At the moment, I've tried adding the 'plt.errorbar line and experimenting different with different iterations:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(14,8))
sns.set(style= "whitegrid")
sns.lineplot(x="time", y="coef",
hue="group",
data=eventstudy)
plt.axhline(y=0 , color='r', linestyle='--')
plt.errorbar("time", "coef", xerr="stderr", data=eventstudy)
plt.legend(bbox_to_anchor=(1, 1), loc=2)
plt.show
plt.savefig('eventstudygraph.png')
As you can see, it seems to be creating it's own group/line in the graph. I think I would know how to use 'plt.errorbar' if I had just one group, but I don't have a clue how to make it work for 3 groups. Is there some way of making 3 versions of 'plt.errorbar' so I can create the error bars for each group separately? Or is there something simpler?
You need to iterate through the different groups, and plot the errorbar separately, what you have above is plotting all the error bars at one go:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
np.random.seed(111)
df = pd.DataFrame({"time":[1,2,3,4,5]*3,"coef":np.random.uniform(-0.5,0.5,15),
"stderr":np.random.uniform(0.05,0.1,15),
"group":np.repeat(['Monthly','3 Monthly','6 Monthly'],5)})
fig,ax = plt.subplots(figsize=(14,8))
sns.set(style= "whitegrid")
lvls = df.group.unique()
for i in lvls:
ax.errorbar(x = df[df['group']==i]["time"],
y=df[df['group']==i]["coef"],
yerr=df[df['group']==i]["stderr"],label=i)
ax.axhline(y=0 , color='r', linestyle='--')
ax.legend()

Combo Seaborn plots don't line up properly

I'm trying to overlay a lineplot above a countplot in seaborn. They both work when they are seperated:
By put together they end up at opposite ends of the chart:
Does anybody know why this is?
You need to use the twinx() from matplotlib and your first graph needs to be just matplotlib, not seaborn. I'm not sure why seaborn has a problem with combo charts, but I got the exact same problem as you did. Here's my code with population data from kaggle:
#Create bar plot for annual growth by year
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
#import dataframe for data
df = pd.read_csv('df.csv')
#Create combo chart
fig, ax1 = plt.subplots(figsize=(10,6))
color = 'tab:green'
#bar plot creation
ax1.bar(df['Year'],df['Population Growth'],color='y')
#specify we want to share the same x-axis
ax2 = ax1.twinx()
#lineplot creation
ax2 = sns.lineplot(x='Year', y='Percent Growth', data=df,color='#C33E3E')
plt.show()
With this code I get the following graph:

Change y-axis scale - FacetGrid

I cannot work out how to change the scale of the y-axis. My code is:
grid = sns.catplot(x='Nationality', y='count',
row='Age', col='Gender',
hue='Type',
data=dfNorthumbria2, kind='bar', ci='No')
I wanted to just go up in full numbers rather than in .5
Update
I just now found this tutorial the probably easiest solution will be the following:
grid.set(yticks=list(range(5)))
From the help of grid.set
Help on method set in module seaborn.axisgrid:
set(**kwargs) method of seaborn.axisgrid.FacetGrid instance
Set attributes on each subplot Axes.
Since seaborn is build on top of matplotlib you can use yticks from plt
import matplotlib.pyplot as plt
plt.yticks(range(5))
However this changed only the yticks of the upper row in my mockup example.
For this reason you probably want to change the y ticks based on the axis with ax.set_yticks(). To get the axis from your grid object you can implemented a list comprehension as follows:
[ax[0].set_yticks(range(0,150,5) )for ax in grid.axes]
A full replicable example would look like this (adapted from here)
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="ticks")
exercise = sns.load_dataset("exercise")
grid = sns.catplot(x="time", y="pulse", hue="kind",
row="diet", data=exercise)
# plt.yticks(range(0,150,5)) # Changed only one y-axis
# Changed y-ticks to steps of 20
[ax[0].set_yticks(range(0,150,20) )for ax in grid.axes]

Multiple graphs instead one using Matplotlib

The code below takes a dataframe filters by a string in a column and then plot the values of another column
I plot the values of the using histogram and than worked fine until I added Mean, Median and standard deviation but now I am just getting an empty graph where instead the all of the variables mentioned below should be plotted in one graph together with their labels
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
from matplotlib import pyplot as plt
import numpy as np
df = pd.read_csv(r'C:/Users/output.csv', delimiter=";", encoding='unicode_escape')
df['Plot_column'] = df['Plot_column'].str.split(',').str[0]
df['Plot_column'] = df['Plot_column'].astype('int64', copy=False)
X=df[df['goal_colum']=='start running']['Plot_column'].values
dev_x= X
mean_=np.mean(dev_x)
median_=np.median(dev_x)
standard_=np.std(dev_x)
plt.hist(dev_x, bins=5)
plt.plot(mean_, label='Mean')
plt.plot(median_, label='Median')
plt.plot(standard_, label='Std Deviation')
plt.title('Data')
https://matplotlib.org/3.1.1/gallery/statistics/histogram_features.html
There are two major ways to plot in matplotlib, pyplot (the easy way) and ax (the hard way). Ax lets you customize your plot more and you should work to move towards that. Try something like the following
num_bins = 50
fig, ax = plt.subplots()
# the histogram of the data
n, bins, patches = ax.hist(dev_x, num_bins, density=1)
ax.plot(np.mean(dev_x))
ax.plot(np.median(dev_x))
ax.plot(np.std(dev_x))
# Tweak spacing to prevent clipping of ylabel
fig.tight_layout()
plt.show()

Categories