I try to come up with a helper function to plot figure with subplots in Seaborn.
The codes currently look like below:
def granular_barplot(data, col_name, separator):
'''
data = dataframe
col_name: the column to be analysed
separator: column to be plotted in subplot
'''
g = sns.catplot(data=data, y=col_name, col=separator, kind='count',color=blue)
g.fig.set_size_inches(16,8)
g.fig.suptitle(f'{col_name.capitalize()} Changes by {separator.capitalize()}',fontsize=16, fontweight='bold')
g.despine()
for ax in g.axes.ravel():
for c in ax.containers:
ax.bar_label(c)
and it produces the graph like this:
What I'm trying to achieve is to make the left and bottom spines visible for each subplots in the helper function like below (which is similar to the sns.despine function):
Appreciate your helps and idea. Thanks.
Try setting this style:
def granular_barplot(data, col_name, separator):
'''
data = dataframe
col_name: the column to be analysed
separator: column to be plotted in subplot
'''
sns.set_style({'axes.linewidth': 2, 'axes.edgecolor':'black'})
g = sns.catplot(data=data, y=col_name, col=separator, kind='count',color='blue')
g.fig.set_size_inches(16,8)
g.fig.suptitle(f'{col_name.capitalize()} Changes by {separator.capitalize()}',fontsize=16, fontweight='bold')
g.despine()
for ax in g.axes.ravel():
ax.spines['left'].set_visible(True)
ax.spines['bottom'].set_visible(True)
df = sns.load_dataset('tips')
granular_barplot(df, 'sex', 'smoker')
Output:
You should either be able to pass some settings to seaborn's despine function or use matplotlib's ability to set spine visibility:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(1,100)
y = np.arange(1,100)
g = sns.lineplot(x=x,y=y)
plt.title("Seaborn despine")
sns.despine(left=False, bottom=False)
plt.show()
g = sns.lineplot(x=x,y=y)
plt.title("False Spines")
g.spines['right'].set_visible(False)
g.spines['top'].set_visible(False)
plt.show()
Related
Given this example code:
import pandas as pd
import matplotlib.pyplot as plt
data = 'https://raw.githubusercontent.com/marsja/jupyter/master/flanks.csv'
df = pd.read_csv(data, index_col=0)
# Subsetting using Pandas query():
congruent = df.query('TrialType == "congruent"')['RT']
incongruent = df.query('TrialType == "incongruent"')['RT']
# Combine data
plot_data = list([incongruent, congruent])
fig, ax = plt.subplots()
xticklabels = ['Incongruent', 'Congruent']
ax.set_xticks([1, 2])
ax.set_xticklabels(xticklabels)
ax.violinplot(plot_data, showmedians=True)
Which results in the following plot:
How can I annotate the min, max, and mean lines with their respective values?
I haven't been able to find examples online that allude to how to annotate violin plots in this way. If we set plot = ax.violinplot(plot_data, showmedians=True) then we can access attributes like plot['cmaxes'] but I cant quite figure out how to use that for annotations.
Here is an example of what I am trying to achieve:
So this was as easy as getting the medians/mins/maxes and then enumerating, adding the annotation with plt.text, and adding some small values for positioning:
medians = results_df.groupby(['model_cat'])['test_f1'].median()
for i, v in enumerate(medians):
plt.text((i+.85), (v+.001), str(round(v, 3)), fontsize = 12)
In Pandas, I am doing:
bp = p_df.groupby('class').plot(kind='kde')
p_df is a dataframe object.
However, this is producing two plots, one for each class.
How do I force one plot with both classes in the same plot?
Version 1:
You can create your axis, and then use the ax keyword of DataFrameGroupBy.plot to add everything to these axes:
import matplotlib.pyplot as plt
p_df = pd.DataFrame({"class": [1,1,2,2,1], "a": [2,3,2,3,2]})
fig, ax = plt.subplots(figsize=(8,6))
bp = p_df.groupby('class').plot(kind='kde', ax=ax)
This is the result:
Unfortunately, the labeling of the legend does not make too much sense here.
Version 2:
Another way would be to loop through the groups and plot the curves manually:
classes = ["class 1"] * 5 + ["class 2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
p_df = pd.DataFrame({"class": classes, "vals": vals})
fig, ax = plt.subplots(figsize=(8,6))
for label, df in p_df.groupby('class'):
df.vals.plot(kind="kde", ax=ax, label=label)
plt.legend()
This way you can easily control the legend. This is the result:
import matplotlib.pyplot as plt
p_df.groupby('class').plot(kind='kde', ax=plt.gca())
Another approach would be using seaborn module. This would plot the two density estimates on the same axes without specifying a variable to hold the axes as follows (using some data frame setup from the other answer):
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# data to create an example data frame
classes = ["c1"] * 5 + ["c2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
# the data frame
df = pd.DataFrame({"cls": classes, "indices":idx, "vals": vals})
# this is to plot the kde
sns.kdeplot(df.vals[df.cls == "c1"],label='c1');
sns.kdeplot(df.vals[df.cls == "c2"],label='c2');
# beautifying the labels
plt.xlabel('value')
plt.ylabel('density')
plt.show()
This results in the following image.
There are two easy methods to plot each group in the same plot.
When using pandas.DataFrame.groupby, the column to be plotted, (e.g. the aggregation column) should be specified.
Use seaborn.kdeplot or seaborn.displot and specify the hue parameter
Using pandas v1.2.4, matplotlib 3.4.2, seaborn 0.11.1
The OP is specific to plotting the kde, but the steps are the same for many plot types (e.g. kind='line', sns.lineplot, etc.).
Imports and Sample Data
For the sample data, the groups are in the 'kind' column, and the kde of 'duration' will be plotted, ignoring 'waiting'.
import pandas as pd
import seaborn as sns
df = sns.load_dataset('geyser')
# display(df.head())
duration waiting kind
0 3.600 79 long
1 1.800 54 short
2 3.333 74 long
3 2.283 62 short
4 4.533 85 long
Plot with pandas.DataFrame.plot
Reshape the data using .groupby or .pivot
.groupby
Specify the aggregation column, ['duration'], and kind='kde'.
ax = df.groupby('kind')['duration'].plot(kind='kde', legend=True)
.pivot
ax = df.pivot(columns='kind', values='duration').plot(kind='kde')
Plot with seaborn.kdeplot
Specify hue='kind'
ax = sns.kdeplot(data=df, x='duration', hue='kind')
Plot with seaborn.displot
Specify hue='kind' and kind='kde'
fig = sns.displot(data=df, kind='kde', x='duration', hue='kind')
Plot
Maybe you can try this:
fig, ax = plt.subplots(figsize=(10,8))
classes = list(df.class.unique())
for c in classes:
df2 = data.loc[data['class'] == c]
df2.vals.plot(kind="kde", ax=ax, label=c)
plt.legend()
So before anyone says, I'm not trying to create a horizontal bar plot. I'm trying to make a scatter graph that categorises the different plots based on the y values.
So this is my current code:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import datetime
import random
f = []
for i in range(10):
f.append(random.randint(60,80))
df = pd.DataFrame({
"Weight": f, "Dates": ["01/12/20", "05/11/20", "12/02/20", "18/09/20", "22/04/20", "19/01/20", "18/02/20", "02/01/20", "28/11/20", "26/03/20"]
}, columns=["Weight", "Dates"])
df["Dates"] = pd.to_datetime(df["Dates"])
df.sort_values(by="Dates", inplace=True, ascending=True)
sns.set_theme(style="dark")
dates = [datetime.datetime.date(x) for x in df["Dates"]]
graph = sns.stripplot(data=df, x=dates, y="Weight")
graph.set_xticklabels(graph.get_xticklabels(), rotation=45)
plt.show()
So this is the current output:
But I want to be able to add some bars so I can categorise the data like (sorry for my poor drawing):
I still want to see the points afterwards, but I don't care about what colour they are.
I don't know if this is possible, but thanks!
EDIT: Answered by tmdavidson in comments.
I would recommend axhspan that was made for this very purpose
bands = [77.5,72.5,67.5,60]
colors = plt.cm.get_cmap('tab10')(range(len(limits)))
for y1,y2,c in zip(bands[0:], bands[1:], colors):
graph.axhspan(ymin=y1, ymax=y2, color=c, zorder=0, alpha=0.5)
I have 15 barh subplots that looks like this:
I can't seem to get the legend working, so I'll see [2,3,4] as separate labels in the graph and in the legend.
I'm having trouble with making this work for subgraphs. My code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
def plot_bars_by_data(data, title):
fig, axs = plt.subplots(8,2, figsize=(20,40))
fig.suptitle(title, fontsize=20)
fig.subplots_adjust(top=0.95)
plt.rcParams.update({'font.size': 13})
axs[7,1].remove()
column_index = 0
for ax_line in axs:
for ax in ax_line:
if column_index < len(data.columns):
column_name = data.columns[column_index]
current_column_values = data[column_name].value_counts().sort_index()
ax.barh([str(i) for i in current_column_values.index], current_column_values.values)
ax.legend([str(i) for i in current_column_values.index])
ax.set_title(column_name)
column_index +=1
plt.show()
# random data
df_test = pd.DataFrame([np.random.randint(2,5,size=15) for i in range(15)], columns=list('abcdefghijlmnop'))
plot_bars_by_data(df_test, "testing")
I just get a 8x2 bars that looks like the above graph. How can I fix this?
I'm using Python 3.6 and Jupyter Python notebook.
Use the following lines in your code. I can't put the whole output here as its a large figure with lots of subplots and hence showing a particular subplot. It turns out that first you have to create a handle for your subplot and then pass the legend values and the handle to produce the desired legends.
colors = ['r', 'g', 'b']
axx = ax.barh([str(i) for i in current_column_values.index], current_column_values.values, color=colors)
ax.legend(axx, [str(i) for i in current_column_values.index])
Sample Output
In Pandas, I am doing:
bp = p_df.groupby('class').plot(kind='kde')
p_df is a dataframe object.
However, this is producing two plots, one for each class.
How do I force one plot with both classes in the same plot?
Version 1:
You can create your axis, and then use the ax keyword of DataFrameGroupBy.plot to add everything to these axes:
import matplotlib.pyplot as plt
p_df = pd.DataFrame({"class": [1,1,2,2,1], "a": [2,3,2,3,2]})
fig, ax = plt.subplots(figsize=(8,6))
bp = p_df.groupby('class').plot(kind='kde', ax=ax)
This is the result:
Unfortunately, the labeling of the legend does not make too much sense here.
Version 2:
Another way would be to loop through the groups and plot the curves manually:
classes = ["class 1"] * 5 + ["class 2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
p_df = pd.DataFrame({"class": classes, "vals": vals})
fig, ax = plt.subplots(figsize=(8,6))
for label, df in p_df.groupby('class'):
df.vals.plot(kind="kde", ax=ax, label=label)
plt.legend()
This way you can easily control the legend. This is the result:
import matplotlib.pyplot as plt
p_df.groupby('class').plot(kind='kde', ax=plt.gca())
Another approach would be using seaborn module. This would plot the two density estimates on the same axes without specifying a variable to hold the axes as follows (using some data frame setup from the other answer):
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# data to create an example data frame
classes = ["c1"] * 5 + ["c2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
# the data frame
df = pd.DataFrame({"cls": classes, "indices":idx, "vals": vals})
# this is to plot the kde
sns.kdeplot(df.vals[df.cls == "c1"],label='c1');
sns.kdeplot(df.vals[df.cls == "c2"],label='c2');
# beautifying the labels
plt.xlabel('value')
plt.ylabel('density')
plt.show()
This results in the following image.
There are two easy methods to plot each group in the same plot.
When using pandas.DataFrame.groupby, the column to be plotted, (e.g. the aggregation column) should be specified.
Use seaborn.kdeplot or seaborn.displot and specify the hue parameter
Using pandas v1.2.4, matplotlib 3.4.2, seaborn 0.11.1
The OP is specific to plotting the kde, but the steps are the same for many plot types (e.g. kind='line', sns.lineplot, etc.).
Imports and Sample Data
For the sample data, the groups are in the 'kind' column, and the kde of 'duration' will be plotted, ignoring 'waiting'.
import pandas as pd
import seaborn as sns
df = sns.load_dataset('geyser')
# display(df.head())
duration waiting kind
0 3.600 79 long
1 1.800 54 short
2 3.333 74 long
3 2.283 62 short
4 4.533 85 long
Plot with pandas.DataFrame.plot
Reshape the data using .groupby or .pivot
.groupby
Specify the aggregation column, ['duration'], and kind='kde'.
ax = df.groupby('kind')['duration'].plot(kind='kde', legend=True)
.pivot
ax = df.pivot(columns='kind', values='duration').plot(kind='kde')
Plot with seaborn.kdeplot
Specify hue='kind'
ax = sns.kdeplot(data=df, x='duration', hue='kind')
Plot with seaborn.displot
Specify hue='kind' and kind='kde'
fig = sns.displot(data=df, kind='kde', x='duration', hue='kind')
Plot
Maybe you can try this:
fig, ax = plt.subplots(figsize=(10,8))
classes = list(df.class.unique())
for c in classes:
df2 = data.loc[data['class'] == c]
df2.vals.plot(kind="kde", ax=ax, label=c)
plt.legend()