Given this example code:
import pandas as pd
import matplotlib.pyplot as plt
data = 'https://raw.githubusercontent.com/marsja/jupyter/master/flanks.csv'
df = pd.read_csv(data, index_col=0)
# Subsetting using Pandas query():
congruent = df.query('TrialType == "congruent"')['RT']
incongruent = df.query('TrialType == "incongruent"')['RT']
# Combine data
plot_data = list([incongruent, congruent])
fig, ax = plt.subplots()
xticklabels = ['Incongruent', 'Congruent']
ax.set_xticks([1, 2])
ax.set_xticklabels(xticklabels)
ax.violinplot(plot_data, showmedians=True)
Which results in the following plot:
How can I annotate the min, max, and mean lines with their respective values?
I haven't been able to find examples online that allude to how to annotate violin plots in this way. If we set plot = ax.violinplot(plot_data, showmedians=True) then we can access attributes like plot['cmaxes'] but I cant quite figure out how to use that for annotations.
Here is an example of what I am trying to achieve:
So this was as easy as getting the medians/mins/maxes and then enumerating, adding the annotation with plt.text, and adding some small values for positioning:
medians = results_df.groupby(['model_cat'])['test_f1'].median()
for i, v in enumerate(medians):
plt.text((i+.85), (v+.001), str(round(v, 3)), fontsize = 12)
Related
I try to come up with a helper function to plot figure with subplots in Seaborn.
The codes currently look like below:
def granular_barplot(data, col_name, separator):
'''
data = dataframe
col_name: the column to be analysed
separator: column to be plotted in subplot
'''
g = sns.catplot(data=data, y=col_name, col=separator, kind='count',color=blue)
g.fig.set_size_inches(16,8)
g.fig.suptitle(f'{col_name.capitalize()} Changes by {separator.capitalize()}',fontsize=16, fontweight='bold')
g.despine()
for ax in g.axes.ravel():
for c in ax.containers:
ax.bar_label(c)
and it produces the graph like this:
What I'm trying to achieve is to make the left and bottom spines visible for each subplots in the helper function like below (which is similar to the sns.despine function):
Appreciate your helps and idea. Thanks.
Try setting this style:
def granular_barplot(data, col_name, separator):
'''
data = dataframe
col_name: the column to be analysed
separator: column to be plotted in subplot
'''
sns.set_style({'axes.linewidth': 2, 'axes.edgecolor':'black'})
g = sns.catplot(data=data, y=col_name, col=separator, kind='count',color='blue')
g.fig.set_size_inches(16,8)
g.fig.suptitle(f'{col_name.capitalize()} Changes by {separator.capitalize()}',fontsize=16, fontweight='bold')
g.despine()
for ax in g.axes.ravel():
ax.spines['left'].set_visible(True)
ax.spines['bottom'].set_visible(True)
df = sns.load_dataset('tips')
granular_barplot(df, 'sex', 'smoker')
Output:
You should either be able to pass some settings to seaborn's despine function or use matplotlib's ability to set spine visibility:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(1,100)
y = np.arange(1,100)
g = sns.lineplot(x=x,y=y)
plt.title("Seaborn despine")
sns.despine(left=False, bottom=False)
plt.show()
g = sns.lineplot(x=x,y=y)
plt.title("False Spines")
g.spines['right'].set_visible(False)
g.spines['top'].set_visible(False)
plt.show()
I'm a beginner in Python.
In my internship project I am trying to plot bloxplots from data contained in a csv
I need to plot bloxplots for each of the 4 (four) variables showed above (AAG, DENS, SRG e RCG). Since each variable presents values in the range from [001] to [100], there will be 100 boxplots for each variable, which need to be plotted in a single graph as shown in the image.
This is the graph I need to plot, but for each variable there will be 100 bloxplots as each one has 100 columns of values:
The x-axis is the "Year", which ranges from 2025 to 2030, so I need a graph like the one shown in figure 2 for each year and the y-axis is the sets of values for each variable.
Using Pandas-melt function and seaborn library I was able to plot only the boxplots of a column. But that's not what I need:
import pandas as pd
import seaborn as sns
df = pd.read_csv("2DBM_50x50_Central_Aug21_Sim.cliped.csv")
mdf= df.melt(id_vars=['Year'], value_vars='AAG[001]')
print(mdf)
ax=sns.boxplot(x='Year', y='value',width = 0.2, data=mdf)
Result of the code above:
What can I try to resolve this?
The following code gives you five subplots, where each subplot only contains the data of one variable. Then a boxplot is generated for each year. To change the range of columns used for each variable, change the upper limit in var_range = range(1, 101), and to see the outliers change showfliers to True.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.read_csv("2DBM_50x50_Central_Aug21_Sim.cliped.csv")
variables = ["AAG", "DENS", "SRG", "RCG", "Thick"]
period = range(2025, 2031)
var_range = range(1, 101)
fig, axes = plt.subplots(2, 3)
flattened_axes = fig.axes
flattened_axes[-1].set_visible(False)
for i, var in enumerate(variables):
var_columns = [f"TB_acc_{var}[{j:05}]" for j in var_range]
data = df.melt(id_vars=["Period"], value_vars=var_columns, value_name=var)
ax = flattened_axes[i]
sns.boxplot(x="Period", y=var, width=0.2, data=data, ax=ax, showfliers=False)
plt.tight_layout()
plt.show()
output:
In Pandas, I am doing:
bp = p_df.groupby('class').plot(kind='kde')
p_df is a dataframe object.
However, this is producing two plots, one for each class.
How do I force one plot with both classes in the same plot?
Version 1:
You can create your axis, and then use the ax keyword of DataFrameGroupBy.plot to add everything to these axes:
import matplotlib.pyplot as plt
p_df = pd.DataFrame({"class": [1,1,2,2,1], "a": [2,3,2,3,2]})
fig, ax = plt.subplots(figsize=(8,6))
bp = p_df.groupby('class').plot(kind='kde', ax=ax)
This is the result:
Unfortunately, the labeling of the legend does not make too much sense here.
Version 2:
Another way would be to loop through the groups and plot the curves manually:
classes = ["class 1"] * 5 + ["class 2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
p_df = pd.DataFrame({"class": classes, "vals": vals})
fig, ax = plt.subplots(figsize=(8,6))
for label, df in p_df.groupby('class'):
df.vals.plot(kind="kde", ax=ax, label=label)
plt.legend()
This way you can easily control the legend. This is the result:
import matplotlib.pyplot as plt
p_df.groupby('class').plot(kind='kde', ax=plt.gca())
Another approach would be using seaborn module. This would plot the two density estimates on the same axes without specifying a variable to hold the axes as follows (using some data frame setup from the other answer):
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# data to create an example data frame
classes = ["c1"] * 5 + ["c2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
# the data frame
df = pd.DataFrame({"cls": classes, "indices":idx, "vals": vals})
# this is to plot the kde
sns.kdeplot(df.vals[df.cls == "c1"],label='c1');
sns.kdeplot(df.vals[df.cls == "c2"],label='c2');
# beautifying the labels
plt.xlabel('value')
plt.ylabel('density')
plt.show()
This results in the following image.
There are two easy methods to plot each group in the same plot.
When using pandas.DataFrame.groupby, the column to be plotted, (e.g. the aggregation column) should be specified.
Use seaborn.kdeplot or seaborn.displot and specify the hue parameter
Using pandas v1.2.4, matplotlib 3.4.2, seaborn 0.11.1
The OP is specific to plotting the kde, but the steps are the same for many plot types (e.g. kind='line', sns.lineplot, etc.).
Imports and Sample Data
For the sample data, the groups are in the 'kind' column, and the kde of 'duration' will be plotted, ignoring 'waiting'.
import pandas as pd
import seaborn as sns
df = sns.load_dataset('geyser')
# display(df.head())
duration waiting kind
0 3.600 79 long
1 1.800 54 short
2 3.333 74 long
3 2.283 62 short
4 4.533 85 long
Plot with pandas.DataFrame.plot
Reshape the data using .groupby or .pivot
.groupby
Specify the aggregation column, ['duration'], and kind='kde'.
ax = df.groupby('kind')['duration'].plot(kind='kde', legend=True)
.pivot
ax = df.pivot(columns='kind', values='duration').plot(kind='kde')
Plot with seaborn.kdeplot
Specify hue='kind'
ax = sns.kdeplot(data=df, x='duration', hue='kind')
Plot with seaborn.displot
Specify hue='kind' and kind='kde'
fig = sns.displot(data=df, kind='kde', x='duration', hue='kind')
Plot
Maybe you can try this:
fig, ax = plt.subplots(figsize=(10,8))
classes = list(df.class.unique())
for c in classes:
df2 = data.loc[data['class'] == c]
df2.vals.plot(kind="kde", ax=ax, label=c)
plt.legend()
Given the following example:
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
df.plot(linewidth=10)
The order of plotting puts the last column on top:
How can I make this keep the data & legend order but change the behaviour so that it plots X on top of Y on top of Z?
(I know I can change the data column order and edit the legend order but I am hoping for a simpler easier method leaving the data as is)
UPDATE: final solution used:
(Thanks to r-beginners) I used the get_lines to modify the z-order of each plot
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
fig = plt.figure()
ax = fig.add_subplot(111)
df.plot(ax=ax, linewidth=10)
lines = ax.get_lines()
for i, line in enumerate(lines, -len(lines)):
line.set_zorder(abs(i))
fig
In a notebook produces:
Get the default zorder and sort it in the desired order.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(2021)
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
ax = df.plot(linewidth=10)
l = ax.get_children()
print(l)
l[0].set_zorder(3)
l[1].set_zorder(1)
l[2].set_zorder(2)
Before definition
After defining zorder
I will just put this answer here because it is a solution to the problem, but probably not the one you are looking for.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# generate data
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
# read columns in reverse order and plot them
# so normally, the legend will be inverted as well, but if we invert it again, you should get what you want
df[df.columns[::-1]].plot(linewidth=10, legend="reverse")
Note that in this example, you don't change the order of your data, you just read it differently, so I don't really know if that's what you want.
You can also make it easier on the eyes by creating a corresponding method.
def plot_dataframe(df: pd.DataFrame) -> None:
df[df.columns[::-1]].plot(linewidth=10, legend="reverse")
# then you just have to call this
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
plot_dataframe(df)
I am currently generating clustermaps in seaborn and labeling the row colors as below.
matrix = pd.DataFrame(np.random.random_integers(0,1, size=(50,4)))
labels = np.random.random_integers(0,5, size=50)
lut = dict(zip(set(labels), sns.hls_palette(len(set(labels)), l=0.5, s=0.8)))
row_colors = pd.DataFrame(labels)[0].map(lut)
g=sns.clustermap(matrix, col_cluster=False, linewidths=0.1, cmap='coolwarm', row_colors=row_colors)
plt.show()
I have a second annotation column similar to the labels data I would also like to add to the plot. The seaborn API doesn't support adding a second row_colors column, which is fine, but I am struggling in finding a workaround using matplotlib to add this annotation column to the clustering.
If I cannot use seaborn to do this and have to generate all of this manually using matplotlib that would be fine, I just can't figure that out either.
Thanks for your help!
The solution is below. The seaborn API does actually allow this to be done.
matrix = pd.DataFrame(np.random.random_integers(0,1, size=(50,4)))
labels = np.random.random_integers(0,5, size=50)
lut = dict(zip(set(labels), sns.hls_palette(len(set(labels)), l=0.5, s=0.8)))
row_colors = pd.DataFrame(labels)[0].map(lut)
#Create additional row_colors here
labels2 = np.random.random_integers(0,1, size=50)
lut2 = dict(zip(set(labels2), sns.hls_palette(len(set(labels2)), l=0.5, s=0.8)))
row_colors2 = pd.DataFrame(labels2)[0].map(lut2)
g=sns.clustermap(matrix, col_cluster=False, linewidths=0.1, cmap='coolwarm', row_colors=[row_colors, row_colors2])
plt.show()
This produces a Clustermap with two additional columns:
I tried to concat the row_colors dataframe by pandas and it worked!
Please try this code:
import seaborn as sns; sns.set(color_codes=True)
import matplotlib.pyplot as plt
import pandas as pd
iris = sns.load_dataset("iris")
print(iris)
species = iris.pop("species")
lut1 = dict(zip(species.unique(), ['#ED2323','#60FD00','#808080']))
row_colors1 = species.map(lut1)
lut2 = dict(zip(species.unique(), "rbg"))
row_colors2 = species.map(lut2)
row_colors = pd.concat([row_colors1,row_colors2],axis=1)
print(row_colors)
g = sns.clustermap(iris, row_colors=row_colors, col_cluster=False,cmap="mako", yticklabels=False, xticklabels=False)
plt.show()
There is another option for feeding in the annotation colors: you can provide a whole dataframe in the row colors or col_colors options, instead of a list of lists.
This strategy might be particularly helpful if you have a dataframe with several annotations you want represented. Instead of map, you can use the pandas function replace.
Something such as this bit can be used to modify the other answer:
## This step is necessary because you can't use replace with the tuple rgb values
lut = {k:matplotlib.colors.to_hex(v) for k, v in lut.iteritems()}
annotations_df = annotations_df.replace(lut)
g=sns.clustermap(matrix, col_cluster=False, linewidths=0.1, cmap='coolwarm', row_colors=annotations_df)
plt.show()