I was wondering, if you can annotate every graph in this example automatically using the column headers as labels.
import seaborn as sns
import pandas as pd
d = {'a': [100, 125, 300, 520],..., 'z': [250, 270, 278, 248]}
df = pd.DataFrame(data=d, index=[25, 26, 26, 30])
a ... z
25 100 ... 250
26 125 ... 270
26 300 ... 278
30 520 ... 248
When I use this code, I only get the column headers as a legend. However, I want the labels to be directly beside/above my graphs.
sns.lineplot(data=df, dashes=False, estimator=None)
Is this what you are looking for?
ax = sns.lineplot(data=df, dashes=False, estimator=None, legend=False)
for label, pos in df.iloc[0].iteritems():
ax.annotate(label, (df.index[0], pos*1.05), ha='left', va='bottom')
output:
Something like:
ax = sns.lineplot(data=df, dashes=False, estimator=None)
for c, l in zip(df.columns, ax.lines):
y = l.get_ydata()
ax.annotate(f'{c}', xy=(1.01,y[-1]), xycoords=('axes fraction', 'data'),
ha='left', va='center', color=l.get_color())
Source: https://stackoverflow.com/a/62703420/15239951
Related
This is a continuation of this question. But now I have a bar-chart with hue.
Here's what I have:
df = pd.DataFrame({'age': ['20-30', '20-30', '20-30', '30-40', '30-40', '30-40', '40-50', '40-50', '40-50', '50-60', '50-60', '50-60'],
'expenses':['50$', '100$', '200$', '50$', '100$', '200$', '50$', '100$', '200$', '50$', '100$', '200$'],
'users': [59, 42, 57, 68, 47, 98, 75, 73, 54, 81, 52, 43],
'buyers': [22, 35, 18, 27, 12, 57, 19, 29, 31, 47, 10, 5],
'percentage': [37.2881, 83.3333, 31.5789, 39.7058, 25.5319, 58.1632, 25.3333, 39.7260, 57.4074, 58.0246, 19.2307, 11.6279]})
index
age
expenses
users
buyers
percentage
0
20-30
50$
59
22
37.2881
1
20-30
100$
42
35
83.3333
2
20-30
200$
57
18
31.5789
3
30-40
50$
68
27
39.7058
4
30-40
100$
47
12
25.5319
5
30-40
200$
98
57
58.1632
6
40-50
50$
75
19
25.3333
7
40-50
100$
73
29
39.726
8
40-50
200$
54
31
57.4074
9
50-60
50$
81
47
58.0246
10
50-60
100$
52
10
19.2307
11
50-60
200$
43
5
11.6279
fig, ax = plt.subplots(figsize=(20, 10))
# Plot the all users
sns.barplot(x='age', y='users', data=df, hue='expenses', palette='Blues', edgecolor='grey', alpha=0.7, ax=ax)
# Plot the buyers
sns.barplot(x='age', y='buyers', data=df, hue='expenses', palette='Blues', edgecolor='darkgrey', hatch='//', ax=ax)
plt.show()
I need to get the same chart. In the case of hue, the code:
# extract the separate containers
c1, c2 = ax.containers
# annotate with the users values
ax.bar_label(c1, fontsize=13)
# annotate with the buyer and percentage values
l2 = [f"{v.get_height()}: {df.loc[i, 'percentage']}%" for i, v in enumerate(c2)]
ax.bar_label(c2, labels=l2, fontsize=8, label_type='center', fontweight='bold')
no longer works.
I would be glad for any hints.
Each object in ax.containers represents the bars for a single hue group.
When using bar_label, the annotations for each bar in '50$', then '100$', and then '200$' are added.
I think it's easier to select the correct data by annotating the 'buyers' group separately.
The answer to your previous question selects the data from the entire dataframe, but here Boolean indexing is used to select only a segment of the dataframe. Using print(data) in each loop will help with understanding.
fig, ax = plt.subplots(figsize=(20, 10))
# plot the all users
sns.barplot(x='age', y='users', data=df, hue='expenses', palette='Blues', edgecolor='grey', alpha=0.7, ax=ax)
# annotate the bars in the 3 containers (1 container per hue group)
for c in ax.containers:
ax.bar_label(c)
# plot the 'buyers', which adds 3 more containers to ax
sns.barplot(x='age', y='buyers', data=df, hue='expenses', palette='Blues', edgecolor='darkgrey', hatch='//', ax=ax)
# iterate through the last 3 new containers containing the hatched groups
for c in ax.containers[3:]:
# get the hue label, which will be used to select the data group
hue_label = c.get_label()
# select the data based on hue_label
data = df.loc[df.expenses.eq(hue_label), ['buyers', 'percentage']]
# customize the labels
labels = [f"{v.get_height()}: {data.iloc[i, 1]:0.2f}%" for i, v in enumerate(c)]
# add the labels
ax.bar_label(c, labels=labels)
plt.show()
I have a dataset, df that looks like this:
Date
Code
City
State
Quantity x
Quantity y
Population
Cases
Deaths
2019-01
10001
Los Angeles
CA
445
0
0
2019-01
10002
Sacramento
CA
4450
556
0
0
2020-03
12223
Houston
TX
440
4440
35000000
23
11
...
...
...
...
...
...
...
...
...
2021-07
10002
Sacramento
CA
3220
NA
5444000
211
22
My start and end date are the same for all cities. I have over 4000 different cities, and would like to plot a 2-yaxis graph for each city, using something similar to the following code:
import matplotlib.pyplot as plt
fig, ax1 = plt.subplots(figsize=(9,9))
color = 'tab:red'
ax1.set_xlabel('Date')
ax1.set_ylabel('Quantity X', color=color)
ax1.plot(df['Quantity x'], color=color)
ax1.tick_params(axis='y', labelcolor=color)
ax2 = ax1.twinx()
color2 = 'tab:blue'
ax2.set_ylabel('Deaths', color=color2)
ax2.plot(df['Deaths'], color=color2)
ax2.tick_params(axis='y', labelcolor=color2)
plt.show()
I would like to create a loop so that the code above runs for every Code that is related to a City, with quantity x and deaths, and it saves each graph made into a folder. How can I create a loop that does that, and stops every different Code?
Observations: Some values on df['Quantity x] and df[Population] are left blank.
If I understood you correctly, you are looking for a filtering functionality:
import matplotlib.pyplot as plt
import pandas as pd
def plot_quantity_and_death(df):
# your code
fig, ax1 = plt.subplots(figsize=(9, 9))
color = 'tab:red'
ax1.set_xlabel('Date')
ax1.set_ylabel('Quantity X', color=color)
ax1.plot(df['Quantity x'], color=color)
ax1.tick_params(axis='y', labelcolor=color)
ax2 = ax1.twinx()
color2 = 'tab:blue'
ax2.set_ylabel('Deaths', color=color2)
ax2.plot(df['Deaths'], color=color2)
ax2.tick_params(axis='y', labelcolor=color2)
# save & close addon
plt.savefig(f"Code_{str(df['Code'].iloc[0])}.png")
plt.close()
df = pd.DataFrame() # this needs to be replaced by your dataset
# get unique city codes, loop over them, filter data and plot it
unique_codes = pd.unique(df['Code'])
for code in unique_codes:
filtered_df = df[df['Code'] == code]
plot_quantity_and_death(filtered_df)
I have the following dataset:
# Make the data
df = pd.DataFrame({'weight': [200, 170, 160, 150, 145],
'days': [0, 91, 174, 205, 279]})
# Display the data
df
weight days
0 200 0
1 170 91
2 160 174
3 150 205
4 145 279
I want to make a lineplot with avxspan with the following code.
# Plot
sns.lineplot(
x=df['days'],
y=df['weight'],
marker="o",
alpha=0.5)
plt.axvspan(200, max(df['days']), facecolor='g', alpha=0.4, label='Intermittent Fasting Starts')
plt.xlabel('Days Passed')
plt.ylabel('Total Weight (Lbs)')
plt.legend();
However, the span isn't going to the extreme of the plot border.
How can I make the span to the edge of the plot?
Any suggestions would be appreciated. Thanks!
I'm trying to plot the data (see below). With company_name on the x-axis, status_mission_2_y on the y axis and percentage on the other y_axis. I have tried using the twinx() fucntion but I can't get it to work.
Please can you help? Thanks in advance!
def twinplot(data):
x_ = data.columns[0]
y_ = data.columns[1]
y_2 = data.columns[2]
data1 = data[[x_, y_]]
data2 = data[[x_, y_2]]
plt.figure(figsize=(15, 8))
ax = sns.barplot(x=x_, y=y_, data=data1)
ax2 = ax.twinx()
g2 = sns.barplot(x=x_, y=y_2, data=data2, ax=ax2)
plt.show()
data = ten_company_missions_failed
twinplot(data)
company_name
percentage
status_mission_2_y
EER
1
1
Ghot
1
1
Trv
1
1
Sandia
1
1
Test
1
1
US Navy
0.823529412
17
Zed
0.8
5
Gov
0.75
4
Knight
0.666666667
3
Had
0.666666667
3
Seaborn plots the two bar plots with the same color and on the same x-positions.
The following example code resizes the bar widths, with the bars belonging ax moved to the left. And the bars of ax2 moved to the right. To differentiate the right bars, a semi-transparency (alpha=0.7) and hatching is used.
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import pandas as pd
import seaborn as sns
from io import StringIO
data_str = '''company_name percentage status_mission_2_y
EER 1 1
Ghot 1 1
Trv 1 1
Sandia 1 1
Test 1 1
"US Navy" 0.823529412 17
Zed 0.8 5
Gov 0.75 4
Knight 0.666666667 3
Had 0.666666667 3'''
data = pd.read_csv(StringIO(data_str), delim_whitespace=True)
x_ = data.columns[0]
y_ = data.columns[1]
y_2 = data.columns[2]
data1 = data[[x_, y_]]
data2 = data[[x_, y_2]]
plt.figure(figsize=(15, 8))
ax = sns.barplot(x=x_, y=y_, data=data1)
width_scale = 0.45
for bar in ax.containers[0]:
bar.set_width(bar.get_width() * width_scale)
ax.yaxis.set_major_formatter(PercentFormatter(1))
ax2 = ax.twinx()
sns.barplot(x=x_, y=y_2, data=data2, alpha=0.7, hatch='xx', ax=ax2)
for bar in ax2.containers[0]:
x = bar.get_x()
w = bar.get_width()
bar.set_x(x + w * (1- width_scale))
bar.set_width(w * width_scale)
plt.show()
A simpler alternative could be to combine a barplot on ax and a lineplot on ax2.
plt.figure(figsize=(15, 8))
ax = sns.barplot(x=x_, y=y_, data=data1)
ax.yaxis.set_major_formatter(PercentFormatter(1))
ax2 = ax.twinx()
sns.lineplot(x=x_, y=y_2, data=data2, marker='o', color='crimson', lw=3, ax=ax2)
plt.show()
I want to connect box plot means. I can do the basic part but cannot connect box plot means and box plots offset from x axis. similar post but not connecting means Python: seaborn pointplot and boxplot in one plot but shifted on the x-axis
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy','Jason', 'Molly', 'Tina', 'Jake', 'Amy','Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'pre_score': [4, 24, 31, 2, 3,25, 94, 57, 62, 70,5, 43, 23, 23, 51]
}
data = pd.DataFrame(raw_data, columns = ['first_name', 'pre_score'])
first_name pre_score
0 Jason 4
1 Molly 24
2 Tina 31
3 Jake 2
4 Amy 3
5 Jason 25
6 Molly 94
7 Tina 57
8 Jake 62
9 Amy 70
10 Jason 5
11 Molly 43
12 Tina 23
13 Jake 23
14 Amy 51
sns.set_style("ticks")
ax = sns.stripplot(x='first_name', y='pre_score', hue='first_name', jitter=True, dodge=True, size=6, zorder=0, alpha=0.5, linewidth =1, data=data)
ax = sns.boxplot(x='first_name', y='pre_score', hue='first_name', dodge=True, showfliers=True, linewidth=0.8, showmeans=True, data=data)
ax = sns.lineplot(x='first_name', y='pre_score', color='k', data=data.groupby(['first_name'], as_index=False).mean())
fig_size = [18.0, 10.0]
plt.rcParams["figure.figsize"] = fig_size
handles, labels = ax.get_legend_handles_labels()
legend_len = labels.__len__()
ax.legend(handles[int(legend_len/2):legend_len], labels[int(legend_len/2):legend_len], bbox_to_anchor=(1.01, 1), loc=2, borderaxespad=0.1);
As we can see the sns.line plot does not follow the means and box plots and names in the x axis has offset.
How can I fix this ?
When dealing with seaborn plot, I would strongly recommend you always provide an order= (and hue_order= if applicable) to avoid nasty surprise with the categories not showing up in a consistent order between calls.
For the purpose of your question, you can replace the lineplot with a pointplot, which will automatically aggregate the values by categories and plot using a line
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy','Jason', 'Molly', 'Tina', 'Jake', 'Amy','Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'pre_score': [4, 24, 31, 2, 3,25, 94, 57, 62, 70,5, 43, 23, 23, 51]
}
data = pd.DataFrame(raw_data, columns = ['first_name', 'pre_score'])
# define the order in which the categories will be plotted on the x-axis
order = np.sort(data['first_name'].unique()) # you could also create a list by hand if you want a specific order
sns.set_style("ticks")
ax = sns.stripplot(x='first_name', y='pre_score', order=order, jitter=True, size=6, zorder=0, alpha=0.5, linewidth =1, data=data)
ax = sns.boxplot(x='first_name', y='pre_score', order=order, showfliers=True, linewidth=0.8, showmeans=True, data=data)
ax = sns.pointplot(x='first_name', y='pre_score', order=order, data=data, ci=None, color='black')
If for some reason you don't want to or cannot use a seaborn function that takes an order argument, then aggregate by hand in pandas, and reindex() with your order to make sure the values appear in the right order in the dataframe before plotting with the tool of your choice.
For instance, you could replace the call to pointplot() above with:
means = data.groupby('first_name')['pre_score'].mean().reindex(order) # calculate the means and ensure they are
# displayed in the same order as the boxplots
ax.plot(means.index, means.values, 'ko-', lw=3)
and have the exact same result