I have the following dataset:
# Make the data
df = pd.DataFrame({'weight': [200, 170, 160, 150, 145],
'days': [0, 91, 174, 205, 279]})
# Display the data
df
weight days
0 200 0
1 170 91
2 160 174
3 150 205
4 145 279
I want to make a lineplot with avxspan with the following code.
# Plot
sns.lineplot(
x=df['days'],
y=df['weight'],
marker="o",
alpha=0.5)
plt.axvspan(200, max(df['days']), facecolor='g', alpha=0.4, label='Intermittent Fasting Starts')
plt.xlabel('Days Passed')
plt.ylabel('Total Weight (Lbs)')
plt.legend();
However, the span isn't going to the extreme of the plot border.
How can I make the span to the edge of the plot?
Any suggestions would be appreciated. Thanks!
Related
I am trying to produce multiple plots from a for loop.
My dataframe is multi-indexed as below:
temperature depth
ID Month
33 2 150 95
3 148 79
4 148 54
5 155 77
55 2 168 37
3 172 33
4 107 32
5 155 77
61 2 168 37
3 172 33
4 107 32
5 155 77
I want to loop through each ID and plot:
Temperature as a line against Month (x-axis)
Depth as a bar against Month (x-axis)
I want these to be on the same plot.
This is what I have so far:
# group the dataframe
grp = df.groupby([df.index.get_level_values(0), df.index.get_level_values(1)])
# create empty plots
fig, ax = plt.subplots()
# create an empty plot for combining with ax
ax2 = ax.twinx()
# for loop
for ID, group in grp:
ax.bar(df.index.get_level_values(1), group["temperature"], color='blue', label='Release')
ax2.plot(df.index.get_level_values(1), group["depth"], color='green', label='Hold')
ax.set_xticklabels(df.index.get_level_values(1))
plt.savefig("value{y}.png".format(y=ID))
next
dataframe reprex:
import pandas as pd
index = pd.MultiIndex.from_product([[33, 55, 61],['2','3','4', '5']], names=['ID','Month'])
df = pd.DataFrame([[150, 95],
[148, 79],
[148, 54],
[155, 77],
[168, 37],
[172, 33],
[107, 32],
[155, 77],
[168, 37],
[172, 33],
[107, 32],
[155, 77]],
columns=['temperature', 'depth'], index=index)
This is a continuation of this question. But now I have a bar-chart with hue.
Here's what I have:
df = pd.DataFrame({'age': ['20-30', '20-30', '20-30', '30-40', '30-40', '30-40', '40-50', '40-50', '40-50', '50-60', '50-60', '50-60'],
'expenses':['50$', '100$', '200$', '50$', '100$', '200$', '50$', '100$', '200$', '50$', '100$', '200$'],
'users': [59, 42, 57, 68, 47, 98, 75, 73, 54, 81, 52, 43],
'buyers': [22, 35, 18, 27, 12, 57, 19, 29, 31, 47, 10, 5],
'percentage': [37.2881, 83.3333, 31.5789, 39.7058, 25.5319, 58.1632, 25.3333, 39.7260, 57.4074, 58.0246, 19.2307, 11.6279]})
index
age
expenses
users
buyers
percentage
0
20-30
50$
59
22
37.2881
1
20-30
100$
42
35
83.3333
2
20-30
200$
57
18
31.5789
3
30-40
50$
68
27
39.7058
4
30-40
100$
47
12
25.5319
5
30-40
200$
98
57
58.1632
6
40-50
50$
75
19
25.3333
7
40-50
100$
73
29
39.726
8
40-50
200$
54
31
57.4074
9
50-60
50$
81
47
58.0246
10
50-60
100$
52
10
19.2307
11
50-60
200$
43
5
11.6279
fig, ax = plt.subplots(figsize=(20, 10))
# Plot the all users
sns.barplot(x='age', y='users', data=df, hue='expenses', palette='Blues', edgecolor='grey', alpha=0.7, ax=ax)
# Plot the buyers
sns.barplot(x='age', y='buyers', data=df, hue='expenses', palette='Blues', edgecolor='darkgrey', hatch='//', ax=ax)
plt.show()
I need to get the same chart. In the case of hue, the code:
# extract the separate containers
c1, c2 = ax.containers
# annotate with the users values
ax.bar_label(c1, fontsize=13)
# annotate with the buyer and percentage values
l2 = [f"{v.get_height()}: {df.loc[i, 'percentage']}%" for i, v in enumerate(c2)]
ax.bar_label(c2, labels=l2, fontsize=8, label_type='center', fontweight='bold')
no longer works.
I would be glad for any hints.
Each object in ax.containers represents the bars for a single hue group.
When using bar_label, the annotations for each bar in '50$', then '100$', and then '200$' are added.
I think it's easier to select the correct data by annotating the 'buyers' group separately.
The answer to your previous question selects the data from the entire dataframe, but here Boolean indexing is used to select only a segment of the dataframe. Using print(data) in each loop will help with understanding.
fig, ax = plt.subplots(figsize=(20, 10))
# plot the all users
sns.barplot(x='age', y='users', data=df, hue='expenses', palette='Blues', edgecolor='grey', alpha=0.7, ax=ax)
# annotate the bars in the 3 containers (1 container per hue group)
for c in ax.containers:
ax.bar_label(c)
# plot the 'buyers', which adds 3 more containers to ax
sns.barplot(x='age', y='buyers', data=df, hue='expenses', palette='Blues', edgecolor='darkgrey', hatch='//', ax=ax)
# iterate through the last 3 new containers containing the hatched groups
for c in ax.containers[3:]:
# get the hue label, which will be used to select the data group
hue_label = c.get_label()
# select the data based on hue_label
data = df.loc[df.expenses.eq(hue_label), ['buyers', 'percentage']]
# customize the labels
labels = [f"{v.get_height()}: {data.iloc[i, 1]:0.2f}%" for i, v in enumerate(c)]
# add the labels
ax.bar_label(c, labels=labels)
plt.show()
I was wondering, if you can annotate every graph in this example automatically using the column headers as labels.
import seaborn as sns
import pandas as pd
d = {'a': [100, 125, 300, 520],..., 'z': [250, 270, 278, 248]}
df = pd.DataFrame(data=d, index=[25, 26, 26, 30])
a ... z
25 100 ... 250
26 125 ... 270
26 300 ... 278
30 520 ... 248
When I use this code, I only get the column headers as a legend. However, I want the labels to be directly beside/above my graphs.
sns.lineplot(data=df, dashes=False, estimator=None)
Is this what you are looking for?
ax = sns.lineplot(data=df, dashes=False, estimator=None, legend=False)
for label, pos in df.iloc[0].iteritems():
ax.annotate(label, (df.index[0], pos*1.05), ha='left', va='bottom')
output:
Something like:
ax = sns.lineplot(data=df, dashes=False, estimator=None)
for c, l in zip(df.columns, ax.lines):
y = l.get_ydata()
ax.annotate(f'{c}', xy=(1.01,y[-1]), xycoords=('axes fraction', 'data'),
ha='left', va='center', color=l.get_color())
Source: https://stackoverflow.com/a/62703420/15239951
I have a pandas dataframe, which looks like this:
import seaborn as sns
import pandas as pd
d = {'a': [100, 125, 300, 520], 'b': [250, 270, 278, 248]}
df = pd.DataFrame(data=d, index=[25, 26, 26, 30])
a b
25 100 250
26 125 270
26 300 278
30 520 248
When I try to plot this dataframe with
df=sns.lineplot(data=df, dashes=False)
the values for 26 are averaged and a error bar shows up. However I want the values for 26 plotted separately.
That's what the estimator parameter does. See the docs: https://seaborn.pydata.org/generated/seaborn.lineplot.html
sns.lineplot(data=df, dashes=False, estimator=None)
An error is returned when I want to plot an interval.
I created an interval for my age column so now I want to show on a chart the age interval compares to the revenue
my code
bins = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
clients['tranche'] = pd.cut(clients.age, bins)
clients.head()
client_id sales revenue birth age sex tranche
0 c_1 39 558.18 1955 66 m (60, 70]
1 c_10 58 1353.60 1956 65 m (60, 70]
2 c_100 8 254.85 1992 29 m (20, 30]
3 c_1000 125 2261.89 1966 55 f (50, 60]
4 c_1001 102 1812.86 1982 39 m (30, 40]
# Plot a scatter tranche x revenue
df = clients.groupby('tranche')[['revenue']].sum().reset_index().copy()
plt.scatter(df.tranche, df.revenue)
plt.show()
But an error appears ending by
TypeError: float() argument must be a string or a number, not 'pandas._libs.interval.Interval'
How to use an interval for plotting ?
You'll need to add labels. (i tried to convert them to str using .astype(str) but that does not seem to work in 3.9)
if you do the following, it will work just fine.
labels = ['10-20', '20-30', '30-40']
df['tranche'] = pd.cut(df.age, bins, labels=labels)