I have a dataframe such as this:
data = {'name': ['Bob', 'Chuck', 'Daren', 'Elisa'],
'100m': [19, 14, 12, 11],
'200m': [36, 25, 24, 24],
'400m': [67, 64, 58, 57],
'800m': [117, 120, 123, 121]}
df = pd.DataFrame(data)
name 100m 200m 400m 800m
1 Bob 19 36 67 117
2 Chuck 14 25 64 120
3 Daren 12 24 58 123
4 Elisa 11 24 57 121
My task is simple: Plot the times (along the y-axis), with the name of the event (100m, 200m, etc. along the x-axis). The hue of each bar should be determined by the 'name' column, and look something like this.
Furthermore, I would like to overlay the results (not stack). However, there is no functionality in seaborn nor matplotlib to do this.
Instead of using seaborn, which is an API for matplotlib, plot df directly with pandas.DataFrame.plot. matplotlib is the default plotting backend for pandas.
Tested in python 3.11, pandas 1.5.1, matplotlib 3.6.2, seaborn 0.12.1
ax = df.set_index('name').T.plot.bar(alpha=.7, rot=0, stacked=True)
seaborn.barplot does not have an option for stacked bars, however, this can be implemented with seaborn.histplot, as shown in Stacked Bar Chart with Centered Labels.
df must be converted from a wide format to a long format with df.melt
# melt the dataframe
dfm = df.melt(id_vars='name')
# plot
ax = sns.histplot(data=dfm, x='variable', weights='value', hue='name', discrete=True, multiple='stack')
Related
I need to plot a barchat and to apply a color according to the "Attribute" column of my dataframe
x axis = Shares
y axis = Price
fig, ax = plt.subplots()
ax.barh(df['Share'],df['Price'], align='center')
ax.set_xlabel('Shares')
ax.set_ylabel('Price')
ax.set_title('Bar Chart & Colors')
plt.show()
Thanks for your help !
There are two easy ways to plot the bars with separate colors for 'Attribute'
Transform the dataframe with .pivot and then plot with pandas.DataFrame.plot and specify kind='barh' for a horizontal bar plot
The index will be the x-axis if using kind='bar', and will be the y-axis if using kind='barh'
The columns of the transformed dataframe will each be plotted with a separate color.
pandas uses matplotlib as the default plotting backend.
Use seaborn.barplot with hue='Attribute' and orient='h'. This option works with the dataframe in a long format, as shown in the OP.
seaborn is a high-level API for matplotlib
Tested with pandas 1.3.0, seaborn 0.11.1, and matplotlib 3.4.2
Imports and DataFrame
import pandas as pd
import seaborn as sns
# test dataframe
data = {'Price': [110, 105, 119, 102, 111, 117, 110, 110], 'Share': [110, -50, 22, 79, 29, -2, 130, 140], 'Attribute': ['A', 'B', 'C', 'D', 'A', 'B', 'B', 'C']}
df = pd.DataFrame(data)
Price Share Attribute
0 110 110 A
1 105 -50 B
2 119 22 C
3 102 79 D
4 111 29 A
5 117 -2 B
6 110 130 B
7 110 140 C
pandas.DataFrame.plot
# transform the dataframe with .pivot
dfp = df.pivot(index='Price', columns='Attribute', values='Share')
Attribute A B C D
Price
102 NaN NaN NaN 79.0
105 NaN -50.0 NaN NaN
110 110.0 130.0 140.0 NaN
111 29.0 NaN NaN NaN
117 NaN -2.0 NaN NaN
119 NaN NaN 22.0 NaN
# plot
ax = dfp.plot(kind='barh', title='Bar Chart of Colors', figsize=(6, 4))
ax.set(xlabel='Shares')
ax.legend(title='Attribute', bbox_to_anchor=(1, 1), loc='upper left')
ax.grid(axis='x')
with stacked=True
ax = dfp.plot(kind='barh', stacked=True, title='Bar Chart of Colors', figsize=(6, 4))
seaborn.barplot
Note the order of the y-axis values are reversed compared to the previous plot
ax = sns.barplot(data=df, x='Share', y='Price', hue='Attribute', orient='h')
ax.set(xlabel='Shares', title='Bar Chart of Colors')
ax.legend(title='Attribute', bbox_to_anchor=(1, 1), loc='upper left')
ax.grid(axis='x')
Consider my series as below: First column is article_id and the second column is frequency count.
article_id
1 39
2 49
3 187
4 159
5 158
...
16947 14
16948 7
16976 2
16977 1
16978 1
16980 1
Name: article_id, dtype: int64
I got this series from a dataframe with the following command:
logs.loc[logs['article_id'] <= 17029].groupby('article_id')['article_id'].count()
logs is the dataframe here and article_id is one of the columns in it.
How do I plot a bar chart(using Matlplotlib) such that the article_id is on the X-axis and the frequency count on the Y-axis ?
My natural instinct was to convert it into a list using .tolist() but that doesn't preserve the article_id.
IIUC you need Series.plot.bar:
#pandas 0.17.0 and above
s.plot.bar()
#pandas below 0.17.0
s.plot('bar')
Sample:
import pandas as pd
import matplotlib.pyplot as plt
s = pd.Series({16976: 2, 1: 39, 2: 49, 3: 187, 4: 159,
5: 158, 16947: 14, 16977: 1, 16948: 7, 16978: 1, 16980: 1},
name='article_id')
print (s)
1 39
2 49
3 187
4 159
5 158
16947 14
16948 7
16976 2
16977 1
16978 1
16980 1
Name: article_id, dtype: int64
s.plot.bar()
plt.show()
The new pandas API suggests the following way:
import pandas as pd
s = pd.Series({16976: 2, 1: 39, 2: 49, 3: 187, 4: 159,
5: 158, 16947: 14, 16977: 1, 16948: 7, 16978: 1, 16980: 1},
name='article_id')
s.plot(kind="bar", figsize=(20,10))
If you are working on Jupyter, you don't need the matplotlib library.
Just use 'bar' in kind parameter of plot
Example
series = read_csv('BwsCount.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
series.plot(kind='bar')
Default value of kind is 'line' (ie. series.plot() --> will automatically plot line graph)
For your reference:
kind : str
‘line’ : line plot (default)
‘bar’ : vertical bar plot
‘barh’ : horizontal bar plot
‘hist’ : histogram
‘box’ : boxplot
‘kde’ : Kernel Density Estimation plot
‘density’ : same as ‘kde’
‘area’ : area plot
‘pie’ : pie plot
I'm trying to plot different duration entries on a graph, not sure if the best way would be to plot a bar chart and have the duration variable define the width?
The data looks like this:
Variable 1 Variable 2 Duration (s)
50 36 14
70 41 25
60 40 20
55 18 27
Thanks in advance to anyone who can help out here!
plt.step draws a step function of the accumulated time. An extra zero time point and repeating the first entry makes sure all the values are shown.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = [[50, 36, 14],
[70, 41, 25],
[60, 40, 20],
[55, 18, 27]]
df = pd.DataFrame(data=data, columns=['Variable 1', 'Variable 2', 'Duration'])
xs = df['Duration'].cumsum()
for col in df.columns[:-1]:
plt.step(np.append(0, xs), np.append(df[col][0], df[col]), where='pre')
plt.xticks(xs, df['Duration'])
plt.yticks(df.iloc[0, :-1], df.columns[:-1])
plt.tight_layout()
plt.show()
You can use cumsum to compute the cumulative duration, then plot with step:
(df.append(df.iloc[-1])
.assign(TimeDuration=lambda x: x['Duration (s)'].shift(fill_value=0).cumsum())
.plot(x="TimeDuration", y=['Variable 1', 'Variable 2'],drawstyle='steps-post')
)
Output:
I have to generate a series of scatter plots (roughly 100 in total).
I have created an example to illustrate the problem.
First do an import.
import pandas as pd
Create a pandas dataframe.
# Create dataframe
data = {'name': ['Jason', 'Jason', 'Tina', 'Tina', 'Tina', 'Jason', 'Tina'],
'report_value': [4, 24, 31, 2, 3, 5, 10],
'coverage_id': ['m1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7']}
df = pd.DataFrame(data)
print(df)
Output:
coverage_id name report_value
0 m1 Jason 4
1 m2 Jason 24
2 m3 Tina 31
3 m4 Tina 2
4 m5 Tina 3
5 m6 Jason 5
6 m7 Tina 10
The goal is generate two scatter plots without using a for-loop. The name of the person, Jason or Tina, should be displayed in the title. The report_value should be on the y-axis in both plots and the coverage_id (which is a string) on the x-axis.
I thought I should start with:
df.groupby('name')
Then I need to apply the operation to every group.
This way I have the dataframe grouped by their names. I don't know how to proceed and get Python to make the two plots for me.
Thanks a lot for any help.
I think you can use this solution, but first is necessary convert string column to numeric, plot and last set xlabels:
import matplotlib.pyplot as plt
u, i = np.unique(df.coverage_id, return_inverse=True)
df.coverage_id = i
groups = df.groupby('name')
# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
ax.plot(group.coverage_id,
group.report_value,
marker='o',
linestyle='',
ms=12,
label=name)
ax.set(xticks=range(len(i)), xticklabels=u)
ax.legend()
plt.show()
Another seaborn solution with seaborn.pairplot:
import seaborn as sns
u, i = np.unique(df.coverage_id, return_inverse=True)
df.coverage_id = i
g=sns.pairplot(x_vars=["coverage_id"], y_vars=["report_value"], data=df, hue="name", size=5)
g.set(xticklabels=u, xlim=(0, None))
I am trying to plot three lines on the same plot in Matplotlib. They are InvoicesThisYear, DisputesThisYear, and PercentThisYear (Which is Disputes/Invoices)
The original input is two columns of dates -- one for the date of a logged dispute and one for the date of a logged invoice.
I use the dates to count up the number of disputes and invoices per month during a certain year.
Then I try to graph it, but it comes up empty. I started with just trying to print PercentThisYear and InvoicesThisYear.
PercentThisYear = (DisputesFYThisYear/InvoicesFYThisYear).fillna(0.0)
#Percent_ThisYear.plot(kind = 'line')
#InvoicesFYThisYear.plot(kind = 'line')
plt.plot(PercentThisYear)
plt.xlabel('Date')
plt.ylabel('Percent')
plt.title('Customer Disputes')
# Remove the plot frame lines. They are unnecessary chartjunk.
ax = plt.subplot(111)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax2 = ax.twinx()
ax2.plot(InvoicesFYThisYear)
# Ensure that the axis ticks only show up on the bottom and left of the plot.
# Ticks on the right and top of the plot are generally unnecessary chartjunk.
ax.get_xaxis().tick_bottom()
#ax.get_yaxis().tick_left()
# Limit the range of the plot to only where the data is.
# Avoid unnecessary whitespace.
datenow = datetime.datetime.now()
dstart = datetime.datetime(2015,4,1)
print datenow
#plt.ylim(0, .14)
plt.xlim(dstart, datenow)
firsts=[]
for i in range(dstart.month, datenow.month+1):
firsts.append(datetime.datetime(2015,i,1))
plt.xticks(firsts)
plt.show()
This is the output... The date is all messed up and nothing prints. But the scaled on the axes look right. What am I doing wrong?
Here is the set up leading up to the graph if that is helpful
The Input looks like this:
InvoicesThisYear
Out[82]:
7 7529
5 5511
6 4934
8 3552
dtype: int64
DisputesThisYear
Out[83]:
2 211
1 98
7 54
4 43
3 32
6 29
5 21
8 8
dtype: int64
PercentThisYear
Out[84]:
1 0.000000
2 0.000000
3 0.000000
4 0.000000
5 0.003810
6 0.005877
7 0.007172
8 0.002252
dtype: float64
Matplotlib has no way of knowing which dates are associated with which data points. When you call plot with only one argument y, Matplotlib automatically assumes that the x-values are range(len(y)). You need to supply the dates as the first argument to plot. Assuming that InvoicesThisYear is a count of the number of invoices each month, starting at 1 and ending at 8, you could do something like
import datetime
import matplotlib.pyplot as plt
import pandas as pd
InvoicesFYThisYear = pd.DataFrame([0, 0, 0, 0, 5511, 4934, 7529, 3552])
Disputes = pd.DataFrame([98, 211, 32, 43, 21, 29, 54, 8])
PercentThisYear = (Disputes / InvoicesFYThisYear)
datenow = datetime.date.today()
ax = plt.subplot(111)
dates = [datetime.date(2015,i,1) for i in xrange(1, 9, 1)]
plt.plot(dates, PercentThisYear)
ax2 = ax.twinx()
ax2.plot(dates, InvoicesFYThisYear)
dstart = datetime.datetime(2015,4,1)
plt.xlim(dstart, datenow)
plt.xticks(dates, dates)
plt.show()
If your data is in a Pandas series and the index is an integer representing the month, all you have to do is change the index to datetime objects instead. The plot method for pandas.Series will handle things automatically from there. Here's how you might do that:
Invoices = pd.Series((211, 98, 54, 43, 32, 29, 21, 8), index = (2, 1, 7, 4, 3, 6, 5, 8))
dates = [datetime.date(2015, month, 1) for month in Invoices.index]
Invoices.index = dates
Invoices.plot()