I've been struggling to plot the results of the GroupBy on three columns.
I have the data on the different absences (AbsenceType) of employees (Employee) over 3 years (MonthYear). I would like to plot in one plot, how many absences of a particular type an employee had in each month-year. I have only two employees in the example, but there are more in the data as well as more month-year values.
Create data
data = {'Employee': ['ID1', 'ID1','ID1','ID1','ID1','ID1','ID1', 'ID1', 'ID1', 'ID2','ID2','ID2','ID2','ID2', 'ID2'],
'MonthYear': ['201708', '201601','201601','201708','201710','201801','201801', '201601', '201601', '201705', '201705', '201705', '201810', '201811', '201705'],
'AbsenceType': ['0210', '0210','0250','0215','0217','0260','0210', '0210', '0210', '0260', '0250', '0215', '0217', '0215', '0250']}
columns = ['Employee','MonthYear','AbsenceType']
df = pd.DataFrame(data, columns=columns)
Then I map each of the codes of the AbsenceType into two categories: Sick or Injury.
df['SickOrInjury'] =df['AbsenceType'].replace({'0210':'Sick', '0215':'Sick', '0217':'Sick', '0250':'Injury', '0260':'Injury'})
What I want to achieve is the following groupby:
test = df.groupby(['Employee', 'MonthYear', 'SickOrInjury'])['SickOrInjury'].count()
But, when I try to plot it, it does not fully show what I want. So far I managed to get to the stage:
df.groupby(['Employee', 'MonthYear', 'SickOrInjury'])['SickOrInjury'].count().unstack('SickOrInjury', fill_value=0).plot()
plt.show()
test plot
However, employee's ID are shown on the X axis and not in the legend.
What I want to have is something like this:
desired plot
I would like to have time on the X axis and the count for each absence type (sick or injury) on the Y axis. There should be two different types of lines (e.g. solid and dashed) for each absence type and different colors for each employee (e.g. black and red).
I think unstacking is the right approach to fill missing values but you should probably convert MonthYear to date and resample by month. You can then plot your dataframe using seaborn.lineplot:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = {'Employee': ['ID1', 'ID1','ID1','ID1','ID1','ID1','ID1', 'ID1', 'ID1', 'ID2','ID2','ID2','ID2','ID2', 'ID2'],
'MonthYear': ['201708', '201601','201601','201708','201710','201801','201801', '201601', '201601', '201705', '201705', '201705', '201810', '201811', '201705'],
'AbsenceType': ['0210', '0210','0250','0215','0217','0260','0210', '0210', '0210', '0260', '0250', '0215', '0217', '0215', '0250']}
columns = ['Employee','MonthYear','AbsenceType']
df = pd.DataFrame(data, columns=columns)
df['SickOrInjury'] = df['AbsenceType'].replace({'0210':'Sick', '0215':'Sick', '0217':'Sick', '0250':'Injury', '0260':'Injury'})
df['MonthYear'] = pd.to_datetime(df['MonthYear'], format="%Y%m")
df = df.groupby(['MonthYear', 'Employee', 'SickOrInjury']).count()
# renaming the aggregated (and unique) column
df = df.rename(columns={'AbsenceType': 'EmpAbsCount'})
df = df.unstack(['Employee', 'SickOrInjury'], fill_value=0)
# resampling for monthly values:
df = df.resample('M').sum().stack(['Employee', 'SickOrInjury'])
sns.lineplot(x='MonthYear', y='EmpAbsCount', data=df, hue='Employee', style='SickOrInjury', markers=True, ci=None)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Output:
I have dataframe as ,
i need something like this for each columns like stress , depression and anxiety and each participant data in each category
i wrote the python code as
ax = data_full.plot(x="participants", y=["Stress","Depression","Anxiety"],kind="line", lw=3, ls='--', figsize = (12,6))
plt.grid(True)
plt.show()
get the output like this
Split the participant column and merge it with the original data frame. Change the data frame to a data frame with only the columns you need in the merged data frame. Transform the data frame in its final form by pivoting. The resulting data frame is then used as the basis for the graph. Now we can adjust the x-axis tick marks, the legend position, and the y-axis limits.
dfs = pd.concat([df,df['participants'].str.split('_', expand=True)],axis=1)
dfs.columns = ['Stress', 'Depression', 'Anxiety', 'participants', 'category', 'group']
fin_df = dfs[['category','group','Stress']]
fin_df = dfs.pivot(index='category', columns='group', values='Stress')
# update
fin_df = fin_df.sort_index(ascending=False)
g = fin_df.plot(kind='line', title='Stress')
g.set_xticks([0,1])
g.set_xticklabels(['pre','post'])
g.legend(loc='center right')
g.set_ylim(5,25)
I am running into an issue where I can't get my bar chart to show up in descending order of a column grouped by region.
I have tried to order the values and then group by and plot on a bar chart.
df1 = df.drop(['Total Volume', '4046', '4225', '4770', 'Total Bags', 'Small Bags', 'Large Bags', 'XLarge Bags', 'year', 'Unnamed: 0', 'Date'], axis=1)
df1 = df1.sort_values(['AveragePrice'],ascending=True).groupby('region').mean().plot(kind='bar', figsize=(15,5))
The graph still plots the values out in alphabetical order by region.
Group the values first then sort and change ascending=True to False:
df1 = df1.groupby('region').mean().sort_values(['AveragePrice'],ascending=False).plot(kind='bar', figsize=(15,5))
Also, that code will overwrite df1 as a Matplotlib subplot instead of updating the dataframe. Further calls to df1 will just output the type (matplotlib.axes._subplots.AxesSubplot) instead of displaying the dataframe.
To update df1 with the grouped and sorted dataframe you should first manipulate the dataframe and save it, then call plot on the updated dataframe, as shown below:
# Manipulate the dataframe
df1 = df1.groupby('region').mean().sort_values(['AveragePrice'],ascending=False)
# Plot the results
df1.plot(kind='bar', figsize=(15,5))
This way, further calls to df1 will display the grouped and sorted dataframe as expected.
The original dataset contain 4 data named df1,df2,df3,df4(all in pandas dataframe format)
df1 = pd.read_csv("./df1.csv")
df2 = pd.read_csv("./df2.csv")
df3 = pd.read_csv("./df3.csv")
df4 = pd.read_csv("./df4.csv")
# Concat these data
dataset = [df1,df2, df3,df4]
# Plottting
fig = plt.figure()
bpl = plt.boxplot(dataset, positions=np.array(xrange(len(dataset)))*2.0-0.4, \
sym='+', widths=0.5, patch_artist=True)
plt.show()
But the first data df1 was missing. I check df1, find nothing abnormal.
I upload these 4 data here in .csv format.
Any advice would be appreciate!
Update
I could make the plot without any problem.
I would like to plot 12 graphs (one graph per month) including columns 'A' and 'B' on the left y axis and column 'C' on the right.
Code below plots everything on the left side.
import pandas as pd
index=pd.date_range('2011-1-1 00:00:00', '2011-12-31 23:50:00', freq='1h')
df=pd.DataFrame(np.random.rand(len(index),3),columns=['A','B','C'],index=index)
df2 = df.groupby(lambda x: x.month)
for key, group in df2:
group.plot()
How to separate columns and use something like this:group.plot({'A','B':style='g'},{'C':secondary_y=True}) ?
You can capture the axes which the Pandas plot() command returns and use it again to plot C specifically on the right axis.
index=pd.date_range('2011-1-1 00:00:00', '2011-12-31 23:50:00', freq='1h')
df=pd.DataFrame(np.random.randn(len(index),3).cumsum(axis=0),columns=['A','B','C'],index=index)
df2 = df.groupby(lambda x: x.month)
for key, group in df2:
ax = group[['A', 'B']].plot()
group[['C']].plot(secondary_y=True, ax=ax)
To get all lines in a single legend see:
Legend only shows one label when plotting with pandas