How to MatPlotLib Plot two DataFrames? - python

I have two DataFrame north and south. Each has same rows and columns. I would like to plot the speed columns of both DataFrames in one figure as bar chart. I am trying this:
ax = south['speed'].plot(kind='bar', color='gray')
north['speed'].plot(kind = 'bar', color='red', ax=ax)
plt.show()
But it plots only the last dataframe , i.e. only the north DataFrame. Can you help me?

1) If you would like to plot just 'speed' column, you have to concatenate dataframes like:
df = pd.concat([north, south])
or
df = north.append(south)
2) If you would like to compare 'speed' column of both dataframes, you have to join dataframes along axis=1 like:
df = pd.concat([north, south], axis=1, ignore_index=True)
and the call plot method of df.
For more info: https://pandas.pydata.org/pandas-docs/stable/merging.html

Related

pandas GroupBy plotting two lines for each group on one plot

I've been struggling to plot the results of the GroupBy on three columns.
I have the data on the different absences (AbsenceType) of employees (Employee) over 3 years (MonthYear). I would like to plot in one plot, how many absences of a particular type an employee had in each month-year. I have only two employees in the example, but there are more in the data as well as more month-year values.
Create data
data = {'Employee': ['ID1', 'ID1','ID1','ID1','ID1','ID1','ID1', 'ID1', 'ID1', 'ID2','ID2','ID2','ID2','ID2', 'ID2'],
'MonthYear': ['201708', '201601','201601','201708','201710','201801','201801', '201601', '201601', '201705', '201705', '201705', '201810', '201811', '201705'],
'AbsenceType': ['0210', '0210','0250','0215','0217','0260','0210', '0210', '0210', '0260', '0250', '0215', '0217', '0215', '0250']}
columns = ['Employee','MonthYear','AbsenceType']
df = pd.DataFrame(data, columns=columns)
Then I map each of the codes of the AbsenceType into two categories: Sick or Injury.
df['SickOrInjury'] =df['AbsenceType'].replace({'0210':'Sick', '0215':'Sick', '0217':'Sick', '0250':'Injury', '0260':'Injury'})
What I want to achieve is the following groupby:
test = df.groupby(['Employee', 'MonthYear', 'SickOrInjury'])['SickOrInjury'].count()
But, when I try to plot it, it does not fully show what I want. So far I managed to get to the stage:
df.groupby(['Employee', 'MonthYear', 'SickOrInjury'])['SickOrInjury'].count().unstack('SickOrInjury', fill_value=0).plot()
plt.show()
test plot
However, employee's ID are shown on the X axis and not in the legend.
What I want to have is something like this:
desired plot
I would like to have time on the X axis and the count for each absence type (sick or injury) on the Y axis. There should be two different types of lines (e.g. solid and dashed) for each absence type and different colors for each employee (e.g. black and red).
I think unstacking is the right approach to fill missing values but you should probably convert MonthYear to date and resample by month. You can then plot your dataframe using seaborn.lineplot:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = {'Employee': ['ID1', 'ID1','ID1','ID1','ID1','ID1','ID1', 'ID1', 'ID1', 'ID2','ID2','ID2','ID2','ID2', 'ID2'],
'MonthYear': ['201708', '201601','201601','201708','201710','201801','201801', '201601', '201601', '201705', '201705', '201705', '201810', '201811', '201705'],
'AbsenceType': ['0210', '0210','0250','0215','0217','0260','0210', '0210', '0210', '0260', '0250', '0215', '0217', '0215', '0250']}
columns = ['Employee','MonthYear','AbsenceType']
df = pd.DataFrame(data, columns=columns)
df['SickOrInjury'] = df['AbsenceType'].replace({'0210':'Sick', '0215':'Sick', '0217':'Sick', '0250':'Injury', '0260':'Injury'})
df['MonthYear'] = pd.to_datetime(df['MonthYear'], format="%Y%m")
df = df.groupby(['MonthYear', 'Employee', 'SickOrInjury']).count()
# renaming the aggregated (and unique) column
df = df.rename(columns={'AbsenceType': 'EmpAbsCount'})
df = df.unstack(['Employee', 'SickOrInjury'], fill_value=0)
# resampling for monthly values:
df = df.resample('M').sum().stack(['Employee', 'SickOrInjury'])
sns.lineplot(x='MonthYear', y='EmpAbsCount', data=df, hue='Employee', style='SickOrInjury', markers=True, ci=None)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Output:

How to do line plot for each column separately based on another column for each two samples for pandas dataframe?

I have dataframe as ,
i need something like this for each columns like stress , depression and anxiety and each participant data in each category
i wrote the python code as
ax = data_full.plot(x="participants", y=["Stress","Depression","Anxiety"],kind="line", lw=3, ls='--', figsize = (12,6))
plt.grid(True)
plt.show()
get the output like this
Split the participant column and merge it with the original data frame. Change the data frame to a data frame with only the columns you need in the merged data frame. Transform the data frame in its final form by pivoting. The resulting data frame is then used as the basis for the graph. Now we can adjust the x-axis tick marks, the legend position, and the y-axis limits.
dfs = pd.concat([df,df['participants'].str.split('_', expand=True)],axis=1)
dfs.columns = ['Stress', 'Depression', 'Anxiety', 'participants', 'category', 'group']
fin_df = dfs[['category','group','Stress']]
fin_df = dfs.pivot(index='category', columns='group', values='Stress')
# update
fin_df = fin_df.sort_index(ascending=False)
g = fin_df.plot(kind='line', title='Stress')
g.set_xticks([0,1])
g.set_xticklabels(['pre','post'])
g.legend(loc='center right')
g.set_ylim(5,25)

Sorting bar chart values that have been grouped in descending order

I am running into an issue where I can't get my bar chart to show up in descending order of a column grouped by region.
I have tried to order the values and then group by and plot on a bar chart.
df1 = df.drop(['Total Volume', '4046', '4225', '4770', 'Total Bags', 'Small Bags', 'Large Bags', 'XLarge Bags', 'year', 'Unnamed: 0', 'Date'], axis=1)
df1 = df1.sort_values(['AveragePrice'],ascending=True).groupby('region').mean().plot(kind='bar', figsize=(15,5))
The graph still plots the values out in alphabetical order by region.
Group the values first then sort and change ascending=True to False:
df1 = df1.groupby('region').mean().sort_values(['AveragePrice'],ascending=False).plot(kind='bar', figsize=(15,5))
Also, that code will overwrite df1 as a Matplotlib subplot instead of updating the dataframe. Further calls to df1 will just output the type (matplotlib.axes._subplots.AxesSubplot) instead of displaying the dataframe.
To update df1 with the grouped and sorted dataframe you should first manipulate the dataframe and save it, then call plot on the updated dataframe, as shown below:
# Manipulate the dataframe
df1 = df1.groupby('region').mean().sort_values(['AveragePrice'],ascending=False)
# Plot the results
df1.plot(kind='bar', figsize=(15,5))
This way, further calls to df1 will display the grouped and sorted dataframe as expected.

Missing data in Boxplot using matplotlib

The original dataset contain 4 data named df1,df2,df3,df4(all in pandas dataframe format)
df1 = pd.read_csv("./df1.csv")
df2 = pd.read_csv("./df2.csv")
df3 = pd.read_csv("./df3.csv")
df4 = pd.read_csv("./df4.csv")
# Concat these data
dataset = [df1,df2, df3,df4]
# Plottting
fig = plt.figure()
bpl = plt.boxplot(dataset, positions=np.array(xrange(len(dataset)))*2.0-0.4, \
sym='+', widths=0.5, patch_artist=True)
plt.show()
But the first data df1 was missing. I check df1, find nothing abnormal.
I upload these 4 data here in .csv format.
Any advice would be appreciate!
Update
I could make the plot without any problem.

Plot groupby data using secondary_y axis

I would like to plot 12 graphs (one graph per month) including columns 'A' and 'B' on the left y axis and column 'C' on the right.
Code below plots everything on the left side.
import pandas as pd
index=pd.date_range('2011-1-1 00:00:00', '2011-12-31 23:50:00', freq='1h')
df=pd.DataFrame(np.random.rand(len(index),3),columns=['A','B','C'],index=index)
df2 = df.groupby(lambda x: x.month)
for key, group in df2:
group.plot()
How to separate columns and use something like this:group.plot({'A','B':style='g'},{'C':secondary_y=True}) ?
You can capture the axes which the Pandas plot() command returns and use it again to plot C specifically on the right axis.
index=pd.date_range('2011-1-1 00:00:00', '2011-12-31 23:50:00', freq='1h')
df=pd.DataFrame(np.random.randn(len(index),3).cumsum(axis=0),columns=['A','B','C'],index=index)
df2 = df.groupby(lambda x: x.month)
for key, group in df2:
ax = group[['A', 'B']].plot()
group[['C']].plot(secondary_y=True, ax=ax)
To get all lines in a single legend see:
Legend only shows one label when plotting with pandas

Categories