After a count operation in Pandas, I have the following dataframe:
Cancer No Yes
AgeGroups Factor
0-5 w-statin 108 0
wo-statin 6575 223
11-15 w-statin 5 1
wo-statin 3669 143
16-20 w-statin 28 1
wo-statin 6174 395
21-25 w-statin 80 2
wo-statin 8173 624
26-30 w-statin 110 2
wo-statin 9143 968
30-35 w-statin 171 5
wo-statin 9046 1225
35-40 w-statin 338 21
wo-statin 8883 1475
41-45 w-statin 782 65
wo-statin 11155 2533
I am having a problem with my barchart. With the code:
ax = counts.plot(kind='bar',stacked=True,colormap='Paired',rot = 45)
for p in ax.patches:
ax.annotate(np.round(p.get_height(),decimals=0).astype(np.int64), (p.get_x()+p.get_width()/2., p.get_y()), ha='center', va='center', xytext=(2, 10), textcoords='offset points', fontsize=10)
yielded me:
My target is to achieve two different subplots with two different factors (w-statin/wo-statin) with agegroups as my x-axis. It should approximately look like this:
I would appreciate any help provided. Thank you so much.
by_factor = counts.groupby(level='Factor')
k = by_factor.ngroups
fig, axes = plt.subplots(1, k, sharex=True, sharey=False, figsize=(15, 8))
for i, (gname, grp) in enumerate(by_factor):
grp.xs(gname, level='Factor').plot.bar(
stacked=True, colormap='Paired', rot=45, ax=axes[i], title=gname)
fig.tight_layout()
Related
I'm still having troubles to do this
Here is how my data looks like:
date positive negative neutral
0 2015-09 23 6 18
1 2016-04 709 288 704
2 2016-08 1478 692 1750
3 2016-09 1881 926 2234
4 2016-10 3196 1594 3956
in my csv file I don't have those 0-4 indexes, but only 4 columns from 'date' to 'neutral'.
I don't know how to fix my codes to get it look like this
Seaborn code
sns.set(style='darkgrid', context='talk', palette='Dark2')
fig, ax = plt.subplots(figsize=(8, 8))
sns.barplot(x=df['positive'], y=df['negative'], ax=ax)
ax.set_xticklabels(['Negative', 'Neutral', 'Positive'])
ax.set_ylabel("Percentage")
plt.show()
To do this in seaborn you'll need to transform your data into long format. You can easily do this via melt:
plotting_df = df.melt(id_vars="date", var_name="sign", value_name="percentage")
print(plotting_df.head())
date sign percentage
0 2015-09 positive 23
1 2016-04 positive 709
2 2016-08 positive 1478
3 2016-09 positive 1881
4 2016-10 positive 3196
Then you can plot this long-format dataframe with seaborn in a straightforward mannter:
sns.set(style='darkgrid', context='talk', palette='Dark2')
fig, ax = plt.subplots(figsize=(8, 8))
sns.barplot(x="date", y="percentage", ax=ax, hue="sign", data=plotting_df)
Based on the data you posted
sns.set(style='darkgrid', context='talk', palette='Dark2')
# fig, ax = plt.subplots(figsize=(8, 8))
df.plot(x="date",y=["positive","neutral","negative"],kind="bar")
plt.xticks(rotation=-360)
# ax.set_xticklabels(['Negative', 'Neutral', 'Positive'])
# ax.set_ylabel("Percentage")
plt.show()
I will like to know how I can go about plotting a barchart with upper and lower limits of the bins represented by the values in the age_classes column of the dataframe shown below with pandas, seaborn or matplotlib. A sample of the dataframe looks like this:
age_classes total_cases male_cases female_cases
0 0-9 693 381 307
1 10-19 931 475 454
2 20-29 4530 1919 2531
3 30-39 7466 3505 3885
4 40-49 13701 6480 7130
5 50-59 20975 11149 9706
6 60-69 18089 11761 6254
7 70-79 19238 12281 6868
8 80-89 16252 8553 7644
9 >90 4356 1374 2973
10 Unknown 168 84 81
If you want a chart like this:
then you can make it with sns.barplot setting age_classes as x and one columns (in my case total_cases) as y, like in this code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('data.csv')
fig, ax = plt.subplots()
sns.barplot(ax = ax,
data = df,
x = 'age_classes',
y = 'total_cases')
plt.show()
I've got a dual axis bar and line plot using matplotlib. I read the data in as a dataframe,
[WEEK SIGNUPS APPLICATIONS PRECOURSE_WORK QUALIFIED ENROLLED SPEND
2019-10-07 5674 2938 2220 106 2 77581.67
2019-10-14 4538 2225 2309 567 204 61258.08
2019-10-21 3865 1997 1801 121 39 53700.58
2019-10-28 3559 1886 1641 162 39 53543.28
2019-11-04 3782 1946 1980 190 109 49495.64
2019-11-11 4033 2035 1568 118 109 49952.17
2019-11-18 3999 2009 1537 83 77 58545.72
2019-11-25 6170 3322 1660 110 61 52332.4
2019-12-02 5189 2658 7041 73 30 56727.55
2019-12-09 4631 2497 7904 174 116 60977.49
2019-12-16 4935 2501 3492 108 82 68179.54
2019-12-23 5289 2603 1983 80 38 76956.81
2019-12-30 5843 3037 2150 90 80 76246.14
2020-01-06 4194 1930 1619 74 57 46114.68]
My code works and produces a graph (below)
Here is my code
import matplotlib.pyplot as plt
from pylab import rcParams
from matplotlib import style
style.use('seaborn-paper')
#print(plt.style.available)
rcParams['figure.figsize'] = 20, 10
#plt.xticks(df[['WEEK']])
ax = df[['SPEND']].plot(kind='bar', color = 'lightblue')
ax.set_ylabel("Spend",color="blue",fontsize=20)
ax.set_xlabel('Weeks',color="blue",fontsize=20)
ax2 = ax.twinx()
ax2.plot(df[['SIGNUPS','APPLICATIONS','ENROLLED']].values, linestyle='-', marker='o', linewidth=4.0)
fmt = '${x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick)
When I uncomment the line plt.xticks(df[['WEEK']]) I get the following error
ConversionError Failed to convert value(s) to axis unit.
Can anyone help me out?
plt.xticks is expecting the tick locations to be specified and optionally the labels, from the docs the signature is
xticks(ticks, [labels], **kwargs)
So when you do
plt.xticks(df[['WEEK']])
It is trying to interpret the dates in the 'WEEK' column as the locations for the ticks. What you want to do instead is use plt.set_xticklabels which expects only the labels be specified, i.e.
plt.set_xticklabels(df[['WEEK']])
# or
plt.set_xticklabels(df[['WEEK']].values)
Although you may also need to manually covert the values to strings, depending on how they are defined.
This is my data frame:
6month final-formula Question Text numPatients6month
286231 1 0.031730 CI_FINANCE 977
286270 1 0.147390 CI_MJO 977
286276 1 0.106448 CI_CONCENTRATING 977
286700 2 0.010323 CI_MJO 775
286323 2 0.018065 CI_FINANCE 775
286401 2 0.034839 CI_CONCENTRATING 775
286228 3 0.032020 CI_CONCENTRATING 812
286238 3 0.061576 CI_MJO 812
286292 3 0.008621 CI_FINANCE 812
286690 4 0.008097 CI_MJO 741
286342 4 0.005398 CI_FINANCE 741
286430 4 0.060729 CI_CONCENTRATING 741
286481 5 0.009840 CI_FINANCE 813
287441 5 0.008610 CI_MJO 813
286362 5 0.041820 CI_CONCENTRATING 813
286360 6 0.021622 CI_CONCENTRATING 740
286492 6 0.017568 CI_FINANCE 740
286494 6 0.014865 CI_MJO 740
286482 7 0.015464 CI_FINANCE 776
286483 7 0.042526 CI_MJO 776
286599 7 0.011598 CI_CONCENTRATING 776
286361 8 0.024490 CI_CONCENTRATING 735
286989 8 0.004082 CI_FINANCE 735
286402 8 0.021769 CI_MJO 735
287119 9 0.003916 CI_FINANCE 766
286408 9 0.011749 CI_MJO 766
286399 9 0.019582 CI_CONCENTRATING 766
286267 10 0.019337 CI_CONCENTRATING 724
286249 10 0.037293 CI_MJO 724
286810 10 0.008287 CI_FINANCE 724
I have plotted this data frame as stacked bar chart.
this stacked bar chart is based on (6month,final-formula).
As you see there is numPatients6month in the data frame.
I would like to show this number on each category of stacked bar.
for example:
this is my barchart:
so according to stacked bar above, I want to show 977 in the first bar the blue color, show 977 for the CI_Finance which is orange color.
It is different from this question, as it isnt stacked bar,
Also different from this, as I am going to show another column(numPatients6month) which is in my data frame, not the column in y-axis.
y-axis is final-formula, but I would like to show numPatients6month on each color of each stacked bar.
Just as information, I have plotted the above using this code:
df = dffinal.drop('numPatients6month', 1).groupby(['6month','Question Text']).sum().unstack('Question Text')
df.columns = df.columns.droplevel()
ax=df.plot(kind='bar', stacked=True)
import matplotlib.pyplot as plt
plt.xticks(range(0,10), ['6month','1 year','1.5 year','2 year','2.5 year','3 year','3.5 year','4 year','4.5 year','5 year'], fontsize=8, rotation=45)
plt.title('Cognitive Impairement-Stack bar')
plt.show()
Thanks, :)
here is one way to do it:
ax=df.plot(kind='bar', stacked=True)
#loop to add the text
list_values = (dffinal['numPatients6month'].tolist()[::3]
+ dffinal['numPatients6month'].tolist()[1::3]
+ dffinal['numPatients6month'].tolist()[2::3])
for rect, value in zip(ax.patches, list_values):
h = rect.get_height() /2.
w = rect.get_width() /2.
x, y = rect.get_xy()
ax.text(x+w, y+h,value,horizontalalignment='center',verticalalignment='center')
#same than your code
plt.xticks(range(0,10), ['6month','1 year','1.5 year','2 year','2.5 year','3 year','3.5 year','4 year','4.5 year','5 year'], fontsize=8, rotation=45)
plt.title('Cognitive Impairement-Stack bar')
plt.show()
The list_values is to get the value from the column 'numPatients6month' in the same order than the rect from ax.patches and the result is:
but because some bars are small the results is not really easy to read.
EDIT: about the loop, ax.patches contains informations about all the bar you plot, so for each bar that I named rect, with get_xy you get the position of the bottom left corner of the bar, and with get_height (r. get_width) get the height (r. width) of the bar. so (x+w, y+h) gives the coordinates of the middle of the bar, where you add the text value (from list_values) with the function ax.text (parameters horizontalalignment and verticalalignment are to center the text)
EDIT 2: more general method, thanks to #SpghttCd for getting list_values
list_values = (dffinal.drop('final-formula', 1).groupby(['6month','Question Text']).sum()
.unstack('Question Text').fillna(0).astype(int).values.flatten('F'))
for rect, value in zip(ax.patches, list_values):
if value != 0:
h = rect.get_height() /2.
w = rect.get_width() /2.
x, y = rect.get_xy()
ax.text(x+w, y+h,value,horizontalalignment='center',verticalalignment='center')
You can calculate the x- and y-positions of the labels directly from your dataset:
x_lbl = dffinal['6month'].values - 1
y_lbl = (df.cumsum(axis=1) - df/2).values.flatten()
The arrangement of the labels can be done the same way you did for your data:
df_lbl = dffinal.drop('final-formula', 1).groupby(['6month','Question Text']).sum().unstack('Question Text')
lbl = df_lbl.values.flatten()
and then just loop over the lists of your x-, y- and label-arrays:
for x, y, txt in zip(x_lbl, y_lbl, lbl):
plt.text(x, y, txt, va='center', ha='center')
I currently have a dataframe that has as an index the years from 1990 to 2014 (25 rows). I want my plot to have the X axis with all the years showing. I'm using add_subplot as I plan to have 4 plots in this figure (all of them with the same X axis).
To create the dataframe:
import pandas as pd
import numpy as np
index = np.arange(1990,2015,1)
columns = ['Total Population','Urban Population']
pop_plot = pd.DataFrame(index=index, columns=columns)
pop_plot = df_.fillna(0)
pop_plot['Total Population'] = np.arange(150,175,1)
pop_plot['Urban Population'] = np.arange(50,125,3)
Total Population Urban Population
1990 150 50
1991 151 53
1992 152 56
1993 153 59
1994 154 62
1995 155 65
1996 156 68
1997 157 71
1998 158 74
1999 159 77
2000 160 80
2001 161 83
2002 162 86
2003 163 89
2004 164 92
2005 165 95
2006 166 98
2007 167 101
2008 168 104
2009 169 107
2010 170 110
2011 171 113
2012 172 116
2013 173 119
2014 174 122
The code that I currently have:
fig = plt.figure(figsize=(10,5))
ax1 = fig.add_subplot(2,2,1, xticklabels=pop_plot.index)
plt.subplot(2, 2, 1)
plt.plot(pop_plot)
legend = plt.legend(pop_plot, bbox_to_anchor=(0.1, 1, 0.8, .45), loc=3, ncol=1, mode='expand')
legend.get_frame().set_alpha(0)
ax1.set_xticks(range(len(pop_plot.index)))
This is the plot that I get:
When I comment the set_xticks I get the following plot:
#ax1.set_xticks(range(len(pop_plot.index)))
I've tried a couple of answers that I found here, but I didn't have much success.
It's not clear what ax1.set_xticks(range(len(pop_plot.index))) should be used for. It will set the ticks to the numbers 0,1,2,3 etc. while your plot should range from 1990 to 2014.
Instead, you want to set the ticks to the numbers of your data:
ax1.set_xticks(pop_plot.index)
Complete corrected example:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
index = np.arange(1990,2015,1)
columns = ['Total Population','Urban Population']
pop_plot = pd.DataFrame(index=index, columns=columns)
pop_plot['Total Population'] = np.arange(150,175,1)
pop_plot['Urban Population'] = np.arange(50,125,3)
fig = plt.figure(figsize=(10,5))
ax1 = fig.add_subplot(2,2,1)
ax1.plot(pop_plot)
legend = ax1.legend(pop_plot, bbox_to_anchor=(0.1, 1, 0.8, .45), loc=3, ncol=1, mode='expand')
legend.get_frame().set_alpha(0)
ax1.set_xticks(pop_plot.index)
plt.show()
The easiest option is to use the xticks parameter for pandas.DataFrame.plot
Pass the dataframe index to xticks: xticks=pop_plot.index
# given the dataframe in the OP
ax = pop_plot.plot(xticks=pop_plot.index, figsize=(15, 5))
# move the legend
ax.legend(bbox_to_anchor=(0.1, 1, 0.8, .45), loc=3, ncol=1, mode='expand', frameon=False)