Looping over Pandas DataFrame and Plotting Each Column - python

I'd like to create a function or for loop that will display a sorted horizontal bar chart for each column of a DataFrame. An example of the code for one column can be seen below. The plt.savefig() is a nice to have but not necessary.
frame['Population-current year'] is one of the columns from the DataFrame. I can't seem to formulate the necessary structure for the loop, syntax-wise.
DataFrame header:
Place name Population-current year
Santa Fe 82980.0
Nashville-Davidson 654187.0
Sandia Park 72.0
Stone Mountain 6209.0
Asheville 89318.0
Place name Pct change in population, 2010-current year \
Santa Fe 0.2277
Nashville-Davidson 0.1130
Sandia Park 0.0286
Stone Mountain 0.0362
Asheville 0.0896
Place name Median age of population Median age of housing \
Santa Fe 42.8 34.0
Nashville-Davidson 34.1 38.0
Sandia Park 60.2 24.0
Stone Mountain 38.8 38.0
Asheville 38.6 44.0
Thanks in Advance.
ax = (frame['Population-current year']/1000).sort_values().plot(kind='barh', figsize=(8, 10), color='#86bf91', zorder=2, width=0.85)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.tick_params(axis="both", which="both", bottom="off", top="off", labelbottom="on", left="off", right="off", labelleft="on")
vals = ax.get_xticks()
for tick in vals:
ax.axvline(x=tick, linestyle='dashed', alpha=0.4, color='#eeeeee', zorder=1)
ax.set_title("Population - 2017 (1000s)", weight='bold', size=12)
ax.set_xlabel("Population - 2017 (1000s)", labelpad=20, weight='bold', size=12)
ax.set_ylabel("City", labelpad=20, weight='bold', size=12)
ax.xaxis.set_major_formatter(StrMethodFormatter('{x:,g}'))
plt.savefig('pop_chart.png')

Related

crosstab - plot - how to return a bar chart with the top 5

I want to return only the top 5 teams that gained the maximum medals (gold, silver and Bronze) in the plot.
medal_ranks = olympics[olympics['Medal'] != 'NaN'].groupby(['NOC', 'Year', 'Sport', 'Event', 'Season'])
medal_ranks = medal_ranks.first()
medal_ranks = medal_ranks.reset_index()
medal_ranks['NOC'].value_counts().head(5)
medal_colors = ['darkgoldenrod', 'gold', 'silver']
cross = pd.crosstab(medal_ranks.NOC, medal_ranks.Medal).plot(kind='bar',color = medal_colors,
stacked=True, figsize=(18,5))
(The Teams that I want to show:
USA 5261
GBR 4152
FRA 4135
ITA 3738
CAN 3649)

Pandas Plot, arrange bar value from high to low, and colour Individual bar separately

I have a series as below:
Country
India 691904
China 659962
United Kingdom of Great Britain and Northern Ireland 551500
Philippines 511391
Pakistan 241600
United States of America 241122
Iran (Islamic Republic of) 175923
Sri Lanka 148358
Republic of Korea 142581
After i plotted a horizontal bar graph, the graph x-axis was arranged from low to high
Q:
1) How can I arrange the y-axis from High to Low?
2) Is there any easier way to colour all bar grey except for highest value (in this case India) without typing all the country name?
Thanks!
ax = top15.plot(kind='barh',
figsize=(20,10),
color = {'red':'India'})
ax.set_title('Immigration from 1980 - 2013')
ax.set_xlabel('Total Number')
Firstly use sort_values():
top15=top15.sort_values(ascending=True)
Finally:
color=['0.8','0.8','0.8','0.8','0.8','0.8','0.8','0.8','r']
#created a list of colors
ax = top15.plot(kind='barh',
figsize=(20,10),
color = color)
ax.set_title('Immigration from 1980 - 2013')
ax.set_xlabel('Total Number')
Output:

How do i plot the graph side by side for comparison?

I have made 2 bar graphs that goes like
dirmedtop.plot.barh()
dirmeantop.plot.barh()
now I tried to do
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.suptitle('Horizontally stacked subplots')
ax1 = dirmedtop.plot.barh
ax2 = dirmeantop.plot.barh
but the result shows a typerror 'Axessubplot' not callable and this.
I want the bar graphs to be side by side so that i can compare. Can anyone help me do this?
ultimately i want the graphs to look like this
what i mean by dirmedtop and dirmeantop top is this. dirmedtop is the top 10 directors with highest median gross per director. dirmeantop is the top 10 directors with highest average imdb score.
dirmean= df.loc[df['director_name'].isin(director2.index)].groupby('director_name')['imdb_score'].mean()
dirmean
dirmeansort= dirmean.sort_values(ascending=False)
dirmeansort
dirmeantop=dirmeansort.head(10)
dirmeantop
director_name
Christopher Nolan 8.425000
Quentin Tarantino 8.200000
Stanley Kubrick 8.000000
James Cameron 7.914286
David Fincher 7.750000
Peter Jackson 7.675000
Martin Scorsese 7.660000
Wes Anderson 7.628571
Paul Greengrass 7.585714
Sam Mendes 7.500000
Name: imdb_score, dtype: float64
dirmed= df.loc[df['director_name'].isin(director2.index)].groupby('director_name')['gross'].median()
dirmed
dirmedsort= dirmed.sort_values(ascending=False)
dirmedsort
dirmedtop= dirmedsort.head(10)
dirmedtop
director_name
Jon Favreau 312057433.0
Peter Jackson 236579815.0
Christopher Nolan 196667606.5
Bryan Singer 156142402.0
James Cameron 146282411.0
Sam Raimi 138480208.0
Michael Bay 138396624.0
Steven Spielberg 132014112.0
Tom Shadyac 128769345.0
Jay Roach 126561111.0
Name: gross, dtype: float64
Add parameter ax to Series.plot.barh and also sorting both, for add space between is used subplots_adjust:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,5))
plt.subplots_adjust(wspace = 0.7)
fig.suptitle('Top 10 movie directors')
dirmeantop.rename_axis(None).sort_values().plot.barh(ax=ax1, title='By IMDB rank')
dirmedtop.rename_axis(None).sort_values().plot.barh(ax=ax2, title='By Gross')
ax1.set_ylabel('Director')
ax1.set_xlabel('IMDB Score')
ax2.set_xlabel('Gross')

How do I make a plot like the one given below with df.plot function?

I've this data of 2007 with population in Millions,GDP in Billions and index column is Country
continent year lifeExpectancy population gdpPerCapita GDP Billions
country
China Asia 2007 72.961 1318.6831 4959.11485 6539.50093
India Asia 2007 64.698 1110.39633 2452.21041 2722.92544
United States Americas 2007 78.242 301.139947 42951.6531 12934.4585
Indonesia Asia 2007 70.65 223.547 3540.65156 791.502035
Brazil Americas 2007 72.39 190.010647 9065.80083 1722.59868
Pakistan Asia 2007 65.483 169.270617 2605.94758 441.110355
Bangladesh Asia 2007 64.062 150.448339 1391.25379 209.311822
Nigeria Africa 2007 46.859 135.031164 2013.97731 271.9497
Japan Asia 2007 82.603 127.467972 31656.0681 4035.1348
Mexico Americas 2007 76.195 108.700891 11977.575 1301.97307
I am trying to plot a histogram as the following:
This was plotted using matplotlib (code below), and I want to get this with df.plot method.
The code for plotting with matplotlib:
x = data.plot(y=[3],kind = "bar")
data.plot(y = [3,5],kind = "bar",secondary_y = True,ax = ax,style='g:', figsize = (24, 6))
plt.show()
You could use df.plot() with the y axis columns you need in your plot and secondary_y argument as the second column
data[['population','gdpPerCapita']].plot(kind='bar', secondary_y='gdpPerCapita')
If you want to set the y labels for each side, then you have to get all the axes of the plot (in this case 2 y axis) and set the labels respectively.
ax1, ax2 = plt.gcf().get_axes()
ax1.set_ylabel('Population')
ax2.set_ylabel('GDP')
Output:

Pandas subplot using two series

I have two series' which contains the same data, but they contain a different number of occurrences of this data. I want to compare these two series' by making a bar chart, where the two are compared. Below is what I've done so far.
import matplotlib.patches as mpatches
fig = plt.figure()
ax = fig.add_subplot(111)
width = 0.3
tree_amount15.plot(kind='bar', color='red', ax=ax, width=width, position=1, label='NYC')
queens_tree_types.plot(kind='bar', color='blue', ax=ax, width=width, position=0, label='Queens')
plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3,
ncol=2, mode="expand", borderaxespad=0.)
ax.set_ylabel('Total trees')
ax.set_xlabel('Tree names')
plt.show()
Which gives me the following chart:
The problem I have is that, even though all the 'Tree names' are the same in each series, the 'Total trees' is of course different, so for example, #5 (Callery pear) is only #5 in 'tree_amount15', where it's #3 in 'queens_tree_types' and so on. How can I order the series such that it's the value that corresponds to the right label shown on the chart? Because right now, it's the labels from the series that gets added first, which is shown, which makes the values of the second series be misleading.
Any hints?
Here's how the two series look, when I do a value_counts() them.
tree_amount15:
London planetree 87014
honeylocust 64264
Callery pear 58931
pin oak 53185
Norway maple 34189
littleleaf linden 29742
cherry 29279
Japanese zelkova 29258
ginkgo 21024
Sophora 19338
red maple 17246
green ash 16251
American linden 13530
silver maple 12277
sweetgum 10657
northern red oak 8400
silver linden 7995
American elm 7975
maple 7080
purple-leaf plum 6879
queens_tree_types:
London planetree 31111
pin oak 22610
honeylocust 20290
Norway maple 19407
Callery pear 16547
cherry 13497
littleleaf linden 11902
Japanese zelkova 8987
green ash 7389
silver maple 6116
ginkgo 5971
Sophora 5386
red maple 4935
American linden 4769
silver linden 4146
purple-leaf plum 3035
maple 2992
northern red oak 2697
sweetgum 2489
American elm 1709
You can create a data frame from your two series that uses the tree name index. By default pandas will sort the index alphabetically, so we tell it to sort using the values of NYC. With both series as columns, we can use a single call to the plot method to put them on the same graph.
df = pd.concat([tree_amount15, queens_tree_types], axis=1).rename_axis(
{0:'NYC', 1:'Queens'}, axis='columns') # sets the column names
df.sort_values('NYC', ascending=False) # sort the df using NYC values
df.plot.bar(color=['red','blue'])

Categories