I am trying to plot a pie chart as a result of the function groupby on dataframe.
I am able to return the correct result of the groupby function but when o try to plot the pie chart its doesn't display all the result.
code:
tl = pd.unique(df['event_type'].tolist())
th = pd.unique(df['event_mohafaza'].tolist())
ct = df.groupby(['event_mohafaza','event_type']).aggregate('sum')
ct = ct.reset_index()
ct['labels'] = df['event_mohafaza'] + ' ' + df['event_type']
trace = go.Pie(labels=ct['labels'],
hoverinfo='label+percent',
values=ct['number_person'],
textposition='outside',
rotation=90)
layout = go.Layout(
title="Percentage of events",
font=dict(family='Arial', size=12, color='#909090'),
legend=dict(x=0.9, y=0.5)
)
data = [trace]
fig = go.Figure(data=data, layout=layout)
fig.show()
result of the groupby function:
number_person
event_mohafaza event_type
loc1 swimming 9157
football 690
baseball 2292
loc2 swimming 10560
football 8987
baseball 70280
loc3 basketball 130
swimming 19395
football 5370
baseball 19078
loc4 swimming 9492
football 50
baseball 5279
loc5 swimming 4652
football 2215
baseball 3000
the plotted pie chart:
it doesn't display all the values it must divide the pie into 16 pieces where now its divided into 8 pieces
Try using:
values = ct['number_person'].value_counts()
Related
I want to return only the top 5 teams that gained the maximum medals (gold, silver and Bronze) in the plot.
medal_ranks = olympics[olympics['Medal'] != 'NaN'].groupby(['NOC', 'Year', 'Sport', 'Event', 'Season'])
medal_ranks = medal_ranks.first()
medal_ranks = medal_ranks.reset_index()
medal_ranks['NOC'].value_counts().head(5)
medal_colors = ['darkgoldenrod', 'gold', 'silver']
cross = pd.crosstab(medal_ranks.NOC, medal_ranks.Medal).plot(kind='bar',color = medal_colors,
stacked=True, figsize=(18,5))
(The Teams that I want to show:
USA 5261
GBR 4152
FRA 4135
ITA 3738
CAN 3649)
I need to plot a histogram for the data below, country wise quantity sum.
Country Quantity
0 United Kingdom 4263829
1 Netherlands 200128
2 EIRE 142637
3 Germany 117448
4 France 110480
5 Australia 83653
6 Sweden 35637
7 Switzerland 30325
8 Spain 26824
9 Japan 25218
so far i have tried this but unable to specify the axis myself:
df.plot(x='Country', y='Quantity', kind='hist', bins=10)
Try a bar plot instead of a plot:
df.bar(x='Country', y='Quantity')
Try this :
import matplotlib.pyplot as plt
plt.bar(df['Country'],df['Quantity'])
plt.show()
I have made 2 bar graphs that goes like
dirmedtop.plot.barh()
dirmeantop.plot.barh()
now I tried to do
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.suptitle('Horizontally stacked subplots')
ax1 = dirmedtop.plot.barh
ax2 = dirmeantop.plot.barh
but the result shows a typerror 'Axessubplot' not callable and this.
I want the bar graphs to be side by side so that i can compare. Can anyone help me do this?
ultimately i want the graphs to look like this
what i mean by dirmedtop and dirmeantop top is this. dirmedtop is the top 10 directors with highest median gross per director. dirmeantop is the top 10 directors with highest average imdb score.
dirmean= df.loc[df['director_name'].isin(director2.index)].groupby('director_name')['imdb_score'].mean()
dirmean
dirmeansort= dirmean.sort_values(ascending=False)
dirmeansort
dirmeantop=dirmeansort.head(10)
dirmeantop
director_name
Christopher Nolan 8.425000
Quentin Tarantino 8.200000
Stanley Kubrick 8.000000
James Cameron 7.914286
David Fincher 7.750000
Peter Jackson 7.675000
Martin Scorsese 7.660000
Wes Anderson 7.628571
Paul Greengrass 7.585714
Sam Mendes 7.500000
Name: imdb_score, dtype: float64
dirmed= df.loc[df['director_name'].isin(director2.index)].groupby('director_name')['gross'].median()
dirmed
dirmedsort= dirmed.sort_values(ascending=False)
dirmedsort
dirmedtop= dirmedsort.head(10)
dirmedtop
director_name
Jon Favreau 312057433.0
Peter Jackson 236579815.0
Christopher Nolan 196667606.5
Bryan Singer 156142402.0
James Cameron 146282411.0
Sam Raimi 138480208.0
Michael Bay 138396624.0
Steven Spielberg 132014112.0
Tom Shadyac 128769345.0
Jay Roach 126561111.0
Name: gross, dtype: float64
Add parameter ax to Series.plot.barh and also sorting both, for add space between is used subplots_adjust:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,5))
plt.subplots_adjust(wspace = 0.7)
fig.suptitle('Top 10 movie directors')
dirmeantop.rename_axis(None).sort_values().plot.barh(ax=ax1, title='By IMDB rank')
dirmedtop.rename_axis(None).sort_values().plot.barh(ax=ax2, title='By Gross')
ax1.set_ylabel('Director')
ax1.set_xlabel('IMDB Score')
ax2.set_xlabel('Gross')
Below is the dataframe
Quantity UnitPrice CustomerID
Country
Netherlands 200128 6492.55 34190538.0
EIRE 142637 48447.19 110391745.0
Germany 117448 37666.00 120075093.0
France 110480 43031.99 107648864.0
Australia 83653 4054.75 15693002.0
How to plot a histogram with condition x axis as country(rotate 90) and Quanity on Y axis
df.hist(x,y)
You can try with this:
df.plot.bar(y='Quantity')
Here's the output:
I have two series' which contains the same data, but they contain a different number of occurrences of this data. I want to compare these two series' by making a bar chart, where the two are compared. Below is what I've done so far.
import matplotlib.patches as mpatches
fig = plt.figure()
ax = fig.add_subplot(111)
width = 0.3
tree_amount15.plot(kind='bar', color='red', ax=ax, width=width, position=1, label='NYC')
queens_tree_types.plot(kind='bar', color='blue', ax=ax, width=width, position=0, label='Queens')
plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3,
ncol=2, mode="expand", borderaxespad=0.)
ax.set_ylabel('Total trees')
ax.set_xlabel('Tree names')
plt.show()
Which gives me the following chart:
The problem I have is that, even though all the 'Tree names' are the same in each series, the 'Total trees' is of course different, so for example, #5 (Callery pear) is only #5 in 'tree_amount15', where it's #3 in 'queens_tree_types' and so on. How can I order the series such that it's the value that corresponds to the right label shown on the chart? Because right now, it's the labels from the series that gets added first, which is shown, which makes the values of the second series be misleading.
Any hints?
Here's how the two series look, when I do a value_counts() them.
tree_amount15:
London planetree 87014
honeylocust 64264
Callery pear 58931
pin oak 53185
Norway maple 34189
littleleaf linden 29742
cherry 29279
Japanese zelkova 29258
ginkgo 21024
Sophora 19338
red maple 17246
green ash 16251
American linden 13530
silver maple 12277
sweetgum 10657
northern red oak 8400
silver linden 7995
American elm 7975
maple 7080
purple-leaf plum 6879
queens_tree_types:
London planetree 31111
pin oak 22610
honeylocust 20290
Norway maple 19407
Callery pear 16547
cherry 13497
littleleaf linden 11902
Japanese zelkova 8987
green ash 7389
silver maple 6116
ginkgo 5971
Sophora 5386
red maple 4935
American linden 4769
silver linden 4146
purple-leaf plum 3035
maple 2992
northern red oak 2697
sweetgum 2489
American elm 1709
You can create a data frame from your two series that uses the tree name index. By default pandas will sort the index alphabetically, so we tell it to sort using the values of NYC. With both series as columns, we can use a single call to the plot method to put them on the same graph.
df = pd.concat([tree_amount15, queens_tree_types], axis=1).rename_axis(
{0:'NYC', 1:'Queens'}, axis='columns') # sets the column names
df.sort_values('NYC', ascending=False) # sort the df using NYC values
df.plot.bar(color=['red','blue'])