I have made 2 bar graphs that goes like
dirmedtop.plot.barh()
dirmeantop.plot.barh()
now I tried to do
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.suptitle('Horizontally stacked subplots')
ax1 = dirmedtop.plot.barh
ax2 = dirmeantop.plot.barh
but the result shows a typerror 'Axessubplot' not callable and this.
I want the bar graphs to be side by side so that i can compare. Can anyone help me do this?
ultimately i want the graphs to look like this
what i mean by dirmedtop and dirmeantop top is this. dirmedtop is the top 10 directors with highest median gross per director. dirmeantop is the top 10 directors with highest average imdb score.
dirmean= df.loc[df['director_name'].isin(director2.index)].groupby('director_name')['imdb_score'].mean()
dirmean
dirmeansort= dirmean.sort_values(ascending=False)
dirmeansort
dirmeantop=dirmeansort.head(10)
dirmeantop
director_name
Christopher Nolan 8.425000
Quentin Tarantino 8.200000
Stanley Kubrick 8.000000
James Cameron 7.914286
David Fincher 7.750000
Peter Jackson 7.675000
Martin Scorsese 7.660000
Wes Anderson 7.628571
Paul Greengrass 7.585714
Sam Mendes 7.500000
Name: imdb_score, dtype: float64
dirmed= df.loc[df['director_name'].isin(director2.index)].groupby('director_name')['gross'].median()
dirmed
dirmedsort= dirmed.sort_values(ascending=False)
dirmedsort
dirmedtop= dirmedsort.head(10)
dirmedtop
director_name
Jon Favreau 312057433.0
Peter Jackson 236579815.0
Christopher Nolan 196667606.5
Bryan Singer 156142402.0
James Cameron 146282411.0
Sam Raimi 138480208.0
Michael Bay 138396624.0
Steven Spielberg 132014112.0
Tom Shadyac 128769345.0
Jay Roach 126561111.0
Name: gross, dtype: float64
Add parameter ax to Series.plot.barh and also sorting both, for add space between is used subplots_adjust:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,5))
plt.subplots_adjust(wspace = 0.7)
fig.suptitle('Top 10 movie directors')
dirmeantop.rename_axis(None).sort_values().plot.barh(ax=ax1, title='By IMDB rank')
dirmedtop.rename_axis(None).sort_values().plot.barh(ax=ax2, title='By Gross')
ax1.set_ylabel('Director')
ax1.set_xlabel('IMDB Score')
ax2.set_xlabel('Gross')
Related
Currently I have built a network using NetworkX from source-target dataframe:
import networkx as nx
G = nx.from_pandas_edgelist(df, source='Person1', target='Person2')
Dataset
Person1 Age Person2 Wedding
0 Adam John 3 Yao Ming Green
1 Mary Abbey 5 Adam Lebron Green
2 Samuel Bradley 24 Mary Lane Orange
3 Lucas Barney 12 Julie Lime Yellow
4 Christopher Rice 0.9 Matt Red Green
I would like to set the size/weights of the links based on the Age column (i.e. age of marriage) and the colour of nodes as in the column Wedding.
I know that, if I wanted add an edge, I could set it as follows: G.add_edge(Person1,Person2, size = 10); for applying different colours to nodes I should probably use the parameter node_color=color_map, where color_map should be the list of colours in the Wedding column (if I am right).
Can you please explain me how to apply these settings to my case?
IIUC:
df = pd.read_clipboard(sep='\s\s+')
collist = df.drop('Age', axis=1).melt('Wedding')
collist
G = nx.from_pandas_edgelist(df, source='Person1', target='Person2', edge_attr='Age')
pos=nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos, nodelist=collist['value'], node_color=collist['Wedding'])
nx.draw_networkx_edges(G, pos, width = [i['Age'] for i in dict(G.edges).values()])
Output:
I need to plot a histogram for the data below, country wise quantity sum.
Country Quantity
0 United Kingdom 4263829
1 Netherlands 200128
2 EIRE 142637
3 Germany 117448
4 France 110480
5 Australia 83653
6 Sweden 35637
7 Switzerland 30325
8 Spain 26824
9 Japan 25218
so far i have tried this but unable to specify the axis myself:
df.plot(x='Country', y='Quantity', kind='hist', bins=10)
Try a bar plot instead of a plot:
df.bar(x='Country', y='Quantity')
Try this :
import matplotlib.pyplot as plt
plt.bar(df['Country'],df['Quantity'])
plt.show()
I'd like to create a function or for loop that will display a sorted horizontal bar chart for each column of a DataFrame. An example of the code for one column can be seen below. The plt.savefig() is a nice to have but not necessary.
frame['Population-current year'] is one of the columns from the DataFrame. I can't seem to formulate the necessary structure for the loop, syntax-wise.
DataFrame header:
Place name Population-current year
Santa Fe 82980.0
Nashville-Davidson 654187.0
Sandia Park 72.0
Stone Mountain 6209.0
Asheville 89318.0
Place name Pct change in population, 2010-current year \
Santa Fe 0.2277
Nashville-Davidson 0.1130
Sandia Park 0.0286
Stone Mountain 0.0362
Asheville 0.0896
Place name Median age of population Median age of housing \
Santa Fe 42.8 34.0
Nashville-Davidson 34.1 38.0
Sandia Park 60.2 24.0
Stone Mountain 38.8 38.0
Asheville 38.6 44.0
Thanks in Advance.
ax = (frame['Population-current year']/1000).sort_values().plot(kind='barh', figsize=(8, 10), color='#86bf91', zorder=2, width=0.85)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.tick_params(axis="both", which="both", bottom="off", top="off", labelbottom="on", left="off", right="off", labelleft="on")
vals = ax.get_xticks()
for tick in vals:
ax.axvline(x=tick, linestyle='dashed', alpha=0.4, color='#eeeeee', zorder=1)
ax.set_title("Population - 2017 (1000s)", weight='bold', size=12)
ax.set_xlabel("Population - 2017 (1000s)", labelpad=20, weight='bold', size=12)
ax.set_ylabel("City", labelpad=20, weight='bold', size=12)
ax.xaxis.set_major_formatter(StrMethodFormatter('{x:,g}'))
plt.savefig('pop_chart.png')
I am researching ATP Tour male tennis data. Currently, I have a Pandas dataframe that contains ~60,000 matches. Every row contains information / statistics about the match, split between the winner and the loser. I have sorted the dataframe on date. Currently I am trying to calculate the ELO-rating of both the winner and the loser for every match (thus every row).
To calculate the ELO-rating, one needs the ELO-rating for both players in their previous match. Another difficulty arises, as the winner of the current match might have been a loser in his previous match. As a result, the 'winner_player_id' value of the current match might be in the 'loser_player_id' column for the previous match.
I am not sure how to efficiently select the previous ELO-ratings for both players per row, as this entails a search across multiple columns.
Every row includes the following columns:
array(['match_id', 'tourney_dates', 'round_order', 'tourney_name',
'tourney_year_id', 'tourney_round_name', 'winner_player_id',
'winner_slug', 'loser_player_id', 'loser_slug', 'elo_player_1', 'elo_player_2'])
Your time is appreciated!
One approach would be to sort each winner and loser in each row by player name/ID, so the order will be stable regardless of who wins/loses. Here's an example:
df.join(pd.DataFrame(
np.sort(df[['winner_name', 'loser_name']].values, axis=1),
columns=['name1', 'name2']))
df.head(10)
Output:
winner_name loser_name name1 name2
0 Nicklas Kulti Michael Stich Michael Stich Nicklas Kulti
1 Michael Stich Jim Courier Jim Courier Michael Stich
2 Nicklas Kulti Magnus Larsson Magnus Larsson Nicklas Kulti
3 Jim Courier Martin Sinner Jim Courier Martin Sinner
4 Michael Stich Jimmy Arias Jimmy Arias Michael Stich
5 Nicklas Kulti Fabrice Santoro Fabrice Santoro Nicklas Kulti
6 Magnus Larsson Patrik Kuhnen Magnus Larsson Patrik Kuhnen
7 Jim Courier Paul Haarhuis Jim Courier Paul Haarhuis
8 Nicklas Kulti Magnus Gustafsson Magnus Gustafsson Nicklas Kulti
9 Michael Stich Gilad Bloom Gilad Bloom Michael Stich
I have two series' which contains the same data, but they contain a different number of occurrences of this data. I want to compare these two series' by making a bar chart, where the two are compared. Below is what I've done so far.
import matplotlib.patches as mpatches
fig = plt.figure()
ax = fig.add_subplot(111)
width = 0.3
tree_amount15.plot(kind='bar', color='red', ax=ax, width=width, position=1, label='NYC')
queens_tree_types.plot(kind='bar', color='blue', ax=ax, width=width, position=0, label='Queens')
plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3,
ncol=2, mode="expand", borderaxespad=0.)
ax.set_ylabel('Total trees')
ax.set_xlabel('Tree names')
plt.show()
Which gives me the following chart:
The problem I have is that, even though all the 'Tree names' are the same in each series, the 'Total trees' is of course different, so for example, #5 (Callery pear) is only #5 in 'tree_amount15', where it's #3 in 'queens_tree_types' and so on. How can I order the series such that it's the value that corresponds to the right label shown on the chart? Because right now, it's the labels from the series that gets added first, which is shown, which makes the values of the second series be misleading.
Any hints?
Here's how the two series look, when I do a value_counts() them.
tree_amount15:
London planetree 87014
honeylocust 64264
Callery pear 58931
pin oak 53185
Norway maple 34189
littleleaf linden 29742
cherry 29279
Japanese zelkova 29258
ginkgo 21024
Sophora 19338
red maple 17246
green ash 16251
American linden 13530
silver maple 12277
sweetgum 10657
northern red oak 8400
silver linden 7995
American elm 7975
maple 7080
purple-leaf plum 6879
queens_tree_types:
London planetree 31111
pin oak 22610
honeylocust 20290
Norway maple 19407
Callery pear 16547
cherry 13497
littleleaf linden 11902
Japanese zelkova 8987
green ash 7389
silver maple 6116
ginkgo 5971
Sophora 5386
red maple 4935
American linden 4769
silver linden 4146
purple-leaf plum 3035
maple 2992
northern red oak 2697
sweetgum 2489
American elm 1709
You can create a data frame from your two series that uses the tree name index. By default pandas will sort the index alphabetically, so we tell it to sort using the values of NYC. With both series as columns, we can use a single call to the plot method to put them on the same graph.
df = pd.concat([tree_amount15, queens_tree_types], axis=1).rename_axis(
{0:'NYC', 1:'Queens'}, axis='columns') # sets the column names
df.sort_values('NYC', ascending=False) # sort the df using NYC values
df.plot.bar(color=['red','blue'])