Getting the nlargest of each group in a Multiindex Pandas Series

Getting the nlargest of each group in a Multiindex Pandas Series - python

I have a DataFrame that consists of information about every NFL play that has occurred since 2009. My goal is to find out which teams had the most "big plays" in each season. To do this, I found all plays which gained over 20 yards, grouped them by year and team, and got the size of each of those group.
big_plays = (df[df['yards_gained'] >= 20]
.groupby([df['game_date'].dt.year, 'posteam'])
.size())
This results in the following Series:
game_date posteam
2009 ARI 55
ATL 51
BAL 55
BUF 37
CAR 52
CHI 58
CIN 51
CLE 31
DAL 68
DEN 42
DET 42
GB 65
HOU 63
IND 67
JAC 51
KC 44
MIA 34
MIN 64
NE 48
NO 72
NYG 69
NYJ 54
OAK 38
PHI 68
PIT 72
SD 71
SEA 45
SF 51
STL 42
TB 51
..
2018 BAL 44
BUF 55
CAR 64
CHI 66
CIN 69
CLE 70
DAL 51
DEN 59
DET 51
GB 63
HOU 53
IND 57
JAX 51
KC 88
LA 80
LAC 77
MIA 47
MIN 56
NE 64
NO 66
NYG 70
NYJ 49
OAK 63
PHI 54
PIT 66
SEA 62
SF 69
TB 73
TEN 51
WAS 46
Length: 323, dtype: int64
So far, this is exactly what I want. However, I am stuck on the next step. I want the n-largest values for each group in the MultiIndex, or the n-teams with the most number of "big plays" per season.
I have semi-successfully solved this task in a cumbersome way. If I groupby the 0th level of the MultiIndex, then run the nlargest function on that groupby, I get the following (truncated to the first two years for brevity):
big_plays.groupby(level=0).nlargest(5)
returns
game_date game_date posteam
2009 2009 NO 72
PIT 72
SD 71
NYG 69
DAL 68
2010 2010 PHI 81
NYG 78
PIT 78
SD 75
DEN 73
This (rather inelegantly) solves the problem, but I'm wondering how I can better achieve more or less the same results.

In my opinion your code is nice, only a bit changed by group_keys=False in Series.groupby for avoid duplicated MultiIndex levels:
s = big_plays.groupby(level=0, group_keys=False).nlargest(5)
print (s)
game_date posteam
2009 NO 72
PIT 72
SD 71
NYG 69
DAL 68
2018 KC 88
LA 80
LAC 77
TB 73
CLE 70
Name: a, dtype: int64
df = big_plays.groupby(level=0, group_keys=False).nlargest(5).reset_index(name='count')
print (df)
game_date posteam count
0 2009 NO 72
1 2009 PIT 72
2 2009 SD 71
3 2009 NYG 69
4 2009 DAL 68
5 2018 KC 88
6 2018 LA 80
7 2018 LAC 77
8 2018 TB 73
9 2018 CLE 70
Alternative is more complicated:
df = (big_plays.reset_index(name='count')
.sort_values(['game_date','count'], ascending=[True, False])
.groupby('game_date')
.head(5))
print (df)
game_date posteam count
19 2009 NO 72
24 2009 PIT 72
25 2009 SD 71
20 2009 NYG 69
8 2009 DAL 68
43 2018 KC 88
44 2018 LA 80
45 2018 LAC 77
57 2018 TB 73
35 2018 CLE 70

Related

Comparing Pandas DataFrame rows against two threshold values

I have two DataFrames shown below. The DataFrames in reality are larger than the sample below.
df1
route_no cost_h1 cost_h2 cost_h3 cost_h4 cost_h5 max min location
0 0010 20 22 21 23 26 26 20 NY
1 0011 30 25 23 31 33 33 23 CA
2 0012 67 68 68 69 65 69 67 GA
3 0013 34 33 31 30 35 35 31 MO
4 0014 44 42 40 39 50 50 39 WA
df2
route_no cost_h1 cost_h2 cost_h3 cost_h4 cost_h5 location
0 0020 19 27 21 24 20 NY
1 0021 31 22 23 30 33 CA
2 0023 66 67 68 70 65 GA
3 0022 34 33 31 30 35 MO
4 0025 41 42 40 39 50 WA
5 0030 19 26 20 24 20 NY
6 0032 37 31 31 20 35 MO
7 0034 40 41 39 39 50 WA
The idea is to compare each row of df2 against the appropriate max and min value specified in df1. The threshold value to be compared depends on the match in the location column. If any of the row values are outside the range defined by min and max value, they will be put in a separate dataframe. Please note the number of cost segments are vary.

Solution
# Merge the dataframes on location to append the min/max columns to df2
df3 = df2.merge(df1[['location', 'max', 'min']], on='location', how='left')
# select the cost like columns
cost = df3.filter(like='cost')
# Check whether the cost values satisfy the interval condition
mask = cost.ge(df3['min'], axis=0) & cost.le(df3['max'], axis=0)
# filter the rows where one or more values in row do not satisfy the condition
df4 = df2[~mask.all(axis=1)]
Result
print(df4)
route_no cost_h1 cost_h2 cost_h3 cost_h4 cost_h5 location
0 0020 19 27 21 24 20 NY
1 0021 31 22 23 30 33 CA
2 0023 66 67 68 70 65 GA
3 0022 34 33 31 30 35 MO
5 0030 19 26 20 24 20 NY
6 0032 37 31 31 20 35 MO

Not able to view CSV from Python Webscrape

I am new to python and am doing a webscraping tutorial. I am having trouble getting my CSV file in the appropriate folder. Basically, I am not able to view the resulting CSV. Does anyone have a solution regarding this problem?
import pandas as pd
import re
from bs4 import BeautifulSoup
import requests
#Pulling in website source code#
url = 'https://www.espn.com/mlb/history/leaders/_/breakdown/season/year/2022'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
#Pulling in player rows
##Identify Player Rows
players = soup.find_all('tr', attrs= {'class':re.compile('row-player-10-')})
for players in players:
##Pulling stats for each players
stats = [stat.get_text() for stat in players.findall('td')]
##Create a data frame for the single player stats
temp.df = pd.DataFrame(stats).transpose()
temp.df = columns
##Join single players stats with the overall dataset
final_dataframe = pd.concat([final_df,temp_df], ignore_index=True)
print(final_dataframe)
final_dataframe.to_csv(r'C\Users\19794\OneDrive\Desktop\Coding Projects', index = False, sep =',', encoding='utf-8')

I've checked your code.
I've found one issue.
This one.
for players in players:
##Pulling stats for each players
stats = [stat.get_text() for stat in players.findall('td')]
##Create a data frame for the single player stats
temp.df = pd.DataFrame(stats).transpose()
temp.df = columns
##Join single players stats with the overall dataset
final_dataframe = pd.concat([final_df,temp_df], ignore_index=True)
print(final_dataframe)
final_dataframe.to_csv(r'C\Users\19794\OneDrive\Desktop\Coding Projects', index = False, sep =',', encoding='utf-8')
You have to use this. (players to player, filename with csv)
for player in players:
##Pulling stats for each players
stats = [stat.get_text() for stat in player.findall('td')]
##Create a data frame for the single player stats
temp.df = pd.DataFrame(stats).transpose()
temp.df = columns
##Join single players stats with the overall dataset
final_dataframe = pd.concat([final_df,temp_df], ignore_index=True)
print(final_dataframe)
final_dataframe.to_csv(r'C\Users\19794\OneDrive\Desktop\Coding Projects\result.csv', index = False, sep =',', encoding='utf-8')

Few issues.
As stated in the previous solution, your for loop you need to change to for player in players: You cant use the same variable as the variable you are looping through
You shouldn't use . in your variables as you have temp.df. That indicates the use of a method. Use underscore instead _
You never define final_df, then try to call it in your pd.concat()
You never define columns and then try to use that (and it would then overwrite your temp_df as well). What you are wanting to do is change instead is temp_df.columns = columns. But note you need to define columns.
Your find_all() for the players is incorrect in that you're searching for a class that contains row-player-10-. There is no class with that. It is row player-10. Very subtle difference, but it's the difference of returning None elements, and 50 elements.
stats = [stat.get_text() for stat in player.findall('td')] - again needs to be referencing player from the for loop as mentioned in 1). And in fact, there's a few syntax things in there that we need to change to actually pull out the text. So that should be [stat.text for stat in player.find_all('td')]
You use pd.concat the temp_df to a final_df within your loop. You can do that (provided you create an initial final_dataframe or final_df (you use 2 different variable names...not sure which you really wanted), but that will lead to repeating the headers/column names in it and require an extra step. What I would rather do, is store each temp_df into a list. Then after it loops through all the players, THEN concat the list of dataframes into a final one.
So here is the full code:
import pandas as pd
import re
from bs4 import BeautifulSoup
import requests
#Pulling in website source code#
url = 'https://www.espn.com/mlb/history/leaders/_/breakdown/season/year/2022'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
#Pulling in player rows
##Identify Player Rows
players = soup.find_all('tr', attrs= {'class':re.compile('.*row player-10-.*')})
columns = soup.find('tr', {'class':'colhead'})
columns = [x.text for x in columns.find_all('td')]
#Initialize a list of dataframes
final_df_list = []
# Loop through the players
for player in players:
##Pulling stats for each players
stats = [stat.text for stat in player.find_all('td')]
##Create a data frame for the single player stats
temp_df = pd.DataFrame(stats).transpose()
temp_df.columns = columns
#Put temp_df in a list of dataframes
final_df_list.append(temp_df)
##Join your list of single players stats
final_dataframe = pd.concat(final_df_list, ignore_index=True)
print(final_dataframe)
final_dataframe.to_csv(r'C\Users\19794\OneDrive\Desktop\Coding Projects', index = False, sep =',', encoding='utf-8')
Output:
print(final_dataframe)
PLAYER YRS G AB R H ... HR RBI BB SO SB CS BA
0 1 J.D. Martinez 11 54 211 38 74 ... 8 28 24 55 0 0 .351
1 2 Paul Goldschmidt 11 62 236 47 82 ... 16 56 35 50 3 0 .347
2 3 Xander Bogaerts 9 62 232 39 77 ... 6 31 23 50 3 0 .332
3 4 Rafael Devers 5 63 258 53 85 ... 16 40 18 49 1 0 .329
4 5 Manny Machado 10 63 244 46 80 ... 11 43 29 46 7 1 .328
5 6 Jeff McNeil 4 61 216 30 70 ... 4 32 16 27 2 0 .324
6 7 Ty France 3 63 249 29 79 ... 10 41 18 40 0 0 .317
7 8 Bryce Harper 10 58 225 46 71 ... 15 46 24 48 7 2 .316
8 9 Yordan Alvarez 3 57 205 39 64 ... 17 45 31 38 0 1 .312
9 10 Aaron Judge 6 61 232 53 72 ... 25 49 31 66 4 0 .310
10 11 Jose Ramirez 9 59 222 40 68 ... 16 62 34 19 11 3 .306
11 12 Andrew Benintendi 6 61 226 23 68 ... 2 22 24 37 0 0 .301
12 13 Michael Brantley 13 55 207 23 62 ... 4 21 28 24 1 1 .300
13 14 Trea Turner 7 62 242 32 72 ... 8 47 21 48 13 2 .298
14 15 J.P. Crawford 5 59 216 28 64 ... 5 16 28 37 3 1 .296
15 16 Dansby Swanson 6 64 234 39 69 ... 9 37 23 70 9 2 .295
16 17 Mike Trout 11 57 201 44 59 ... 18 38 30 64 0 0 .294
17 Josh Bell 6 65 235 33 69 ... 8 39 28 37 0 1 .294
18 19 Santiago Espinal 2 63 219 25 64 ... 5 31 18 40 3 2 .292
19 20 Trey Mancini 5 58 217 25 63 ... 6 25 24 47 0 0 .290
20 21 Austin Hays 4 60 228 33 66 ... 9 37 18 41 1 3 .289
21 22 Eric Hosmer 11 59 222 23 64 ... 4 29 22 38 0 0 .288
22 23 Freddie Freeman 12 62 241 40 69 ... 5 34 32 43 6 0 .286
23 24 C.J. Cron 8 64 249 36 71 ... 14 44 16 74 0 0 .285
24 Tommy Edman 3 63 246 52 70 ... 7 26 26 45 15 2 .285
25 26 Starling Marte 10 54 222 40 63 ... 7 34 10 45 8 5 .284
26 27 Ian Happ 5 61 209 30 59 ... 7 31 34 50 5 1 .282
27 28 Pete Alonso 3 64 239 41 67 ... 18 59 26 56 2 1 .280
28 29 Lourdes Gurriel Jr. 4 58 206 21 57 ... 3 25 15 41 2 1 .277
29 30 Nathaniel Lowe 3 58 217 25 60 ... 8 24 15 57 1 1 .276
30 31 Mookie Betts 8 60 245 53 67 ... 17 40 27 47 6 1 .273
31 32 Jose Abreu 8 59 224 34 61 ... 9 30 33 42 0 0 .272
32 Amed Rosario 5 53 217 31 59 ... 1 16 10 31 7 1 .272
33 Ke'Bryan Hayes 2 57 213 26 58 ... 2 22 26 53 7 3 .272
34 35 Nolan Arenado 9 61 229 28 62 ... 11 41 25 31 0 2 .271
35 George Springer 8 58 218 39 59 ... 12 33 20 51 4 1 .271
36 37 Ryan Mountcastle 2 53 211 28 57 ... 12 35 11 57 2 0 .270
37 Vladimir Guerrero Jr. 3 62 233 34 63 ... 16 39 27 45 0 1 .270
38 39 Cesar Hernandez 9 65 271 37 73 ... 0 16 17 55 2 2 .269
39 Ketel Marte 7 61 223 33 60 ... 4 22 22 45 4 0 .269
40 Connor Joe 2 60 238 32 64 ... 5 16 32 52 3 2 .269
41 42 Brandon Nimmo 6 57 209 36 56 ... 4 21 27 44 0 1 .268
42 Thairo Estrada 3 59 205 34 55 ... 4 26 14 31 9 1 .268
43 44 Shohei Ohtani 4 63 243 42 64 ... 13 37 24 67 7 5 .263
44 45 Randy Arozarena 3 61 233 30 61 ... 7 31 14 58 12 5 .262
45 46 Nelson Cruz 17 60 222 29 58 ... 7 36 25 50 2 0 .261
46 Hunter Dozier 5 55 203 25 53 ... 6 21 15 50 1 2 .261
47 48 Kyle Tucker 4 58 204 24 53 ... 12 39 31 41 11 1 .260
48 Bo Bichette 3 63 265 35 69 ... 10 33 17 65 4 3 .260
49 50 Charlie Blackmon 11 57 232 29 60 ... 10 33 17 41 2 1 .259
[50 rows x 16 columns]
Lastly, tables are a great way to learn how to use BeautifulSoup because of the structure. But do want to throw out there that pandas can parse <table> tags for you with less code:
import pandas as pd
url = 'https://www.espn.com/mlb/history/leaders/_/breakdown/season/year/2022'
final_dataframe = pd.read_html(url, header=1)[0]
final_dataframe = final_dataframe[final_dataframe['PLAYER'].ne('PLAYER')]

Pandas: How to structure data in more than two dimensions?

I have a dataframe of quantities for each of a number products (below labeled 'A','B','C',etc) for each month...
import pandas as pd
import numpy as np
np.random.seed(0)
range = pd.date_range('2020-01-31', periods=12, freq='M')
column_names = list('ABCDEFGH')
quantities = pd.DataFrame(np.random.randint(0,100,size=(12, 8)), index=range, columns=column_names)
quantities
# Output
# A B C D E F G H
# 2020-01-31 44 47 64 67 67 9 83 21
# 2020-02-29 36 87 70 88 88 12 58 65
# 2020-03-31 39 87 46 88 81 37 25 77
# 2020-04-30 72 9 20 80 69 79 47 64
# 2020-05-31 82 99 88 49 29 19 19 14
# 2020-06-30 39 32 65 9 57 32 31 74
# 2020-07-31 23 35 75 55 28 34 0 0
# 2020-08-31 36 53 5 38 17 79 4 42
# 2020-09-30 58 31 1 65 41 57 35 11
# 2020-10-31 46 82 91 0 14 99 53 12
# 2020-11-30 42 84 75 68 6 68 47 3
# 2020-12-31 76 52 78 15 20 99 58 23
I also have a dataframe of the unit costs for each product for each month. And from these, I have calculated a third dataframe of the costs (quantity x unit cost) for each product for each month.
unit_costs = pd.DataFrame(np.random.rand(12, 8), index=range, columns=column_names)
costs = quantities*unit_costs
The code below produces a dataframe of the bill for the first month (bill0)...
bill0 = pd.DataFrame({'quantity': quantities.iloc[0],'unit_cost': unit_costs.iloc[0],'cost': costs.iloc[0]})
bill0
# Output
# quantity unit_cost cost
# A 44 0.338008 14.872335
# B 47 0.674752 31.713359
# C 64 0.317202 20.300911
# D 67 0.778345 52.149147
# E 67 0.949571 63.621261
# F 9 0.662527 5.962742
# G 83 0.013572 1.126446
# H 21 0.622846 13.079768
I would like to efficiently produce a dataframe of the bill for any specific month. It seems that a 3D data structure is required, and I'm too new to python to know how to approach it.
Perhaps an array of bill dataframes - one for each month? (If so, how?)
Or perhaps the quantities, unit_costs, and amounts dataframes should first be combined into a multi-indexed dataframe and then that could be filtered (or otherwise manipulated) to produce the bill dataframe for whichever month I'm after? (If so, how?)
Or is there a more elegant way of doing this?
Thanks so much for your time!

IIUC, you can use MultiIndex column headers:
pd.concat(
[quantities, unit_costs, costs], keys=["Quantity", "Unit Cost", "Cost"], axis=1
).swaplevel(0, 1, axis=1).sort_index(level=0, axis=1)
Output (just printed A and B but dataframe has all products):
A B
Cost Quantity Unit Cost Cost Quantity Unit Cost
2020-01-31 14.872335 44 0.338008 31.713359 47 0.674752
2020-02-29 24.251747 36 0.673660 84.559215 87 0.971945
2020-03-31 38.203882 39 0.979587 31.271668 87 0.359444
2020-04-30 62.287384 72 0.865103 4.580721 9 0.508969
2020-05-31 53.068279 82 0.647174 83.297226 99 0.841386
2020-06-30 22.215118 39 0.569618 22.519593 32 0.703737
2020-07-31 20.505752 23 0.891554 23.801945 35 0.680056
2020-08-31 8.992666 36 0.249796 16.600572 53 0.313218
2020-09-30 35.890897 58 0.618809 14.720893 31 0.474868
2020-10-31 4.731714 46 0.102863 7.574659 82 0.092374
2020-11-30 5.933084 42 0.141264 8.169834 84 0.097260
2020-12-31 35.662937 76 0.469249 43.739287 52 0.841140

Simple python question about data visualization

I want a bar plot that shows the number of all diseases in 2000 for Albania.
I tried this, but I could not get what I want.
fig, ax = plt.subplots()
ax.bar(df[country['Albania']], df['2000'])
plt.xlabel('Fruit', fontsize=17, fontname='Times New Roman')
plt.ylabel('Spent', fontsize=17, fontname='Times New Roman')
plt.title('Share of diseases in Albania in 2000 ', fontsize=17, fontname="Times New Roman")
plt.show()

Let's first set up a dummy example:
import numpy as np
import pandas as pd
import itertools
np.random.seed(0)
df = pd.DataFrame({('Country_%s' % c, y): {'disease_%d' % (i+1): np.random.randint(100)
for i in range(4)}
for c,y in itertools.product(list('ABCD'), range(1998,2002))
}).T
df.index.names = ('country', 'year')
disease_1 disease_2 disease_3 disease_4
country year
Country_A 1998 44 47 64 67
1999 67 9 83 21
2000 36 87 70 88
2001 88 12 58 65
Country_B 1998 39 87 46 88
1999 81 37 25 77
2000 72 9 20 80
2001 69 79 47 64
Country_C 1998 82 99 88 49
1999 29 19 19 14
2000 39 32 65 9
2001 57 32 31 74
Country_D 1998 23 35 75 55
1999 28 34 0 0
2000 36 53 5 38
2001 17 79 4 42
You can then subset one multi-indexed row per country and year
df.loc[('Country_B', 2000)]
output:
disease_1 72
disease_2 9
disease_3 20
disease_4 80
Name: (Country_B, 2000), dtype: int64
and plot (here using pandas+matplotlib):
ax = df.loc[('Country_B', 2000)].plot.bar()
ax.set_ylabel('number of cases')

Delete rows in pandas which match your header

I'm kind of new with pandas and now i have a question.
I read a table from a html site and set my header according to the table on the website.
df = pd.read_html('http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2', header = 1)
Now I have the my dataframe with a matching header BUT I have some rows that are the same as the header, like the example below.
RK PLAYER TEAM GP G A PTS +/- PIM PTS/G SOG
1 Jamie Benn, LW DAL 82 35 52 87 1 64 1.06 253
2 John Tavares, C NYI 82 38 48 86 5 46 1.05 278
...
10 Vladimir Tarasenko, RW STL 77 37 36 73 27 31 0.95 264
RK PLAYER TEAM GP G A PTS +/- PIM PTS/G SOG
14 Steven Stamkos, C TB 82 43 29 72 2 49 0.88 268
I know that it's possible to delete duplicate rows with panda but is it possible to delete rows that are duplicates of the header or a specific row?
Hope you can help me out !

You can use boolean indexing:
df = df[df.PLAYER != 'PLAYER']
If need also remove rows with PP in column PLAYER use isin:
Notice: I add [0] to the end of read_html, because it return list of dataframes an you need select first item of list:
df = pd.read_html('http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2', header = 1)[0]
print (df)
RK PLAYER TEAM GP G A PTS +/- PIM PTS/G \
0 1 Jamie Benn, LW DAL 82 35 52 87 1 64 1.06
1 2 John Tavares, C NYI 82 38 48 86 5 46 1.05
2 3 Sidney Crosby, C PIT 77 28 56 84 5 47 1.09
3 4 Alex Ovechkin, LW WSH 81 53 28 81 10 58 1.00
4 NaN Jakub Voracek, RW PHI 82 22 59 81 1 78 0.99
5 6 Nicklas Backstrom, C WSH 82 18 60 78 5 40 0.95
6 7 Tyler Seguin, C DAL 71 37 40 77 -1 20 1.08
7 8 Jiri Hudler, LW CGY 78 31 45 76 17 14 0.97
8 NaN Daniel Sedin, LW VAN 82 20 56 76 5 18 0.93
9 10 Vladimir Tarasenko, RW STL 77 37 36 73 27 31 0.95
10 NaN PP SH NaN NaN NaN NaN NaN NaN NaN
11 RK PLAYER TEAM GP G A PTS +/- PIM PTS/G
12 NaN Nick Foligno, LW CBJ 79 31 42 73 16 50 0.92
13 NaN Claude Giroux, C PHI 81 25 48 73 -3 36 0.90
14 NaN Henrik Sedin, C VAN 82 18 55 73 11 22 0.89
15 14 Steven Stamkos, C TB 82 43 29 72 2 49 0.88
...
...
mask = df['PLAYER'].isin(['PLAYER', 'PP'])
print (df[~mask])
RK PLAYER TEAM GP G A PTS +/- PIM PTS/G SOG \
0 1 Jamie Benn, LW DAL 82 35 52 87 1 64 1.06 253
1 2 John Tavares, C NYI 82 38 48 86 5 46 1.05 278
2 3 Sidney Crosby, C PIT 77 28 56 84 5 47 1.09 237
3 4 Alex Ovechkin, LW WSH 81 53 28 81 10 58 1.00 395
4 NaN Jakub Voracek, RW PHI 82 22 59 81 1 78 0.99 221
5 6 Nicklas Backstrom, C WSH 82 18 60 78 5 40 0.95 153
6 7 Tyler Seguin, C DAL 71 37 40 77 -1 20 1.08 280
7 8 Jiri Hudler, LW CGY 78 31 45 76 17 14 0.97 158
8 NaN Daniel Sedin, LW VAN 82 20 56 76 5 18 0.93 226
9 10 Vladimir Tarasenko, RW STL 77 37 36 73 27 31 0.95 264
12 NaN Nick Foligno, LW CBJ 79 31 42 73 16 50 0.92 182
13 NaN Claude Giroux, C PHI 81 25 48 73 -3 36 0.90 279
14 NaN Henrik Sedin, C VAN 82 18 55 73 11 22 0.89 101
15 14 Steven Stamkos, C TB 82 43 29 72 2 49 0.88 268
16 NaN Tyler Johnson, C TB 77 29 43 72 33 24 0.94 203
17 16 Ryan Johansen, C CBJ 82 26 45 71 -6 40 0.87 202
18 17 Joe Pavelski, C SJ 82 37 33 70 12 29 0.85 261
19 NaN Evgeni Malkin, C PIT 69 28 42 70 -2 60 1.01 212
20 NaN Ryan Getzlaf, C ANA 77 25 45 70 15 62 0.91 191
21 20 Rick Nash, LW NYR 79 42 27 69 29 36 0.87 304
...
...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting the nlargest of each group in a Multiindex Pandas Series - python

Related

Comparing Pandas DataFrame rows against two threshold values

Not able to view CSV from Python Webscrape

Pandas: How to structure data in more than two dimensions?

Simple python question about data visualization

Delete rows in pandas which match your header

Categories

Resources