Soccer Stats Python Scraper

Soccer Stats Python Scraper - python

I'm looking to scrape some Houston Dynamo stats from this season into a CSV and then visualize that data with R.
How can I scrape both the tr and td elements using lxml? Is there an easier selector I should be looking at?

For (reasonably) well formed HTML tables, the XML package in R makes this sort of thing pretty stupidly easy:
library(XML)
> url <- "http://www.houstondynamo.com/stats/season?page=0"
> tbl <- readHTMLTable(url)
> head(tbl[[1]])
Player POS GP GS MINS G A SHTS SOG GWG PKG/A HmG RdG G/90min SC%
1 Will Bruin F 32 31 2510 12 4 78 35 0 0/0 6 6 0.43 15.4
2 Brad Davis M 31 28 2523 8 12 53 22 3 3/4 5 3 0.29 15.1
3 Brian Ching F 30 13 1385 5 5 35 15 1 2/2 2 3 0.32 14.3
4 Boniek Garcia M 17 17 1530 4 6 30 12 1 0/0 3 1 0.24 13.3
5 Calen Carr M 26 17 1512 4 2 29 11 2 0/0 3 1 0.24 13.8
6 Macoumba Kandji F 29 21 1630 4 2 34 16 1 0/0 3 1 0.22 11.8

Related

How can I turn this into a DataFrame?

I am new to Python, and was trying to run a basic web scraper. My code looks like this
import requests
import pandas as pd
x = requests.get('https://www.baseball-reference.com/players/p/penaje02.shtml')
dfs = pd.read_html(x.content)
print(dfs)
df = pd.DataFrame(dfs)
when printing dfs it looks like this. I only want the second table.
[ Year Age Tm Lg G PA AB \
0 2018 20 HOU-min A- 36 156 136
1 2019 21 HOU-min A,A+ 109 473 409
2 2021 23 HOU-min AAA,Rk 37 160 145
3 2022 24 HOU AL 136 558 521
4 1 Yr 1 Yr 1 Yr 1 Yr 136 558 521
5 162 Game Avg. 162 Game Avg. 162 Game Avg. 162 Game Avg. 162 665 621
R H 2B ... OPS OPS+ TB GDP HBP SH SF IBB Pos \
0 22 34 5 ... 0.649 NaN 42 0 1 0 1 0 NaN
1 72 124 21 ... 0.825 NaN 180 4 11 0 6 0 NaN
2 25 43 5 ... 0.942 NaN 84 0 7 0 0 0 NaN
3 72 132 20 ... 0.715 101.0 222 5 6 1 5 0 *6/H
4 72 132 20 ... 0.715 101.0 222 5 6 1 5 0 NaN
5 86 157 24 ... 0.715 101.0 264 6 7 1 6 0 NaN
Awards
0 TRC · NYPL
1 DAV,FAY · MIDW,CARL
2 SKT,AST · AAAW,FCL
3 GG
4 NaN
5 NaN
[6 rows x 30 columns]]
however, i end up with error Must pass 2-d input. shape=(1, 6, 30) after my last line. I have tried using df=dfs[1], but got the error list index our of range. Any way i can turn dfs from a list to a datframe?

What do you mean you only want the second table? There's only one table, it's 6 rows and 30 columns. The backslashes show up when whatever you're trying to print to isn't wide enough to contain the dataframe without line wrapping. Here's the dataframe printed in a wider terminal:
The pd.read_html() function returns a List[DataFrame] so you first need to grab your dataframe from the list, and then you can subset it to get the columns you care about:
df = dfs[0]
columns = ['R', 'H', '2B', '3B', 'HR', 'RBI', 'SB', 'CS', 'BB', 'SO', 'BA', 'OBP', 'SLG', 'OPS', 'OPS+', 'TB', 'GDP', 'HBP', 'SH', 'SF', 'IBB', 'Pos']
print(df[columns])
Output:
R H 2B 3B HR RBI SB CS BB SO BA OBP SLG OPS OPS+ TB GDP HBP SH SF IBB Pos
0 22 34 5 0 1 10 3 0 18 19 0.250 0.340 0.309 0.649 NaN 42 0 1 0 1 0 NaN
1 72 124 21 7 7 54 20 10 47 90 0.303 0.385 0.440 0.825 NaN 180 4 11 0 6 0 NaN
2 25 43 5 3 10 21 6 1 8 41 0.297 0.363 0.579 0.942 NaN 84 0 7 0 0 0 NaN
3 72 132 20 2 22 63 11 2 22 135 0.253 0.289 0.426 0.715 101.0 222 5 6 1 5 0 *6/H
4 72 132 20 2 22 63 11 2 22 135 0.253 0.289 0.426 0.715 101.0 222 5 6 1 5 0 NaN
5 86 157 24 2 26 75 13 2 26 161 0.253 0.289 0.426 0.715 101.0 264 6 7 1 6 0 NaN

Drop certain rows based on quantity of rows with specific values

I am newer data science and am working on a project to analyze sports statistics. I have a dataset of hockey statistics for a group of players over multiple seasons. Players have anywhere between 1 row to 12 rows representing their season statistics over however many seasons they've played.
Example:
Player Season Pos GP G A P +/- PIM P/GP ... PPG PPP SHG SHP OTG GWG S S% TOI/GP FOW%
0 Nathan MacKinnon 2022 1 65 32 56 88 22 42 1.35 ... 7 27 0 0 1 5 299 10.7 21.07 45.4
1 Nathan MacKinnon 2021 1 48 20 45 65 22 37 1.35 ... 8 25 0 0 0 2 206 9.7 20.37 48.5
2 Nathan MacKinnon 2020 1 69 35 58 93 13 12 1.35 ... 12 31 0 0 2 4 318 11.0 21.22 43.1
3 Nathan MacKinnon 2019 1 82 41 58 99 20 34 1.21 ... 12 37 0 0 1 6 365 11.2 22.08 43.7
4 Nathan MacKinnon 2018 1 74 39 58 97 11 55 1.31 ... 12 32 0 1 3 12 284 13.7 19.90 41.9
5 Nathan MacKinnon 2017 1 82 16 37 53 -14 16 0.65 ... 2 14 2 2 2 4 251 6.4 19.95 50.6
6 Nathan MacKinnon 2016 1 72 21 31 52 -4 20 0.72 ... 7 16 0 1 0 6 245 8.6 18.87 48.4
7 Nathan MacKinnon 2015 1 64 14 24 38 -7 34 0.59 ... 3 7 0 0 0 2 192 7.3 17.05 47.0
8 Nathan MacKinnon 2014 1 82 24 39 63 20 26 0.77 ... 8 17 0 0 0 5 241 10.0 17.35 42.9
9 J.T. Compher 2022 2 70 18 15 33 6 25 0.47 ... 4 6 1 1 0 0 102 17.7 16.32 51.4
10 J.T. Compher 2021 2 48 10 8 18 10 19 0.38 ... 1 2 0 0 0 2 47 21.3 14.22 45.9
11 J.T. Compher 2020 2 67 11 20 31 9 18 0.46 ... 1 5 0 3 1 3 106 10.4 16.75 47.7
12 J.T. Compher 2019 2 66 16 16 32 -8 31 0.48 ... 4 9 3 3 0 3 118 13.6 17.48 49.2
13 J.T. Compher 2018 2 69 13 10 23 -29 20 0.33 ... 4 7 2 2 2 3 131 9.9 16.00 45.1
14 J.T. Compher 2017 2 21 3 2 5 0 4 0.24 ... 1 1 0 0 0 1 30 10.0 14.93 47.6
15 Darren Helm 2022 1 68 7 8 15 -5 14 0.22 ... 0 0 1 2 0 1 93 7.5 10.55 44.2
16 Darren Helm 2021 1 47 3 5 8 -3 10 0.17 ... 0 0 0 0 0 0 83 3.6 14.68 66.7
17 Darren Helm 2020 1 68 9 7 16 -6 37 0.24 ... 0 0 1 2 0 0 102 8.8 13.73 53.6
18 Darren Helm 2019 1 61 7 10 17 -11 20 0.28 ... 0 0 1 4 0 0 107 6.5 14.57 44.4
19 Darren Helm 2018 1 75 13 18 31 3 39 0.41 ... 0 0 2 4 0 0 141 9.2 15.57 44.1
[sample of my dataset][1]
[1]: https://i.stack.imgur.com/7CsUd.png
If any player has played more than 6 seasons, I want to drop the row corresponding to Season 2021. This is because COVID drastically shortened the season and it is causing issues as I work with averages.
As you can see from the screenshot, Nathan MacKinnon has played 9 seasons. Across those 9 seasons, except for 2021, he plays in no fewer than 64 games. Due to the shortened season of 2021, he only got 48 games.
Removing Season 2021 results in an Average Games Played of 73.75.
Keeping Season 2021 in the data, the Average Games Played becomes 70.89.
While not drastic, it compounds into the other metrics as well.
I have been trying this for a little while now, but as I mentioned, I am new to this world and am struggling to figure out how to accomplish this.
I don't want to just completely drop ALL rows for 2021 across all players, though, as some players only have 1-5 years' worth of data and for those players, I need to use as much data as I can and remove 1 row from a player with only 2 seasons would also negatively skew averages.
I would really appreciate some assistance from anyone more experienced than me!

This can be accomplished by using groupby and apply. For example:
edited_players = (players
.groupby("Player")
.apply(lambda subset: subset if len(subset) <= 6 else subset.query("Season != 2021"))
)
Round brackets for formatting purposes.
The combination of groupby and apply basically feeds a grouped subset of your dataframe to a function. So, first all the rows of Nathan MacKinnon will be used, then rows for J.T. Compher, then Darren Helm rows, etc.
The function used is an anonymous/lambda function which operates under the following logic: "if the dataframe subset that I receive has 6 or fewer rows, I'll return the subset unedited. Otherwise, I will filter out rows within that subset which have the value 2021 in the Season column".

In Selenium see if the 'a' anchor tag contains the text I want, and then extract multiple td's of text in the same row

For my python code, I have been trying to scrape data from NCAAF Stats. I have been having issues extracting the td's text after I evaluate if the anchor tag 'a', contains the text I want. I want to be able to find the teams amount of tds, points, and ppg. I have been able to successfully find the school by text in selenium, but after that I am unable to extract the info I want. Here is what I have coded so far.
from selenium import webdriver
driver = webdriver.Chrome('C:\\Users\\Carl\\Downloads\\chromedriver.exe')
driver.get('https://www.ncaa.com/stats/football/fbs/current/team/27')
# I plan to make a while or for loop later, that is why I used f strings
team = "Coastal Carolina"
first = driver.find_element_by_xpath(f'//a[text()="{team}"]')
# This was the way another similiarly asked question was answered but did not work
#tds = driver.find_element_by_xpath(f'//td//a[text()="{apples}"]/../td[4]').text
# This grabs data from the very first row of data... not the one I want
tds = first.find_element_by_xpath('//following-sibling::td[4]').text
total_points = first.find_element_by_xpath('//following-sibling::td[10]').text
ppg = first.find_element_by_xpath('//following-sibling::td[11]').text
print(tds, total_points, ppg)
driver.quit()
I have tried to look around for a similarly asked question and was able to find this snippet
tds = driver.find_element_by_xpath(f'//td//a[text()="{apples}"]/../td[4]').text
it unfortunately did not help me out much. The html structure looks like this. I appreciate any help, and thank you!

No need to use Selenium, the page isn't dynamic. Just use pandas to parse the table for you:
import pandas as pd
url = 'https://www.ncaa.com/stats/football/fbs/current/team/27'
dfs = pd.read_html(url)[0]
Output:
print(df)
Rank Team G TDs PAT 2PT Def Pts FG Saf Pts PPG
0 1 Ohio St. 6 39 39 0 0 6 0 291.0 48.5
1 2 Pittsburgh 6 40 36 0 0 4 1 290.0 48.3
2 3 Coastal Carolina 7 43 42 0 0 6 1 320.0 45.7
3 4 Alabama 7 41 40 1 0 9 0 315.0 45.0
4 5 Ole Miss 6 35 30 1 0 6 1 262.0 43.7
5 6 Cincinnati 6 36 34 1 0 3 0 261.0 43.5
6 7 Oklahoma 7 35 34 1 1 17 0 299.0 42.7
7 - SMU 7 40 36 1 0 7 0 299.0 42.7
8 9 Texas 7 38 37 0 0 8 1 291.0 41.6
9 10 Western Ky. 6 31 27 1 0 10 0 245.0 40.8
10 11 Tennessee 7 36 36 0 0 7 1 275.0 39.3
11 12 Wake Forest 6 28 24 2 0 12 0 232.0 38.7
12 13 UTSA 7 33 33 0 0 13 0 270.0 38.6
13 14 Michigan 6 28 25 1 0 12 0 231.0 38.5
14 15 Georgia 7 34 33 0 0 10 1 269.0 38.4
15 16 Baylor 7 35 35 0 0 7 1 268.0 38.3
16 17 Houston 6 30 28 0 0 5 0 223.0 37.2
17 - TCU 6 29 28 0 0 7 0 223.0 37.2
18 19 Marshall 7 34 33 0 0 7 0 258.0 36.9
19 - North Carolina 7 34 32 2 0 6 0 258.0 36.9
20 21 Nevada 6 26 24 1 0 12 0 218.0 36.3
21 22 Virginia 7 31 29 2 0 10 2 253.0 36.1
22 23 Fresno St. 7 32 27 1 0 10 0 251.0 35.9
23 - Memphis 7 33 26 3 0 7 0 251.0 35.9
24 25 Texas Tech 7 32 31 0 0 9 0 250.0 35.7
25 26 Auburn 7 29 28 1 0 12 1 242.0 34.6
26 27 Florida 7 33 29 1 0 4 0 241.0 34.4
27 - Missouri 7 31 31 0 0 8 0 241.0 34.4
28 29 Liberty 7 33 29 1 0 3 1 240.0 34.3
29 - Michigan St. 7 30 30 0 0 10 0 240.0 34.3
30 31 UCF 6 28 26 0 0 3 1 205.0 34.2
31 32 Oregon St. 6 27 27 0 0 5 0 204.0 34.0
32 33 Oregon 6 26 26 0 0 7 0 203.0 33.8
33 34 Iowa St. 6 23 22 0 0 14 0 202.0 33.7
34 35 UCLA 7 30 28 0 0 9 0 235.0 33.6
35 36 San Diego St. 6 25 24 1 0 7 0 197.0 32.8
36 37 LSU 7 29 29 0 0 8 0 227.0 32.4
37 38 Louisville 6 24 23 0 0 9 0 194.0 32.3
38 - Miami (FL) 6 24 22 1 0 8 1 194.0 32.3
39 - NC State 6 25 24 0 0 6 1 194.0 32.3
40 41 Southern California 6 22 19 3 0 12 0 193.0 32.2
41 42 Tulane 7 31 23 4 0 2 0 223.0 31.9
42 43 Arizona St. 7 30 25 2 0 4 0 221.0 31.6
43 44 Utah 6 25 22 1 0 5 0 189.0 31.5
44 45 Air Force 7 29 27 1 0 5 1 220.0 31.4
45 46 App State 7 27 24 0 0 11 0 219.0 31.3
46 47 Arkansas 7 27 25 0 0 10 0 217.0 31.0
47 - Army West Point 6 25 22 0 0 4 1 186.0 31.0
48 - Notre Dame 6 23 20 2 0 8 0 186.0 31.0
49 - Western Mich. 7 28 25 0 0 8 0 217.0 31.0

Dataframe Interpolation based on table values

I have a dataset with different lats values, the range of these latitudes are between 0 to 20, and months between 1-12
How can I compute a new row in my dataset that has the result of N by each latitude and month?
As the latitude is not integer value in my dataset its necessary to do the interpolation
INPUT DATASET
LAT YEAR MONTH
0 11 2000 1
1 9 2000 2
2 11 2000 3
3 10 2000 4
4 17 2000 5
5 6 2000 6
6 18 2000 7
7 11 2000 8
8 17 2000 9
9 12 2000 10
10 19 2000 11
11 8 2000 12
12 14 2001 1
13 13 2001 2
14 14 2001 3
15 12 2001 4
16 12 2001 5
17 5 2001 6
18 18 2001 7
19 13 2001 8
20 7 2001 9
21 18 2001 10
22 12 2001 11
23 10 2001 12
24 14 2002 1
25 14 2002 2
26 20 2002 3
27 20 2002 4
28 9 2002 5
29 15 2002 6
30 15 2002 7
31 5 2002 8
32 7 2002 9
33 5 2002 10
34 6 2002 11
35 7 2002 12
N values by month according to the latitude
1 2 3 4 5 6 7 8 9 10 11 12
lat
0 1.04 0.94 1.04 1.01 1.04 1.01 1.04 1.04 1.01 1.04 1.01 1.04
10 1.00 0.91 1.03 1.03 1.08 1.06 1.08 1.07 1.02 1.02 0.98 0.99
15 0.97 0.91 1.03 1.04 1.11 1.08 1.12 1.08 1.02 1.01 0.95 0.97
20 0.95 0.90 1.03 1.65 1.13 1.11 1.14 1.12 1.02 1.00 0.93 0.94
the code for N values table is:
data2 = {"lat":[0,10,15,20],"1":[1.04,1,0.97,0.95],"2":[0.94,0.91,0.91,0.9],"3":[1.04,1.03,1.03,1.03],
"4":[1.01,1.03,1.04,1.65],"5":[1.04,1.08,1.11,1.13],"6":[1.01,1.06,1.08,1.11],"7":[1.04,1.08,1.12,1.14],"8":[1.04,1.07,1.08,1.12],
"9":[1.01,1.02,1.02,1.02],"10":[1.04,1.02,1.01,1],"11":[1.01,0.98,0.95,0.93],"12":[1.04,0.99,0.97,0.94]}
df2 = pd.DataFrame(data2)
For example if the latitude is 20 and the month is 3 the result in the N column must be 1.03, if the latitude is 11 now and the month is 1 the result in N column must be 0.97 more or less

Using pd.interpolate() method would come extremely handy in your case:
import pandas as pd
data2 = {"lat":[0,10,15,20],"1":[1.04,1,0.97,0.95],"2":[0.94,0.91,0.91,0.9],"3":[1.04,1.03,1.03,1.03],
"4":[1.01,1.03,1.04,1.65],"5":[1.04,1.08,1.11,1.13],"6":[1.01,1.06,1.08,1.11],"7":[1.04,1.08,1.12,1.14],"8":[1.04,1.07,1.08,1.12],
"9":[1.01,1.02,1.02,1.02],"10":[1.04,1.02,1.01,1],"11":[1.01,0.98,0.95,0.93],"12":[1.04,0.99,0.97,0.94]}
df2 = pd.DataFrame(data2)
df2 = df2.set_index('lat')
index_set = df2.index.unique()
for i in range(20):
if i not in index_set:
df2.loc[i] = pd.Series()
df2 = df2.sort_values(by=['lat'])
res_df = df2.interpolate()
tdf = pd.read_csv('try.tsv', sep='\s+', header=None, index_col=None)
tdf.columns = ['id', 'lat', 'year', 'month']
tdf['lat'] = tdf.lat.astype(int)
tdf['N'] = tdf.apply(lambda x: res_df.loc[x['lat'], str(x['month'])], axis=1)
print(tdf)
Output:
id lat year month N
0 0 11 2000 1 0.994
1 1 9 2000 2 0.913
2 2 11 2000 3 1.030
3 3 10 2000 4 1.030
4 4 17 2000 5 1.118
5 5 6 2000 6 1.040
6 6 18 2000 7 1.132
7 7 11 2000 8 1.072
8 8 17 2000 9 1.020
9 9 12 2000 10 1.016
10 10 19 2000 11 0.934
11 11 8 2000 12 1.000
12 12 14 2001 1 0.976
13 13 13 2001 2 0.910
14 14 14 2001 3 1.030
15 15 12 2001 4 1.034
16 16 12 2001 5 1.092
17 17 5 2001 6 1.035
18 18 18 2001 7 1.132
19 19 13 2001 8 1.076
20 20 7 2001 9 1.017
21 21 18 2001 10 1.004
22 22 12 2001 11 0.968
23 23 10 2001 12 0.990
24 24 14 2002 1 0.976
25 25 14 2002 2 0.910
26 26 20 2002 3 1.030
27 27 20 2002 4 1.650
28 28 9 2002 5 1.076
29 29 15 2002 6 1.080
30 30 15 2002 7 1.120
31 31 5 2002 8 1.055
32 32 7 2002 9 1.017
33 33 5 2002 10 1.030
34 34 6 2002 11 0.992
35 35 7 2002 12 1.005

Sum values of a column for each value based on another column and divide it by total

Today I'm struggling once again with Python and data-analytics.
I got a dataframe which looks like this:
name totdmgdealt
0 Warwick 96980.0
1 Nami 25995.0
2 Draven 171568.0
3 Fiora 113721.0
4 Viktor 185302.0
5 Skarner 148791.0
6 Galio 130692.0
7 Ahri 145731.0
8 Jinx 182680.0
9 VelKoz 85785.0
10 Ziggs 46790.0
11 Cassiopeia 62444.0
12 Yasuo 117896.0
13 Warwick 129156.0
14 Evelynn 179252.0
15 Caitlyn 163342.0
16 Wukong 122919.0
17 Syndra 146754.0
18 Karma 35766.0
19 Warwick 117790.0
20 Draven 74879.0
21 Janna 11242.0
22 Lux 66424.0
23 Amumu 87826.0
24 Vayne 76085.0
25 Ahri 93334.0
..
..
..
this is a dataframe, which includes the total damage of a champion for one game.
Now I want to group these information, so I can see which champion overall has the most damage dealt.
I tried groupby('name') but it didn't work at all.
I already went through some threads about groupby and summing values, but I didn't solve my specific problem.
The dealt damage of each champion should also be shown as percentage of the total.
I'm looking for something like this as an output:
name totdmgdealt percentage
0 Warwick 2378798098 2.1 %
1 Nami 2837491074 2.3 %
2 Draven 1231451224 ..
3 Fiora 1287301724 ..
4 Viktor 1239808504 ..
5 Skarner 1487911234 ..
6 Galio 1306921234 ..

We can groupby on name and get the sum then we divide each value by the total with .div and multiply it by 100 with .mul and finally round it to one decimal with .round:
total = df['totdmgdealt'].sum()
summed = df.groupby('name', sort=False)['totdmgdealt'].sum().reset_index()
summed['percentage'] = summed.groupby('name', sort=False)['totdmgdealt']\
.sum()\
.div(total)\
.mul(100)\
.round(1).values
name totdmgdealt percentage
0 Warwick 343926.0 12.2
1 Nami 25995.0 0.9
2 Draven 246447.0 8.7
3 Fiora 113721.0 4.0
4 Viktor 185302.0 6.6
5 Skarner 148791.0 5.3
6 Galio 130692.0 4.6
7 Ahri 239065.0 8.5
8 Jinx 182680.0 6.5
9 VelKoz 85785.0 3.0
10 Ziggs 46790.0 1.7
11 Cassiopeia 62444.0 2.2
12 Yasuo 117896.0 4.2
13 Evelynn 179252.0 6.4
14 Caitlyn 163342.0 5.8
15 Wukong 122919.0 4.4
16 Syndra 146754.0 5.2
17 Karma 35766.0 1.3
18 Janna 11242.0 0.4
19 Lux 66424.0 2.4
20 Amumu 87826.0 3.1
21 Vayne 76085.0 2.7

you can use sum() to get the total dmg, and apply to calculate the precent relevant for each row, like this:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""
name totdmgdealt
0 Warwick 96980.0
1 Nami 25995.0
2 Draven 171568.0
3 Fiora 113721.0
4 Viktor 185302.0
5 Skarner 148791.0
6 Galio 130692.0
7 Ahri 145731.0
8 Jinx 182680.0
9 VelKoz 85785.0
10 Ziggs 46790.0
11 Cassiopeia 62444.0
12 Yasuo 117896.0
13 Warwick 129156.0
14 Evelynn 179252.0
15 Caitlyn 163342.0
16 Wukong 122919.0
17 Syndra 146754.0
18 Karma 35766.0
19 Warwick 117790.0
20 Draven 74879.0
21 Janna 11242.0
22 Lux 66424.0
23 Amumu 87826.0
24 Vayne 76085.0
25 Ahri 93334.0"""), sep=r"\s+")
summed_df = df.groupby('name')['totdmgdealt'].agg(['sum']).rename(columns={"sum": "totdmgdealt"}).reset_index()
summed_df['percentage'] = summed_df.apply(
lambda x: "{:.2f}%".format(x['totdmgdealt'] / summed_df['totdmgdealt'].sum() * 100), axis=1)
print(summed_df)
Output:
name totdmgdealt percentage
0 Ahri 239065.0 8.48%
1 Amumu 87826.0 3.12%
2 Caitlyn 163342.0 5.79%
3 Cassiopeia 62444.0 2.21%
4 Draven 246447.0 8.74%
5 Evelynn 179252.0 6.36%
6 Fiora 113721.0 4.03%
7 Galio 130692.0 4.64%
8 Janna 11242.0 0.40%
9 Jinx 182680.0 6.48%
10 Karma 35766.0 1.27%
11 Lux 66424.0 2.36%
12 Nami 25995.0 0.92%
13 Skarner 148791.0 5.28%
14 Syndra 146754.0 5.21%
15 Vayne 76085.0 2.70%
16 VelKoz 85785.0 3.04%
17 Viktor 185302.0 6.57%
18 Warwick 343926.0 12.20%
19 Wukong 122919.0 4.36%
20 Yasuo 117896.0 4.18%
21 Ziggs 46790.0 1.66%

Maybe You can Try this:
I tried to achieve the same using my sample data and try to run the below code into your Jupyter Notebook:
import pandas as pd
name=['abhit','mawa','vaibhav','dharam','sid','abhit','vaibhav','sid','mawa','lakshya']
totdmgdealt=[24,45,80,22,89,55,89,51,93,85]
name=pd.Series(name,name='name') #converting into series
totdmgdealt=pd.Series(totdmgdealt,name='totdmgdealt') #converting into series
data=pd.concat([name,totdmgdealt],axis=1)
data=pd.DataFrame(data) #converting into Dataframe
final=data.pivot_table(values="totdmgdealt",columns="name",aggfunc="sum").transpose() #actual aggregating method
total=data['totdmgdealt'].sum() #calculating total for calculating percentage
def calPer(row,total): #actual Function for Percentage
return ((row/total)*100).round(2)
total=final['totdmgdealt'].sum()
final['Percentage']=calPer(final['totdmgdealt'],total) #assigning the function to the column
final
Sample Data :
name totdmgdealt
0 abhit 24
1 mawa 45
2 vaibhav 80
3 dharam 22
4 sid 89
5 abhit 55
6 vaibhav 89
7 sid 51
8 mawa 93
9 lakshya 85
Output:
totdmgdealt Percentage
name
abhit 79 12.48
dharam 22 3.48
lakshya 85 13.43
mawa 138 21.80
sid 140 22.12
vaibhav 169 26.70
Understand and run the code and just replace the dataset with Yours. Maybe This Helps.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Soccer Stats Python Scraper - python

I'm looking to scrape some Houston Dynamo stats from this season into a CSV and then visualize that data with R. How can I scrape both the tr and td elements using lxml? Is there an easier selector I should be looking at?

Related

How can I turn this into a DataFrame?

Drop certain rows based on quantity of rows with specific values

In Selenium see if the 'a' anchor tag contains the text I want, and then extract multiple td's of text in the same row

Dataframe Interpolation based on table values

Sum values of a column for each value based on another column and divide it by total

Categories

Resources