pandas merge python sort data frame - python

Name Sex Age Height Weight
0 Alfred M 14 69.0 112.5
1 Alice F 13 56.5 84.0
2 Barbara F 13 65.3 98.0
3 Carol F 14 62.8 102.5
4 Henry M 14 63.5 102.5
5 James M 12 57.3 83.0
6 Jane F 12 59.8 84.5
7 Janet F 15 62.5 112.5
8 Jeffrey M 13 62.5 84.0
9 John M 12 59.0 99.5
10 Joyce F 11 51.3 50.5
11 Judy F 14 64.3 90.0
12 Louise F 12 56.3 77.0
13 Mary F 15 66.5 112.0
14 Philip M 16 72.0 150.0
15 Robert M 12 64.8 128.0
16 Ronald M 15 67.0 133.0
17 Thomas M 11 57.5 85.0
18 William M 15 66.5 112.0
i want output sex column rows alternatively
Name Sex Age Height Weight
Alice F 13 56.5 84.0
Alfred M 14 69.0 112.5
Barbara F 13 65.3 98.0
Henry M 14 63.5 102.5
Carol F 14 62.8 102.5
James M 12 57.3 83.0
Jane F 12 59.8 84.5
Jeffrey M 13 62.5 84.0
Janet F 15 62.5 112.5
John M 12 59.0 99.5
Joyce F 11 51.3 50.5
Philip M 16 72.0 150.0
Judy F 14 64.3 90.0
Robert M 12 64.8 128.0
Louise F 12 56.3 77.0
Ronald M 15 67.0 133.0
Mary F 15 66.5 112.0
Thomas M 11 57.5 85.0

You can use groupby().cumcount() to enumerate the rows within the group then sort_values:
(df.assign(order=df.groupby(['Sex']).cumcount())
.sort_values(['order','Sex'])
.drop('order',axis=1)
)
Output:
Name Sex Age Height Weight
1 Alice F 13 56.5 84.0
0 Alfred M 14 69.0 112.5
2 Barbara F 13 65.3 98.0
4 Henry M 14 63.5 102.5
3 Carol F 14 62.8 102.5
5 James M 12 57.3 83.0
6 Jane F 12 59.8 84.5
8 Jeffrey M 13 62.5 84.0
7 Janet F 15 62.5 112.5
9 John M 12 59.0 99.5
10 Joyce F 11 51.3 50.5
14 Philip M 16 72.0 150.0
11 Judy F 14 64.3 90.0
15 Robert M 12 64.8 128.0
12 Louise F 12 56.3 77.0
16 Ronald M 15 67.0 133.0
13 Mary F 15 66.5 112.0
17 Thomas M 11 57.5 85.0
18 William M 15 66.5 112.0

Related

Looping through HTML to collect data

I am new to web scraping so looking to test with the NBA data on Basketball Reference. I am trying to collect the data for the standings for the league, conference and divisions. I then want to store them into a database.
so far i have the code below which gives me the team names of the Eastern Confrence.
I need to loop through the HTML and collect the data points, but unsure how to proceed.
import requests
from bs4 import BeautifulSoup
url = 'https://www.basketball-reference.com/leagues/NBA_2022_standings.html'
r = requests.get(url)
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
eastern_conf_table = soup.find('table' , id = 'confs_standings_E')
for team in eastern_conf_table.find_all('tbody'):
rows = team.find_all("tr")
# loop over all rows, get all cells
for row in rows:
try:
teams = row.find_all('th')
# print contents of the second cell in the row
print(teams[0].a.text)
except:
pass
I will then need to collect the same data for the other conferences, divisions and leagues.
The easiest way to do it, using Pandas.
import pandas as pd
df = pd.read_html('https://www.basketball-reference.com/leagues/NBA_2022_standings.html', match='Eastern Conference')
print(df[0])
OUTPUT:
Eastern Conference W L W/L% GB PS/G PA/G SRS
0 Miami Heat* (1) 53 29 0.646 — 110.0 105.6 4.23
1 Boston Celtics* (2) 51 31 0.622 2.0 111.8 104.5 7.02
2 Milwaukee Bucks* (3) 51 31 0.622 2.0 115.5 112.1 3.22
3 Philadelphia 76ers* (4) 51 31 0.622 2.0 109.9 107.3 2.57
4 Toronto Raptors* (5) 48 34 0.585 5.0 109.4 107.1 2.38
5 Chicago Bulls* (6) 46 36 0.561 7.0 111.6 112.0 -0.38
6 Brooklyn Nets* (7) 44 38 0.537 9.0 112.9 112.1 0.82
7 Cleveland Cavaliers (8) 44 38 0.537 9.0 107.8 105.7 2.04
8 Atlanta Hawks* (9) 43 39 0.524 10.0 113.9 112.4 1.55
9 Charlotte Hornets (10) 43 39 0.524 10.0 115.3 114.9 0.53
10 New York Knicks (11) 37 45 0.451 16.0 106.5 106.6 -0.01
11 Washington Wizards (12) 35 47 0.427 18.0 108.6 112.0 -3.23
12 Indiana Pacers (13) 25 57 0.305 28.0 111.5 114.9 -3.26
13 Detroit Pistons (14) 23 59 0.280 30.0 104.8 112.5 -7.36
14 Orlando Magic (15) 22 60 0.268 31.0 104.2 112.2 -7.67
But if you need BS an Request, example:
import requests
from bs4 import BeautifulSoup
url = 'https://www.basketball-reference.com/leagues/NBA_2022_standings.html'
soup = BeautifulSoup(requests.get(url).text, features='lxml')
confs_standings_E = soup.find('table', attrs={'id': 'confs_standings_E'})
for stats in confs_standings_E.find_all('tr', class_='full_table'):
team_name = stats.find('th', attrs={'data-stat': 'team_name'}).getText().strip()
wins = stats.find('td', attrs={'data-stat': 'wins'}).getText().strip()
losses = stats.find('td', attrs={'data-stat': 'losses'}).getText().strip()
win_loss_pct = stats.find('td', attrs={'data-stat': 'win_loss_pct'}).getText().strip()
gb = stats.find('td', attrs={'data-stat': 'gb'}).getText().strip()
pts_per_g = stats.find('td', attrs={'data-stat': 'pts_per_g'}).getText().strip()
opp_pts_per_g = stats.find('td', attrs={'data-stat': 'opp_pts_per_g'}).getText().strip()
srs = stats.find('td', attrs={'data-stat': 'srs'}).getText().strip()
print(team_name, wins, losses, win_loss_pct, gb, pts_per_g, opp_pts_per_g, srs)
OUTPUT:
Miami Heat* (1) 53 29 .646 — 110.0 105.6 4.23
Boston Celtics* (2) 51 31 .622 2.0 111.8 104.5 7.02
Milwaukee Bucks* (3) 51 31 .622 2.0 115.5 112.1 3.22
Philadelphia 76ers* (4) 51 31 .622 2.0 109.9 107.3 2.57
Toronto Raptors* (5) 48 34 .585 5.0 109.4 107.1 2.38
Chicago Bulls* (6) 46 36 .561 7.0 111.6 112.0 -0.38
Brooklyn Nets* (7) 44 38 .537 9.0 112.9 112.1 0.82
Cleveland Cavaliers (8) 44 38 .537 9.0 107.8 105.7 2.04
Atlanta Hawks* (9) 43 39 .524 10.0 113.9 112.4 1.55
Charlotte Hornets (10) 43 39 .524 10.0 115.3 114.9 0.53
New York Knicks (11) 37 45 .451 16.0 106.5 106.6 -0.01
Washington Wizards (12) 35 47 .427 18.0 108.6 112.0 -3.23
Indiana Pacers (13) 25 57 .305 28.0 111.5 114.9 -3.26
Detroit Pistons (14) 23 59 .280 30.0 104.8 112.5 -7.36
Orlando Magic (15) 22 60 .268 31.0 104.2 112.2 -7.67

How to scrape specific tables from web page with multiple tables?

I'm trying to scrape some NFL data from:
url = https://www.pro-football-reference.com/years/2019/opp.htm.
I first tried to scrape the data from the tables with pandas. I've done this before and it's always been straight forward. I expected pandas to return a list of all tables found on the page. However, when I ran
dfs = pd.read_html(url)
I only received the first two tables from the web page, Team Defense and Team Advanced Defense.
I then went to try to scrape the other tables with bs4 and requests. To test, I first only tried to scrape the first table:
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
table = soup.find('table', id = 'advanced_defense')
rows = table.find_all('tr')
for tr in rows:
td = tr.find_all('td')
row = [i.text for i in td]
print(row)
I was then able to simply change the id such that I returned both the Team Defense and Team Advanced Defense - the same two tables that pandas returned.
However, when I try to use the same method to scrape the other tables on the page I receive an error. I obtained the id by inspecting the web page in the same manner as the first two tables and am unable to get a result.
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
table = soup.find('table', id = 'passing')
rows = table.find_all('tr')
for tr in rows:
td = tr.find_all('td')
row = [i.text for i in td]
print(row)
It is not able to find anything for table when attempting to scrape any of the other tables on the page as I receive the following error
AttributeError: 'NoneType' object has no attribute 'find_all'
I find it strange how both pandas and bs4 are only able to return the Team Defense and Team Advanced Defense tables.
I only intend to scrape the Team Defense, Passing Defense, and Rushing Defense tables.
How could I approach successfully scraping the Passing Defense and Rushing Defense tables?
So the sports reference.com sites are tricky in that the first table (or a few tables) do show up in the html source. The other tables are dynamically rendered. HOWEVER, those other tables are within the Comments within the html. So to get those other tables, you have to pull out the comments, then can use pandas or beautifulsoup to get those table tags.
So you can grab the team stats as you normally would. Then pull the comments and parse those other tables.
import pandas as pd
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://www.pro-football-reference.com/years/2019/opp.htm'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
dfs = [pd.read_html(url, header=0, attrs={'id':'team_stats'})[0]]
dfs[0].columns = dfs[0].iloc[0,:]
dfs[0] = dfs[0].iloc[1:,:].reset_index(drop=True)
for each in comments:
if 'table' in each and ('id="passing"' in each or 'id="rushing"' in each):
dfs.append(pd.read_html(each)[0])
Output:
for df in dfs:
print (df)
0 Rk Tm G PF ... 1stPy Sc% TO% EXP
0 1 New England Patriots 16 225 ... 39 19.4 17.3 165.75
1 2 Buffalo Bills 16 259 ... 33 23.6 12.4 39.85
2 3 Baltimore Ravens 16 282 ... 39 32.9 14.6 16.61
3 4 Chicago Bears 16 298 ... 30 31.5 10.7 -4.15
4 5 Minnesota Vikings 16 303 ... 31 34.5 17.0 -7.88
5 6 Pittsburgh Steelers 16 303 ... 30 29.9 19.0 85.78
6 7 Kansas City Chiefs 16 308 ... 39 34.6 13.6 -65.69
7 8 San Francisco 49ers 16 310 ... 30 29.0 14.2 77.41
8 9 Green Bay Packers 16 313 ... 20 34.5 14.1 -63.65
9 10 Denver Broncos 16 316 ... 34 37.3 8.4 -35.98
10 11 Dallas Cowboys 16 321 ... 38 35.5 9.9 -36.81
11 12 Tennessee Titans 16 331 ... 27 32.1 11.8 -54.20
12 13 New Orleans Saints 16 341 ... 43 34.7 12.7 -41.89
13 14 Los Angeles Chargers 16 345 ... 28 37.3 8.2 -86.11
14 15 Philadelphia Eagles 16 354 ... 28 33.9 10.2 -29.57
15 16 New York Jets 16 359 ... 40 34.4 10.1 -0.06
16 17 Los Angeles Rams 16 364 ... 30 33.7 12.7 -11.53
17 18 Indianapolis Colts 16 373 ... 23 39.3 13.1 -58.37
18 19 Houston Texans 16 385 ... 28 39.3 13.1 -160.87
19 20 Cleveland Browns 16 393 ... 37 36.9 11.2 -91.15
20 21 Jacksonville Jaguars 16 397 ... 33 37.4 9.2 -120.09
21 22 Seattle Seahawks 16 398 ... 25 37.1 16.3 -92.02
22 23 Atlanta Falcons 16 399 ... 30 42.8 9.0 -105.34
23 24 Oakland Raiders 16 419 ... 52 41.2 8.5 -159.71
24 25 Cincinnati Bengals 16 420 ... 21 39.8 8.8 -132.66
25 26 Detroit Lions 16 423 ... 39 40.1 9.0 -142.55
26 27 Washington Redskins 16 435 ... 34 41.9 12.2 -135.83
27 28 Arizona Cardinals 16 442 ... 38 42.6 9.5 -174.55
28 29 Tampa Bay Buccaneers 16 449 ... 39 39.6 13.5 12.23
29 30 New York Giants 16 451 ... 32 39.7 8.7 -105.11
30 31 Carolina Panthers 16 470 ... 30 41.4 9.4 -116.88
31 32 Miami Dolphins 16 494 ... 34 45.6 8.8 -175.02
32 NaN Avg Team NaN 365.0 ... 32.9 36.0 11.8 -56.6
33 NaN League Total NaN 11680 ... 1054 36.0 11.8 NaN
34 NaN Avg Tm/G NaN 22.8 ... 2.1 36.0 11.8 NaN
[35 rows x 28 columns]
Rk Tm G Cmp ... NY/A ANY/A Sk% EXP
0 1.0 San Francisco 49ers 16.0 318.0 ... 4.80 4.6 8.5 58.30
1 2.0 New England Patriots 16.0 303.0 ... 5.00 3.5 8.1 117.74
2 3.0 Pittsburgh Steelers 16.0 314.0 ... 5.50 4.7 9.5 20.19
3 4.0 Buffalo Bills 16.0 348.0 ... 5.20 4.7 7.4 30.01
4 5.0 Los Angeles Chargers 16.0 328.0 ... 6.50 6.3 6.1 -92.16
5 6.0 Baltimore Ravens 16.0 318.0 ... 5.70 5.2 6.4 15.40
6 7.0 Cleveland Browns 16.0 318.0 ... 6.30 6.1 6.9 -64.09
7 8.0 Kansas City Chiefs 16.0 352.0 ... 5.70 5.2 7.2 -36.78
8 9.0 Chicago Bears 16.0 362.0 ... 5.90 5.7 5.3 -47.04
9 10.0 Dallas Cowboys 16.0 370.0 ... 5.90 6.1 6.4 -67.46
10 11.0 Denver Broncos 16.0 348.0 ... 6.30 6.1 6.9 -61.45
11 12.0 Los Angeles Rams 16.0 348.0 ... 5.90 5.7 8.2 -42.76
12 13.0 Carolina Panthers 16.0 347.0 ... 6.20 5.8 8.9 -63.03
13 14.0 Green Bay Packers 16.0 326.0 ... 6.30 5.7 7.0 -27.30
14 15.0 Minnesota Vikings 16.0 394.0 ... 5.80 5.3 7.4 -34.01
15 16.0 Jacksonville Jaguars 16.0 327.0 ... 6.70 6.7 8.3 -98.77
16 17.0 New York Jets 16.0 363.0 ... 6.10 6.0 5.6 -79.16
17 18.0 Washington Redskins 16.0 371.0 ... 6.50 6.7 7.8 -135.17
18 19.0 Philadelphia Eagles 16.0 348.0 ... 6.30 6.4 7.0 -88.15
19 20.0 New Orleans Saints 16.0 371.0 ... 5.90 5.8 7.8 -94.59
20 21.0 Cincinnati Bengals 16.0 308.0 ... 7.40 7.4 5.8 -126.81
21 22.0 Atlanta Falcons 16.0 351.0 ... 6.90 7.0 5.0 -128.75
22 23.0 Indianapolis Colts 16.0 394.0 ... 6.60 6.4 6.8 -86.44
23 24.0 Tennessee Titans 16.0 386.0 ... 6.40 6.2 6.7 -92.39
24 25.0 Oakland Raiders 16.0 337.0 ... 7.40 7.8 5.7 -177.69
25 26.0 Miami Dolphins 16.0 344.0 ... 7.40 7.7 4.0 -172.01
26 27.0 Seattle Seahawks 16.0 383.0 ... 6.70 6.2 4.5 -77.18
27 28.0 New York Giants 16.0 369.0 ... 7.10 7.4 6.1 -152.48
28 29.0 Houston Texans 16.0 375.0 ... 6.90 7.1 5.0 -160.60
29 30.0 Tampa Bay Buccaneers 16.0 408.0 ... 6.10 6.2 6.6 -38.17
30 31.0 Arizona Cardinals 16.0 421.0 ... 7.00 7.7 6.2 -190.81
31 32.0 Detroit Lions 16.0 381.0 ... 7.10 7.7 4.4 -162.94
32 NaN Avg Team NaN 354.1 ... 6.29 6.2 6.7 -73.60
33 NaN League Total NaN 11331.0 ... 6.29 6.2 6.7 NaN
34 NaN Avg Tm/G NaN 22.1 ... 6.29 6.2 6.7 NaN
[35 rows x 25 columns]
Rk Tm G Att ... TD Y/A Y/G EXP
0 1.0 Tampa Bay Buccaneers 16.0 362.0 ... 11.0 3.3 73.8 56.23
1 2.0 New York Jets 16.0 417.0 ... 12.0 3.3 86.9 72.34
2 3.0 Philadelphia Eagles 16.0 353.0 ... 13.0 4.1 90.1 47.64
3 4.0 New Orleans Saints 16.0 345.0 ... 12.0 4.2 91.3 39.45
4 5.0 Baltimore Ravens 16.0 340.0 ... 12.0 4.4 93.4 -1.25
5 6.0 New England Patriots 16.0 365.0 ... 7.0 4.2 95.5 33.13
6 7.0 Indianapolis Colts 16.0 383.0 ... 8.0 4.1 97.9 21.54
7 8.0 Oakland Raiders 16.0 405.0 ... 15.0 3.9 98.1 17.69
8 9.0 Chicago Bears 16.0 414.0 ... 16.0 3.9 102.0 38.83
9 10.0 Buffalo Bills 16.0 388.0 ... 12.0 4.3 103.1 10.92
10 11.0 Dallas Cowboys 16.0 407.0 ... 14.0 4.1 103.5 25.11
11 12.0 Tennessee Titans 16.0 415.0 ... 14.0 4.0 104.5 28.27
12 13.0 Minnesota Vikings 16.0 404.0 ... 8.0 4.3 108.0 21.01
13 14.0 Pittsburgh Steelers 16.0 462.0 ... 7.0 3.8 109.6 63.09
14 15.0 Atlanta Falcons 16.0 421.0 ... 13.0 4.2 110.9 17.98
15 16.0 Denver Broncos 16.0 426.0 ... 9.0 4.2 111.4 12.72
16 17.0 San Francisco 49ers 16.0 401.0 ... 11.0 4.5 112.6 9.91
17 18.0 Los Angeles Chargers 16.0 429.0 ... 15.0 4.2 112.8 1.08
18 19.0 Los Angeles Rams 16.0 444.0 ... 15.0 4.1 113.1 21.49
19 20.0 New York Giants 16.0 469.0 ... 19.0 3.9 113.3 40.51
20 21.0 Detroit Lions 16.0 455.0 ... 13.0 4.1 115.9 17.32
21 22.0 Seattle Seahawks 16.0 388.0 ... 22.0 4.9 117.7 -17.45
22 23.0 Green Bay Packers 16.0 411.0 ... 15.0 4.7 120.1 -42.18
23 24.0 Arizona Cardinals 16.0 439.0 ... 9.0 4.4 120.1 15.13
24 25.0 Houston Texans 16.0 403.0 ... 12.0 4.8 121.1 -6.34
25 26.0 Kansas City Chiefs 16.0 416.0 ... 14.0 4.9 128.2 -41.35
26 27.0 Miami Dolphins 16.0 485.0 ... 15.0 4.5 135.4 -6.14
27 28.0 Jacksonville Jaguars 16.0 435.0 ... 23.0 5.1 139.3 -21.95
28 29.0 Carolina Panthers 16.0 445.0 ... 31.0 5.2 143.5 -62.69
29 30.0 Cleveland Browns 16.0 463.0 ... 19.0 5.0 144.7 -37.50
30 31.0 Washington Redskins 16.0 493.0 ... 14.0 4.7 146.2 -6.89
31 32.0 Cincinnati Bengals 16.0 504.0 ... 17.0 4.7 148.9 -12.07
32 NaN Avg Team NaN 418.3 ... 14.0 4.3 112.9 11.10
33 NaN League Total NaN 13387.0 ... 447.0 4.3 112.9 NaN
34 NaN Avg Tm/G NaN 26.1 ... 0.9 4.3 112.9 NaN
[35 rows x 9 columns]

Getting table from webpage: Problem getting full html

I need to get the table from this page: https://stats.nba.com/teams/traditional/?sort=GP&dir=-1. From the html of the page one can see that the table is encoded in the descendants of the tag
<nba-stat-table filters="filters" ... >
<div class="nba-stat-table">
<div class="nba-stat-table__overflow" data-fixed="2" role="grid">
<table>
...
</nba-stat-table>
(I cannot add a screenshot since I am new to stackoverflow but just doing: right click -> inspect element wherever in the table you will see what I mean).
I've tried some different ways such as the first and second answer to this question How to extract tables from websites in Python as well as those to this other question pandas read_html ValueError: No tables found (since trying the first solution I've got an error which is essentially this second question).
First try using pandas:
import requests
import pandas as pd
url = 'http://stats.nba.com/teams/traditional/?sort=GP&dir=-1'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
Or another try with BeautifulSoup:
import requests
from bs4 import BeautifulSoup
url = "https://stats.nba.com/teams/traditional/?sort=GP&dir=-1"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
stats_table = soup.find('nba-stat-table')
for child in stats_table.descendants:
print(child)
For the first I got ''pandas read_html ValueError: No tables found'' error. For the second I didn't get any error but nothing showing. Then, I have tried to see on a file what was actually happening by doing:
with open('html.txt', 'w') as fout:
fout.write(str(page.content))
and/or:
with open('html.txt', 'w') as fout:
fout.write(str(soup))
and I get in the text file in the part of the html in which the table should be:
<nba-stat-table filters="filters"
ng-if="!isLoading && !noData"
options="options"
params="params"
rows="teamStats.rows"
template="teams/teams-traditional">
</nba-stat-table>
So it appears that I am not getting all the descendats of this tag which actually contains the information of the table. Then, does someone has a solution which obtains the whole html of the page and so it allows me for parsing it or instead an alternative solution to obtaining the table?
Here's what I try when attempting to scrape data. (By the way I LOVE scraping/working with sports data.)
1) Pandas pd.read_html(). (beautifulSoup actually works under the hood here). I like this method as it's rather easy and quick. Usually only requires a small amount of manipulation if it does return what I want. The pandas' pd.read_html() only works if the data is within <table> tags though in the html. Since there are no <table> tags here, it will return what you stated as "ValueError: No tables found". So good work on trying that first, it's the easiest method when it works.
2) The other "go to" method I'll use, is then to see if the data is pulled through XHR. Actually, this might be my first choice as it can give you options of being able to filter what is returned, but requires a little more (not much) investigated work to find the correct request url and query parameter. (This is the route I went for this solution).
3) If it is generated through javascript, sometimes you can find the data in json format with <script> tags using BeautifulSoup. this requires a bit more investigation of pulling out the right <script> tag, then doing string manipulation to get the string in a valid json format to be able to use json.loads() to read in the data.
4a) Use BeautifulSoup to pull out the data elements if they are present in other tags and not rendered by javascript.
4b) Selenium is an option to allow the page to render first, then go into the html and parse with BeautifulSoup (in some cases allow Selenium to render and then could use pd.read_html() if it renders <table> tags), but is usually my last choice. It's not that it doesn't work or is bad, it just slow and unnecessary if any of the above choices work.
So I went with option 2. Here's the code and output:
import requests
import pandas as pd
url = 'https://stats.nba.com/stats/leaguedashteamstats'
headers = {'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Mobile Safari/537.36'}
payload = {
'Conference': '',
'DateFrom': '',
'DateTo': '',
'Division': '',
'GameScope': '',
'GameSegment': '',
'LastNGames': '82',
'LeagueID': '00',
'Location': '',
'MeasureType': 'Base',
'Month': '0',
'OpponentTeamID': '0',
'Outcome': '',
'PORound': '0',
'PaceAdjust': 'N',
'PerMode': 'PerGame',
'Period': '0',
'PlayerExperience': '',
'PlayerPosition': '',
'PlusMinus': 'N',
'Rank': 'N',
'Season': '2019-20',
'SeasonSegment': '',
'SeasonType': 'Regular Season',
'ShotClockRange': '',
'StarterBench': '',
'TeamID': '0',
'TwoWay': '0',
'VsConference':'',
'VsDivision':'' }
jsonData = requests.get(url, headers=headers, params=payload).json()
df = pd.DataFrame(jsonData['resultSets'][0]['rowSet'], columns=jsonData['resultSets'][0]['headers'])
Output:
print (df.to_string())
TEAM_ID TEAM_NAME GP W L W_PCT MIN FGM FGA FG_PCT FG3M FG3A FG3_PCT FTM FTA FT_PCT OREB DREB REB AST TOV STL BLK BLKA PF PFD PTS PLUS_MINUS GP_RANK W_RANK L_RANK W_PCT_RANK MIN_RANK FGM_RANK FGA_RANK FG_PCT_RANK FG3M_RANK FG3A_RANK FG3_PCT_RANK FTM_RANK FTA_RANK FT_PCT_RANK OREB_RANK DREB_RANK REB_RANK AST_RANK TOV_RANK STL_RANK BLK_RANK BLKA_RANK PF_RANK PFD_RANK PTS_RANK PLUS_MINUS_RANK CFID CFPARAMS
0 1610612737 Atlanta Hawks 4 2 2 0.500 48.0 39.5 84.3 0.469 10.0 31.8 0.315 16.0 23.0 0.696 8.5 34.5 43.0 25.0 18.0 10.0 5.3 7.3 23.8 21.5 105.0 1.0 1 11 14 14 10 14 27 5 23 21 21 24 21 27 24 21 25 9 19 3 15 29 17 25 22 15 10 Atlanta Hawks
1 1610612738 Boston Celtics 3 2 1 0.667 48.0 39.0 97.3 0.401 11.7 35.0 0.333 18.0 26.3 0.684 14.7 33.0 47.7 21.0 11.3 9.3 6.3 5.7 25.0 29.3 107.7 5.0 19 11 4 11 10 18 3 28 13 12 18 19 12 28 2 25 13 22 1 7 7 19 20 1 16 10 10 Boston Celtics
2 1610612751 Brooklyn Nets 3 1 2 0.333 51.3 43.0 93.3 0.461 15.3 38.7 0.397 22.7 32.3 0.701 10.7 38.3 49.0 22.7 19.7 8.3 5.3 6.0 26.0 27.3 124.0 0.7 19 18 14 18 1 4 9 8 3 7 4 7 3 24 11 9 7 19 26 15 13 22 21 3 1 16 10 Brooklyn Nets
3 1610612766 Charlotte Hornets 4 1 3 0.250 48.0 38.3 86.5 0.442 14.8 36.8 0.401 14.3 19.8 0.722 10.0 31.5 41.5 24.3 19.3 5.3 4.0 6.5 22.5 21.8 105.5 -13.8 1 18 23 23 10 23 23 16 4 9 3 26 26 23 14 28 29 14 24 30 24 25 10 22 21 28 10 Charlotte Hornets
4 1610612741 Chicago Bulls 4 1 3 0.250 48.0 38.5 95.0 0.405 9.8 35.5 0.275 17.5 23.0 0.761 12.3 32.0 44.3 20.0 12.8 10.0 4.5 7.0 21.3 20.8 104.3 -6.0 1 18 23 23 10 20 6 27 24 11 29 20 21 15 7 26 24 26 2 3 20 27 7 26 24 23 10 Chicago Bulls
5 1610612739 Cleveland Cavaliers 3 1 2 0.333 48.0 39.3 89.3 0.440 10.7 34.7 0.308 13.0 18.7 0.696 10.7 38.0 48.7 20.7 15.7 6.3 4.0 4.7 19.0 19.3 102.3 -5.0 19 18 14 18 10 17 16 19 19 13 24 29 27 26 11 10 10 23 13 25 24 11 3 30 26 22 10 Cleveland Cavaliers
6 1610612742 Dallas Mavericks 4 3 1 0.750 48.0 39.5 86.8 0.455 12.8 40.8 0.313 23.0 31.0 0.742 9.8 36.0 45.8 24.0 13.0 6.8 5.0 2.8 19.3 27.0 114.8 4.0 1 1 4 4 10 14 21 10 8 5 22 5 4 19 17 15 21 15 3 19 19 1 4 7 10 12 10 Dallas Mavericks
7 1610612743 Denver Nuggets 4 3 1 0.750 49.3 37.3 90.5 0.412 11.5 31.8 0.362 19.8 24.3 0.814 13.0 35.5 48.5 22.0 14.3 8.0 5.5 4.5 22.8 23.8 105.8 3.3 1 1 4 4 4 27 13 25 14 21 11 13 20 7 4 19 11 20 7 16 11 9 13 14 20 13 10 Denver Nuggets
8 1610612765 Detroit Pistons 4 2 2 0.500 48.0 38.5 80.0 0.481 10.5 26.0 0.404 19.0 25.3 0.752 8.3 33.5 41.8 21.8 18.8 6.0 5.3 3.8 21.8 21.8 106.5 -3.0 1 11 14 14 10 20 29 3 20 28 2 15 15 17 26 24 28 21 21 27 15 4 9 22 18 21 10 Detroit Pistons
9 1610612744 Golden State Warriors 3 1 2 0.333 48.0 40.0 98.3 0.407 11.3 36.7 0.309 24.7 28.3 0.871 15.3 32.0 47.3 27.0 15.3 9.3 1.3 5.7 19.3 23.3 116.0 -12.0 19 18 14 18 10 10 2 26 15 10 23 3 8 2 1 26 14 4 10 7 30 19 5 17 9 27 10 Golden State Warriors
10 1610612745 Houston Rockets 3 2 1 0.667 48.0 38.3 91.3 0.420 13.0 45.7 0.285 28.0 34.0 0.824 9.3 38.0 47.3 24.3 15.7 6.3 5.3 5.0 23.7 28.0 117.7 0.3 19 11 4 11 10 22 11 23 6 3 27 1 1 5 21 10 14 12 13 25 13 15 15 2 8 18 10 Houston Rockets
11 1610612754 Indiana Pacers 3 0 3 0.000 48.0 39.7 90.0 0.441 8.0 23.3 0.343 13.7 16.7 0.820 9.7 29.3 39.0 24.3 13.3 8.7 4.3 5.3 23.7 19.7 101.0 -7.3 19 28 23 28 10 13 14 18 29 30 14 27 29 6 19 30 30 12 5 10 23 18 15 28 27 26 10 Indiana Pacers
12 1610612746 LA Clippers 4 3 1 0.750 48.0 43.0 82.8 0.520 13.0 32.0 0.406 22.5 28.5 0.789 8.3 34.0 42.3 25.0 17.0 8.5 5.5 3.3 26.3 25.5 121.5 9.0 1 1 4 4 10 4 28 1 6 19 1 8 7 11 26 22 26 9 18 11 11 2 23 10 3 3 10 LA Clippers
13 1610612747 Los Angeles Lakers 4 3 1 0.750 48.0 40.0 87.5 0.457 9.8 29.0 0.336 19.5 24.5 0.796 10.0 36.0 46.0 23.5 15.3 8.5 8.0 3.5 21.5 24.3 109.3 11.8 1 1 4 4 10 10 17 9 24 25 17 14 17 8 14 15 19 17 9 11 1 3 8 12 15 1 10 Los Angeles Lakers
14 1610612763 Memphis Grizzlies 4 1 3 0.250 49.3 39.5 95.3 0.415 9.0 32.0 0.281 19.0 24.5 0.776 11.3 36.5 47.8 24.8 18.8 9.0 6.5 7.0 27.0 23.8 107.0 -13.8 1 18 23 23 4 14 5 24 27 19 28 15 17 14 10 14 12 11 21 9 5 27 26 14 17 28 10 Memphis Grizzlies
15 1610612748 Miami Heat 4 3 1 0.750 49.3 40.3 86.0 0.468 12.8 32.3 0.395 24.8 33.8 0.733 9.8 39.0 48.8 23.8 22.5 8.5 6.5 4.8 27.0 27.3 118.0 8.0 1 1 4 4 4 9 25 6 8 17 5 2 2 20 17 6 9 16 30 11 5 12 26 6 7 6 10 Miami Heat
16 1610612749 Milwaukee Bucks 3 2 1 0.667 49.7 45.0 95.0 0.474 16.7 46.0 0.362 17.3 25.7 0.675 6.3 43.7 50.0 27.3 13.7 8.0 7.0 4.0 24.7 25.7 124.0 6.0 19 11 4 11 2 2 6 4 2 1 10 21 14 29 29 2 3 3 6 16 2 5 19 9 1 9 10 Milwaukee Bucks
17 1610612750 Minnesota Timberwolves 3 3 0 1.000 49.7 42.7 96.7 0.441 12.7 42.0 0.302 23.3 30.7 0.761 13.0 37.0 50.0 25.7 15.3 10.7 3.7 7.7 20.0 27.3 121.3 10.0 19 1 1 1 2 6 4 17 10 4 25 4 5 15 4 13 3 5 10 1 28 30 6 3 4 2 10 Minnesota Timberwolves
18 1610612740 New Orleans Pelicans 4 0 4 0.000 49.3 45.5 100.8 0.452 16.8 45.8 0.366 13.3 18.3 0.726 12.0 34.0 46.0 30.8 16.3 8.0 5.3 4.0 26.5 21.8 121.0 -7.3 1 28 29 28 4 1 1 13 1 2 8 28 28 21 8 22 19 1 17 16 15 5 25 22 5 24 10 New Orleans Pelicans
19 1610612752 New York Knicks 4 1 3 0.250 48.0 37.8 87.0 0.434 10.8 27.8 0.387 18.8 28.0 0.670 13.8 35.3 49.0 18.8 20.3 10.0 3.8 5.3 27.0 23.0 105.0 -7.3 1 18 23 23 10 24 19 20 17 27 6 17 10 30 3 20 7 27 27 3 27 17 26 18 22 24 10 New York Knicks
20 1610612760 Oklahoma City Thunder 4 1 3 0.250 48.0 37.5 84.5 0.444 10.8 29.3 0.368 17.3 24.8 0.697 9.5 40.3 49.8 18.8 18.5 6.8 4.5 4.8 23.5 22.8 103.0 1.8 1 18 23 23 10 25 26 15 17 23 7 22 16 25 20 3 5 27 20 19 20 12 14 19 25 14 10 Oklahoma City Thunder
21 1610612753 Orlando Magic 3 1 2 0.333 48.0 35.3 91.3 0.387 8.7 33.3 0.260 16.7 21.0 0.794 10.7 35.7 46.3 20.3 13.0 9.7 5.7 4.3 17.7 20.3 96.0 -1.3 19 18 14 18 10 28 11 30 28 16 30 23 25 9 11 18 17 24 3 6 9 8 1 27 29 20 10 Orlando Magic
22 1610612755 Philadelphia 76ers 3 3 0 1.000 48.0 38.7 86.7 0.446 10.3 34.7 0.298 22.0 30.3 0.725 10.0 39.7 49.7 25.3 20.3 10.7 7.0 4.0 29.7 27.3 109.7 7.3 19 1 1 1 10 19 22 14 22 13 26 11 6 22 14 5 6 7 29 1 2 5 29 3 14 7 10 Philadelphia 76ers
23 1610612756 Phoenix Suns 4 2 2 0.500 49.3 39.8 87.5 0.454 12.3 34.5 0.355 22.3 26.8 0.832 7.8 39.0 46.8 27.8 16.0 8.5 4.0 6.5 31.3 27.0 114.0 8.8 1 11 14 14 4 12 17 11 12 15 13 10 11 4 28 6 16 2 15 11 24 25 30 7 11 4 10 Phoenix Suns
24 1610612757 Portland Trail Blazers 4 2 2 0.500 48.0 41.5 89.8 0.462 9.3 28.3 0.327 21.0 24.5 0.857 8.5 37.8 46.3 17.0 15.5 6.8 5.3 4.5 26.3 22.5 113.3 0.3 1 11 14 14 10 7 15 7 26 26 20 12 17 3 24 12 18 30 12 19 15 9 23 20 12 19 10 Portland Trail Blazers
25 1610612758 Sacramento Kings 4 0 4 0.000 48.0 34.3 86.5 0.396 11.0 32.3 0.341 16.0 21.5 0.744 11.5 30.8 42.3 18.8 18.8 6.5 4.5 5.0 22.5 22.0 95.5 -19.5 1 28 29 28 10 30 23 29 16 17 15 24 24 18 9 29 26 27 21 23 20 15 10 21 30 30 10 Sacramento Kings
26 1610612759 San Antonio Spurs 3 3 0 1.000 48.0 44.3 92.0 0.482 8.0 23.7 0.338 22.3 28.3 0.788 12.7 38.7 51.3 25.3 16.0 5.7 7.0 5.7 18.7 24.7 119.0 4.7 19 1 1 1 10 3 10 2 29 29 16 9 8 12 6 8 2 7 15 29 2 19 2 11 6 11 10 San Antonio Spurs
27 1610612761 Toronto Raptors 4 3 1 0.750 49.3 37.5 87.0 0.431 14.3 39.3 0.363 22.8 25.8 0.883 9.3 44.3 53.5 22.8 20.3 6.8 5.8 6.3 24.3 24.0 112.0 8.8 1 1 4 4 4 25 19 22 5 6 9 6 13 1 22 1 1 18 27 19 8 24 18 13 13 4 10 Toronto Raptors
28 1610612762 Utah Jazz 4 3 1 0.750 48.0 35.0 77.3 0.453 10.5 29.3 0.359 18.3 23.0 0.793 5.5 39.8 45.3 20.3 19.5 6.5 3.3 4.8 26.0 23.5 98.8 7.3 1 1 4 4 10 29 30 12 20 23 12 18 21 10 30 4 22 25 25 23 29 12 21 16 28 8 10 Utah Jazz
29 1610612764 Washington Wizards 3 1 2 0.333 48.0 41.0 95.0 0.432 12.7 38.7 0.328 11.7 15.0 0.778 9.0 36.0 45.0 25.7 15.0 6.0 5.7 6.0 22.7 19.7 106.3 0.7 19 18 14 18 10 8 6 21 10 7 19 30 30 13 23 15 23 5 8 27 9 22 12 28 19 16 10 Washington Wizards
Using Selenium will be the best way to do it. Then you can get the whole content which is rendered by javascript.
https://towardsdatascience.com/simple-web-scraping-with-pythons-selenium-4cedc52798cd

pandasql EOL error while scanning string literal

I have the code below where I'm trying to use pandasql to run a sql query with sqldf. I'm doing some division and aggregation. The query runs just fine when I run it in r with sqldf. I'm totally new to pandasql and I'm getting the error below, can anyone see what my issue is and suggest how to fix it? I've also included some sample data.
Code:
import pandasql
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
ExampleDf=pysqldf("select sum(lastSaleAmount-priorSaleAmount)/sum(squareFootage) as AvgPric
,zipcode
from data
where priorSaleDate between '2010-01-01' and '2011-01-01'
group by zipcode
order by
sum(lastSaleAmount-priorSaleAmount)/sum(squareFootage) desc")
Error:
File "<ipython-input-100-679165684772>", line 1
ExampleDf=pysqldf("select sum(lastSaleAmount-priorSaleAmount)/sum(squareFootage) as AvgPric
^
SyntaxError: EOL while scanning string literal
Sample Data:
print(data.iloc[:50])
id address city state zipcode latitude \
0 39525749 8171 E 84th Ave Denver CO 80022 39.849160
1 184578398 10556 Wheeling St Denver CO 80022 39.888020
2 184430015 3190 Wadsworth Blvd Denver CO 80033 39.761710
3 155129946 3040 Wadsworth Blvd Denver CO 80033 39.760780
4 245107 5615 S Eaton St Denver CO 80123 39.616181
5 3523925 6535 W Sumac Ave Denver CO 80123 39.615136
6 30560679 6673 W Berry Ave Denver CO 80123 39.616350
7 39623928 5640 S Otis St Denver CO 80123 39.615213
8 148975825 5342 S Gray St Denver CO 80123 39.620158
9 184623176 4967 S Wadsworth Blvd Denver CO 80123 39.626770
10 39811456 6700 W Dorado Dr # 11 Denver CO 80123 39.614540
11 39591617 4956 S Perry St Denver CO 80123 39.628740
12 39577604 4776 S Gar Way Denver CO 80123 39.630547
13 153665665 8890 W Tanforan Dr Denver CO 80123 39.630738
14 39868673 5538 W Prentice Cir Denver CO 80123 39.620625
15 184328555 4254 W Monmouth Ave Denver CO 80123 39.629000
16 30554949 6600 W Berry Ave Denver CO 80123 39.616165
17 24157982 6560 W Sumac Ave Denver CO 80123 39.614712
18 51335315 5655 S Fenton St Denver CO 80123 39.615488
19 152799217 5626 S Fenton St Denver CO 80123 39.616153
20 51330641 5599 S Fenton St Denver CO 80123 39.616514
21 15598828 6595 W Sumac Ave Denver CO 80123 39.615144
22 49360310 6420 W Sumac Ave Denver CO 80123 39.614531
23 39777745 4962 S Field Ct Denver CO 80123 39.625819
24 18021201 9664 W Grand Ave Denver CO 80123 39.625826
25 39776096 4881 S Jellison St Denver CO 80123 39.628401
26 29850085 5012 S Field Ct Denver CO 80123 39.625537
27 51597934 4982 S Field Ct Denver CO 80123 39.625757
28 39563379 4643 S Hoyt St Denver CO 80123 39.632457
29 18922140 5965 W Sumac Ave Denver CO 80123 39.615199
30 39914328 9740 W Chenango Ave Denver CO 80123 39.627226
31 51323181 5520 W Prentice Cir Denver CO 80123 39.620548
32 3493378 4665 S Garland Way Denver CO 80123 39.632063
33 4115341 5466 W Prentice Cir Denver CO 80123 39.619027
34 39639069 5735 W Berry Ave Denver CO 80123 39.617727
35 184333944 9015 W Tanforan Dr Denver CO 80123 39.631178
36 18197471 4977 S Garland St Denver CO 80123 39.626080
37 49430482 9540 W Bellwood Pl Denver CO 80123 39.624558
38 39868648 5535 S Fenton St Denver CO 80123 39.617145
39 143684222 3761 W Wagon Trail Dr Denver CO 80123 39.631251
40 152898579 4850 S Yukon St Denver CO 80123 39.629025
41 43174426 4951 S Ammons St Denver CO 80123 39.626582
42 39615194 7400 W Grant Ranch Blvd # 31 Denver CO 80123 39.618440
43 184340029 7400 W Grant Ranch Blvd # 7 Denver CO 80123 39.618440
44 3523919 5425 S Gray St Denver CO 80123 39.618265
45 151444231 6610 W Berry Ave Denver CO 80123 39.616148
46 19150871 4756 S Perry St Denver CO 80123 39.630389
47 39545155 4328 W Bellewood Dr Denver CO 80123 39.627883
48 3523923 6585 W Sumac Ave Denver CO 80123 39.615145
49 51337334 5737 W Alamo Dr Denver CO 80123 39.615881
longitude bedrooms bathrooms rooms squareFootage lotSize yearBuilt \
0 -104.893468 3 2.0 6 1378 9968 2003.0
1 -104.830930 2 2.0 6 1653 6970 2004.0
2 -105.081070 3 1.0 0 1882 23875 1917.0
3 -105.081060 4 3.0 0 2400 11500 1956.0
4 -105.058812 3 4.0 8 2305 5600 1998.0
5 -105.069018 3 5.0 7 2051 6045 1996.0
6 -105.070760 4 4.0 8 2051 6315 1997.0
7 -105.070617 3 3.0 7 2051 8133 1997.0
8 -105.063094 3 3.0 7 1796 5038 1999.0
9 -105.081990 3 3.0 0 2054 4050 2007.0
10 -105.071350 3 4.0 7 2568 6397 2000.0
11 -105.040126 3 2.0 6 1290 9000 1962.0
12 -105.100242 3 4.0 6 1804 6952 1983.0
13 -105.097718 3 3.0 6 1804 7439 1983.0
14 -105.059503 4 5.0 8 3855 9656 1998.0
15 -105.042330 2 2.0 4 1297 16600 1962.0
16 -105.069424 4 4.0 9 2321 5961 1996.0
17 -105.069264 4 4.0 8 2321 6337 1997.0
18 -105.060173 3 3.0 7 2321 6151 1998.0
19 -105.059696 3 3.0 7 2071 6831 1999.0
20 -105.060193 3 3.0 7 2071 6050 1998.0
21 -105.069803 3 3.0 7 2074 6022 1996.0
22 -105.067815 4 4.0 9 2588 6432 1996.0
23 -105.099825 3 2.0 7 1567 6914 1980.0
24 -105.106423 3 2.0 5 1317 9580 1983.0
25 -105.108440 3 3.0 5 1317 6718 1982.0
26 -105.099012 2 2.0 6 808 8568 1980.0
27 -105.099484 2 1.0 6 808 6858 1980.0
28 -105.104752 3 2.0 6 1321 6000 1978.0
29 -105.062378 3 4.0 8 2350 6839 1997.0
30 -105.107806 2 2.0 5 1586 6510 1982.0
31 -105.058600 2 4.0 6 2613 8250 1998.0
32 -105.101493 3 2.0 8 1590 7044 1977.0
33 -105.057427 3 5.0 7 2614 9350 1999.0
34 -105.059123 3 4.0 7 2107 6491 1998.0
35 -105.099179 2 1.0 5 1340 6741 1982.0
36 -105.103470 3 2.0 6 1085 6120 1985.0
37 -105.104316 3 1.0 6 1085 13500 1981.0
38 -105.060195 4 3.0 8 2365 6050 1998.0
39 -105.036567 3 2.0 5 1344 9240 1959.0
40 -105.081998 2 3.0 5 1601 6660 1986.0
41 -105.087250 3 2.0 8 1858 6890 1986.0
42 -105.079900 2 2.0 5 1603 5742 1997.0
43 -105.079900 2 2.0 5 1603 6168 1997.0
44 -105.061397 3 3.0 7 1860 6838 1998.0
45 -105.069618 3 4.0 8 2376 5760 1996.0
46 -105.038707 3 2.0 5 1355 9600 1960.0
47 -105.042611 2 2.0 6 1867 11000 1973.0
48 -105.069604 3 3.0 7 2382 5830 1996.0
49 -105.059085 3 3.0 6 1872 5500 1999.0
lastSaleDate lastSaleAmount priorSaleDate priorSaleAmount \
0 2009-12-17 75000 2004-05-13 165700.0
1 2004-09-23 216935 NaN NaN
2 2008-04-03 330000 NaN NaN
3 2008-12-02 185000 2008-06-27 0.0
4 2012-07-18 308000 2011-12-29 0.0
5 2006-09-12 363500 2005-05-16 339000.0
6 2014-12-15 420000 2006-07-07 345000.0
7 2004-03-15 328700 1998-04-09 225200.0
8 2011-08-16 274900 2011-01-10 0.0
9 2015-12-01 407000 2012-10-30 312000.0
10 2014-11-12 638000 2005-03-22 530000.0
11 2004-02-02 235000 2000-10-12 171000.0
12 2004-07-19 247000 1999-06-07 187900.0
13 2013-08-14 249700 2000-09-07 217900.0
14 2004-08-17 580000 1999-01-11 574000.0
15 2011-11-07 150000 NaN NaN
16 2006-01-18 402800 2004-08-16 335000.0
17 2013-12-31 422000 2012-11-05 399000.0
18 1999-12-02 277900 NaN NaN
19 2000-02-04 271800 NaN NaN
20 1999-10-20 274400 NaN NaN
21 2007-11-30 314500 NaN NaN
22 2001-12-31 342500 NaN NaN
23 2016-12-02 328000 2016-08-02 231200.0
24 2017-06-21 376000 2008-02-29 244000.0
25 2004-08-31 225000 NaN NaN
26 2016-09-06 310000 2015-09-15 258900.0
27 1999-12-06 128000 NaN NaN
28 2004-04-28 197000 NaN NaN
29 2011-08-11 365000 2004-08-04 365000.0
30 2015-07-08 302000 2004-07-15 210000.0
31 2000-02-10 425000 1999-04-08 396500.0
32 2016-02-26 275000 2004-12-03 204000.0
33 2005-08-29 580000 1999-09-10 398200.0
34 2004-06-30 355000 2001-02-22 320000.0
35 2015-05-26 90000 1983-06-01 80000.0
36 2017-06-08 312500 2017-05-12 258000.0
37 2001-04-27 184000 1999-11-10 164900.0
38 2004-02-08 335000 2001-05-08 339950.0
39 2016-10-17 290000 NaN 70200.0
40 2010-09-02 260000 1998-04-14 189900.0
41 2012-07-30 231600 2012-03-30 0.0
42 2013-10-24 400000 2004-08-04 388400.0
43 2004-11-19 350000 1998-10-05 292400.0
44 2005-06-23 295000 2004-07-26 300000.0
45 2009-06-24 404500 2000-05-04 304900.0
46 1999-12-14 153500 1999-12-14 153500.0
47 2004-05-25 208000 NaN NaN
48 2016-10-20 502000 2005-05-31 357000.0
49 2013-04-05 369000 2000-08-07 253000.0
estimated_value
0 239753
1 343963
2 488840
3 494073
4 513676
5 496062
6 514953
7 494321
8 496079
9 424514
10 721350
11 331915
12 389415
13 386694
14 784587
15 354031
16 515537
17 544960
18 504791
19 495121
20 495894
21 496281
22 528343
23 349041
24 367754
25 356934
26 346001
27 342927
28 337969
29 500105
30 353827
31 693035
32 350857
33 716655
34 493156
35 349355
36 348079
37 343957
38 504705
39 311996
40 391469
41 418814
42 502894
43 478049
44 475615
45 521467
46 366187
47 386913
48 527104
49 497239
Just change the quotes to be able to read multiline string:
ExampleDf=pysqldf("""select sum(lastSaleAmount-priorSaleAmount)/sum(squareFootage) as AvgPric
,zipcode
from data
where priorSaleDate between '2010-01-01' and '2011-01-01'
group by zipcode
order by
sum(lastSaleAmount-priorSaleAmount)/sum(squareFootage) desc""")

iterating over loc on dataframes

I'm trying to extract data from a list of dataframes and extract row ranges. Each dataframe might not have the same data, therefore I have a list of possible index ranges that I would like loc to loop over, i.e. from the code sample below, I might want CIN to LAN, but on another dataframe, the CIN row doesn't exist, so I will want DET to LAN or HOU to LAN.
so I was thinking putting them in a list and iterating over the list, i.e.
for df in dfs:
ranges=[[df.loc["CIN":"LAN"]], [df.loc["DET":"LAN"]]]
extracted ranges = (i for i in ranges)
I'm not sure how you would iterate over a list and feed into loc, or perhaps .query().
df1 stint g ab r h X2b X3b hr rbi sb cs bb \
year team
2007 CIN 6 379 745 101 203 35 2 36 125.0 10.0 1.0 105
DET 5 301 1062 162 283 54 4 37 144.0 24.0 7.0 97
HOU 4 311 926 109 218 47 6 14 77.0 10.0 4.0 60
LAN 11 413 1021 153 293 61 3 36 154.0 7.0 5.0 114
NYN 13 622 1854 240 509 101 3 61 243.0 22.0 4.0 174
SFN 5 482 1305 198 337 67 6 40 171.0 26.0 7.0 235
TEX 2 198 729 115 200 40 4 28 115.0 21.0 4.0 73
TOR 4 459 1408 187 378 96 2 58 223.0 4.0 2.0 190
df2 so ibb hbp sh sf gidp
year team
2008 DET 176.0 3.0 10.0 4.0 8.0 28.0
HOU 212.0 3.0 9.0 16.0 6.0 17.0
LAN 141.0 8.0 9.0 3.0 8.0 29.0
NYN 310.0 24.0 23.0 18.0 15.0 48.0
SFN 188.0 51.0 8.0 16.0 6.0 41.0
TEX 140.0 4.0 5.0 2.0 8.0 16.0
TOR 265.0 16.0 12.0 4.0 16.0 38.0
Here is a solution:
import pandas as pd
# Prepare a list of ranges
ranges = [('CIN','LAN'), ('DET','LAN')]
# Declare an empty list of data frames and a list with the existing data frames
df_ranges = []
df_list = [df1, df2]
# Loop over multi-indices
for i, idx_range in enumerate(ranges):
df = df_list[i]
row1, row2 = idx_range
df_ranges.append(df.loc[(slice(None), slice(row1, row2)),:])
# Print the extracted data
print('Extracted data:\n')
print(df_ranges)
Output:
[ stint g ab r h X2b X3b hr rbi sb cs bb
year team
2007 CIN 6 379 745 101 203 35 2 36 125 10 1 105
DET 5 301 1062 162 283 54 4 37 144 24 7 97
HOU 4 311 926 109 218 47 6 14 77 10 4 60
LAN 11 413 1021 153 293 61 3 36 154 7 5 114
so ibb hbp sh sf gidp
year team
2008 DET 176 3 10 4 8 28
HOU 212 3 9 16 6 17
LAN 141 8 9 3 8 29]

Categories