VLookup in Pandas using merge - python

I have 2 dataframes:
df_dict:
Bet365 Team (Dataset) Record ID
-- -------------------- ---------------- -----------
0 Lincoln City Lincoln 50
1 Peterborough Peterboro 65
2 Cambridge Utd Cambridge 72
3 Harrogate Town Harrogate 87
4 Cologne FC Koln 160
5 Hertha Berlin Hertha 167
6 Arminia Bielefeld Bielefeld 169
7 Schalke Schalke 04 173
8 TSG Hoffenheim Hoffenheim 174
9 SC Freiburg Freiburg 175
10 Zulte-Waregem Waregem 320
11 Royal Excel Mouscron Mouscron 325
Other dataframe:
df_odds:
DateTime League HomeTeam AwayTeam B365H B365D B365A
-- -------------------------- ---------------------- ----------------- -------------------- ------- ------- -------
0 2021-01-09 12:30:00.000001 England League 1 Lincoln City Peterborough 2.29 3.4 3.1
1 2021-01-09 15:00:00 England League 2 Cambridge Utd Harrogate Town 2.29 3.2 3.25
2 2021-01-09 15:14:59.999999 Belgium First Division Zulte-Waregem Royal Excel Mouscron 1.85 3.75 3.8
3 2021-01-09 14:29:59.999999 Germany Bundesliga 1 SC Freiburg Cologne 1.9 3.75 3.75
4 2021-01-09 14:29:59.999999 Germany Bundesliga 1 Schalke TSG Hoffenheim 3.8 3.8 1.85
5 2021-01-10 17:00:00.000001 Germany Bundesliga 1 Arminia Bielefeld Hertha Berlin 4 3.5 1.9
6 2021-01-16 14:29:59.999999 Germany Bundesliga 1 Cologne Hertha Berlin 3.2 3.3 2.25
I would like to merge the dataset to get the final dataframe as:
df_expected
DateTime League HomeTeam AwayTeam B365H B365D B365A
-- -------------------------- ---------------------- ---------- ---------- ------- ------- -------
0 2021-01-09 12:30:00.000001 England League 1 Lincoln Peterboro 2.29 3.4 3.1
1 2021-01-09 15:00:00 England League 2 Cambridge Harrogate 2.29 3.2 3.25
2 2021-01-09 15:14:59.999999 Belgium First Division Waregem Mouscron 1.85 3.75 3.8
3 2021-01-09 14:29:59.999999 Germany Bundesliga 1 Freiburg FC Koln 1.9 3.75 3.75
4 2021-01-09 14:29:59.999999 Germany Bundesliga 1 Schalke 04 Hoffenheim 3.8 3.8 1.85
5 2021-01-10 17:00:00.000001 Germany Bundesliga 1 Bielefeld Hertha 4 3.5 1.9
6 2021-01-16 14:29:59.999999 Germany Bundesliga 1 FC Koln Hertha 3.2 3.3 2.25
The common key is the df_dict.Bet365
I am trying merge pd.merge but I am unable to get the right keys and the correct join
Help would be greatly appreciated

Use Series.map for both columns by Series with Bet365 column converted to index:
s = df_dict.set_index('Bet365')['Team (Dataset)']
df_odds['HomeTeam'] = df_odds['HomeTeam'].map(s)
df_odds['AwayTeam'] = df_odds['AwayTeam'].map(s)

Related

How Do I Merge DFs with a for loop

Below is my code. What I want to do is merge the spread and total values for each week that I have saved in separate files. It works perfectly for individual weeks, but doesn't when I introduce the for loop. I assume its overwriting each time it merges, but when I place the .merge code outside the for loop, it only writes the last iteration to the excel file.
year = 2015
weeks = np.arange(1,18)
for week in weeks:
odds = pd.read_excel(fr'C:\Users\logan\Desktop\Gambling_Scraper\Odds_{year}\Odds{year}Wk{week}.xlsx')
odds['Favorite'] = odds['Favorite'].map(lambda x: x.lstrip('at '))
odds['Underdog'] = odds['Underdog'].map(lambda x: x.lstrip('at '))
odds['UD_Spread'] = odds['Spread'] * -1
#new df to add spread
new_df = pd.DataFrame(odds['Favorite'].append(odds['Underdog']))
new_df['Tm'] = new_df
new_df['Wk'] = new_df['Tm'] + str(week)
new_df['Spread'] = odds['Spread'].append(odds['UD_Spread'])
#new df to add total
total_df = pd.DataFrame(odds['Favorite'].append(odds['Underdog']))
total_df['Tm'] = total_df
total_df['Wk'] = total_df['Tm'] + str(week)
total_df['Total']= pd.DataFrame(odds['Total'].append(odds['Total']))
df['Week'] = df['Week'].astype(int)
df['Merge'] = df['Tm'].astype(str) + df['Week'].astype(str)
df = df.merge(new_df['Spread'], left_on='Merge', right_on=new_df['Wk'], how='left')
df = df.merge(total_df['Total'], left_on='Merge', right_on=total_df['Wk'], how='left')
df['Implied Tm Pts'] = df['Total'].astype(float) /2 - df['Spread'].astype(float)/2
df.to_excel('DFS2015.xlsx')
What I get:
Name Position Week Tm Merge Spread Total Implied Tm Pts
Devonta Freeman RB 1 Falcons Falcons1 3 55 26
Devonta Freeman RB 2 Falcons Falcons2
Devonta Freeman RB 3 Falcons Falcons3
Devonta Freeman RB 4 Falcons Falcons4
Devonta Freeman RB 5 Falcons Falcons5
Devonta Freeman RB 6 Falcons Falcons6
Devonta Freeman RB 7 Falcons Falcons7
Devonta Freeman RB 8 Falcons Falcons8
Devonta Freeman RB 9 Falcons Falcons9
Devonta Freeman RB 11 Falcons Falcons11
Devonta Freeman RB 13 Falcons Falcons13
Devonta Freeman RB 14 Falcons Falcons14
Devonta Freeman RB 15 Falcons Falcons15
Devonta Freeman RB 16 Falcons Falcons16
Devonta Freeman RB 17 Falcons Falcons17
Antonio Brown WR 1 Steelers Steelers1 7 51 22
But I need a value in each row.
Trying to merge 'Spread' and Total from this data:
Date Favorite Spread Underdog Spread2 Total Away Money
Line Home Money Line Week Favs Spread Uds Spread2
September 10, 2015 8:30 PM Patriots -7.0 Steelers 7 51.0 +270 -340 1 Patriots1 -7.0 Steelers1 7
September 13, 2015 1:00 PM Packers -6.0 Bears 6 48.0 -286 +230 1 Packers1 -6.0 Bears1 6
September 13, 2015 1:00 PM Chiefs -1.0 Texans 1 40.0 -115 -105 1 Chiefs1 -1.0 Texans1 1
September 13, 2015 1:00 PM Jets -4.0 Browns 4 40.0 +170 -190 1 Jets1 -4.0 Browns1 4
September 13, 2015 1:00 PM Colts -1.0 Bills 1 44.0 -115 -105 1 Colts1 -1.0 Bills1 1
September 13, 2015 1:00 PM Dolphins -4.0 Football Team 4 46.0 -210 +175 1 Dolphins1 -4.0 Football Team1 4
September 13, 2015 1:00 PM Panthers -3.0 Jaguars 3 41.0 -150 +130 1 Panthers1 -3.0 Jaguars1 3
September 13, 2015 1:00 PM Seahawks -4.0 Rams 4 42.0 -185 +160 1 Seahawks1 -4.0 Rams1 4
September 13, 2015 4:05 PM Cardinals -2.0 Saints 2 49.0 +120 -140 1 Cardinals1 -2.0 Saints1 2
September 13, 2015 4:05 PM Chargers -4.0 Lions 4 46.0 +160 -180 1 Chargers1 -4.0 Lions1 4
September 13, 2015 4:25 PM Buccaneers -3.0 Titans 3 40.0 +130 -150 1 Buccaneers1 -3.0 Titans1 3
September 13, 2015 4:25 PM Bengals -3.0 Raiders 3 43.0 -154 +130 1 Bengals1 -3.0 Raiders1 3
September 13, 2015 4:25 PM Broncos -4.0 Ravens 4 46.0 +180 -220 1 Broncos1 -4.0 Ravens1 4
September 13, 2015 8:30 PM Cowboys -7.0 Giants 7 52.0 +240 -300 1 Cowboys1 -7.0 Giants1 7
September 14, 2015 7:10 PM Eagles -3.0 Falcons 3 55.0 -188 +150 1 Eagles1 -3.0 Falcons1 3
September 14, 2015 10:20 PM Vikings -2.0 49ers 2 42.0 -142 +120 1 Vikings1 -2.0 49ers1 2

python selenium: <input> element not interactable

I was trying to crawl down nba player info from https://nba.com/players and click the button "Show Historic" on the webpage
nba_webpage_picture
part of the html code for the input button shows below:
<div aria-label="Show Historic Toggle" class="Toggle_switch__2e_90">
<input type="checkbox" class="Toggle_input__gIiFd" name="showHistoric">
<span class="Toggle_slider__hCMQQ Toggle_sliderActive__15Jrf Toggle_slidercerulean__1UnnV">
</span>
</div>
I simply use find_element_by_xpath to locate the input button and click
button_show_historic = driver.find_element_by_xpath("//input[#name='showHistoric']")
button_show_historic.click()
However it says:
Exception has occurred: ElementNotInteractableException
Message: element not interactable
(Session info: chrome=88.0.4324.192)
Could anyone help on solving this issue? Is this because the input is not visible?
Simply wait for the span element not the input element and click.
wait = WebDriverWait(driver, 30)
driver.get('https://www.nba.com/players')
wait.until(EC.element_to_be_clickable((By.XPATH,"//button[.='I Accept']"))).click()
wait.until(EC.element_to_be_clickable((By.XPATH,"//input[#name='showHistoric']/preceding::span[1]"))).click()
Import
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Also to find an api just look under Developer tools ->Network->Headers
and Response to find if it gets populated.
Most probably problem is you don't have any wait code. You should wait until page is loaded. You can use simple python sleep function:
import time
time.sleep(3) #it will wait 3 seconds
##Do your action
Or You can use explicit wait. Check this page: selenium.dev
No need to use selenium when there's an api. Try this:
import requests
import pandas as pd
url = 'https://stats.nba.com/stats/playerindex'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36',
'Referer': 'http://stats.nba.com'}
payload = {
'College': '',
'Country': '',
'DraftPick': '',
'DraftRound': '',
'DraftYear': '',
'Height': '' ,
'Historical': '1',
'LeagueID': '00',
'Season': '2020-21',
'SeasonType': 'Regular Season',
'TeamID': '0',
'Weight': ''}
jsonData = requests.get(url, headers=headers, params=payload).json()
cols = jsonData['resultSets'][0]['headers']
data = jsonData['resultSets'][0]['rowSet']
df = pd.DataFrame(data, columns=cols)
Output: [4589 rows x 26 columns]
print(df.head(20).to_string())
PERSON_ID PLAYER_LAST_NAME PLAYER_FIRST_NAME PLAYER_SLUG TEAM_ID TEAM_SLUG IS_DEFUNCT TEAM_CITY TEAM_NAME TEAM_ABBREVIATION JERSEY_NUMBER POSITION HEIGHT WEIGHT COLLEGE COUNTRY DRAFT_YEAR DRAFT_ROUND DRAFT_NUMBER ROSTER_STATUS PTS REB AST STATS_TIMEFRAME FROM_YEAR TO_YEAR
0 76001 Abdelnaby Alaa alaa-abdelnaby 1.610613e+09 blazers 0 Portland Trail Blazers POR 30 F 6-10 240 Duke USA 1990.0 1.0 25.0 NaN 5.7 3.3 0.3 Career 1990 1994
1 76002 Abdul-Aziz Zaid zaid-abdul-aziz 1.610613e+09 rockets 0 Houston Rockets HOU 54 C 6-9 235 Iowa State USA 1968.0 1.0 5.0 NaN 9.0 8.0 1.2 Career 1968 1977
2 76003 Abdul-Jabbar Kareem kareem-abdul-jabbar 1.610613e+09 lakers 0 Los Angeles Lakers LAL 33 C 7-2 225 UCLA USA 1969.0 1.0 1.0 NaN 24.6 11.2 3.6 Career 1969 1988
3 51 Abdul-Rauf Mahmoud mahmoud-abdul-rauf 1.610613e+09 nuggets 0 Denver Nuggets DEN 1 G 6-1 162 Louisiana State USA 1990.0 1.0 3.0 NaN 14.6 1.9 3.5 Career 1990 2000
4 1505 Abdul-Wahad Tariq tariq-abdul-wahad 1.610613e+09 kings 0 Sacramento Kings SAC 9 F-G 6-6 235 San Jose State France 1997.0 1.0 11.0 NaN 7.8 3.3 1.1 Career 1997 2003
5 949 Abdur-Rahim Shareef shareef-abdur-rahim 1.610613e+09 grizzlies 0 Memphis Grizzlies MEM 3 F 6-9 245 California USA 1996.0 1.0 3.0 NaN 18.1 7.5 2.5 Career 1996 2007
6 76005 Abernethy Tom tom-abernethy 1.610613e+09 warriors 0 Golden State Warriors GSW 5 F 6-7 220 Indiana USA 1976.0 3.0 43.0 NaN 5.6 3.2 1.2 Career 1976 1980
7 76006 Able Forest forest-able 1.610613e+09 sixers 0 Philadelphia 76ers PHI 6 G 6-3 180 Western Kentucky USA 1956.0 NaN NaN NaN 0.0 1.0 1.0 Career 1956 1956
8 76007 Abramovic John john-abramovic 1.610610e+09 None 1 Pittsburgh Ironmen PIT None F 6-3 195 Salem USA NaN NaN NaN NaN 9.5 NaN 0.7 Career 1946 1947
9 203518 Abrines Alex alex-abrines 1.610613e+09 thunder 0 Oklahoma City Thunder OKC 8 G 6-6 190 FC Barcelona Spain 2013.0 2.0 32.0 NaN 5.3 1.4 0.5 Career 2016 2018
10 1630173 Achiuwa Precious precious-achiuwa 1.610613e+09 heat 0 Miami Heat MIA 5 F 6-8 225 Memphis Nigeria 2020.0 1.0 20.0 1.0 5.9 3.9 0.6 Season 2020 2020
11 101165 Acker Alex alex-acker 1.610613e+09 clippers 0 LA Clippers LAC 3 G 6-5 185 Pepperdine USA 2005.0 2.0 60.0 NaN 2.7 1.0 0.5 Career 2005 2008
12 76008 Ackerman Donald donald-ackerman 1.610613e+09 knicks 0 New York Knicks NYK G 6-0 183 Long Island-Brooklyn USA 1953.0 2.0 NaN NaN 1.5 0.5 0.8 Career 1953 1953
13 76009 Acres Mark mark-acres 1.610613e+09 magic 0 Orlando Magic ORL 42 C 6-11 220 Oral Roberts USA 1985.0 2.0 40.0 NaN 3.6 4.1 0.5 Career 1987 1992
14 76010 Acton Charles charles-acton 1.610613e+09 rockets 0 Houston Rockets HOU 24 F 6-6 210 Hillsdale USA NaN NaN NaN NaN 3.3 2.0 0.5 Career 1967 1967
15 203112 Acy Quincy quincy-acy 1.610613e+09 kings 0 Sacramento Kings SAC 13 F 6-7 240 Baylor USA 2012.0 2.0 37.0 NaN 4.9 3.5 0.6 Career 2012 2018
16 76011 Adams Alvan alvan-adams 1.610613e+09 suns 0 Phoenix Suns PHX 33 C 6-9 210 Oklahoma USA 1975.0 1.0 4.0 NaN 14.1 7.0 4.1 Career 1975 1987
17 76012 Adams Don don-adams 1.610613e+09 pistons 0 Detroit Pistons DET 10 F 6-7 210 Northwestern USA 1970.0 8.0 120.0 NaN 8.7 5.6 1.8 Career 1970 1976
18 200801 Adams Hassan hassan-adams 1.610613e+09 nets 0 Brooklyn Nets BKN 8 F 6-4 220 Arizona USA 2006.0 2.0 54.0 NaN 2.5 1.2 0.2 Career 2006 2008
19 1629121 Adams Jaylen jaylen-adams 1.610613e+09 bucks 0 Milwaukee Bucks MIL 6 G 6-0 225 St. Bonaventure USA NaN NaN NaN 1.0 0.3 0.4 0.3 Season 2018 2020

Reshape pivot table in pandas

I need to reshape a csv pivot table. A small extract looks like:
country location confirmedcases_10-02-2020 deaths_10-02-2020 confirmedcases_11-02-2020 deaths_11-02-2020
0 Australia New South Wales 4.0 0.0 4 0.0
1 Australia Victoria 4.0 0.0 4 0.0
2 Australia Queensland 5.0 0.0 5 0.0
3 Australia South Australia 2.0 0.0 2 0.0
4 Cambodia Sihanoukville 1.0 0.0 1 0.0
5 Canada Ontario 3.0 0.0 3 0.0
6 Canada British Columbia 4.0 0.0 4 0.0
7 China Hubei 31728.0 974.0 33366 1068.0
8 China Zhejiang 1177.0 0.0 1131 0.0
9 China Guangdong 1177.0 1.0 1219 1.0
10 China Henan 1105.0 7.0 1135 8.0
11 China Hunan 912.0 1.0 946 2.0
12 China Anhui 860.0 4.0 889 4.0
13 China Jiangxi 804.0 1.0 844 1.0
14 China Chongqing 486.0 2.0 505 3.0
15 China Sichuan 417.0 1.0 436 1.0
16 China Shandong 486.0 1.0 497 1.0
17 China Jiangsu 515.0 0.0 543 0.0
18 China Shanghai 302.0 1.0 311 1.0
19 China Beijing 342.0 3.0 352 3.0
is there any ready to use pandas tool to achieve it?
into something like:
country location date confirmedcases deaths
0 Australia New South Wales 2020-02-10 4.0 0.0
1 Australia Victoria 2020-02-10 4.0 0.0
2 Australia Queensland 2020-02-10 5.0 0.0
3 Australia South Australia 2020-02-10 2.0 0.0
4 Cambodia Sihanoukville 2020-02-10 1.0 0.0
5 Canada Ontario 2020-02-10 3.0 0.0
6 Canada British Columbia 2020-02-10 4.0 0.0
7 China Hubei 2020-02-10 31728.0 974.0
8 China Zhejiang 2020-02-10 1177.0 0.0
9 China Guangdong 2020-02-10 1177.0 1.0
10 China Henan 2020-02-10 1105.0 7.0
11 China Hunan 2020-02-10 912.0 1.0
12 China Anhui 2020-02-10 860.0 4.0
13 China Jiangxi 2020-02-10 804.0 1.0
14 China Chongqing 2020-02-10 486.0 2.0
15 China Sichuan 2020-02-10 417.0 1.0
16 China Shandong 2020-02-10 486.0 1.0
17 China Jiangsu 2020-02-10 515.0 0.0
18 China Shanghai 2020-02-10 302.0 1.0
19 China Beijing 2020-02-10 342.0 3.0
20 Australia New South Wales 2020-02-11 4.0 0.0
21 Australia Victoria 2020-02-11 4.0 0.0
22 Australia Queensland 2020-02-11 5.0 0.0
23 Australia South Australia 2020-02-11 2.0 0.0
24 Cambodia Sihanoukville 2020-02-11 1.0 0.0
25 Canada Ontario 2020-02-11 3.0 0.0
26 Canada British Columbia 2020-02-11 4.0 0.0
27 China Hubei 2020-02-11 33366.0 1068.0
28 China Zhejiang 2020-02-11 1131.0 0.0
29 China Guangdong 2020-02-11 1219.0 1.0
30 China Henan 2020-02-11 1135.0 8.0
31 China Hunan 2020-02-11 946.0 2.0
32 China Anhui 2020-02-11 889.0 4.0
33 China Jiangxi 2020-02-11 844.0 1.0
34 China Chongqing 2020-02-11 505.0 3.0
35 China Sichuan 2020-02-11 436.0 1.0
36 China Shandong 2020-02-11 497.0 1.0
37 China Jiangsu 2020-02-11 543.0 0.0
38 China Shanghai 2020-02-11 311.0 1.0
39 China Beijing 2020-02-11 352.0 3.0
Use pd.wide_to_long:
print (pd.wide_to_long(df,stubnames=["confirmedcases","deaths"],
i=["country","location"],j="date",sep="_",
suffix=r'\d{2}-\d{2}-\d{4}').reset_index())
country location date confirmedcases deaths
0 Australia New South Wales 10-02-2020 4.0 0.0
1 Australia New South Wales 11-02-2020 4.0 0.0
2 Australia Victoria 10-02-2020 4.0 0.0
3 Australia Victoria 11-02-2020 4.0 0.0
4 Australia Queensland 10-02-2020 5.0 0.0
5 Australia Queensland 11-02-2020 5.0 0.0
6 Australia South Australia 10-02-2020 2.0 0.0
7 Australia South Australia 11-02-2020 2.0 0.0
8 Cambodia Sihanoukville 10-02-2020 1.0 0.0
9 Cambodia Sihanoukville 11-02-2020 1.0 0.0
10 Canada Ontario 10-02-2020 3.0 0.0
11 Canada Ontario 11-02-2020 3.0 0.0
12 Canada British Columbia 10-02-2020 4.0 0.0
13 Canada British Columbia 11-02-2020 4.0 0.0
14 China Hubei 10-02-2020 31728.0 974.0
15 China Hubei 11-02-2020 33366.0 1068.0
16 China Zhejiang 10-02-2020 1177.0 0.0
17 China Zhejiang 11-02-2020 1131.0 0.0
18 China Guangdong 10-02-2020 1177.0 1.0
19 China Guangdong 11-02-2020 1219.0 1.0
20 China Henan 10-02-2020 1105.0 7.0
21 China Henan 11-02-2020 1135.0 8.0
22 China Hunan 10-02-2020 912.0 1.0
23 China Hunan 11-02-2020 946.0 2.0
24 China Anhui 10-02-2020 860.0 4.0
25 China Anhui 11-02-2020 889.0 4.0
26 China Jiangxi 10-02-2020 804.0 1.0
27 China Jiangxi 11-02-2020 844.0 1.0
28 China Chongqing 10-02-2020 486.0 2.0
29 China Chongqing 11-02-2020 505.0 3.0
30 China Sichuan 10-02-2020 417.0 1.0
31 China Sichuan 11-02-2020 436.0 1.0
32 China Shandong 10-02-2020 486.0 1.0
33 China Shandong 11-02-2020 497.0 1.0
34 China Jiangsu 10-02-2020 515.0 0.0
35 China Jiangsu 11-02-2020 543.0 0.0
36 China Shanghai 10-02-2020 302.0 1.0
37 China Shanghai 11-02-2020 311.0 1.0
38 China Beijing 10-02-2020 342.0 3.0
39 China Beijing 11-02-2020 352.0 3.0
Yes, and you can achieve it by reshaping the dataframe.
Firs you have to melt the columns to have them as values:
df = df.melt(['country', 'location'],
[ p for p in df.columns if p not in ['country', 'location'] ],
'key',
'value')
#> country location key value
#> 0 Australia New South Wales confirmedcases_10-02-2020 4
#> 1 Australia Victoria confirmedcases_10-02-2020 4
#> 2 Australia Queensland confirmedcases_10-02-2020 5
#> 3 Australia South Australia confirmedcases_10-02-2020 2
#> 4 Cambodia Sihanoukville confirmedcases_10-02-2020 1
#> .. ... ... ... ...
#> 75 China Sichuan deaths_11-02-2020 1
#> 76 China Shandong deaths_11-02-2020 1
#> 77 China Jiangsu deaths_11-02-2020 0
#> 78 China Shanghai deaths_11-02-2020 1
#> 79 China Beijing deaths_11-02-2020 3
After that you need to separate the values in the column key:
key_split_series = df.key.str.split("_", expand=True)
df["key"] = key_split_series[0]
df["date"] = key_split_series[1]
#> country location key value date
#> 0 Australia New South Wales confirmedcases 4 10-02-2020
#> 1 Australia Victoria confirmedcases 4 10-02-2020
#> 2 Australia Queensland confirmedcases 5 10-02-2020
#> 3 Australia South Australia confirmedcases 2 10-02-2020
#> 4 Cambodia Sihanoukville confirmedcases 1 10-02-2020
#> .. ... ... ... ... ...
#> 75 China Sichuan deaths 1 11-02-2020
#> 76 China Shandong deaths 1 11-02-2020
#> 77 China Jiangsu deaths 0 11-02-2020
#> 78 China Shanghai deaths 1 11-02-2020
#> 79 China Beijing deaths 3 11-02-2020
In the end, you just need to pivot the table to have confirmedcases and deaths back as columns:
df = df.set_index(["country", "location", "date", "key"])["value"].unstack().reset_index()
#> key country location date confirmedcases deaths
#> 0 Australia New South Wales 10-02-2020 4 0
#> 1 Australia New South Wales 11-02-2020 4 0
#> 2 Australia Queensland 10-02-2020 5 0
#> 3 Australia Queensland 11-02-2020 5 0
#> 4 Australia South Australia 10-02-2020 2 0
#> .. ... ... ... ... ...
#> 35 China Shanghai 11-02-2020 311 1
#> 36 China Sichuan 10-02-2020 417 1
#> 37 China Sichuan 11-02-2020 436 1
#> 38 China Zhejiang 10-02-2020 1177 0
#> 39 China Zhejiang 11-02-2020 1131 0
Use {dataframe}.reshape((-1,1)) if there is only one feature and {dataframe}.reshape((1,-1)) if there is only one sample

creating new column by merging on column name and other column value

Trying to create a new column in DF1 that lists the home teams number of allstars for that year.
DF1
Date Visitor V_PTS Home H_PTS \
0 2012-10-30 19:00:00 Washington Wizards 84 Cleveland Cavaliers 94
1 2012-10-30 19:30:00 Dallas Mavericks 99 Los Angeles Lakers 91
2 2012-10-30 20:00:00 Boston Celtics 107 Miami Heat 120
3 2012-10-31 19:00:00 Dallas Mavericks 94 Utah Jazz 113
4 2012-10-31 19:00:00 San Antonio Spurs 99 New Orleans Pelicans 95
Attendance Arena Location Capacity \
0 20562 Quicken Loans Arena Cleveland, Ohio 20562
1 18997 Staples Center Los Angeles, California 18997
2 20296 American Airlines Arena Miami, Florida 19600
3 17634 Vivint Smart Home Arena Salt Lake City, Utah 18303
4 15358 Smoothie King Center New Orleans, Louisiana 16867
Yr Arena Opened Season
0 1994 2012-13
1 1992 2012-13
2 1999 2012-13
3 1991 2012-13
4 1999 2012-13
DF2
2012-13 2013-14 2014-15 2015-16 2016-17
Cleveland Cavaliers 1 1 2 1 3
Los Angeles Lakers 2 1 1 1 0
Miami Heat 3 3 2 2 1
Chicago Bulls 2 1 2 2 1
Detroit Pistons 0 0 0 1 1
Los Angeles Clippers 2 2 2 1 1
New Orleans Pelicans 0 1 1 1 1
Philadelphia 76ers 1 0 0 0 0
Phoenix Suns 0 0 0 0 0
Portland Trail Blazers 1 2 2 0 0
Toronto Raptors 0 1 1 2 2
DF1['H_Allstars']=DF2[DF1['Season'],DF1['Home']])
results in TypeError: 'Series' objects are mutable, thus they cannot be hashed
I understand the error just am not sure how else to do it.
I've removed the extra columns and just focused on the necessary ones for demonstration.
Input:
df1
Home 2012-13 2013-14 2014-15 2015-16 2016-17
0 Cleveland Cavaliers 1 1 2 1 3
1 Los Angeles Lakers 2 1 1 1 0
2 Miami Heat 3 3 2 2 1
3 Chicago Bulls 2 1 2 2 1
4 Detroit Pistons 0 0 0 1 1
5 Los Angeles Clippers 2 2 2 1 1
6 New Orleans Pelicans 0 1 1 1 1
7 Philadelphia 76ers 1 0 0 0 0
8 Phoenix Suns 0 0 0 0 0
9 Portland Trail Blazers 1 2 2 0 0
10 Toronto Raptors 0 1 1 2 2
df2
Visitor Home Season
0 Washington Wizards Cleveland Cavaliers 2012-13
1 Dallas Mavericks Los Angeles Lakers 2012-13
2 Boston Celtics Miami Heat 2012-13
3 Dallas Mavericks Utah Jazz 2012-13
4 San Antonio Spurs New Orleans Pelicans 2012-13
Step 1: Melt df1 to get the allstars column
df3 = pd.melt(df1, id_vars='Home', value_vars = df1.columns[df.columns.str.contains('20')], var_name = 'Season', value_name='H_Allstars')
Ouput:
Home Season H_Allstars
0 Cleveland Cavaliers 2012-13 1
1 Los Angeles Lakers 2012-13 2
2 Miami Heat 2012-13 3
3 Chicago Bulls 2012-13 2
4 Detroit Pistons 2012-13 0
5 Los Angeles Clippers 2012-13 2
6 New Orleans Pelicans 2012-13 0
7 Philadelphia 76ers 2012-13 1
8 Phoenix Suns 2012-13 0
...
Step 2: Merge this new dataframe with df2 to get the H_Allstars and V_Allstars columns
df4 = pd.merge(df2, df3, how='left', on=['Home', 'Season'])
Output:
Visitor Home Season H_Allstars
0 Washington Wizards Cleveland Cavaliers 2012-13 1.0
1 Dallas Mavericks Los Angeles Lakers 2012-13 2.0
2 Boston Celtics Miami Heat 2012-13 3.0
3 Dallas Mavericks Utah Jazz 2012-13 NaN
4 San Antonio Spurs New Orleans Pelicans 2012-13 0.0
Step 3: Add the V_Allstars column
# renaming column as required
df3.rename(columns={'Home': 'Visitor', 'H_Allstars': 'V_Allstars'}, inplace=True)
df5 = pd.merge(df4, df3, how='left', on=['Visitor', 'Season'])
Output:
Visitor Home Season H_Allstars V_Allstars
0 Washington Wizards Cleveland Cavaliers 2012-13 1.0 NaN
1 Dallas Mavericks Los Angeles Lakers 2012-13 2.0 NaN
2 Boston Celtics Miami Heat 2012-13 3.0 NaN
3 Dallas Mavericks Utah Jazz 2012-13 NaN NaN
4 San Antonio Spurs New Orleans Pelicans 2012-13 0.0 NaN
You can use pandas.melt . Bring your data df2 to long format, i.e. Home and season as columns and Allstars as values and then merge to df1 on 'Home' and 'Season'.
import pandas as pd
df2['Home'] = df2.index
df2 = pd.melt(df2, id_vars = 'Home', value_vars = ['2012-13', '2013-14', '2014-15', '2015-16', '2016-17'], var_name = 'Season', value_name='H_Allstars')
df = df1.merge(df2, on=['Home','Season'], how='left')

Python Pandas pivot with values equal to simple function of specific column

import pandas as pd
olympics = pd.read_csv('olympics.csv')
Edition NOC Medal
0 1896 AUT Silver
1 1896 FRA Gold
2 1896 GER Gold
3 1900 HUN Bronze
4 1900 GBR Gold
5 1900 DEN Bronze
6 1900 USA Gold
7 1900 FRA Bronze
8 1900 FRA Silver
9 1900 USA Gold
10 1900 FRA Silver
11 1900 GBR Gold
12 1900 SUI Silver
13 1900 ZZX Gold
14 1904 HUN Gold
15 1904 USA Bronze
16 1904 USA Gold
17 1904 USA Silver
18 1904 CAN Gold
19 1904 USA Silver
I can pivot the data frame to have some aggregate value
pivot = olympics.pivot_table(index='Edition', columns='NOC', values='Medal', aggfunc='count')
NOC AUT CAN DEN FRA GBR GER HUN SUI USA ZZX
Edition
1896 1.0 NaN NaN 1.0 NaN 1.0 NaN NaN NaN NaN
1900 NaN NaN 1.0 3.0 2.0 NaN 1.0 1.0 2.0 1.0
1904 NaN 1.0 NaN NaN NaN NaN 1.0 NaN 4.0 NaN
Rather than having the total number of medals in values= , I am interested to have a tuple (a triple) with (#Gold, #Silver, #Bronze), (0,0,0) for NaN
How do I do that succinctly and elegantly?
No need to use pivot_table, as pivot is perfectly fine with tuple for a value
value_counts to count all medals
create multi-index for all combinations of countries, dates, medals
reindex with fill_values=0
counts = df.groupby(['Edition', 'NOC']).Medal.value_counts()
mux = pd.MultiIndex.from_product(
[c.values for c in counts.index.levels], names=counts.index.names)
counts = counts.reindex(mux, fill_value=0).unstack('Medal')
counts = counts[['Bronze', 'Silver', 'Gold']]
pd.Series([tuple(l) for l in counts.values.tolist()], counts.index).unstack()

Categories