Multiple dataframe dates and groups conditions

Multiple dataframe dates and groups conditions - python

A B C D E
0 2002-01-13 Dan 2002-01-15 26 -1
1 2002-01-13 Dan 2002-01-15 10 0
2 2002-01-13 Dan 2002-01-15 16 1
3 2002-01-13 Vic 2002-01-17 14 0
4 2002-01-13 Vic 2002-01-03 18 0
5 2002-01-28 Mel 2002-02-08 37 0
6 2002-01-28 Mel 2002-02-06 29 0
7 2002-01-28 Mel 2002-02-10 20 0
8 2002-01-28 Rob 2002-02-12 30 -1
9 2002-01-28 Rob 2002-02-12 48 1
10 2002-01-28 Rob 2002-02-12 0 1
11 2002-01-28 Rob 2002-02-01 19 0
Wen answered a very similar question an hour ago, but I forgot to include some conditions. I´ll write them down in bold style:
I want to create a new df['F'] column, with next conditions, per each B group and ignoring zeros in D column:
F=D value, where A dates are nearest to 10 days later than C date and where E=0.
If E=0 doesn´t exist in the nearest A date to 10 days (case of 2002-01-28 Rob), F will be the mean of D values when E=-1 and E=1.
If there are two C dates at the same distance to 10 days from A (case of 2002-01-28 Mel), F will be the mean of these same-period D values.
Output should be:
A B C D E F
0 2002-01-13 Dan 2002-01-15 26 -1 10
1 2002-01-13 Dan 2002-01-15 10 0 10
2 2002-01-13 Dan 2002-01-15 16 1 10
3 2002-01-13 Vic 2002-01-17 14 0 14
4 2002-01-13 Vic 2002-01-03 18 0 14
5 2002-01-28 Mel 2002-02-08 37 0 33
6 2002-01-28 Mel 2002-02-06 29 0 33
7 2002-01-28 Mel 2002-02-10 20 0 33
8 2002-01-28 Rob 2002-02-12 30 -1 39
9 2002-01-28 Rob 2002-02-12 48 1 39
10 2002-01-28 Rob 2002-02-12 0 1 39
11 2002-01-28 Rob 2002-02-01 19 0 39
Wen answered:
df['F']=abs((df.C-df.A).dt.days-10)# get the days different
df['F']=df.B.map(df.loc[df.F==df.groupby('B').F.transform('min')].groupby('B').D.mean())# find the min value for the different , and get the mean
df
But now I can´t get to insert the new conditions (that I´ve put in bold style).

Change the mapper to
m=df.loc[(df.F==df.groupby('B').F.transform('min'))&(df.D!=0)].groupby('B').apply(lambda x : x['D'][x['E']==0].mean() if (x['E']==0).any() else x['D'].mean())
df['F']=df.B.map(m)

Related

In Selenium see if the 'a' anchor tag contains the text I want, and then extract multiple td's of text in the same row

For my python code, I have been trying to scrape data from NCAAF Stats. I have been having issues extracting the td's text after I evaluate if the anchor tag 'a', contains the text I want. I want to be able to find the teams amount of tds, points, and ppg. I have been able to successfully find the school by text in selenium, but after that I am unable to extract the info I want. Here is what I have coded so far.
from selenium import webdriver
driver = webdriver.Chrome('C:\\Users\\Carl\\Downloads\\chromedriver.exe')
driver.get('https://www.ncaa.com/stats/football/fbs/current/team/27')
# I plan to make a while or for loop later, that is why I used f strings
team = "Coastal Carolina"
first = driver.find_element_by_xpath(f'//a[text()="{team}"]')
# This was the way another similiarly asked question was answered but did not work
#tds = driver.find_element_by_xpath(f'//td//a[text()="{apples}"]/../td[4]').text
# This grabs data from the very first row of data... not the one I want
tds = first.find_element_by_xpath('//following-sibling::td[4]').text
total_points = first.find_element_by_xpath('//following-sibling::td[10]').text
ppg = first.find_element_by_xpath('//following-sibling::td[11]').text
print(tds, total_points, ppg)
driver.quit()
I have tried to look around for a similarly asked question and was able to find this snippet
tds = driver.find_element_by_xpath(f'//td//a[text()="{apples}"]/../td[4]').text
it unfortunately did not help me out much. The html structure looks like this. I appreciate any help, and thank you!

No need to use Selenium, the page isn't dynamic. Just use pandas to parse the table for you:
import pandas as pd
url = 'https://www.ncaa.com/stats/football/fbs/current/team/27'
dfs = pd.read_html(url)[0]
Output:
print(df)
Rank Team G TDs PAT 2PT Def Pts FG Saf Pts PPG
0 1 Ohio St. 6 39 39 0 0 6 0 291.0 48.5
1 2 Pittsburgh 6 40 36 0 0 4 1 290.0 48.3
2 3 Coastal Carolina 7 43 42 0 0 6 1 320.0 45.7
3 4 Alabama 7 41 40 1 0 9 0 315.0 45.0
4 5 Ole Miss 6 35 30 1 0 6 1 262.0 43.7
5 6 Cincinnati 6 36 34 1 0 3 0 261.0 43.5
6 7 Oklahoma 7 35 34 1 1 17 0 299.0 42.7
7 - SMU 7 40 36 1 0 7 0 299.0 42.7
8 9 Texas 7 38 37 0 0 8 1 291.0 41.6
9 10 Western Ky. 6 31 27 1 0 10 0 245.0 40.8
10 11 Tennessee 7 36 36 0 0 7 1 275.0 39.3
11 12 Wake Forest 6 28 24 2 0 12 0 232.0 38.7
12 13 UTSA 7 33 33 0 0 13 0 270.0 38.6
13 14 Michigan 6 28 25 1 0 12 0 231.0 38.5
14 15 Georgia 7 34 33 0 0 10 1 269.0 38.4
15 16 Baylor 7 35 35 0 0 7 1 268.0 38.3
16 17 Houston 6 30 28 0 0 5 0 223.0 37.2
17 - TCU 6 29 28 0 0 7 0 223.0 37.2
18 19 Marshall 7 34 33 0 0 7 0 258.0 36.9
19 - North Carolina 7 34 32 2 0 6 0 258.0 36.9
20 21 Nevada 6 26 24 1 0 12 0 218.0 36.3
21 22 Virginia 7 31 29 2 0 10 2 253.0 36.1
22 23 Fresno St. 7 32 27 1 0 10 0 251.0 35.9
23 - Memphis 7 33 26 3 0 7 0 251.0 35.9
24 25 Texas Tech 7 32 31 0 0 9 0 250.0 35.7
25 26 Auburn 7 29 28 1 0 12 1 242.0 34.6
26 27 Florida 7 33 29 1 0 4 0 241.0 34.4
27 - Missouri 7 31 31 0 0 8 0 241.0 34.4
28 29 Liberty 7 33 29 1 0 3 1 240.0 34.3
29 - Michigan St. 7 30 30 0 0 10 0 240.0 34.3
30 31 UCF 6 28 26 0 0 3 1 205.0 34.2
31 32 Oregon St. 6 27 27 0 0 5 0 204.0 34.0
32 33 Oregon 6 26 26 0 0 7 0 203.0 33.8
33 34 Iowa St. 6 23 22 0 0 14 0 202.0 33.7
34 35 UCLA 7 30 28 0 0 9 0 235.0 33.6
35 36 San Diego St. 6 25 24 1 0 7 0 197.0 32.8
36 37 LSU 7 29 29 0 0 8 0 227.0 32.4
37 38 Louisville 6 24 23 0 0 9 0 194.0 32.3
38 - Miami (FL) 6 24 22 1 0 8 1 194.0 32.3
39 - NC State 6 25 24 0 0 6 1 194.0 32.3
40 41 Southern California 6 22 19 3 0 12 0 193.0 32.2
41 42 Tulane 7 31 23 4 0 2 0 223.0 31.9
42 43 Arizona St. 7 30 25 2 0 4 0 221.0 31.6
43 44 Utah 6 25 22 1 0 5 0 189.0 31.5
44 45 Air Force 7 29 27 1 0 5 1 220.0 31.4
45 46 App State 7 27 24 0 0 11 0 219.0 31.3
46 47 Arkansas 7 27 25 0 0 10 0 217.0 31.0
47 - Army West Point 6 25 22 0 0 4 1 186.0 31.0
48 - Notre Dame 6 23 20 2 0 8 0 186.0 31.0
49 - Western Mich. 7 28 25 0 0 8 0 217.0 31.0

How to look for same columns from one dataframe in other dataframe pandas python?

I have one dataframe like this,
tabla_aciertos= {'Numeros_acertados' : [5,5,5,4,4,3,4,2,3,3,1,2,2],'Estrellas_acertadas': [2,1,0,2,1,2,0,2,1,0,2,1,0]}
categorias = [1,2,3,4,5,6,7,8,9,10,11,12,13]
categoria_de_premios = pd.DataFrame (tabla_aciertos,index = [categorias] )
categoria_de_premios
Numeros_acertados Estrellas_acertadas
1 5 2
2 5 1
3 5 0
4 4 2
5 4 1
6 3 2
7 4 0
8 2 2
9 3 1
10 3 0
11 1 2
12 2 1
13 2 0
and another df :
sorteos_anteriores.iloc[:,:]
uno dos tres cuatro cinco Estrella1 Estrella2 bolas_Acertadas estrellas_Acertadas
Fecha
2020-10-13 5 14 38 41 46 1 10 0 1
2020-09-10 11 15 35 41 50 5 8 1 0
2020-06-10 4 21 36 41 47 9 11 0 0
2020-02-10 6 12 15 40 45 3 9 0 0
2020-09-29 4 14 16 41 44 11 12 0 1
... ... ... ... ... ... ... ... ... ...
2004-12-03 15 24 28 44 47 4 5 0 0
2004-05-03 4 7 33 37 39 1 5 0 1
2004-02-27 14 18 19 31 37 4 5 0 0
2004-02-20 7 13 39 47 50 2 5 1 0
2004-02-13 16 29 32 36 41 7 9 0 0
1363 rows × 9 columns
Now I need to see in each and every row of the df "sorteos_anteriores" is in one of the all row from the first df, "tabla_aciertos" .
Let me give you one example,
Inmagine in "sorteos_anteriores" you have in:
2019-11-2 in the column "bolas_Acertadas"= 5 and "estrellas_Acertadas= 1". Now you go to fist table, "tabla_aciertos" and you find that in (index 2 = "Numeros_acertados" = 5 and Estrellas_acertadas=1) . You have won a second (index=2) class prize. You should create a new column "Prize" in "sorteos_anteriores" and in each row write a number from 1 to 13 if you have some kind of prize of 0 or Nan if you not.
I have try :
sorteos_anteriores ['categorias'] = sorteos_anteriores(sorteos_anteriores.loc[:,'bolas_Acertadas':'estrellas_Acertadas'] == tabla_premios.iloc[ : ,0:2])
Also with where and merge, but nothing works.
Thanks for your help.
Thanks to Cuina Max I could do it.
answer here

# supposing that the indexes, starting from one, correspond to the the premiums
categoria_de_premios['Categoria'] = df.index
# Merge using pd.merge and the appropriate arguments
sorteos_anteriores = (sorteos_anteriores.merge(
categoria_de_premios,
how='outer',
left_on=['bolas_Acertadas','estrellas_Acertadas'],
right_on=['Numeros_acertados', 'Estrellas_acertadas']
)).drop(columns=['Numeros_acertados', 'Estrellas_acertadas'])

Conditional filling of column based on string

I have a dataset which I have to fill conditional or dropping the conditional rows. But, I am still unsuccessful.
Idx Fruits Days Name
0 60 20
1 15 85.5
2 10 62 Peter
3 40 90 Maria
4 5 10.2
5 92 66
6 65 87 John
7 50 1 Eric
8 50 0 Maria
9 80 87 John
Now, I have some empty cells. I can fill with fillna or regex or can drop empty cells.
I want only first starting cells until the string starts, either dropping or filling with "."
Like below
Idx Fruits Days Name
0 60 20 .
1 15 85.5 .
2 10 62 Peter
3 40 90 Maria
4 5 10.2
5 92 66
6 65 87 John
7 50 1 Eric
8 50 0 Maria
9 80 87 John
and
Idx Fruits Days Name
2 10 62 Peter
3 40 90 Maria
4 5 10.2
5 92 66
6 65 87 John
7 50 1 Eric
8 50 0 Maria
9 80 87 John
Is there any possibility using pandas? or any looping?

You can try this:
df['Name'] = df['Name'].replace('', np.nan)
df['Name'] = df['Name'].where(df['Name'].ffill().notna(), '.')
print(df)
Idx Fruits Days Name
0 0 60 20.0 .
1 1 15 85.5 .
2 2 10 62.0 Peter
3 3 40 90.0 Maria
4 4 5 10.2
5 5 92 66.0
6 6 65 87.0 John
7 7 50 1.0 Eric
8 8 50 0.0 Maria
9 9 80 87.0 John

Comparing Two Data Frames in python

I have two data frames. I have to compare the two data frames and get the position of the unmatched data using python.
Note:
The First column will always not be unique.
Data Frame 1:
0 1 2 3 4
0 1 Dhoni 24 Kota 60000.0
1 2 Raina 90 Delhi 41500.0
2 3 Kholi 67 Ahmedabad 20000.0
3 4 Ashwin 45 Bhopal 8500.0
4 5 Watson 64 Mumbai 6500.0
5 6 KL Rahul 19 Indore 4500.0
6 7 Hardik 24 Bengaluru 1000.0
Data Frame 2
0 1 2 3 4
0 3 Kholi 67 Ahmedabad 20000.0
1 7 Hardik 24 Bengaluru 1000.0
2 4 Ashwin 45 Bhopal 8500.0
3 2 Raina 90 Delhi 41500.0
4 6 KL Rahul 19 Chennai 4500.0
5 1 Dhoni 24 Kota 60000.0
6 5 Watson 64 Mumbai 6500.0
I expect the output of (3,5)-(Indore - Chennai).

df1=pd.DataFrame({'A':['Dhoni','Raina','KL Rahul'],'B':[24,90,67],'C':['Kota','Delhi','Indore'],'D':[6000.0,41500.0,4500.0]})
df2=pd.DataFrame({'A':['Dhoni','Raina','KL Rahul'],'B':[24,90,67],'C':['Kota','Delhi','Chennai'],'D':[6000.0,41500.0,4500.0]})
df1['df']='df1'
df2['df']='df2'
df=pd.concat([df1,df2],sort=False).drop_duplicates(subset=['A','B','C','D'],keep=False)
print(df)
A B C D df
2 KL Rahul 67 Indore 4500.0 df1
2 KL Rahul 67 Chennai 4500.0 df2
I have added df column to show, from which df difference comes from

python - add indicator for 10 most recent dates

I'm using Python, and I have data with a team name and dates of games that have been played, it looks something like this (except there are a few hundred rows):
team date
0 TOR 2016/10/15
1 LAK 2016/10/20
2 CGY 2016/11/03
3 BUF 2016/10/30
4 PIT 2016/10/27
5 CHI 2016/11/05
6 VAN 2016/10/20
7 BUF 2016/10/16
8 STL 2016/10/13
9 BUF 2016/10/29
10 MIN 2016/10/29
11 PIT 2016/11/05
12 CHI 2016/10/18
13 BOS 2016/10/29
14 PIT 2016/10/20
15 COL 2016/10/20
16 MTL 2016/10/20
17 MTL 2016/11/05
18 BOS 2016/11/03
19 EDM 2016/11/05
20 NSH 2016/11/01
I would like to add indicator columns to show which are the most recent 10 games for each team, as well as the most recent 5 games for each team. With a 1 if they are in this group, and a 0 if they are not.
I'm stumped. Any ideas would be much appreciated!

I think you can use SeriesGroupBy.nsmallest with numpy.where for selecting indices by isin:
df.date = pd.to_datetime(df.date)
#in real data use nsmallest(10)
idx = df.groupby('team')['date'].nsmallest(2).index.get_level_values(1)
df['indicator'] = np.where(df.index.isin(idx), 1, 0)
print (df)
team date indicator
0 TOR 2016-10-15 1
1 LAK 2016-10-20 1
2 CGY 2016-11-03 1
3 BUF 2016-10-30 0
4 PIT 2016-10-27 1
5 CHI 2016-11-05 1
6 VAN 2016-10-20 1
7 BUF 2016-10-16 1
8 STL 2016-10-13 1
9 BUF 2016-10-29 1
10 MIN 2016-10-29 1
11 PIT 2016-11-05 0
12 CHI 2016-10-18 1
13 BOS 2016-10-29 1
14 PIT 2016-10-20 1
15 COL 2016-10-20 1
16 MTL 2016-10-20 1
17 MTL 2016-11-05 1
18 BOS 2016-11-03 1
19 EDM 2016-11-05 1
20 NSH 2016-11-01 1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiple dataframe dates and groups conditions - python

Change the mapper to m=df.loc[(df.F==df.groupby('B').F.transform('min'))&(df.D!=0)].groupby('B').apply(lambda x : x['D'][x['E']==0].mean() if (x['E']==0).any() else x['D'].mean()) df['F']=df.B.map(m)

Related

In Selenium see if the 'a' anchor tag contains the text I want, and then extract multiple td's of text in the same row

How to look for same columns from one dataframe in other dataframe pandas python?

Conditional filling of column based on string

Comparing Two Data Frames in python

python - add indicator for 10 most recent dates

Categories

Resources