The below is a part of a dataframe which consists of football game results.
FTHG stands for "Full time home goals"
FTAG stands for "Full time away goals"
Date HomeTeam AwayTeam FTHG FTAG FTR
14/08/93 Arsenal Coventry 0 3 A
14/08/93 Aston Villa QPR 4 1 H
16/08/93 Tottenham Arsenal 0 1 A
17/08/93 Everton Man City 1 0 H
21/08/93 QPR Southampton 2 1 H
21/08/93 Sheffield Arsenal 0 1 A
24/08/93 Arsenal Leeds 2 1 H
24/08/93 Man City Blackburn 0 2 A
28/08/93 Arsenal Everton 2 0 H
I want to create a code in python that calculates a rolling sum (for ex. 3) of the goals scored by each team regardless if the team was home or visitor.
The groupby method does half the job. Say "a" is a variable and "df" is dataframe
a = df.groupby("HomeTeam")["FTHG"].rolling(3).sum()
The result be something like that:
FTHG
Arsenal NaN
NaN
4.0
.....
However I would like the code to take into account also the goals when Arsenal was visiting team. Respectively to produce a column (it should not be called FTHG but to be some new column)
Arsenal NaN
NaN
2
4
5
Ideas will be much appreciated
you can combine those columns together and then apply groupby
tmp1 = df[['Date','HomeTeam', 'FTHG']]
tmp2 = df[['Date','AwayTeam', 'FTAG']]
tmp1.columns = ['Date','name', 'score']
tmp2.columns = ['Date','name', 'score']
tmp = pd.concat([tmp1,tmp2])
tmp.sort_values(by='Date').groupby("name")["score"].rolling(3).sum()
name
Arsenal 0 NaN
2 NaN
5 2.0
6 4.0
8 5.0
Related
I have the following two data frames.
df_1:
unique_id amount
1 NaN
2 5
df_2:
unique_id amount city email
1 90 Kansas True
2 100 Miami False
3 NaN Kent True
4 123 Newport True
I would like to only update the amount column where unique_id is 1 or 2 or any other that might match on unique_id. The output should be:
unique_id amount city email
1 NaN Kansas True
2 5 Miami False
3 NaN Kent True
4 123 Newport True
I've tried merging and contacting but I am not getting the desired result. I just want an idea of what the best approach is when two data frames are of different sizes and want to update certain column values. Any guidance is greatly appreciated
Try with mask:
df_2['amount'] = df_2['amount'].mask(df_2['unique_id'].isin(df_1['unique_id']),
df_2['unique_id'].map(df_1.set_index('unique_id')['amount'])
)
Output:
unique_id amount city email
0 1 NaN Kansas True
1 2 5.0 Miami False
2 3 NaN Kent True
3 4 123.0 Newport True
I don't know if this is possible but I have a data frame like this one:
df
State County Homicides Man Woman Not_Register
Gto Celaya 2 2 0 0
NaN NaN 8 4 2 2
NaN NaN 3 2 1 0
NaN Yiriria 2 1 1 0
Nan Acambaro 1 1 0 0
Sin Culiacan 3 1 1 1
NaN Nan 5 4 0 1
Chih Juarez 1 1 0 0
I want to group by State, County, Man Women, Homicides and Not Register. Like this:
State County Homicides Man Woman Not_Register
Gto Celaya 13 8 3 2
Gto Yiriria 2 1 1 0
Gto Acambaro 1 1 0 0
Sin Culiacan 8 5 1 2
Chih Juarez 1 1 0 0
So far, I been able to group by State and County and fill the rows with NaN with the right name of the county and State. My result and code:
import numpy as np
import math
df = df.fillna(method ='pad') #To repeat the name of the State and County with the right order
#To group
df = df.groupby(["State","County"]).agg('sum')
df =df.reset_index()
df
State County Homicides
Gto Celaya 13
Gto Yiriria 2
Gto Acambaro 1
Sin Culiacan 8
Chih Juarez 1
But When I tried to add the Men and woman
df1 = df.groupby(["State","County", "Man", "Women", "Not_Register"]).agg('sum')
df1 =df.reset_index()
df1
My result is repeating the Counties not giving me a unique County for State,
How can I resolve this issue?
Thanks for your help
Change to
df[['Homicides','Man','Woman','Not_Register']]=df[['Homicides','Man','Woman','Not_Register']].apply(pd.to_numeric,errors = 'coerce')
df = df.groupby(['State',"County"]).sum().reset_index()
I'm sorry if this has been asked but I can't find another question like this.
I have a data frame in Pandas like this:
Home Away Home_Score Away_Score
MIL NYC 1 2
ATL NYC 1 3
NYC PHX 2 1
HOU NYC 1 6
I want to calculate the moving average for each team, but the catch is that I want to do it for all of their games, both home and away combined.
So for a moving average window of size 3 for 'NYC' the answer should be (2+3+2)/3 for row 1 and then (3+2+6)/3 for row 2, etc.
You can exploid stack to convert the two columns into one and groupby:
(df[['Home_Score','Away_Score']]
.stack()
.groupby(df[['Home','Away']].stack().values)
.rolling(3).mean()
.reset_index(level=0, drop=True)
.unstack()
.add_prefix('Avg_')
)
Output:
Avg_Away_Score Avg_Home_Score
0 NaN NaN
1 NaN NaN
2 NaN 2.333333
3 3.666667 NaN
I have a df like this,
case step deep value
0 case 1 1 ram in India ram,cricket
1 NaN 2 ram plays cricket NaN
2 case 2 1 ravi played football ravi
3 NaN 2 ravi works welll NaN
4 case 3 1 Sri bought a car sri
5 NaN 2 sri went out NaN
and a dictionary, my_dict = {ram:1,cricket:1,ravi:2.5,sri:1}
I am trying to re-order the dataframe according to the values of the dictionary, I achieved this dictionary using tfidf method. I face difficulty in re-ordering as we need to re-order the rows including with the values.
My expected output is,
case step deep value
2 case 2 1 ravi played football ravi
3 NaN 2 ravi works welll NaN
0 case 1 1 ram in India ram,cricket
1 NaN 2 ram plays cricket NaN
4 case 3 1 Sri bought a car sri
5 NaN 2 sri went out NaN
Please help, thanks in advance!
You can create MultiIndex for sorting, only is necessary values from column value are in my_dict:
my_dict = {'ram':1,'cricket':1,'ravi':2.5,'sri':1}
#create DataFrame from value column, replace and sum columns
a = df['value'].str.split(',', expand=True).replace(my_dict).sum(axis=1)
#create groups
b = df['step'].diff().le(0).cumsum()
#create Series by summing per groups
c = a.groupby(b).transform('sum')
#create MultiIndex
df.index = [c,b]
print (df)
case step deep value
step
2.0 0 case 1 1 ram in India ram,cricket
0 NaN 2 ram plays cricket NaN
2.5 1 case 2 1 ravi played football ravi
1 NaN 2 ravi works welll NaN
1.0 2 case 3 1 Sri bought a car sri
2 NaN 2 sri went out NaN
#sorting MultiIndex and removing
df = df.sort_index(ascending=False).reset_index(drop=True)
print (df)
case step deep value
0 case 2 1 ravi played football ravi
1 NaN 2 ravi works welll NaN
2 case 1 1 ram in India ram,cricket
3 NaN 2 ram plays cricket NaN
4 case 3 1 Sri bought a car sri
5 NaN 2 sri went out NaN
I have a dataset like this:
Country Name Match Result
US Martin Win 3
US Martin Lose 1
US Martin Draw 5
UK Luther Win 5
UK Luther Draw 3
I'd like to add two more columns with sum result from Win, Lose and Draw, and percentage of each match like this:
Country Name Match Result All Percentage
US Martin Win 3 8 0.375
US Martin Lose 1 8 0.125
US Martin Draw 5 8 0.625
UK Luther Win 6 10 0.6
UK Luther Draw 4 10 0.4
I've already tried using groupby and got result for size total match. However I don't know how to put it in the next column.
Thank you
IIUC you need GroupBy.transform, sample DataFrame was changed:
df['All'] = df.groupby(['Country','Name'])['Result'].transform('sum')
df['Percentage'] = df.Result.div(df.All)
print (df)
Country Name Match Result All Percentage
0 US Martin Win 2 8 0.250
1 US Martin Lose 1 8 0.125
2 US Martin Draw 5 8 0.625
3 UK Luther Win 6 10 0.600
4 UK Luther Draw 4 10 0.400