Pull specific values from one dataframe based on values in another

Pull specific values from one dataframe based on values in another - python

I have two dataframes
df1:
Country
value
Average
Week Rank
UK
42
42
1
US
9
9.5
2
DE
10
9.5
3
NL
15
15.5
4
ESP
16
15.5
5
POL
17
18
6
CY
18
18
7
IT
20
18
8
AU
17
18
9
FI
18
18
10
SW
20
18
11
df2:
Country
value
Average
Year Rank
US
42
42
1
UK
9
9.5
2
ESP
10
9.5
3
SW
15
15.5
4
IT
16
15.5
5
POL
17
18
6
NO
18
18
7
SL
20
18
8
PO
17
18
9
FI
18
18
10
NL
20
18
11
DE
17
18
12
AU
18
18
13
CY
20
18
14
Im looking to create a column in df1 that shows the 'Year Rank' of the countries in df1 so that I have the following:
Country
value
Average
Week Rank
Year Rank
UK
42
42
1
2
US
9
9.5
2
1
DE
10
9.5
3
9
NL
15
15.5
4
8
ESP
16
15.5
5
3
POL
17
18
6
6
CY
18
18
7
7
IT
20
18
8
5
AU
17
18
9
13
FI
18
18
10
10
SW
20
18
11
4
How would i loop through the countries in df1 and find the corresponding rank in df2?
Edit: I am only looking for the yearly rank of the countries in df1
Thanks!

Use:
df1['Year Rank'] = df1.merge(df2, on='Country')['Year Rank']

Related

combining specific row conditionally and add output to existing row in pandas

suppose I have following data frame :
data = {'age' :[10,11,12,11,11,10,11,13,13,13,14,14,15,15,15],
'num1':[10,11,12,13,14,15,16,17,18,19,20,21,22,23,24],
'num2':[20,21,22,23,24,25,26,27,28,29,30,31,32,33,34]}
df = pd.DataFrame(data)
I want to sum rows for age 14 and 15 and keep those new values as age 14. my expected output would be like this:
age time1 time2
1 10 10 20
2 11 11 21
3 12 12 22
4 11 13 23
5 11 14 24
6 10 15 25
7 11 16 26
8 13 17 27
9 13 18 28
10 13 19 29
11 14 110 160
in the code below, I have tried to group.by age but it does not work for me:
df1 =df.groupby(age[age >=14])['num1', 'num2'].apply(', '.join).reset_index(drop=True).to_frame()

limit_age = 14
new = df.query("age < #limit_age").copy()
new.loc[len(new)] = [limit_age,
*df.query("age >= #limit_age").drop(columns="age").sum()]
first get the "before 14" dataframe
then assign it to a new row where
age is 14
other values are the row-wise sums of "after 14" dataframe
to get
>>> new
age num1 num2
0 10 10 20
1 11 11 21
2 12 12 22
3 11 13 23
4 11 14 24
5 10 15 25
6 11 16 26
7 13 17 27
8 13 18 28
9 13 19 29
10 14 110 160
(new.index += 1 can be used for a 1-based index at the end.)

I would use a mask and concat:
m = df['age'].isin([14, 15])
out = pd.concat([df[~m],
df[m].agg({'age': 'min', 'num1': 'sum', 'num2': 'sum'})
.to_frame().T
], ignore_index=True)
Output:
age num1 num2
0 10 10 20
1 11 11 21
2 12 12 22
3 11 13 23
4 11 14 24
5 10 15 25
6 11 16 26
7 13 17 27
8 13 18 28
9 13 19 29
10 14 110 160

create dataframe with increasing numbers in python

I want to create the following dataframe: n is the number of rows, and m is the columns.
In R, this would be generated by:
ia=array((1:m),c(m,n))
But I do not know how i can achieve the same in python.
Kind regards,

Use numpy.broadcast_to with DataFrame constructor:
m = 24
n = 13
df = pd.DataFrame(np.broadcast_to(np.arange(1, m + 1)[:, None], (m, n)))
print (df)
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1 1 1 1 1 1 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2 2 2 2 2 2 2
2 3 3 3 3 3 3 3 3 3 3 3 3 3
3 4 4 4 4 4 4 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5 5 5 5 5 5 5
5 6 6 6 6 6 6 6 6 6 6 6 6 6
6 7 7 7 7 7 7 7 7 7 7 7 7 7
7 8 8 8 8 8 8 8 8 8 8 8 8 8
8 9 9 9 9 9 9 9 9 9 9 9 9 9
9 10 10 10 10 10 10 10 10 10 10 10 10 10
10 11 11 11 11 11 11 11 11 11 11 11 11 11
11 12 12 12 12 12 12 12 12 12 12 12 12 12
12 13 13 13 13 13 13 13 13 13 13 13 13 13
13 14 14 14 14 14 14 14 14 14 14 14 14 14
14 15 15 15 15 15 15 15 15 15 15 15 15 15
15 16 16 16 16 16 16 16 16 16 16 16 16 16
16 17 17 17 17 17 17 17 17 17 17 17 17 17
17 18 18 18 18 18 18 18 18 18 18 18 18 18
18 19 19 19 19 19 19 19 19 19 19 19 19 19
19 20 20 20 20 20 20 20 20 20 20 20 20 20
20 21 21 21 21 21 21 21 21 21 21 21 21 21
21 22 22 22 22 22 22 22 22 22 22 22 22 22
22 23 23 23 23 23 23 23 23 23 23 23 23 23
23 24 24 24 24 24 24 24 24 24 24 24 24 24
df = df.rename(index = lambda x: x+1, columns=lambda x: x+1)
print (df)
1 2 3 4 5 6 7 8 9 10 11 12 13
1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9 9 9 9 9
10 10 10 10 10 10 10 10 10 10 10 10 10 10
11 11 11 11 11 11 11 11 11 11 11 11 11 11
12 12 12 12 12 12 12 12 12 12 12 12 12 12
13 13 13 13 13 13 13 13 13 13 13 13 13 13
14 14 14 14 14 14 14 14 14 14 14 14 14 14
15 15 15 15 15 15 15 15 15 15 15 15 15 15
16 16 16 16 16 16 16 16 16 16 16 16 16 16
17 17 17 17 17 17 17 17 17 17 17 17 17 17
18 18 18 18 18 18 18 18 18 18 18 18 18 18
19 19 19 19 19 19 19 19 19 19 19 19 19 19
20 20 20 20 20 20 20 20 20 20 20 20 20 20
21 21 21 21 21 21 21 21 21 21 21 21 21 21
22 22 22 22 22 22 22 22 22 22 22 22 22 22
23 23 23 23 23 23 23 23 23 23 23 23 23 23
24 24 24 24 24 24 24 24 24 24 24 24 24 24

You can use np.repeat or np.tile
n = 5 # 13
m = 8 # 24
# Enhanced by #mozway
df = pd.DataFrame(np.tile(np.arange(1, m+1),(n, 1)).T)
# OR
df = pd.DataFrame(np.repeat(np.arange(1, m+1), m).reshape(-1, m))
print(df)
# Output
0 1 2 3 4
0 1 1 1 1 1
1 2 2 2 2 2
2 3 3 3 3 3
3 4 4 4 4 4
4 5 5 5 5 5
5 6 6 6 6 6
6 7 7 7 7 7
7 8 8 8 8 8

Create a new column in pandas dataframe based on multiple conditions

I have a dataframe like the one below, and I have to create a new column year_val that is equal to the values of col2016 through col2019 based on the Years column, so that the value for year_val will be the value of col#### when Years is equal to the suffix of col####
import pandas as pd
sampleDF = pd.DataFrame({'Years':[2016,2016,2017,2017,2018,2018,2019,2019],
'col2016':[1,2,3,4,5,6,7,8],
'col2017':[9,10,11,12,13,14,15,16],
'col2018':[17,18,19,20,21,22,23,24],
'col2019':[25,26,27,28,29,30,31,32]})
sampleDF['year_val'] = ?????

Use DataFrame.lookup with change values in Years column with prepend col and cast to string:
sampleDF['year_val'] = sampleDF.lookup(sampleDF.index, 'col' + sampleDF['Years'].astype(str))
print (sampleDF)
Years col2016 col2017 col2018 col2019 year_val
0 2016 1 9 17 25 1
1 2016 2 10 18 26 2
2 2017 3 11 19 27 11
3 2017 4 12 20 28 12
4 2018 5 13 21 29 21
5 2018 6 14 22 30 22
6 2019 7 15 23 31 31
7 2019 8 16 24 32 32
EDIT: If check definition of lookup function:
result = [df.get_value(row, col) for row, col in zip(row_labels, col_labels)]
you can modify it with try-except statement with Series.at for prevent:
FutureWarning: get_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead
oup.append(sampleDF.at[row, col] )
sampleDF = pd.DataFrame({'Years':[2015,2016,2017,2017,2018,2018,2019,2019],
'col2016':[1,2,3,4,5,6,7,8],
'col2017':[9,10,11,12,13,14,15,16],
'col2018':[17,18,19,20,21,22,23,24],
'col2019':[25,26,27,28,29,30,31,32]})
print (sampleDF)
Years col2016 col2017 col2018 col2019
0 2015 1 9 17 25
1 2016 2 10 18 26
2 2017 3 11 19 27
3 2017 4 12 20 28
4 2018 5 13 21 29
5 2018 6 14 22 30
6 2019 7 15 23 31
7 2019 8 16 24 32
out= []
for row, col in zip(sampleDF.index, 'col' + sampleDF['Years'].astype(str)):
try:
out.append(sampleDF.at[row, col] )
except KeyError:
out.append(np.nan)
sampleDF['year_val'] = out
print (sampleDF)
Years col2016 col2017 col2018 col2019 year_val
0 2015 1 9 17 25 NaN
1 2016 2 10 18 26 2.0
2 2017 3 11 19 27 11.0
3 2017 4 12 20 28 12.0
4 2018 5 13 21 29 21.0
5 2018 6 14 22 30 22.0
6 2019 7 15 23 31 31.0
7 2019 8 16 24 32 32.0

Sum values of a column for each value based on another column and divide it by total

Today I'm struggling once again with Python and data-analytics.
I got a dataframe which looks like this:
name totdmgdealt
0 Warwick 96980.0
1 Nami 25995.0
2 Draven 171568.0
3 Fiora 113721.0
4 Viktor 185302.0
5 Skarner 148791.0
6 Galio 130692.0
7 Ahri 145731.0
8 Jinx 182680.0
9 VelKoz 85785.0
10 Ziggs 46790.0
11 Cassiopeia 62444.0
12 Yasuo 117896.0
13 Warwick 129156.0
14 Evelynn 179252.0
15 Caitlyn 163342.0
16 Wukong 122919.0
17 Syndra 146754.0
18 Karma 35766.0
19 Warwick 117790.0
20 Draven 74879.0
21 Janna 11242.0
22 Lux 66424.0
23 Amumu 87826.0
24 Vayne 76085.0
25 Ahri 93334.0
..
..
..
this is a dataframe, which includes the total damage of a champion for one game.
Now I want to group these information, so I can see which champion overall has the most damage dealt.
I tried groupby('name') but it didn't work at all.
I already went through some threads about groupby and summing values, but I didn't solve my specific problem.
The dealt damage of each champion should also be shown as percentage of the total.
I'm looking for something like this as an output:
name totdmgdealt percentage
0 Warwick 2378798098 2.1 %
1 Nami 2837491074 2.3 %
2 Draven 1231451224 ..
3 Fiora 1287301724 ..
4 Viktor 1239808504 ..
5 Skarner 1487911234 ..
6 Galio 1306921234 ..

We can groupby on name and get the sum then we divide each value by the total with .div and multiply it by 100 with .mul and finally round it to one decimal with .round:
total = df['totdmgdealt'].sum()
summed = df.groupby('name', sort=False)['totdmgdealt'].sum().reset_index()
summed['percentage'] = summed.groupby('name', sort=False)['totdmgdealt']\
.sum()\
.div(total)\
.mul(100)\
.round(1).values
name totdmgdealt percentage
0 Warwick 343926.0 12.2
1 Nami 25995.0 0.9
2 Draven 246447.0 8.7
3 Fiora 113721.0 4.0
4 Viktor 185302.0 6.6
5 Skarner 148791.0 5.3
6 Galio 130692.0 4.6
7 Ahri 239065.0 8.5
8 Jinx 182680.0 6.5
9 VelKoz 85785.0 3.0
10 Ziggs 46790.0 1.7
11 Cassiopeia 62444.0 2.2
12 Yasuo 117896.0 4.2
13 Evelynn 179252.0 6.4
14 Caitlyn 163342.0 5.8
15 Wukong 122919.0 4.4
16 Syndra 146754.0 5.2
17 Karma 35766.0 1.3
18 Janna 11242.0 0.4
19 Lux 66424.0 2.4
20 Amumu 87826.0 3.1
21 Vayne 76085.0 2.7

you can use sum() to get the total dmg, and apply to calculate the precent relevant for each row, like this:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""
name totdmgdealt
0 Warwick 96980.0
1 Nami 25995.0
2 Draven 171568.0
3 Fiora 113721.0
4 Viktor 185302.0
5 Skarner 148791.0
6 Galio 130692.0
7 Ahri 145731.0
8 Jinx 182680.0
9 VelKoz 85785.0
10 Ziggs 46790.0
11 Cassiopeia 62444.0
12 Yasuo 117896.0
13 Warwick 129156.0
14 Evelynn 179252.0
15 Caitlyn 163342.0
16 Wukong 122919.0
17 Syndra 146754.0
18 Karma 35766.0
19 Warwick 117790.0
20 Draven 74879.0
21 Janna 11242.0
22 Lux 66424.0
23 Amumu 87826.0
24 Vayne 76085.0
25 Ahri 93334.0"""), sep=r"\s+")
summed_df = df.groupby('name')['totdmgdealt'].agg(['sum']).rename(columns={"sum": "totdmgdealt"}).reset_index()
summed_df['percentage'] = summed_df.apply(
lambda x: "{:.2f}%".format(x['totdmgdealt'] / summed_df['totdmgdealt'].sum() * 100), axis=1)
print(summed_df)
Output:
name totdmgdealt percentage
0 Ahri 239065.0 8.48%
1 Amumu 87826.0 3.12%
2 Caitlyn 163342.0 5.79%
3 Cassiopeia 62444.0 2.21%
4 Draven 246447.0 8.74%
5 Evelynn 179252.0 6.36%
6 Fiora 113721.0 4.03%
7 Galio 130692.0 4.64%
8 Janna 11242.0 0.40%
9 Jinx 182680.0 6.48%
10 Karma 35766.0 1.27%
11 Lux 66424.0 2.36%
12 Nami 25995.0 0.92%
13 Skarner 148791.0 5.28%
14 Syndra 146754.0 5.21%
15 Vayne 76085.0 2.70%
16 VelKoz 85785.0 3.04%
17 Viktor 185302.0 6.57%
18 Warwick 343926.0 12.20%
19 Wukong 122919.0 4.36%
20 Yasuo 117896.0 4.18%
21 Ziggs 46790.0 1.66%

Maybe You can Try this:
I tried to achieve the same using my sample data and try to run the below code into your Jupyter Notebook:
import pandas as pd
name=['abhit','mawa','vaibhav','dharam','sid','abhit','vaibhav','sid','mawa','lakshya']
totdmgdealt=[24,45,80,22,89,55,89,51,93,85]
name=pd.Series(name,name='name') #converting into series
totdmgdealt=pd.Series(totdmgdealt,name='totdmgdealt') #converting into series
data=pd.concat([name,totdmgdealt],axis=1)
data=pd.DataFrame(data) #converting into Dataframe
final=data.pivot_table(values="totdmgdealt",columns="name",aggfunc="sum").transpose() #actual aggregating method
total=data['totdmgdealt'].sum() #calculating total for calculating percentage
def calPer(row,total): #actual Function for Percentage
return ((row/total)*100).round(2)
total=final['totdmgdealt'].sum()
final['Percentage']=calPer(final['totdmgdealt'],total) #assigning the function to the column
final
Sample Data :
name totdmgdealt
0 abhit 24
1 mawa 45
2 vaibhav 80
3 dharam 22
4 sid 89
5 abhit 55
6 vaibhav 89
7 sid 51
8 mawa 93
9 lakshya 85
Output:
totdmgdealt Percentage
name
abhit 79 12.48
dharam 22 3.48
lakshya 85 13.43
mawa 138 21.80
sid 140 22.12
vaibhav 169 26.70
Understand and run the code and just replace the dataset with Yours. Maybe This Helps.

Columns located within a column

I am trying to extract a dataframe from a web api and can't seem to work out how to break columns out. For Home and Away, they have breakdowns inside them, so should read Home Wins, Home Draws etc.
url = "http://api.football-data.org/v1/soccerseasons/398/leagueTable/?matchday=38"
response = requests.get(url)
response_json = response.content
result = json.loads(response_json)
football = pd.DataFrame(result['standing'], columns=['position','teamName','playedGames','wins','draws','losses','goals',
'goalsAgainst','home','away','goalDifference','points'])
football
football.home
this shows the problem:
0 {u'wins': 12, u'losses': 1, u'draws': 6, u'goa...

I think you can use json_normalize:
import pandas as pd
import json
import requests
from pandas.io.json import json_normalize
url = "http://api.football-data.org/v1/soccerseasons/398/leagueTable/?matchday=38"
result = json.loads(requests.get(url).text)
#print (result)
df = json_normalize(result["standing"])
print (df)
_links.team.href away.draws away.goals \
0 http://api.football-data.org/v1/teams/338 6 33
1 http://api.football-data.org/v1/teams/57 7 34
2 http://api.football-data.org/v1/teams/73 7 34
3 http://api.football-data.org/v1/teams/65 7 24
4 http://api.football-data.org/v1/teams/66 4 22
5 http://api.football-data.org/v1/teams/340 6 20
6 http://api.football-data.org/v1/teams/563 7 31
7 http://api.football-data.org/v1/teams/64 4 30
8 http://api.football-data.org/v1/teams/70 5 19
9 http://api.football-data.org/v1/teams/61 5 27
10 http://api.football-data.org/v1/teams/62 9 24
11 http://api.football-data.org/v1/teams/72 5 22
12 http://api.football-data.org/v1/teams/346 3 20
13 http://api.football-data.org/v1/teams/74 8 14
14 http://api.football-data.org/v1/teams/354 6 20
15 http://api.football-data.org/v1/teams/1044 4 22
16 http://api.football-data.org/v1/teams/71 6 25
17 http://api.football-data.org/v1/teams/67 3 12
18 http://api.football-data.org/v1/teams/68 2 13
19 http://api.football-data.org/v1/teams/58 3 13
away.goalsAgainst away.losses away.wins \
0 18 2 11
1 25 4 8
2 20 3 9
3 20 5 7
4 26 8 7
5 19 6 7
6 25 5 7
7 28 7 8
8 31 8 6
9 23 7 7
10 25 5 5
11 32 10 4
12 31 10 6
13 22 7 4
14 28 8 5
15 33 9 6
16 42 10 3
17 41 14 2
18 37 14 3
19 41 15 1
crestURI draws goalDifference \
0 http://upload.wikimedia.org/wikipedia/en/6/63/... 12 32
1 http://upload.wikimedia.org/wikipedia/en/5/53/... 11 29
2 http://upload.wikimedia.org/wikipedia/de/b/b4/... 13 34
3 http://upload.wikimedia.org/wikipedia/de/f/fd/... 9 30
4 http://upload.wikimedia.org/wikipedia/de/d/da/... 9 14
5 http://upload.wikimedia.org/wikipedia/de/c/c9/... 9 18
6 http://upload.wikimedia.org/wikipedia/de/e/e0/... 14 14
7 http://upload.wikimedia.org/wikipedia/de/0/0a/... 12 13
8 http://upload.wikimedia.org/wikipedia/de/a/a3/... 9 -14
9 http://upload.wikimedia.org/wikipedia/de/5/5c/... 14 6
10 http://upload.wikimedia.org/wikipedia/de/f/f9/... 14 4
11 http://upload.wikimedia.org/wikipedia/de/a/ab/... 11 -10
12 https://upload.wikimedia.org/wikipedia/en/e/e2... 9 -10
13 http://upload.wikimedia.org/wikipedia/de/8/8b/... 13 -14
14 http://upload.wikimedia.org/wikipedia/de/b/bf/... 9 -12
15 https://upload.wikimedia.org/wikipedia/de/4/41... 9 -22
16 http://upload.wikimedia.org/wikipedia/de/6/60/... 12 -14
17 http://upload.wikimedia.org/wikipedia/de/5/56/... 10 -21
18 http://upload.wikimedia.org/wikipedia/de/8/8c/... 7 -28
19 http://upload.wikimedia.org/wikipedia/de/9/9f/... 8 -49
goals ... home.goals home.goalsAgainst home.losses home.wins \
0 68 ... 35 18 1 12
1 65 ... 31 11 3 12
2 69 ... 35 15 3 10
3 71 ... 47 21 5 12
4 49 ... 27 9 2 12
5 59 ... 39 22 5 11
6 65 ... 34 26 3 9
7 63 ... 33 22 3 8
8 41 ... 22 24 7 8
9 59 ... 32 30 5 5
10 59 ... 35 30 8 6
11 42 ... 20 20 5 8
12 40 ... 20 19 7 6
13 34 ... 20 26 8 6
14 39 ... 19 23 10 6
15 45 ... 23 34 9 5
16 48 ... 23 20 7 6
17 44 ... 32 24 5 7
18 39 ... 26 30 8 6
19 27 ... 14 35 12 2
losses playedGames points position teamName wins
0 3 38 81 1 Leicester City FC 23
1 7 38 71 2 Arsenal FC 20
2 6 38 70 3 Tottenham Hotspur FC 19
3 10 38 66 4 Manchester City FC 19
4 10 38 66 5 Manchester United FC 19
5 11 38 63 6 Southampton FC 18
6 8 38 62 7 West Ham United FC 16
7 10 38 60 8 Liverpool FC 16
8 15 38 51 9 Stoke City FC 14
9 12 38 50 10 Chelsea FC 12
10 13 38 47 11 Everton FC 11
11 15 38 47 12 Swansea City FC 12
12 17 38 45 13 Watford FC 12
13 15 38 43 14 West Bromwich Albion FC 10
14 18 38 42 15 Crystal Palace FC 11
15 18 38 42 16 AFC Bournemouth 11
16 17 38 39 17 Sunderland AFC 9
17 19 38 37 18 Newcastle United FC 9
18 22 38 34 19 Norwich City FC 9
19 27 38 17 20 Aston Villa FC 3
[20 rows x 22 columns]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pull specific values from one dataframe based on values in another - python

Use: df1['Year Rank'] = df1.merge(df2, on='Country')['Year Rank']

Related

combining specific row conditionally and add output to existing row in pandas

create dataframe with increasing numbers in python

Create a new column in pandas dataframe based on multiple conditions

Sum values of a column for each value based on another column and divide it by total

Columns located within a column

Categories

Resources