Python pandas Dataframe column to rows manipulation [duplicate] - python

This question already has answers here:
Pandas Melt Function
(2 answers)
Closed 1 year ago.
I'm trying to transpose a few columns while keeping the other columns. I'm having a hard time with pivot codes or transpose codes as it doesn't really give me the output I need.
Can anyone help?
I have this data frame:
EmpID
Goal
week 1
week 2
week 3
week 4
1
556
54
33
24
54
2
342
32
32
56
43
3
534
43
65
64
21
4
244
45
87
5
22
My expected dataframe output is:
EmpID
Goal
Weeks
Actual
1
556
week 1
54
1
556
week 2
33
1
556
week 3
24
1
556
week 4
54
and so on until the full employee IDs are listed..

Something like this.
# Python - melt DF
import pandas as pd
d = {'Country Code': [1960, 1961, 1962, 1963, 1964, 1965, 1966],
'ABW': [2.615300, 2.734390, 2.678430, 2.929920, 2.963250, 3.060540, 4.349760],
'AFG': [0.249760, 0.218480, 0.210840, 0.217240, 0.211410, 0.209910, 0.671330],
'ALB': ['NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 1.12214]}
df = pd.DataFrame(data=d)
print(df)
df1 = (df.melt(['Country Code'], var_name='Year', value_name='Econometric_Metric')
.sort_values(['Country Code','Year'])
.reset_index(drop=True))
print(df1)
df2 = (df.set_index(['Country Code'])
.stack(dropna=False)
.reset_index(name='Econometric_Metric')
.rename(columns={'level_1':'Year'}))
print(df2)
# BEFORE
ABW AFG ALB Country Code
0 2.61530 0.24976 NaN 1960
1 2.73439 0.21848 NaN 1961
2 2.67843 0.21084 NaN 1962
3 2.92992 0.21724 NaN 1963
4 2.96325 0.21141 NaN 1964
5 3.06054 0.20991 NaN 1965
6 4.34976 0.67133 1.12214 1966
# AFTER
Country Code Year Econometric_Metric
0 1960 ABW 2.6153
1 1960 AFG 0.24976
2 1960 ALB NaN
3 1961 ABW 2.73439
4 1961 AFG 0.21848
5 1961 ALB NaN
6 1962 ABW 2.67843
7 1962 AFG 0.21084
8 1962 ALB NaN
9 1963 ABW 2.92992
10 1963 AFG 0.21724
11 1963 ALB NaN
12 1964 ABW 2.96325
13 1964 AFG 0.21141
14 1964 ALB NaN
15 1965 ABW 3.06054
16 1965 AFG 0.20991
17 1965 ALB NaN
18 1966 ABW 4.34976
19 1966 AFG 0.67133
20 1966 ALB 1.12214
Country Code Year Econometric_Metric
0 1960 ABW 2.6153
1 1960 AFG 0.24976
2 1960 ALB NaN
3 1961 ABW 2.73439
4 1961 AFG 0.21848
5 1961 ALB NaN
6 1962 ABW 2.67843
7 1962 AFG 0.21084
8 1962 ALB NaN
9 1963 ABW 2.92992
10 1963 AFG 0.21724
11 1963 ALB NaN
12 1964 ABW 2.96325
13 1964 AFG 0.21141
14 1964 ALB NaN
15 1965 ABW 3.06054
16 1965 AFG 0.20991
17 1965 ALB NaN
18 1966 ABW 4.34976
19 1966 AFG 0.67133
20 1966 ALB 1.12214
Also, take a look at the link below, for more info.
https://www.dataindependent.com/pandas/pandas-melt/

Related

python pandas add new column with values grouped count

I want to add a new column with the number of times the points were over 700 and after the year 2014.
import pandas as pd
ipl_data = {'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
grouped = df.groupby('Year')
df.loc[(df['Points'] > 700) & (df['Year'] > 2014), 'High_points'] = df['Points']
#df['Point_per_year_gr_700']=df.groupby(by='Year')['Points'].transform('count')
df['Point_per_year_gr_700']=grouped['Points'].agg(np.size))
the end dataframe should look like this, but I cant get the 'Point_per_year_gr_700' right
Year Points Point_per_year_gr_700 High_points
0 2014 876 NaN
1 2015 789 3 789.0
2 2014 863 NaN
3 2015 673 NaN
4 2014 741 NaN
5 2015 812 3 812.0
6 2016 756 1 756.0
7 2017 788 1 788.0
8 2016 694 NaN
9 2014 701 NaN
10 2015 804 3 804.0
11 2017 690 NaN
Use where to mask the DataFrame to NaN where your condition isn't met. You can use this to create the High_points column and also to exclude rows that shouldn't count when you groupby year and find how many rows satisfy High_points each year.
df['High_points'] = df['Points'].where(df['Year'].gt(2014) & df['Points'].gt(700))
df['ppy_gt700'] = (df.where(df['High_points'].notnull())
.groupby('Year')['Year'].transform('size'))
Year Points High_Points ppy_gt700
0 2014 876 NaN NaN
1 2015 789 789.0 3.0
2 2014 863 NaN NaN
3 2015 673 NaN NaN
4 2014 741 NaN NaN
5 2015 812 812.0 3.0
6 2016 756 756.0 1.0
7 2017 788 788.0 1.0
8 2016 694 NaN NaN
9 2014 701 NaN NaN
10 2015 804 804.0 3.0
11 2017 690 NaN NaN

Set "Year" column to individual columns to create a panel

I am trying to reshape the following dataframe such that it is in panel data form by moving the "Year" column such that each year is an individual column.
Out[34]:
Award Year 0
State
Alabama 2003 89
Alabama 2004 92
Alabama 2005 108
Alabama 2006 81
Alabama 2007 71
... ...
Wyoming 2011 4
Wyoming 2012 2
Wyoming 2013 1
Wyoming 2014 4
Wyoming 2015 3
[648 rows x 2 columns]
I want the years to each be individual columns, this is an example,
Out[48]:
State 2003 2004 2005 2006
0 NewYork 10 10 10 10
1 Alabama 15 15 15 15
2 Washington 20 20 20 20
I have read up on stack/unstack but I don't think I want a multilevel index as a result. I have been looking through the documentation at to_frame etc. but I can't see what I am looking for.
If anyone can help that would be great!
Use set_index with append=True then select the column 0 and use unstack to reshape:
df = df.set_index('Award Year', append=True)['0'].unstack()
Result:
Award Year 2003 2004 2005 2006 2007 2011 2012 2013 2014 2015
State
Alabama 89.0 92.0 108.0 81.0 71.0 NaN NaN NaN NaN NaN
Wyoming NaN NaN NaN NaN NaN 4.0 2.0 1.0 4.0 3.0
Pivot Table can help.
df2 = pd.pivot_table(df,values='0', columns='AwardYear', index=['State'])
df2
Result:
AwardYear 2003 2004 2005 2006 2007 2011 2012 2013 2014 2015
State
Alabama 89.0 92.0 108.0 81.0 71.0 NaN NaN NaN NaN NaN
Wyoming NaN NaN NaN NaN NaN 4.0 2.0 1.0 4.0 3.0

Merge 2 data frames based on 2 condition in pandas

How to replace the NaN values with the values from the first df:
country sex year cancer
0 Albania female 2000 32
1 Albania male 2000 58
2 Antigua female 2000 2
3 Antigua male 2000 5
4 Argen female 2000 591
5 Argen male 2000 2061
in the second df:
country year sex cancer
0 Albania 1985 female NaN
1 Albania 1985 male NaN
2 Albania 1986 female NaN
3 Albania 1986 male NaN
4 Albania 1987 female 25.0
5 Antigua 1992 male NaN
6 Antigua 1985 female NaN
the final should look like:
country year sex cancer
0 Albania 1985 female 32
1 Albania 1985 male 58
2 Albania 1986 female 32
3 Albania 1986 male 58
4 Albania 1987 female 25
5 Antigua 1992 male 5
6 Antigua 1985 female 2
Important are 2 conditions Country and Sex
I am end up using fillna
df2.set_index(['country','sex'],inplace=True)
df2['cancer']=df2['cancer'].fillna(df1.set_index(['country','sex']).cancer)
df2.reset_index(inplace=True)
df2
Out[745]:
country sex year cancer
0 Albania female 1985 32.0
1 Albania male 1985 58.0
2 Albania female 1986 32.0
3 Albania male 1986 58.0
4 Albania female 1987 25.0
5 Antigua male 1992 5.0
6 Antigua female 1985 2.0

Transpose and widen Data

My panda data frame looks like as follows:
Country Code 1960 1961 1962 1963 1964 1965 1966 1967 1968 ... 2015
ABW 2.615300 2.734390 2.678430 2.929920 2.963250 3.060540 ... 4.349760
AFG 0.249760 0.218480 0.210840 0.217240 0.211410 0.209910 ... 0.671330
ALB NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1.12214
...
How can I transpose it that it looks like as follows?
Country_Code Year Econometric_Metric
ABW 1960 2.615300
ABW 1961 2.734390
ABW 1962 2.678430
...
ABW 2015 4.349760
AFG 1960 0.249760
AFG 1961 0.218480
AFG 1962 0.210840
...
AFG 2015 0.671330
ALB 1960 NaN
ALB 1961 NaN
ALB 1962 NaN
ALB 2015 1.12214
...
Thanks.
I think need melt with sort_values:
df = (df.melt(['Country Code'], var_name='Year', value_name='Econometric_Metric')
.sort_values(['Country Code','Year'])
.reset_index(drop=True))
Or set_index with stack:
df = (df.set_index(['Country Code'])
.stack(dropna=False)
.reset_index(name='Econometric_Metric')
.rename(columns={'level_1':'Year'}))
print (df.head(10))
Country Code Year Econometric_Metric
0 ABW 1960 2.61530
1 ABW 1961 2.73439
2 ABW 1962 2.67843
3 ABW 1963 2.92992
4 ABW 1964 2.96325
5 ABW 1965 3.06054
6 ABW 1966 NaN
7 ABW 1967 NaN
8 ABW 1968 NaN
9 ABW 2015 4.34976

Pandas merge 2 databases based on 2 keys

I am trying to merge 2 pandas df's:
ISO3 Year Calories
AFG 1960 2300
AFG 1961 2323
...
USA 2005 2800
USA 2006 2828
and
ISO3 Year GDP
AFG 1980 3600
AFG 1981 3636
...
USA 2049 10000
USA 2050 10100
I have tried pd.merge(df1,df2,on=['ISO3','Year'],how=outer) and many others, but for some reason it does not work, any help?
pd.merge(df1, df2, on=['ISO3', 'Year'], how='outer') should work just fine. Can you modify the example, or post a concrete example of df1, df2 which is not yielding the correct result?
In [58]: df1 = pd.read_table('data', sep='\s+')
In [59]: df1
Out[59]:
ISO3 Year Calories
0 AFG 1960 2300
1 AFG 1961 2323
2 USA 2005 2800
3 USA 2006 2828
In [60]: df2 = pd.read_table('data2', sep='\s+')
In [61]: df2
Out[61]:
ISO3 Year GDP
0 AFG 1980 3600
1 AFG 1981 3636
2 USA 2049 10000
3 USA 2050 10100
In [62]: pd.merge(df1, df2, on=['ISO3', 'Year'], how='outer')
Out[62]:
ISO3 Year Calories GDP
0 AFG 1960 2300 NaN
1 AFG 1961 2323 NaN
2 USA 2005 2800 NaN
3 USA 2006 2828 NaN
4 AFG 1980 NaN 3600
5 AFG 1981 NaN 3636
6 USA 2049 NaN 10000
7 USA 2050 NaN 10100

Categories