Order columns in pandas dataframe - python

I have pandas dataframe as below:
import pandas as pd
import numpy as np
df = pd.DataFrame({'CATEGORY': [1, 1, 2, 2],
'GROUP': ['A', 'A', 'B', 'B'],
'XYZ': [3000, 2500, 3000, 3000],
'VAL': [3000, 2500, 3000, 3000],
'A_CLASS': [3000, 2500, 3000, 3000],
'B_CAL': [3000, 4500, 3000, 1000],
'C_CLASS': [3000, 2500, 3000, 3000],
'A_CAL': [3000, 2500, 3000, 3000],
'B_CLASS': [3000, 4500, 3000, 500],
'C_CAL': [3000, 2500, 3000, 3000],
'ABC': [3000, 2500, 3000, 3000]})
df
CATEGORY GROUP XYZ VAL A_CLASS B_CAL C_CLASS A_CAL B_CLASS C_CAL ABC
1 A 3000 1 3000 3000 3000 3000 3000 3000 3000
1 A 2500 2 2500 4500 2500 2500 4500 2500 2500
2 B 3000 4 3000 3000 3000 3000 3000 3000 3000
2 B 3000 1 3000 1000 3000 3000 500 3000 3000
I want columns in below order in my final dataframe
GROUP, CATEGORY, all columns with suffix "_CAL", all columns with suffix "_CLASS", all other fields
My expected output:
GROUP CATEGORY B_CAL A_CAL C_CAL A_CLASS C_CLASS B_CLASS XYZ VAL ABC
A 1 3000 3000 3000 3000 3000 3000 3000 1 3000
A 1 4500 2500 2500 2500 2500 4500 2500 2 2500
A 1 8000 7000 8000 8000 8000 8000 8000 5 8000
B 2 3000 3000 3000 3000 3000 3000 3000 4 3000
B 2 1000 3000 3000 3000 3000 500 3000 1 3000

Fun with sorted:
first = ['GROUP','CATEGORY']
cols = sorted(df.columns.difference(first),
key=lambda x: (not x.endswith('_CAL'), not x.endswith('_CLASS')))
df[first+cols]
GROUP CATEGORY A_CAL B_CAL C_CAL A_CLASS B_CLASS C_CLASS ABC VAL \
0 A 1 3000 3000 3000 3000 3000 3000 3000 3000
1 A 1 2500 4500 2500 2500 4500 2500 2500 2500
2 B 2 3000 3000 3000 3000 3000 3000 3000 3000
3 B 2 3000 1000 3000 3000 500 3000 3000 3000
XYZ
0 3000
1 2500
2 3000
3 3000
For more details here's a similar one with a detailed explanation

You just need to play with strings
cols = df.columns
cols_sorted = ["GROUP", "CATEGORY"] +\
[col for col in cols if col.endswith('_CAL')] +\
[col for col in cols if col.endswith('_CLASS')]
cols_sorted += sorted([col for col in cols if col not in cols_sorted])
df = df[cols_sorted]

Related

How can I plot st error bars with seaborn relplot?

I am studying different variables. I want to plot the results with the stadnard error.
I use the filter function because depending on what I want to analyse, I am interested in just plotting mineral, or just plotting one material...etc. I mention this because it is important for the error bars. With seaborn it is not possible to plot the error bars (I used the raw data and I introduced in the seaborn function cd='', but it does not work. Therefore, I have calculated the mean and st error in excel and I plot that directly. The table is the result of the average and the st error that I use in the script.
If I add ci in the seaborn, does not do anything. Therefore I want to add it externally in a second line. But I have tried with ax.errorbar(), I cant either plot the st error.
import os
import io
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import FormatStrFormatter
Results = pd.read_excel('results.xlsx',sheet_name='Sheet1',usecols="A:J")
df=pd.DataFrame(Results)
RR_filtered=Results[(Results['Mineral ']=='IC60') | (Results['Mineral ']=='MinFree')]
R_filtered=RR_filtered[(Results['Material']=='A')]
palette = ["#fdae61","#abd9e9"]
sns.set_palette(palette)
ax1=sns.relplot( data=R_filtered,x="Impeller speed (rpm)", y="Result",col="Media size ",hue="Mineral content (g/g fibre)",
palette=palette,size="Media size ",sizes=(50, 200))
R2_filtered=RR_filtered[(Results['Material']=='B')]
ax2=sns.relplot( data=R2_filtered,x="Impeller speed (rpm)", y="Result",col="Media size ",hue="Mineral content (g/g fibre)",
palette=palette,size="Media size ",sizes=(50, 200))
plt.show()
data as image
Media size Material Impeller speed (rpm) Energy input (kWh/t) Mineral Mineral content (g/g fibre) Result ster
1.7 A 400 3000 IC60 4 3.42980002276166 0.21806853183829
1.7 A 650 3000 IC60 4 5.6349292302978 0.63877270588513
1.7 A 900 3000 IC60 4 6.1386616444364 0.150420705145224
1.7 A 1150 3000 IC60 4 5.02677117937851 1.05459146256349
1.7 A 1400 3000 IC60 4 3.0654271029038 0.917937247698497
3 A 400 3000 IC60 4 8.06973541574516 2.07869756201064
3 A 650 3000 IC60 4 4.69110601906018 1.21725878149246
3 A 900 3000 IC60 4 10.2119514553564 1.80680816945106
3 A 1150 3000 IC60 4 7.3271067522139 0.438931805677489
3 A 1400 3000 IC60 4 4.86901883487513 2.04826541508181
1.7 A 400 3000 MinFree 0 1.30614274245145 0.341512517371074
1.7 A 650 3000 MinFree 0 0.80632268273782 0.311762840996982
1.7 A 900 3000 MinFree 0 1.35958635068886 0.360649049944933
1.7 A 1150 3000 MinFree 0 1.38784671261469 0.00524838126778526
1.7 A 1400 3000 MinFree 0 1.12365621425779 0.561737044169193
3 A 400 3000 MinFree 0 4.61104587078813 0.147526557483362
3 A 650 3000 MinFree 0 4.40934493149759 0.985706944001226
3 A 900 3000 MinFree 0 5.06333415444978 0.00165055503033251
3 A 1150 3000 MinFree 0 3.85940865344646 0.731238210429852
3 A 1400 3000 MinFree 0 3.75572328102963 0.275897272330075
3 A 400 3000 GIC 4 6.05239906571977 0.0646300937591957
3 A 650 3000 GIC 4 7.9023202316634 0.458062146361444
3 A 900 3000 GIC 4 6.97774277141699 0.171777036954104
3 A 1150 3000 GIC 4 11.0705742735252 1.3960974547215
3 A 1400 3000 GIC 4 9.37948091546579 0.0650589433632627
1.7 A 869 3000 IC60 4 2.39416757908564 0.394947207603093
3 A 859 3000 IC60 4 10.2373958352881 1.55162686552938
1.7 A 885 3000 BHX 4 87.7569689333017 10.2502550323564
3 A 918 3000 BHX 4 104.135074642339 4.77467275433362
1.7 B 400 3000 MinFree 0 1.87573877068556 0.34648345153664
1.7 B 650 3000 MinFree 0 1.99555403904079 0.482200923313764
1.7 B 900 3000 MinFree 0 2.54989484285768 0.398071770532481
1.7 B 1150 3000 MinFree 0 3.67636872311402 0.662270521850053
1.7 B 1400 3000 MinFree 0 3.5664978541551 0.164453275639932
3 B 400 3000 MinFree 0 2.62948341485392 0.0209463845730038
3 B 650 3000 MinFree 0 3.0066638279753 0.305024483713006
3 B 900 3000 MinFree 0 2.79255446831386 0.472851866083359
3 B 1150 3000 MinFree 0 5.64970870330824 0.251859240942665
3 B 1400 3000 MinFree 0 7.40595580787647 0.629256778750272
1.7 B 400 3000 IC60 4 0.38040036521839 0.231869270120922
1.7 B 650 3000 IC60 4 0.515922221163329 0.434661621954815
1.7 B 900 3000 IC60 4 3.06358032815653 0.959408177590503
1.7 B 1150 3000 IC60 4 4.04800689693192 0.255594912271896
1.7 B 1400 3000 IC60 4 3.69967975589305 0.469944383688801
3 B 400 3000 IC60 4 1.35706340378197 0.134829945730943
3 B 650 3000 IC60 4 1.91317966458018 1.77106692180411
3 B 900 3000 IC60 4 0.874227487043329 0.493348110823194
3 B 1150 3000 IC60 4 2.71732337235447 0.0703901684702626
3 B 1400 3000 IC60 4 4.96743231003956 0.45853815499614
3 B 400 3000 GIC 4 0.325743752029247 0.325743752029247
3 B 650 3000 GIC 4 3.12776074994155 0.452049425276085
3 B 900 3000 GIC 4 3.25564762321322 0.319567445434468
3 B 1150 3000 GIC 4 5.99730462724499 1.03439035936441
3 B 1400 3000 GIC 4 7.51312624370307 0.38399627585515
Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2, seaborn 0.12.1
Sample DataFrame
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = {'Media size': [1.7, 1.7, 1.7, 1.7, 1.7, 3.0, 3.0, 3.0, 3.0, 3.0, 1.7, 1.7, 1.7, 1.7, 1.7, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 1.7, 3.0, 1.7, 3.0, 1.7, 1.7, 1.7, 1.7, 1.7, 3.0, 3.0, 3.0, 3.0, 3.0, 1.7, 1.7, 1.7, 1.7, 1.7, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0],
'Material': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
'Impeller speed (rpm)': [400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400, 869, 859, 885, 918, 400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400],
'Energy input (kWh/t)': [3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000],
'Mineral': ['IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'GIC', 'GIC', 'GIC', 'GIC', 'GIC', 'IC60', 'IC60', 'BHX', 'BHX', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'GIC', 'GIC', 'GIC', 'GIC', 'GIC'],
'Mineral content (g/g fibre)': [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 4, 4, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4], 'Result': [3.42980002276166, 5.6349292302978, 6.1386616444364, 5.02677117937851, 3.0654271029038, 8.06973541574516, 4.69110601906018, 10.2119514553564, 7.3271067522139, 4.86901883487513, 1.30614274245145, 0.80632268273782, 1.35958635068886, 1.38784671261469, 1.12365621425779, 4.61104587078813, 4.40934493149759, 5.06333415444978, 3.85940865344646, 3.75572328102963, 6.05239906571977, 7.9023202316634, 6.97774277141699, 11.0705742735252, 9.37948091546579, 2.39416757908564, 10.2373958352881, 87.7569689333017, 104.135074642339, 1.87573877068556, 1.99555403904079, 2.54989484285768, 3.67636872311402, 3.5664978541551, 2.62948341485392, 3.0066638279753, 2.79255446831386, 5.64970870330824, 7.40595580787647, 0.38040036521839, 0.515922221163329, 3.06358032815653, 4.04800689693192, 3.69967975589305, 1.35706340378197, 1.91317966458018, 0.874227487043329, 2.71732337235447, 4.96743231003956, 0.325743752029247, 3.12776074994155, 3.25564762321322, 5.99730462724499, 7.51312624370307],
'ster': [0.21806853183829, 0.63877270588513, 0.150420705145224, 1.05459146256349, 0.917937247698497, 2.07869756201064, 1.21725878149246, 1.80680816945106, 0.438931805677489, 2.04826541508181, 0.341512517371074, 0.311762840996982, 0.360649049944933, 0.0052483812677852, 0.561737044169193, 0.147526557483362, 0.985706944001226, 0.0016505550303325, 0.731238210429852, 0.275897272330075, 0.0646300937591957, 0.458062146361444, 0.171777036954104, 1.3960974547215, 0.0650589433632627, 0.394947207603093, 1.55162686552938, 10.2502550323564, 4.77467275433362, 0.34648345153664, 0.482200923313764, 0.398071770532481, 0.662270521850053, 0.164453275639932, 0.0209463845730038, 0.305024483713006, 0.472851866083359, 0.251859240942665, 0.629256778750272, 0.231869270120922, 0.434661621954815, 0.959408177590503, 0.255594912271896, 0.469944383688801, 0.134829945730943, 1.77106692180411, 0.493348110823194, 0.0703901684702626, 0.45853815499614, 0.325743752029247, 0.452049425276085, 0.319567445434468, 1.03439035936441, 0.38399627585515]}
df = pd.DataFrame(data)
Map plt.errorbar onto sns.relplot
# filter the dataframe by Mineral
filtered = df[(df['Mineral']=='IC60') | (df['Mineral']=='MinFree')]
# plot the filtered dataframe
g = sns.relplot(data=filtered, x="Impeller speed (rpm)", y="Result", col="Media size", row='Material', hue="Mineral content (g/g fibre)", size="Media size", sizes=(50, 200))
# add the errorbars
g.map(plt.errorbar, "Impeller speed (rpm)", "Result", "ster", marker="none", color='r', ls='none')
Specify color for each group of errorbars
plt.errorbar only accepts one value for color. In order to match the colors to a palette, the specific data for each facet needs to be selected, and the proper color for that group passed to the color parameter.
errorbars that are smaller than the circle can't be seen.
# create a palette dictionary for the unique values in the hue column
palette = dict(zip(filtered['Mineral content (g/g fibre)'].unique(), ["#fdae61", "#abd9e9"]))
# plot the filtered dataframe
g = sns.relplot(data=filtered, x="Impeller speed (rpm)", y="Result", col="Media size", row='Material', hue="Mineral content (g/g fibre)", size="Media size", sizes=(50, 200), palette=palette)
# iterate through each facet of the facetgrid
for (material, media), ax in g.axes_dict.items():
# select the data for the facet
data = filtered[filtered['Material'].eq(material) & filtered['Media size'].eq(media)]
# select the data for each hue group
for group, selected in data.groupby('Mineral content (g/g fibre)'):
# plot the errorbar with the correct color for each group
ax.errorbar(data=selected, x="Impeller speed (rpm)", y="Result", yerr="ster", marker="none", color=palette[group], ls='none')

How To Categorize a List

Having this list:
list_price = [['1800','5060','6300','6800','10800','3000','7100',]
how do I categorize the list to be (1000, 2000, 3000, 4000, 5000, 6000, 7000, 000)?
example:
2000: 1800
7000:6800, 6300
And count them 2000(1),7000(2), if possible using pandas as an example.
Using rounding to the upper thousand:
list_price = ['1800','5060','6300','6800','10800','3000','7100']
out = (pd.Series(list_price).astype(int)
.sub(1).floordiv(1000)
.add(1).mul(100)
.value_counts()
)
output:
700 2
200 1
600 1
1100 1
300 1
800 1
0 1
dtype: int64
Intermediate without value_counts:
0 200
1 600
2 700
3 700
4 1100
5 300
6 800
dtype: int64
I assumed 000 at the end of categories is 10000. Try:
cut = pd.cut(list_price, bins=(1000, 2000, 3000, 4000, 5000, 6000, 7000, 10000))
pd.Series(list_price).groupby(cut).count()

Appending to lists in a dataframe via a for loop in Python, why is it being appended to each row?

I have a dataframe with area and price columns and have created a new column of empty lists called compList.
I am using a for loop to populate the compList for each row with the prices of any other houses with the same area value.
The result I am looking for is for data['compList'] to be [] for all area values apart from the first and last which both have an area of 1500, where the compList values should each have one value, 31000 and 30000 respectively. Instead I am getting [30000, 31000] for every compList value.
What is wrong with my code? Been racking my head for 2 hours trying to figure this out. Your help would be greatly appreciated.
import pandas as pd
import numpy as np
import collections
reqArea = 1200
area = [1500, 500, 1000, 2000, 2500, 1500]
price = [30000, 10000, 20000, 40000, 50000, 31000]
data = pd.DataFrame(list(zip(area,price)), columns = ['area','price'])
data['compList'] = [[]]*len(data['area'])
At this stage my dataframe looks like this:
area price compList
0 1500 30000 []
1 500 10000 []
2 1000 20000 []
3 2000 40000 []
4 2500 50000 []
5 1500 31000 []
Then I process it.
for i in range(len(data['area'])):
sameArea = []
sameArea = np.where(data['area'] == data['area'][i])[0]
if len(sameArea)>1:
for j in range(len(sameArea)):
if sameArea[j] != i:
data['compList'][i].append(data['price'][sameArea[j]])
else:
pass
At the end my dataframe looks like this:
area price compList
0 1500 30000 [31000, 30000]
1 500 10000 [31000, 30000]
2 1000 20000 [31000, 30000]
3 2000 40000 [31000, 30000]
4 2500 50000 [31000, 30000]
5 1500 31000 [31000, 30000]
[[]]*n is n references to the same object. Once you append data['compList'][i].append(data['price'][sameArea[j]]) you are basically appending to the elements of your compList column (which are essentially one object). Try this:
reqArea = 1200
area = [1500, 500, 1000, 2000, 2500, 1500]
price = [30000, 10000, 20000, 40000, 50000, 31000]
data = pd.DataFrame(list(zip(area,price)), columns = ['area','price'])
data['compList'] = np.empty((len(data), 0)).tolist()
Output using the rest of your code is:
area price compList
0 1500 30000 [31000]
1 500 10000 []
2 1000 20000 []
3 2000 40000 []
4 2500 50000 []
5 1500 31000 [30000]

Python - Running Average If number is great than 0

I have a column in my dataframe comprised of numbers. Id like to have another column in the dataframe that takes a running average of the values greater than 0 that i can ideally do in numpy without iteration. (data is huge)
Vals Output
-350
1000 1000
1300 1150
1600 1300
1100 1250
1000 1200
450 1075
1900 1192.857143
-2000 1192.857143
-3150 1192.857143
1000 1168.75
-900 1168.75
800 1127.777778
8550 1870
Code:
list =[-350, 1000, 1300, 1600, 1100, 1000, 450,
1900, -2000, -3150, 1000, -900, 800, 8550]
df = pd.DataFrame(data = list)
Option 1
expanding and mean
df.assign(out=df.loc[df.Vals.gt(0)].Vals.expanding().mean()).ffill()
If you have other columns in your DataFrame that have NaN values, this method will ffill those too, so if that is a concern, you may want to consider using something like this:
df['Out'] = df.loc[df.Vals.gt(0)].Vals.expanding().mean()
df['Out'] = df.Out.ffill()
Which will only fill in the Out column.
Option 2
mask:
df.assign(Out=df.mask(df.Vals.lt(0)).Vals.expanding().mean())
Both of these result in:
Vals Out
0 -350 NaN
1 1000 1000.000000
2 1300 1150.000000
3 1600 1300.000000
4 1100 1250.000000
5 1000 1200.000000
6 450 1075.000000
7 1900 1192.857143
8 -2000 1192.857143
9 -3150 1192.857143
10 1000 1168.750000
11 -900 1168.750000
12 800 1127.777778
13 8550 1870.000000

Python summarize row into column (Pandas pivot table)

I have a list of persons with the respective earnings by company like this
Company_code Person Date Earning1 Earning2
1 Jonh 2014-01 100 200
2 Jonh 2014-01 300 400
1 Jonh 2014-02 500 600
1 Peter 2014-01 300 400
1 Peter 2014-02 500 600
And I would like to summarize into this:
Company_code Person 2014-01_E1 2014-01_E2 2014-02_E1 2014-02_E2
1 Jonh 100 200 300 400
2 Jonh 500 600
1 Peter 300 400 500 600
I had the same problem doing this with SQL which I solved with the code:
with t(Company_code, Person, Dt, Earning1, Earning2) as (
select 1, 'Jonh', to_date('2014-01-01', 'YYYY-MM-DD'), 100, 200 from dual union all
select 2, 'Jonh', to_date('2014-01-01', 'YYYY-MM-DD'), 300, 400 from dual union all
select 1, 'Jonh', to_date('2014-02-01', 'YYYY-MM-DD'), 500, 600 from dual union all
select 1, 'Peter', to_date('2014-01-01', 'YYYY-MM-DD'), 300, 400 from dual union all
select 1, 'Peter', to_date('2014-02-01', 'YYYY-MM-DD'), 500, 600 from dual
)
select *
from t
pivot (
sum(Earning1) e1
, sum(Earning2) e2
for dt in (
to_date('2014-01-01', 'YYYY-MM-DD') "2014-01"
, to_date('2014-02-01', 'YYYY-MM-DD') "2014-02"
)
)
COMPANY_CODE PERSON 2014-01_E1 2014-01_E2 2014-02_E1 2014-02_E2
----------------------------------------------------------------------
2 Jonh 300 400 - -
1 Peter 300 400 500 600
1 Jonh 100 200 500 600
How can this be achived in python? I'm trying with Pandas pivot_table:
pd.pivot_table(df, columns=['COMPANY_CODE', 'PERSON', 'DATE'], aggfunc=np.sum)
but this just transposes the table ... any clues?
Using user1827356's suggestion:
df2 = pd.pivot_table(df, rows=['Company_code', 'Person'], cols=['Date'], aggfunc='sum')
print(df2)
# Earning1 Earning2
# Date 2014-01 2014-02 2014-01 2014-02
# Company_code Person
# 1 Jonh 100 500 200 600
# Peter 300 500 400 600
# 2 Jonh 300 NaN 400 NaN
You can flatten the hierarchical columns like this:
columns = ['{}_E{}'.format(date, earning.replace('Earning', ''))
for earning, date in df2.columns.tolist()]
df2.columns = columns
print(df2)
# 2014-01_E1 2014-02_E1 2014-01_E2 2014-02_E2
# Company_code Person
# 1 Jonh 100 500 200 600
# Peter 300 500 400 600
# 2 Jonh 300 NaN 400 NaN
Here's the nicest way to do it, using unstack.
df = pd.DataFrame({
'company_code': [1, 2, 1, 1, 1],
'person': ['Jonh', 'Jonh', 'Jonh', 'Peter', 'Peter'],
'earning2': [200, 400, 600, 400, 600],
'earning1': [100, 300, 500, 300, 500],
'date': ['2014-01', '2014-01', '2014-02', '2014-01', '2014-02']
})
df = df.set_index(['date', 'company_code', 'person'])
df.unstack('date')
Resulting in:
earning1 earning2
date 2014-01 2014-02 2014-01 2014-02
company_code person
1 Jonh 100.0 500.0 200.0 600.0
1 Peter 300.0 500.0 400.0 600.0
2 Jonh 300.0 NaN 400.0 NaN
Setting the index to ['date', 'company_code', 'person'] is a good idea anyway, since that's really what your DataFrame contains: two different earnings categories (1 and 2) each described by a date, a company code and a person.
It's good practice to always work out what the 'real' data in your DataFrame is, and which columns are meta-data, and index accordingly.

Categories