Combining one column values in another - python

I have the following dataframe:
import pandas as pd
array = {'test_ID': [10, 13, 10, 13, 16],
'test_date': ['2010-09-05', '2010-10-23', '2011-09-12', '2010-05-05', '2010-06-01'],
'Value1': [40, 56, 23, 78, 67],
'Value2': [25, 0, 68, 0, 0]}
df = pd.DataFrame(array)
df
test_ID test_date Value1 Value2
0 10 2010-09-05 40 25
1 13 2010-10-23 56 0
2 10 2011-09-12 23 68
3 13 2010-05-05 78 0
4 16 2010-06-01 67 0
I would like to delete column 'Value2' and combine it in column 'Value1' - but only when Value2 != Zero.
The expected output is:
test_ID test_date Value1
0 10 2010-09-05 40
1 99 2010-09-05 25
2 13 2010-10-23 56
3 10 2011-09-12 23
4 99 2011-09-12 68
5 13 2010-05-05 78
6 16 2010-06-01 67

Use DataFrame.set_index with DataFrame.stack for reshape, remove values with 0, remove last level of MultiIndex by DataFrame.droplevel and last create 3 columns DataFrame:
s = df.set_index(['test_ID','test_date']).stack()
df = s[s.ne(0)].reset_index(name='Value1')
df['test_ID'] = df['test_ID'].mask(df.pop('level_2').eq('Value2'), 99)
print (df)
test_ID test_date Value1
0 10 2010-09-05 40
1 99 2010-09-05 25
2 13 2010-10-23 56
3 10 2011-09-12 23
4 99 2011-09-12 68
5 13 2010-05-05 78
6 16 2010-06-01 67
Another solution with DataFrame.melt and remove 0 rows by DataFrame.loc:
df = (df.melt(['test_ID','test_date'], value_name='Value1', ignore_index=False)
.assign(test_ID = lambda x: x['test_ID'].mask(x.pop('variable').eq('Value2'), 99))
.sort_index()
.loc[lambda x: x['Value1'].ne(0)]
.reset_index(drop=True))
print (df)
test_ID test_date Value1
0 10 2010-09-05 40
1 99 2010-09-05 25
2 13 2010-10-23 56
3 10 2011-09-12 23
4 99 2011-09-12 68
5 13 2010-05-05 78
6 16 2010-06-01 67

Here is a simple solution by filtering on non zero values.
df = pd.DataFrame(array)
filtered_rows = df.loc[df["Value2"] != 0]
filtered_rows.loc[:,'Value1'] = filtered_rows.loc[:,'Value2']
filtered_rows.loc[:, 'test_ID'] = 99
df = pd.concat([df, filtered_rows]).sort_index().drop(['Value2'], axis=1)
This gives us the expected data :
test_ID test_date Value1
0 10 2010-09-05 40
0 99 2010-09-05 25
1 13 2010-10-23 56
2 10 2011-09-12 23
2 99 2011-09-12 68
3 13 2010-05-05 78
4 16 2010-06-01 67

Related

How do you lookup in range

I have 2 data frames that I would like to return the values in a range (-1, 0, +1). One of the data frames contains Id's that i would like to look up and the other data frame contains Id's & values. For example, I want to lookup 99, 55, 117 in another data frame and return 100 99 98, 56 55 54, 118 117 116. As you can see it getting the values -1 and +1 of the Id's I would like to lookup. There is a better example below.
df = pd.DataFrame([[99],[55],[117]],columns = ['Id'])
df2 = pd.DataFrame([[100,1,2,4,5,6,8],
[87,1,6,20,22,23,34],
[99,1,12,13,34,45,46],
[64,1,10,14,29,32,33],
[55,1,22,13,23,33,35],
[66,1,6,7,8,9,10],
[77,1,2,3,5,6,8],
[811,1,2,5,6,8,10],
[118,1,7,8,22,44,56],
[117,1,66,44,47,87,91]],
columns = ['Id', 'Num1','Num2','Num3','Num4','Num5','Num6'])
I would like my result to something like this below.
results = pd.DataFrame([[87,1,6,20,22,23,34],
[99,1,12,13,34,45,46],
[64,1,10,14,29,32,33],
[64,1,10,14,29,32,33],
[55,1,22,13,23,33,35],
[66,1,6,7,8,9,10],
[118,1,7,8,22,44,56],
[117,1,66,44,47,87,91]],
columns = ['Id', 'Num1','Num2','Num3','Num4','Num5','Num6'])
import pandas as pd
import numpy as np
ind = df2[df2['Id'].isin(df['Id'])].index
aaa = np.array([[ind[i]-1,ind[i],ind[i]+1] for i in range(len(ind))]).ravel()
aaa = aaa[(aaa <= df2.index.values[-1]) & (aaa >= 0)]
df_test = df2.loc[aaa, :].reset_index().drop(['index'], axis=1)
print(df_test)
Output
Id Num1 Num2 Num3 Num4 Num5 Num6
0 87 1 6 20 22 23 34
1 99 1 12 13 34 45 46
2 64 1 10 14 29 32 33
3 64 1 10 14 29 32 33
4 55 1 22 13 23 33 35
5 66 1 6 7 8 9 10
6 118 1 7 8 22 44 56
7 117 1 66 44 47 87 91
Here, in the ind list, indexes are obtained where there are the required Ids in df2.
The aaa list creates ranges for these indexes, then the lists are wrapped in np.array, ravel() is used to concatenate them. Next, the list aaa is overwritten, the elements that are greater than the maximum index df2 are removed.
Sampling occurs through loc.
Update 17.12.2022
if you need duplicate rows.
df = pd.DataFrame([[99], [55], [117], [117]], columns=['Id'])
lim_ind = df2.index[-1]
def my_func(i):
a = df2[df2['Id'].isin([i])].index.values
a = np.array([a - 1, a, a + 1]).ravel()
a = a[(a >= 0) & (a <= lim_ind)]
return a
qqq = [my_func(i) for i in df['Id']]
fff = np.array([df2.loc[qqq[i]].values for i in range(len(qqq))], dtype=object)
fff = np.vstack(fff)
result = pd.DataFrame(fff, columns=df2.columns)
print(result)
Output
Id Num1 Num2 Num3 Num4 Num5 Num6
0 87 1 6 20 22 23 34
1 99 1 12 13 34 45 46
2 64 1 10 14 29 32 33
3 64 1 10 14 29 32 33
4 55 1 22 13 23 33 35
5 66 1 6 7 8 9 10
6 118 1 7 8 22 44 56
7 117 1 66 44 47 87 91
8 118 1 7 8 22 44 56
9 117 1 66 44 47 87 91

Data frame segmentation and dropping

I have the following DataFrame in pandas:
A = [1,10,23,45,24,24,55,67,73,26,13,96,53,23,24,43,90],
B = [24,23,29, BW,49,59,72, BW,9,183,17, txt,2,49,BW,479,BW]
I want to create a new column and in that column I want to have values from column A based on the condition on column B. Conditions are if there is no ''txt'' in between two consecutive ''BW'', then I will have those on column C. But if there is ''txt'' between two consecutive ''BW'', I want to drop all those values. So the expected output should look like:
A = [1,10,23,45,24,24,55,67,73,26,13,96,53,23,24,43,90],
B = [24,23,29, BW,49,59,72, BW,9,183,17, txt,2,49,BW,479,BW]
C = [1,10,23, BW, 24,24,55, BW, nan, nan, nan, nan, nan, nan, BW, 43,BW]
I have no clue how to do it. Any help is much appreciated.
EDIT:
Updated answer which was missing the values of BW in the final df.
import pandas as pd
import numpy as np
BW = 999
txt = -999
A = [1,10,23,45,24,24,55,67,73,26,13,96,53,23,24,43,90]
B = [24,23,29, BW,49,59,72, BW,9,183,17, txt,2,49,BW,479,BW]
df = pd.DataFrame({'A': A, 'B': B})
df = df.assign(group = (df[~df['B'].between(BW,BW)].index.to_series().diff() > 1).cumsum())
df['C'] = np.where(df.group == df[df.B == txt].group.values[0], np.nan, df.A)
df['C'] = np.where(df['B'] == BW, df['B'], df['C'])
df['C'] = df['C'].astype('Int64')
df = df.drop('group', axis=1)
In [435]: df
Out[435]:
A B C
0 1 24 1
1 10 23 10
2 23 29 23
3 45 999 999 <-- BW
4 24 49 24
5 24 59 24
6 55 72 55
7 67 999 999 <-- BW
8 73 9 <NA>
9 26 183 <NA>
10 13 17 <NA>
11 96 -999 <NA> <-- txt is in the middle of BW
12 53 2 <NA>
13 23 49 <NA>
14 24 999 999 <-- BW
15 43 479 43
16 90 999 999 <-- BW
You can achieve it like so, assuming BW and txt are specific values I just filled them with some random number to differentiate them
In [277]: BW = 999
In [278]: txt = -999
In [293]: A = [1,10,23,45,24,24,55,67,73,26,13,96,53,23,24,43,90]
...: B = [24,23,29, BW,49,59,72, BW,9,183,17, txt,49,BW,479,BW]
In [300]: df = pd.DataFrame({'A': A, 'B': B})
In [301]: df
Out[301]:
A B
0 1 24
1 10 23
2 23 29
3 45 999
4 24 49
5 24 59
6 55 72
7 67 999
8 73 9
9 26 183
10 13 17
11 96 -999
12 53 2
13 23 49
14 24 999
15 43 479
16 90 999
First lets split the different groups of values, here I am splitting them into unique groups where each group contains the values of B that are between the value BW and the next BW.
In [321]: df = df.assign(group = (df[~df['B'].between(BW,BW)].index.to_series().diff() > 1).cumsum())
In [322]: df
Out[322]:
A B group
0 1 24 0.00000000
1 10 23 0.00000000
2 23 29 0.00000000
3 45 999 NaN
4 24 49 1.00000000
5 24 59 1.00000000
6 55 72 1.00000000
7 67 999 NaN
8 73 9 2.00000000
9 26 183 2.00000000
10 13 17 2.00000000
11 96 -999 2.00000000
12 53 2 2.00000000
13 23 49 2.00000000
14 24 999 NaN
15 43 479 3.00000000
16 90 999 NaN
Next with the use of np.where() we can replace the values depending on the condition that you set.
In [360]: df['C'] = np.where(df.group == df[df.B == txt].group.values[0], np.nan, df.B)
In [432]: df
Out[432]:
A B group C
0 1 24 0.00000000 24.00000000
1 10 23 0.00000000 23.00000000
2 23 29 0.00000000 29.00000000
3 45 999 NaN 999.00000000
4 24 49 1.00000000 49.00000000
5 24 59 1.00000000 59.00000000
6 55 72 1.00000000 72.00000000
7 67 999 NaN 999.00000000
8 73 9 2.00000000 NaN
9 26 183 2.00000000 NaN
10 13 17 2.00000000 NaN
11 96 -999 2.00000000 NaN
12 53 2 2.00000000 NaN
13 23 49 2.00000000 NaN
14 24 999 NaN 999.00000000
15 43 479 3.00000000 479.00000000
16 90 999 NaN 999.00000000
Here we need to set the where B is equal to BW for C back to the values of B.
In [488]: df['C'] = np.where(df['B'] == BW, df['B'], df['C'])
In [489]: df
Out[489]:
A B group C
0 1 24 0.00000000 24.00000000
1 10 23 0.00000000 23.00000000
2 23 29 0.00000000 29.00000000
3 45 999 NaN 999.00000000
4 24 49 1.00000000 49.00000000
5 24 59 1.00000000 59.00000000
6 55 72 1.00000000 72.00000000
7 67 999 NaN 999.00000000
8 73 9 2.00000000 NaN
9 26 183 2.00000000 NaN
10 13 17 2.00000000 NaN
11 96 -999 2.00000000 NaN
12 53 2 2.00000000 NaN
13 23 49 2.00000000 NaN
14 24 999 NaN 999.00000000
15 43 479 3.00000000 479.00000000
16 90 999 NaN 999.00000000
Lastly just convert the float column to int and drop the group column which we do not need anymore. If you want to maintain that the NaN values are np.nan then ignore the conversion to Int64.
In [396]: df.C = df.C.astype('Int64')
In [397]: df
Out[397]:
A B group C
0 1 24 0.00000000 24
1 10 23 0.00000000 23
2 23 29 0.00000000 29
3 45 999 NaN 999
4 24 49 1.00000000 49
5 24 59 1.00000000 59
6 55 72 1.00000000 72
7 67 999 NaN 999
8 73 9 2.00000000 <NA>
9 26 183 2.00000000 <NA>
10 13 17 2.00000000 <NA>
11 96 -999 2.00000000 <NA>
12 53 2 2.00000000 <NA>
13 23 49 2.00000000 <NA>
14 24 999 NaN 999
15 43 479 3.00000000 479
16 90 999 NaN 999
In [398]: df = df.drop('group', axis=1)
In [435]: df
Out[435]:
A B C
0 1 24 24
1 10 23 23
2 23 29 29
3 45 999 999
4 24 49 49
5 24 59 59
6 55 72 72
7 67 999 999
8 73 9 <NA>
9 26 183 <NA>
10 13 17 <NA>
11 96 -999 <NA>
12 53 2 <NA>
13 23 49 <NA>
14 24 999 999
15 43 479 479
16 90 999 999
I don't know if this is the most efficient way to do it, but you can create a new column called mask from mapping the values in column B the following way: 'BW' to True, 'txt' to False and all other values to np.nan.
Then if you forward fill the NaN from mask, and backward fill the NaN from mask and logically combine the results (set equal to True as long as one of the forward or backward filled columns is False), you can create a column called final_mask where all of the values between consecutive BW containing txt are filled in with True.
You can then use .apply to select the value of column A only when the final_mask is False and column B isn't 'BW', select column B if final_mask is False and column B is 'BW', and np.nan otherwise.
import numpy as np
import pandas as pd
A = [1,10,23,45,24,24,55,67,73,26,13,96,53,23,24,43,90]
B = [24,23,29, 'BW',49,59,72, 'BW',9,183,17, 'txt',2,49,'BW',479,'BW']
df = pd.DataFrame({'A':A,'B':B})
df["mask"] = df["B"].apply(lambda x: True if x == 'BW' else False if x == 'txt' else np.nan)
df["ffill"] = df["mask"].fillna(method="ffill")
df["bfill"] = df["mask"].fillna(method="bfill")
df["final_mask"] = (df["ffill"] == False) | (df["bfill"] == False)
df["C"] = df.apply(lambda x: x['A'] if (
(x['final_mask'] == False) & (x['B'] != 'BW'))
else x['B'] if ((x['final_mask'] == False) & (x['B'] == 'BW'))
else np.nan, axis=1
)
>>> df
A B mask ffill bfill final_mask C
0 1 24 NaN NaN True False 1
1 10 23 NaN NaN True False 10
2 23 29 NaN NaN True False 23
3 45 BW True True True False BW
4 24 49 NaN True True False 24
5 24 59 NaN True True False 24
6 55 72 NaN True True False 55
7 67 BW True True True False BW
8 73 9 NaN True False True NaN
9 26 183 NaN True False True NaN
10 13 17 NaN True False True NaN
11 96 txt False False False True NaN
12 53 2 NaN False True True NaN
13 23 49 NaN False True True NaN
14 24 BW True True True False BW
15 43 479 NaN True True False 43
16 90 BW True True True False BW
Dropping the columns we created along the way:
df.drop(columns=['mask','ffill','bfill','final_mask'])
A B C
0 1 24 1
1 10 23 10
2 23 29 23
3 45 BW BW
4 24 49 24
5 24 59 24
6 55 72 55
7 67 BW BW
8 73 9 NaN
9 26 183 NaN
10 13 17 NaN
11 96 txt NaN
12 53 2 NaN
13 23 49 NaN
14 24 BW BW
15 43 479 43
16 90 BW BW

Add/Update/Merge original DataFrame into a grouped DataFrame

How can I merge, update, join, concat, or filter the original DF correctly so that I can have the complete 78 columns?
I have a DataFrame with 22 rows and 78 columns. An internet-friendly version of the file can be found here. This a sample:
item_no code group gross_weight net_weight value ... ... +70 columns more
1 7417.85.24.25 0 18 17 13018.74
2 1414.19.00.62 1 35 33 0.11
3 7815.80.99.96 0 49 48 1.86
4 1414.19.00.62 1 30 27 2.7
5 5867.21.36.92 1 31 24 94
6 9227.71.84.12 1 24 17 56.4
7 1414.19.00.62 0 42 35 0.56
8 4465.58.84.31 0 50 42 0.94
9 1596.09.32.64 1 20 13 0.75
10 2194.64.27.41 1 38 33 1.13
11 1596.09.32.64 1 53 46 1.9
12 1596.09.32.64 1 18 15 10.44
13 1596.09.32.64 1 35 33 15.36
14 4835.09.81.44 1 55 47 10.44
15 5698.44.72.13 1 51 49 15.36
16 5698.44.72.13 1 49 45 2.15
17 5698.44.72.13 0 41 33 16
18 3815.79.80.69 1 25 21 4
19 3815.79.80.69 1 35 30 2.4
20 4853.40.53.94 1 53 46 3.12
21 4853.40.53.94 1 50 47 3.98
22 4853.40.53.94 1 16 13 6.53
The column group gives me the instruction that I should group all similar values in the code column and add the values in the columns: 'gross_weight', 'net_weight', 'value', and 'item_quantity'. Additionally, I have to modify 2 additional columns as shown below:
#Group DF
grouped_df = df.groupby(['group', 'code'], as_index=False).agg({'item_quantity':'sum', 'gross_weight':'sum','net_weight':'sum', 'value':'sum'}).copy()
#Total items should be equal to the length of the DF
grouped_df['total_items'] = len(grouped_df)
#Item No.
grouped_df['item_no'] = [x+1 for x in range(len(grouped_df))]
This is the result:
group code item_quantity gross_weight net_weight value total_items item_no
0 0 1414.19.00.62 75.0 42 35 0.56 14 1
1 0 4465.58.84.31 125.0 50 42 0.94 14 2
2 0 5698.44.72.13 200.0 41 33 16.0 14 3
3 0 7417.85.24.25 1940.2 18 17 13018.74 14 4
4 0 7815.80.99.96 200.0 49 48 1.86 14 5
5 1 1414.19.00.62 275.0 65 60 2.81 14 6
6 1 1596.09.32.64 515.0 126 107 28.45 14 7
7 1 2194.64.27.41 151.0 38 33 1.13 14 8
8 1 3815.79.80.69 400.0 60 51 6.4 18 14 9
9 1 4835.09.81.44 87.0 55 47 10.44 14 10
10 1 4853.40.53.94 406.0 119 106 13.63 14 11
11 1 5698.44.72.13 328.0 100 94 17.51 14 12
12 1 5867.21.36.92 1000.0 31 24 94.0 14 13
13 1 9227.71.84.12 600.0 24 17 56.4 14 14
All of the columns in the grouped DF exist in the original DF but some have different values.
How can I merge, update, join, concat, or filter the original DF correctly so that I can have the complete 78 columns?
The objective DataFrame is the grouped DF.
The columns in the original DF that already exists in the Grouped DF should be omitted.
I should be able to take the first value of the columns in the original DF that aren't in the Grouped DF.
The column code does not have unique values.
The column part_number in the complete file does not have unique values.
I tried:
pd.Merge(how='left') after creating a unique ID; it duplicates existing columns instead of updating values or overwriting.
join, concat, update: does not yield the expected results.
.agg({lambda x: x.iloc[0]}) adds all the columns but I don't know how to add it to the current .agg({'item_quantity':'sum', 'gross_weight':'sum','net_weight':'sum', 'value':'sum'})
I know that .agg({'column_name':'first']) returns the first value, but I don't know how to make it work for over 70 columns automatically.
You can achieve this dynamically creating a dictionary with list comprehension like this:
df.groupby(['group', 'code'], as_index=False).agg({col : 'sum' for col in df.columns[3:]}
If item_no is your index, then change df.columns[3:] to df.columns[2:]

Overwrite some rows in pandas dataframe with ones from another dataframe based on index

I have a pandas dataframe, df1.
I want to overwrite its values with values in df2, where the index and column name match.
I've found a few answers on this site, but nothing that quite does what I want.
df1
A B C
0 33 44 54
1 11 32 54
2 43 55 12
3 43 23 34
df2
A
0 5555
output
A B C
0 5555 44 54
1 11 32 54
2 43 55 12
3 43 23 34
You can use combine_first with convert to integer if necessary:
df = df2.combine_first(df1).astype(int)
print (df)
A B C
0 5555 44 54
1 11 32 54
2 43 55 12
3 43 23 34
If need check intersection index and columns between both DataFrames:
df2= pd.DataFrame({'A':[5555, 2222],
'D':[3333, 4444]},index=[0, 10])
idx = df2.index.intersection(df1.index)
cols = df2.columns.intersection(df1.columns)
df = df2.loc[idx, cols].combine_first(df1).astype(int)
print (df)
A B C
0 5555 44 54
1 11 32 54
2 43 55 12
3 43 23 34

Combine duplicated columns within a DataFrame

If I have a dataframe that has columns that include the same name, is there a way to combine the columns that have the same name with some sort of function (i.e. sum)?
For instance with:
In [186]:
df["NY-WEB01"].head()
Out[186]:
NY-WEB01 NY-WEB01
DateTime
2012-10-18 16:00:00 5.6 2.8
2012-10-18 17:00:00 18.6 12.0
2012-10-18 18:00:00 18.4 12.0
2012-10-18 19:00:00 18.2 12.0
2012-10-18 20:00:00 19.2 12.0
How might I collapse the NY-WEB01 columns (there are a bunch of duplicate columns, not just NY-WEB01) by summing each row where the column name is the same?
I believe this does what you are after:
df.groupby(lambda x:x, axis=1).sum()
Alternatively, between 3% and 15% faster depending on the length of the df:
df.groupby(df.columns, axis=1).sum()
EDIT: To extend this beyond sums, use .agg() (short for .aggregate()):
df.groupby(df.columns, axis=1).agg(numpy.max)
pandas >= 0.20: df.groupby(level=0, axis=1)
You don't need a lambda here, nor do you explicitly have to query df.columns; groupby accepts a level argument you can specify in conjunction with the axis argument. This is cleaner, IMO.
# Setup
np.random.seed(0)
df = pd.DataFrame(np.random.choice(50, (5, 5)), columns=list('AABBB'))
df
A A B B B
0 44 47 0 3 3
1 39 9 19 21 36
2 23 6 24 24 12
3 1 38 39 23 46
4 24 17 37 25 13
<!_ >
df.groupby(level=0, axis=1).sum()
A B
0 91 6
1 48 76
2 29 60
3 39 108
4 41 75
Handling MultiIndex columns
Another case to consider is when dealing with MultiIndex columns. Consider
df.columns = pd.MultiIndex.from_arrays([['one']*3 + ['two']*2, df.columns])
df
one two
A A B B B
0 44 47 0 3 3
1 39 9 19 21 36
2 23 6 24 24 12
3 1 38 39 23 46
4 24 17 37 25 13
To perform aggregation across the upper levels, use
df.groupby(level=1, axis=1).sum()
A B
0 91 6
1 48 76
2 29 60
3 39 108
4 41 75
or, if aggregating per upper level only, use
df.groupby(level=[0, 1], axis=1).sum()
one two
A B B
0 91 0 6
1 48 19 57
2 29 24 36
3 39 39 69
4 41 37 38
Alternate Interpretation: Dropping Duplicate Columns
If you came here looking to find out how to simply drop duplicate columns (without performing any aggregation), use Index.duplicated:
df.loc[:,~df.columns.duplicated()]
A B
0 44 0
1 39 19
2 23 24
3 1 39
4 24 37
Or, to keep the last ones, specify keep='last' (default is 'first'),
df.loc[:,~df.columns.duplicated(keep='last')]
A B
0 47 3
1 9 36
2 6 12
3 38 46
4 17 13
The groupby alternatives for the two solutions above are df.groupby(level=0, axis=1).first(), and ... .last(), respectively.
Here is possible simplier solution for common aggregation functions like sum, mean, median, max, min, std - only use parameters axis=1 for working with columns and level:
#coldspeed samples
np.random.seed(0)
df = pd.DataFrame(np.random.choice(50, (5, 5)), columns=list('AABBB'))
print (df)
print (df.sum(axis=1, level=0))
A B
0 91 6
1 48 76
2 29 60
3 39 108
4 41 75
df.columns = pd.MultiIndex.from_arrays([['one']*3 + ['two']*2, df.columns])
print (df.sum(axis=1, level=1))
A B
0 91 6
1 48 76
2 29 60
3 39 108
4 41 75
print (df.sum(axis=1, level=[0,1]))
one two
A B B
0 91 0 6
1 48 19 57
2 29 24 36
3 39 39 69
4 41 37 38
Similar it working for index, then use axis=0 instead axis=1:
np.random.seed(0)
df = pd.DataFrame(np.random.choice(50, (5, 5)), columns=list('ABCDE'), index=list('aabbc'))
print (df)
A B C D E
a 44 47 0 3 3
a 39 9 19 21 36
b 23 6 24 24 12
b 1 38 39 23 46
c 24 17 37 25 13
print (df.min(axis=0, level=0))
A B C D E
a 39 9 0 3 3
b 1 6 24 23 12
c 24 17 37 25 13
df.index = pd.MultiIndex.from_arrays([['bar']*3 + ['foo']*2, df.index])
print (df.mean(axis=0, level=1))
A B C D E
a 41.5 28.0 9.5 12.0 19.5
b 12.0 22.0 31.5 23.5 29.0
c 24.0 17.0 37.0 25.0 13.0
print (df.max(axis=0, level=[0,1]))
A B C D E
bar a 44 47 19 21 36
b 23 6 24 24 12
foo b 1 38 39 23 46
c 24 17 37 25 13
If need use another functions like first, last, size, count is necessary use coldspeed answer

Categories