How to select nested columns in a multi-indexed pandas dataframe

How to select nested columns in a multi-indexed pandas dataframe - python

I created a 3D Pandas dataframe like this:
A= ['ECFP', 'ECFP', 'ECFP', 'FCFP', 'FCFP', 'FCFP', 'RDK5', 'RDK5', 'RDK5']
B = ['R', 'tau', 'RMSEc', 'R', 'tau', 'RMSEc', 'R', 'tau', 'RMSEc']
C = array([[ 0.1 , 0.3 , 0.5 , nan, 0.6 , 0.4 ],
[ 0.4 , 0.3 , 0.3 , nan, 0.4 , 0.3 ],
[ 1.2 , 1.3 , 1.1 , nan, 1.5 , 1. ],
[ 0.4 , 0.3 , 0.4 , 0.8 , 0.1 , 0.2 ],
[ 0.2 , 0.3 , 0.3 , 0.3 , 0.5 , 0.6 ],
[ 1. , 1.2 , 1. , 0.9 , 1.2 , 1. ],
[ 0.4 , 0.7 , 0.5 , 0.4 , 0.6 , 0.6 ],
[ 0.6 , 0.5 , 0.3 , 0.3 , 0.3 , 0.5 ],
[ 1.2 , 1.5 , 1.3 , 0.97, 1.5 , 1. ]])
df = pd.DataFrame(data=C.T, columns=pd.MultiIndex.from_tuples(zip(A,B)))
df = df.dropna(axis=0, how='any')
The final Dataframe looks like this:
ECFP FCFP RDK5
R tau RMSEc R tau RMSEc R tau RMSEc
0 0.1 0.4 1.2 0.4 0.2 1.0 0.4 0.6 1.2
1 0.3 0.3 1.3 0.3 0.3 1.2 0.7 0.5 1.5
2 0.5 0.3 1.1 0.4 0.3 1.0 0.5 0.3 1.3
4 0.6 0.4 1.5 0.1 0.5 1.2 0.6 0.3 1.5
5 0.4 0.3 1.0 0.2 0.6 1.0 0.6 0.5 1.0
How can I get the correlation matrix only between 'R' values for all types of data ('ECFP', 'FCFP', 'RDK5')?

use IndexSlice:
In [53]: df.loc[:, pd.IndexSlice[:, 'R']]
Out[53]:
ECFP FCFP RDK5
R R R
0 0.1 0.4 0.4
1 0.3 0.3 0.7
2 0.5 0.4 0.5
4 0.6 0.1 0.6
5 0.4 0.2 0.6

By using slice
df.loc[:,(slice(None),'R')]
Out[375]:
ECFP FCFP RDK5
R R R
0 0.1 0.4 0.4
1 0.3 0.3 0.7
2 0.5 0.4 0.5
4 0.6 0.1 0.6
5 0.4 0.2 0.6

Both answers work, but first I must lexstort, otherwise I get this error:
KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (1)'
The solution is:
df.sortlevel(axis=1, inplace=True)
print "Correlation matrix of Pearson's R values among all feature vector types:"
df.loc[:, pd.IndexSlice[:, 'R']].corr()

Related

Python: Appending a row into all rows in a dataframe

I have 2 dataframe as shown below
dff = pd.DataFrame([[0.4, 0.2, 0.4], [0.1, 0.3, 0.6], [0.3, 0.2, 0.5], [0.3,0.3,0.4]], columns=['WA', 'WB','WC'])
WA WB WC
0 0.4 0.2 0.4
1 0.1 0.3 0.6
2 0.3 0.2 0.5
3 0.3 0.3 0.4
dff2 = pd.DataFrame([[0.5, 0.2, 0.4]], columns = ['stv_A', 'stv_B', 'stv_c'])
stv_Astv_Bstv_c
0 0.5 0.2 0.4
Is there anyway to append dff2 which only consist of one row to every single row in ddf? Resulting dataframe should thus have 6 columns and rows

You can use:
dff[dff2.columns] = dff2.squeeze()
print(dff)
# Output
WA WB WC stv_A stv_B stv_c
0 0.4 0.2 0.4 0.5 0.2 0.4
1 0.1 0.3 0.6 0.5 0.2 0.4
2 0.3 0.2 0.5 0.5 0.2 0.4
3 0.3 0.3 0.4 0.5 0.2 0.4

Pandas does the broadcasting for you when you assign a scalar as a column:
import pandas as pd
dff = pd.DataFrame([[0.4, 0.2, 0.4], [0.1, 0.3, 0.6], [0.3, 0.2, 0.5], [0.3,0.3,0.4]], columns=['WA', 'WB','WC'])
dff2 = pd.DataFrame([[0.5, 0.2, 0.4]], columns = ['stv_A', 'stv_B', 'stv_c'])
for col in dff2.columns:
dff[col] = dff2[col][0] # Pass a scalar
print(dff)
Output:
WA WB WC stv_A stv_B stv_c
0 0.4 0.2 0.4 0.5 0.2 0.4
1 0.1 0.3 0.6 0.5 0.2 0.4
2 0.3 0.2 0.5 0.5 0.2 0.4
3 0.3 0.3 0.4 0.5 0.2 0.4

You can first repeat the row in dff2 len(dff) times with different methods, then concat the repeated dataframe to dff
df = pd.concat([dff, pd.concat([dff2]*len(dff)).reset_index(drop=True)], axis=1)
print(df)
WA WB WC stv_A stv_B stv_c
0 0.4 0.2 0.4 0.5 0.2 0.4
1 0.1 0.3 0.6 0.5 0.2 0.4
2 0.3 0.2 0.5 0.5 0.2 0.4
3 0.3 0.3 0.4 0.5 0.2 0.4

Pandas row-wise addition with another column

I have a dataframe df
A B C
0.1 0.3 0.5
0.2 0.4 0.6
0.3 0.5 0.7
0.4 0.6 0.8
0.5 0.7 0.9
For each row I would I would like to add a value to each element from dataframe df1
X
0.1
0.2
0.3
0.4
0.5
Such that the final result would be
A B C
0.2 0.4 0.6
0.4 0.6 0.8
0.6 0.8 1.0
0.8 1.0 1.2
1.0 1.2 1.4
I have tried using df_new =df.sum(df1, axis=0), but got the following error TypeError: stat_func() got multiple values for argument 'axis' I would be open to numpy solutions as well

You can use np.add:
df = np.add(df, df1.to_numpy())
print(df)
Prints:
A B C
0 0.2 0.4 0.6
1 0.4 0.6 0.8
2 0.6 0.8 1.0
3 0.8 1.0 1.2
4 1.0 1.2 1.4

import pandas as pd
df = pd.DataFrame([[0.1,0.3, 0.5],
[0.2, 0.4, 0.6],
[0.3, 0.5, 0.7],
[0.4, 0.6, 0.8],
[0.5, 0.7, 0.9]],
columns=['A', 'B', 'C'])
df1 = [0.1, 0.2, 0.3, 0.4, 0.5]
# In one Pandas instruction
df = df.add(pd.Series(df1), axis=0)
results :
A B C
0 0.2 0.4 0.6
1 0.4 0.6 0.8
2 0.6 0.8 1.0
3 0.8 1.0 1.2
4 1.0 1.2 1.4

Try concat with .stack() and .sum()
df_new = pd.concat([df1.stack(),df2.stack()],1).bfill().sum(axis=1).unstack(1).drop('X',1)
A B C
0 0.2 0.4 0.6
1 0.4 0.6 0.8
2 0.6 0.8 1.0
3 0.8 1.0 1.2
4 1.0 1.2 1.4

df= pd.DataFrame([[0.1,0.3, 0.5],
[0.2, 0.4, 0.6],
[0.3, 0.5, 0.7],
[0.4, 0.6, 0.8],
[0.5, 0.7, 0.9]],
columns=['A', 'B', 'C'])
df["X"]=[0.1, 0.2, 0.3, 0.4, 0.5]
columns_to_add= df.columns[:-1]
for col in columns_to_add:
df[col]+=df['X'] #this is where addition or any other operation can be performed
df.drop('X',axis=0)

Pandas: Return first column number that matches a condition

I have a dataframe that looks like this
a b c d
0 0.6 -0.4 0.2 0.7
1 0.8 0.2 -0.2 0.3
2 -0.1 0.5 0.5 -0.4
3 0.8 -0.6 -0.7 -0.2
And I wish to create column 'e' such that it displays the column number of the first instance in each row where the value is less than 0
So the goal result will look like this
a b c d e
0 0.6 -0.4 0.2 0.7 2
1 0.8 0.2 -0.2 0.3 3
2 -0.1 0.5 0.5 -0.4 1
3 0.8 -0.6 -0.7 -0.2 2
I can do this in Excel using a MATCH(True) type function but am struggling to make progress in Pandas.
Thanks for any help

You can use np.argmax:
# where the values are less than 0
a = df.values < 0
# if the row is all non-negative, return 0
df['e'] = np.where(a.any(1), np.argmax(a,axis=1)+1, 0)
Output:
a b c d e
0 0.6 -0.4 0.2 0.7 2
1 0.8 0.2 -0.2 0.3 3
2 -0.1 0.5 0.5 -0.4 1
3 0.8 -0.6 -0.7 -0.2 2

Something like idxmin with np.sin
import numpy as np
df['e']=df.columns.get_indexer(np.sign(df).idxmin(1))+1
df
a b c d e
0 0.6 -0.4 0.2 0.7 2
1 0.8 0.2 -0.2 0.3 3
2 -0.1 0.5 0.5 -0.4 1
3 0.8 -0.6 -0.7 -0.2 2

Get the first max, combined with get indexer for to get the column numbers:
df["e"] = df.columns.get_indexer_for(df.lt(0, axis=1).idxmax(axis=1).array) + 1
df
a b c d e
0 0.6 -0.4 0.2 0.7 2
1 0.8 0.2 -0.2 0.3 3
2 -0.1 0.5 0.5 -0.4 1
3 0.8 -0.6 -0.7 -0.2 2

create new rows based on values of one of the columns in the above row with specific condition - pandas or numpy

I have a data frame as shown below
B_ID no_show Session slot_num walkin ns_w c_ns_w c_walkin
1 0.4 S1 1 0.2 0.2 0.2 0.2
2 0.3 S1 2 0.5 -0.2 0.2 0.7
3 0.8 S1 3 0.5 0.3 0.5 1.2
4 0.3 S1 4 0.8 -0.5 0.0 2.0
5 0.6 S1 5 0.4 0.2 0.2 2.4
6 0.8 S1 6 0.2 0.6 0.8 2.6
7 0.9 S1 7 0.1 0.8 1.4 2.7
8 0.4 S1 8 0.5 -0.1 1.3 3.2
9 0.6 S1 9 0.1 0.5 1.8 3.3
12 0.9 S2 1 0.9 0.0 0.0 0.9
13 0.5 S2 2 0.4 0.1 0.1 1.3
14 0.3 S2 3 0.1 0.2 0.3 1.4
15 0.7 S2 4 0.4 0.3 0.6 1.8
20 0.7 S2 5 0.1 0.6 1.2 1.9
16 0.6 S2 6 0.3 0.3 1.5 2.2
17 0.8 S2 7 0.5 0.3 1.8 2.7
19 0.3 S2 8 0.8 -0.5 1.3 3.5
where,
df[ns_w] = df['no_show'] - df['walkin']
c_ns_w = cumulaitve of ns_w
df['c_ns_w'] = df.groupby(['Session'])['ns_w'].cumsum()
c_walkin = cumulative of walkin
df['c_walkin'] = df.groupby(['Session'])['walkin'].cumsum()
From the above I would like to calculate two columns called u_ns_w and u_c_walkin.
And when ever u_c_walkin > 0.9 create a new row with no_show = 0, walkin=0 and all other values will be same as the above row. where B_ID = walkin1, 2, etc, and subtract 1 from the above u_c_walkin.
At the same time when ever u_c_ns_w > 0.8 add a new row with B_ID = overbook1, 2 etc, with no_show = 0.5, walkin=0, ns_w = 0.5 and all other values same as above row and subtract 0.5 from the above u_c_ns_w.
Expected output:
B_ID no_show Session slot_num walkin ns_w c_ns_w c_walkin u_c_walkin u_c_ns_w
1 0.4 S1 1 0.2 0.2 0.2 0.2 0.2 0.2
2 0.3 S1 2 0.5 -0.2 0.2 0.7 0.7 0.2
3 0.8 S1 3 0.5 0.3 0.5 1.2 1.2 0.5
walkin1 0.0 S1 3 0.0 0.3 0.5 1.2 0.2 0.5
4 0.3 S1 4 0.8 -0.5 0.0 2.0 1.0 0.0
walkin2 0.0 S1 4 0.0 -0.5 0.0 2.0 0.0 0.0
5 0.6 S1 5 0.4 0.2 0.2 2.4 0.4 0.2
6 0.8 S1 6 0.2 0.6 0.8 2.6 0.6 0.8
7 0.9 S1 7 0.1 0.8 1.4 2.7 0.7 1.4
overbook1 0.5 S1 7 0.0 0.5 1.4 2.7 0.7 0.9
8 0.4 S1 8 0.5 -0.1 1.3 3.2 1.2 0.8
walkin3 0.0 S1 8 0.0 -0.1 1.3 3.2 0.2 0.8
9 0.6 S1 9 0.1 0.5 1.8 3.3 0.1 1.3
overbook2 0.5 S1 9 0.0 0.5 1.8 3.3 0.1 0.8
12 0.9 S2 1 0.9 0.0 0.0 0.9 0.9 0.0
13 0.5 S2 2 0.4 0.1 0.1 1.3 1.3 0.1
walkin1 0.0 S2 2 0.0 0.1 0.1 1.3 0.3 0.1
14 0.3 S2 3 0.1 0.2 0.3 1.4 0.4 0.3
15 0.7 S2 4 0.4 0.3 0.6 1.8 0.8 0.6
20 0.7 S2 5 0.1 0.6 1.2 1.9 0.9 1.2
overbook1 0.5 S2 5 0.0 0.5 1.2 1.9 0.9 0.7
16 0.6 S2 6 0.3 0.3 1.5 2.2 1.2 1.0
walkin2 0.0 S2 6 0.3 0.3 1.5 2.2 0.2 1.0
overbook2 0.5 S2 6 0.0 0.5 1.5 2.2 0.2 0.5
17 0.8 S2 7 0.5 0.3 1.8 2.7 0.7 0.8
19 0.3 S2 8 0.8 -0.5 1.3 3.5 1.5 0.3
walkin3 0.0 S2 8 0.8 -0.5 1.3 3.5 0.5 0.3
I tried below code to create the walkin rows but not able to create for overbook rows.
def create_u_columns (ser):
l_index = []
arr_ns = ser.to_numpy()
# array for latter insert
arr_idx = np.zeros(len(ser), dtype=int)
walkin_id = 1
for i in range(len(arr_ns)-1):
if arr_ns[i]>0.8:
# remove 1 to u_no_show
arr_ns[i+1:] -= 1
# increment later idx to add
arr_idx[i] = walkin_id
walkin_id +=1
#return a dataframe with both columns
return pd.DataFrame({'u_cumulative': arr_ns, 'mask_idx':arr_idx}, index=ser.index)
df[['u_c_walkin', 'mask_idx']]= df.groupby(['Session'])['c_walkin'].apply(create_u_columns)
# select the rows
df_toAdd = df.loc[df['mask_idx'].astype(bool), :].copy()
# replace the values as wanted
df_toAdd['no_show'] = 0
df_toAdd['walkin'] = 0
df_toAdd['EpisodeNumber'] = 'walkin'+df_toAdd['mask_idx'].astype(str)
df_toAdd['u_c_walkin'] -= 1
# add 0.5 to index for later sort
df_toAdd.index += 0.5
new_df = pd.concat([df,df_toAdd]).sort_index()\
.reset_index(drop=True).drop('mask_idx', axis=1)

Here you can modify the function this way to do both check at the same time. Please check that it is exactly the conditions you want to apply for the walkin and overbook dataframes.
def create_columns(dfg):
arr_walkin = dfg['c_walkin'].to_numpy()
arr_ns = dfg['c_ns_w'].to_numpy()
# array for latter insert
arr_idx_walkin = np.zeros(len(arr_walkin), dtype=int)
arr_idx_ns = np.zeros(len(arr_ns), dtype=int)
walkin_id = 1
oberbook_id = 1
for i in range(len(arr_ns)):
# condition on c_walkin
if arr_walkin[i]>0.9:
# remove 1 to u_no_show
arr_walkin[i+1:] -= 1
# increment later idx to add
arr_idx_walkin[i] = walkin_id
walkin_id +=1
# condition on c_ns_w
if arr_ns[i]>0.8:
# remove 1 to u_no_show
arr_ns[i+1:] -= 0.5
# increment later idx to add
arr_idx_ns[i] = oberbook_id
oberbook_id +=1
#return a dataframe with both columns
return pd.DataFrame({'u_c_walkin': arr_walkin,
'u_c_ns_w': arr_ns,
'mask_idx_walkin':arr_idx_walkin,
'mask_idx_ns': arr_idx_ns }, index=dfg.index)
df[['u_c_walkin', 'u_c_ns_w', 'mask_idx_walkin', 'mask_idx_ns']]=\
df.groupby(['Session'])[['c_walkin', 'c_ns_w']].apply(create_columns)
# select the rows for walkin
df_walkin = df.loc[df['mask_idx_walkin'].astype(bool), :].copy()
# replace the values as wanted
df_walkin['no_show'] = 0
df_walkin['walkin'] = 0
df_walkin['B_ID'] = 'walkin'+df_walkin['mask_idx_walkin'].astype(str)
df_walkin['u_c_walkin'] -= 1
# add 0.5 to index for later sort
df_walkin.index += 0.2
# select the rows for ns_w
df_ns = df.loc[df['mask_idx_ns'].astype(bool), :].copy()
# replace the values as wanted
df_ns['no_show'] = 0.5
df_ns['walkin'] = 0
df_ns['ns_w'] = 0.5
df_ns['B_ID'] = 'overbook'+df_ns['mask_idx_ns'].astype(str)
df_ns['u_c_ns_w'] -= 0.5
# add 0.5 to index for later sort
df_ns.index += 0.4
new_df = pd.concat([df,df_walkin, df_ns]).sort_index()\
.reset_index(drop=True).drop(['mask_idx_walkin','mask_idx_ns'], axis=1)
and you get:
print (new_df)
B_ID no_show Session slot_num walkin ns_w c_ns_w c_walkin \
0 1 0.4 S1 1 0.2 0.2 0.2 0.2
1 2 0.3 S1 2 0.5 -0.2 0.2 0.7
2 3 0.8 S1 3 0.5 0.3 0.5 1.2
3 walkin1 0.0 S1 3 0.0 0.3 0.5 1.2
4 4 0.3 S1 4 0.8 -0.5 0.0 2.0
5 walkin2 0.0 S1 4 0.0 -0.5 0.0 2.0
6 5 0.6 S1 5 0.4 0.2 0.2 2.4
7 6 0.8 S1 6 0.2 0.6 0.8 2.6
8 7 0.9 S1 7 0.1 0.8 1.4 2.7
9 overbook1 0.5 S1 7 0.0 0.5 1.4 2.7
10 8 0.4 S1 8 0.5 -0.1 1.3 3.2
11 walkin3 0.0 S1 8 0.0 -0.1 1.3 3.2
12 9 0.6 S1 9 0.1 0.5 1.8 3.3
13 overbook2 0.5 S1 9 0.0 0.5 1.8 3.3
14 12 0.9 S2 1 0.9 0.0 0.0 0.9
15 13 0.5 S2 2 0.4 0.1 0.1 1.3
16 walkin1 0.0 S2 2 0.0 0.1 0.1 1.3
17 14 0.3 S2 3 0.1 0.2 0.3 1.4
18 15 0.7 S2 4 0.4 0.3 0.6 1.8
19 20 0.7 S2 5 0.1 0.6 1.2 1.9
20 overbook1 0.5 S2 5 0.0 0.5 1.2 1.9
21 16 0.6 S2 6 0.3 0.3 1.5 2.2
22 walkin2 0.0 S2 6 0.0 0.3 1.5 2.2
23 overbook2 0.5 S2 6 0.0 0.5 1.5 2.2
24 17 0.8 S2 7 0.5 0.3 1.8 2.7
25 19 0.3 S2 8 0.8 -0.5 1.3 3.5
26 walkin3 0.0 S2 8 0.0 -0.5 1.3 3.5
u_c_walkin u_c_ns_w
0 0.2 0.2
1 0.7 0.2
2 1.2 0.5
3 0.2 0.5
4 1.0 0.0
5 0.0 0.0
6 0.4 0.2
7 0.6 0.8
8 0.7 1.4
9 0.7 0.9
10 1.2 0.8
11 0.2 0.8
12 0.3 1.3
13 0.3 0.8
14 0.9 0.0
15 1.3 0.1
16 0.3 0.1
17 0.4 0.3
18 0.8 0.6
19 0.9 1.2
20 0.9 0.7
21 1.2 1.0
22 0.2 1.0
23 1.2 0.5
24 0.7 0.8
25 1.5 0.3
26 0.5 0.3

How to convert a series of tuples into a pandas dataframe?

Assume that we have the following pandas series resulted from an apply function applied on a dataframe after groupby.
<class 'pandas.core.series.Series'>
0 (1, 0, [0.2, 0.2, 0.2], [0.2, 0.2, 0.2])
1 (2, 1000, [0.6, 0.7, 0.5], [0.1, 0.3, 0.1])
2 (1, 0, [0.4, 0.4, 0.4], [0.4, 0.4, 0.4])
3 (1, 0, [0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
4 (3, 14000, [0.8, 0.8, 0.8], [0.6, 0.6, 0.6])
dtype: object
Can we convert this into a dataframe when the sigList=['sig1','sig2', 'sig3'] are given?
Length Distance sig1Max sig2Max sig3Max sig1Min sig2Min sig3Min
1 0 0.2 0.2 0.2 0.2 0.2 0.2
2 1000 0.6 0.7 0.5 0.1 0.3 0.1
1 0 0.4 0.4 0.4 0.4 0.4 0.4
1 0 0.5 0.5 0.5 0.5 0.5 0.5
3 14000 0.8 0.8 0.8 0.6 0.6 0.6
Thanks in advance

Do it the old fashioned (and fast) way, using a list comprehension:
columns = ("Length Distance sig1Max sig2Max"
"sig3Max sig1Min sig2Min sig3Min").split()
df = pd.DataFrame([[a, b, *c, *d] for a,b,c,d in series.values], columns=columns)
print(df)
Length Distance sig1Max sig2Max sig3Max sig1Min sig2Min sig3Min
0 1 0 0.2 0.2 0.2 0.2 0.2 0.2
1 2 1000 0.6 0.7 0.5 0.1 0.3 0.1
2 1 0 0.4 0.4 0.4 0.4 0.4 0.4
3 1 0 0.5 0.5 0.5 0.5 0.5 0.5
4 3 14000 0.8 0.8 0.8 0.6 0.6 0.6
Or, perhaps you meant, do it a little more dynamically
sigList = ['sig1', 'sig2', 'sig3']
columns = ['Length', 'Distance']
columns.extend(f'{s}{lbl}' for lbl in ('Max', 'Min') for s in sigList )
df = pd.DataFrame([[a,b,*c,*d] for a,b,c,d in series.values], columns=columns)
print(df)
Length Distance sig1Max sig2Max sig3Max sig1Min sig2Min sig3Min
0 1 0 0.2 0.2 0.2 0.2 0.2 0.2
1 2 1000 0.6 0.7 0.5 0.1 0.3 0.1
2 1 0 0.4 0.4 0.4 0.4 0.4 0.4
3 1 0 0.5 0.5 0.5 0.5 0.5 0.5
4 3 14000 0.8 0.8 0.8 0.6 0.6 0.6

You may check
newdf=pd.DataFrame(s.tolist())
newdf=pd.concat([newdf[[0,1]],pd.DataFrame(newdf[2].tolist()),pd.DataFrame(newdf[3].tolist())],1)
newdf.columns = [
"Length", "Distance", "sig1Max", "sig2Max", "sig3Max", "sig1Min", "sig2Min", "sig3Min"
]
newdf
Out[163]:
Length Distance sig1Max ... sig1Min sig2Min sig3Min
0 1 0 0.2 ... 0.2 0.2 0.2
1 2 1000 0.6 ... 0.1 0.3 0.1
2 1 0 0.4 ... 0.4 0.4 0.4
3 1 0 0.5 ... 0.5 0.5 0.5
4 3 14000 0.8 ... 0.6 0.6 0.6
[5 rows x 8 columns]

You can flatten each element and then convert each to a Series itself. Converting each element to a Series turns the main Series (s in the example below) into a DataFrame. Then just set the column names as you wish.
For example:
import pandas as pd
# load in your data
s = pd.Series([
(1, 0, [0.2, 0.2, 0.2], [0.2, 0.2, 0.2]),
(2, 1000, [0.6, 0.7, 0.5], [0.1, 0.3, 0.1]),
(1, 0, [0.4, 0.4, 0.4], [0.4, 0.4, 0.4]),
(1, 0, [0.5, 0.5, 0.5], [0.5, 0.5, 0.5]),
(3, 14000, [0.8, 0.8, 0.8], [0.6, 0.6, 0.6]),
])
def flatten(x):
# note this is not very robust, but works for this case
return [x[0], x[1], *x[2], *x[3]]
df = s.apply(flatten).apply(pd.Series)
df.columns = [
"Length", "Distance", "sig1Max", "sig2Max", "sig3Max", "sig1Min", "sig2Min", "sig3Min"
]
Then you have df as:
Length Distance sig1Max sig2Max sig3Max sig1Min sig2Min sig3Min
0 1.0 0.0 0.2 0.2 0.2 0.2 0.2 0.2
1 2.0 1000.0 0.6 0.7 0.5 0.1 0.3 0.1
2 1.0 0.0 0.4 0.4 0.4 0.4 0.4 0.4
3 1.0 0.0 0.5 0.5 0.5 0.5 0.5 0.5
4 3.0 14000.0 0.8 0.8 0.8 0.6 0.6 0.6

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to select nested columns in a multi-indexed pandas dataframe - python

use IndexSlice: In [53]: df.loc[:, pd.IndexSlice[:, 'R']] Out[53]: ECFP FCFP RDK5 R R R 0 0.1 0.4 0.4 1 0.3 0.3 0.7 2 0.5 0.4 0.5 4 0.6 0.1 0.6 5 0.4 0.2 0.6

By using slice df.loc[:,(slice(None),'R')] Out[375]: ECFP FCFP RDK5 R R R 0 0.1 0.4 0.4 1 0.3 0.3 0.7 2 0.5 0.4 0.5 4 0.6 0.1 0.6 5 0.4 0.2 0.6

Related

Python: Appending a row into all rows in a dataframe

Pandas row-wise addition with another column

Pandas: Return first column number that matches a condition

create new rows based on values of one of the columns in the above row with specific condition - pandas or numpy

How to convert a series of tuples into a pandas dataframe?

Categories

Resources