I have the following data frame
df = pd.DataFrame( {'Code Similarity & Clone Detection': {0: 0.0, 1: 0.0, 2: 0.0, 3: 1.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 1.0}, 'Code Navigation & Understanding': {0: 0.0, 1: 0.0, 2: 1.0, 3: 0.0, 4: 0.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 0.0}, 'Security': {0: 1.0, 1: 1.0, 2: 0.0, 3: 0.0, 4: 1.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0}, 'ANN': {0: 1.0, 1: 1.0, 2: 1.0, 3: 0.0, 4: 0.0, 5: 1.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0}, 'CNN': {0: 1.0, 1: 1.0, 2: 0.0, 3: 0.0, 4: 1.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0}, 'RNN': {0: 0.0, 1: 1.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0}, 'LSTM': {0: 0.0, 1: 1.0, 2: 0.0, 3: 0.0, 4: 1.0, 5: 0.0, 6: 1.0, 7: 0.0, 8: 1.0, 9: 1.0}} )
I want to convert this data frame into a new one with three columns, the first column called "SE" which includes the head of the first 4 columns in df, The second column called 'DL' which includes the rest of the columns in df. the third column called 'count' which counts the occurrences for each SE and DL values that come together. The following figure is must be the new shape
Use:
#create MultiIndex by all combinations
mux = pd.MultiIndex.from_product([df.columns[:4], df.columns[4:]])
#repeat by first and second level with transpose
df1 = df.reindex(mux, axis=1, level=0).T
df2 = df.reindex(mux, axis=1, level=1).T
#sum together per columns, per MultiIndex
df=(df1.add(df2)
.sum(axis=1)
.sum(level=[0,1])
.astype(int)
.rename_axis(['SE','DL'])
.reset_index(name='count'))
print (df.head(10))
SE DL count
0 Code Similarity & Clone Detection ANN 5
1 Code Similarity & Clone Detection CNN 5
2 Code Similarity & Clone Detection RNN 3
3 Code Similarity & Clone Detection LSTM 7
4 Code Similarity & Clone Detection attention mechanism 9
5 Code Similarity & Clone Detection Autoencoder 7
6 Code Similarity & Clone Detection GNN 6
7 Code Similarity & Clone Detection Other_DL 4
8 Code Navigation & Understanding ANN 8
9 Code Navigation & Understanding CNN 8
EDIT: If need count 1 matching between use:
#in real data change 3 to 4 for select first 4 columns
mux = pd.MultiIndex.from_product([df.columns[:3], df.columns[3:]])
#repeat by first and second level with transpose
s1 = df.reindex(mux, axis=1, level=0).T.stack()
s2 = df.reindex(mux, axis=1, level=1).T.stack()
df = (s1[s1 == 1].eq(s2[s2 == 1]).sum(level=[0,1])
.rename_axis(['SE','DL'])
.sort_index(level=1)
.reset_index(name='count'))
print (df)
SE DL count
0 Code Navigation & Understanding ANN 2
1 Code Similarity & Clone Detection ANN 0
2 Security ANN 2
3 Code Navigation & Understanding CNN 0
4 Code Similarity & Clone Detection CNN 0
5 Security CNN 3
6 Code Navigation & Understanding LSTM 2
7 Code Similarity & Clone Detection LSTM 1
8 Security LSTM 2
9 Code Navigation & Understanding RNN 0
10 Code Similarity & Clone Detection RNN 0
11 Security RNN 1
Related
Say I have the following DataFrame() where I have repeated observations per individual (column id_ind). Hence, first two rows belong the first individual, the third and fourth rows belong to the second individual, and so forth...
import pandas as pd
X = pd.DataFrame.from_dict({'x1_1': {0: -0.1766214634108258, 1: 1.645852185286492, 2: -0.13348860101031038, 3: 1.9681043689968933, 4: -1.7004428240831382, 5: 1.4580091413853749, 6: 0.06504113741068565, 7: -1.2168493676768384, 8: -0.3071304478616376, 9: 0.07121332925591593}, 'x1_2': {0: -2.4207773498298844, 1: -1.0828751040719462, 2: 2.73533787008624, 3: 1.5979611987152071, 4: 0.08835542172064115, 5: 1.2209786277076156, 6: -0.44205979195950784, 7: -0.692872860268244, 8: 0.0375521181289943, 9: 0.4656030062266639}, 'x1_3': {0: -1.548320898226322, 1: 0.8457342014424675, 2: -0.21250514722879738, 3: 0.5292389938329516, 4: -2.593946520223666, 5: -0.6188958526077123, 6: 1.6949245117526974, 7: -1.0271341091035742, 8: 0.637561891142571, 9: -0.7717170035055559}, 'x2_1': {0: 0.3797245517345564, 1: -2.2364391598508835, 2: 0.6205947900678905, 3: 0.6623865847688559, 4: 1.562036259999875, 5: -0.13081282910947759, 6: 0.03914373833251773, 7: -0.995761652421108, 8: 1.0649494418154162, 9: 1.3744782478849122}, 'x2_2': {0: -0.5052556836786106, 1: 1.1464291788297152, 2: -0.5662380273138174, 3: 0.6875729143723538, 4: 0.04653136473130827, 5: -0.012885303852347407, 6: 1.5893672346098884, 7: 0.5464286050059511, 8: -0.10430829457707284, 9: -0.5441755265313813}, 'x2_3': {0: -0.9762973303149007, 1: -0.983731467806563, 2: 1.465827578266328, 3: 0.5325950414202745, 4: -1.4452121324204903, 5: 0.8148816373643869, 6: 0.470791989780882, 7: -0.17951636294180473, 8: 0.7351814781280054, 9: -0.28776723200679066}, 'x3_1': {0: 0.12751822396637064, 1: -0.21926633684030983, 2: 0.15758799357206943, 3: 0.5885412224632464, 4: 0.11916562911189271, 5: -1.6436210334529249, 6: -0.12444368631987467, 7: 1.4618564171802453, 8: 0.6847234328916137, 9: -0.23177118858569187}, 'x3_2': {0: -0.6452955690715819, 1: 1.052094761527654, 2: 0.20190339195326157, 3: 0.6839430295237913, 4: -0.2607691613858866, 5: 0.3315513026670213, 6: 0.015901139336566113, 7: 0.15243420084881903, 8: -0.7604225072161022, 9: -0.4387652927008854}, 'x3_3': {0: -1.067058994377549, 1: 0.8026914180717286, 2: -1.9868531745912268, 3: -0.5057770735303253, 4: -1.6589569342151713, 5: 0.358172252880764, 6: 1.9238983803281329, 7: 2.2518318810978246, 8: -1.2781475121874357, 9: -0.7103081175166167}})
Y = pd.DataFrame.from_dict({'CHOICE': {0: 1.0, 1: 1.0, 2: 2.0, 3: 2.0, 4: 3.0, 5: 2.0, 6: 1.0, 7: 1.0, 8: 2.0, 9: 2.0}})
Z = pd.DataFrame.from_dict({'z1': {0: 2.4196730570917233, 1: 2.4196730570917233, 2: 2.822802255159467, 3: 2.822802255159467, 4: 2.073171091633643, 5: 2.073171091633643, 6: 2.044165101485163, 7: 2.044165101485163, 8: 2.4001241292606275, 9: 2.4001241292606275}, 'z2': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 0.0, 9: 0.0}, 'z3': {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 2.0, 5: 2.0, 6: 2.0, 7: 2.0, 8: 3.0, 9: 3.0}})
id = pd.DataFrame.from_dict({'id_choice': {0: 1.0, 1: 2.0, 2: 3.0, 3: 4.0, 4: 5.0, 5: 6.0, 6: 7.0, 7: 8.0, 8: 9.0, 9: 10.0}, 'id_ind': {0: 1.0, 1: 1.0, 2: 2.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 4.0, 7: 4.0, 8: 5.0, 9: 5.0}} )
# Create a dataframe with all the data
data = pd.concat([id, X, Z, Y], axis=1)
print(data.head(4))
# id_choice id_ind x1_1 x1_2 x1_3 x2_1 x2_2 \
# 0 1.0 1.0 -0.176621 -2.420777 -1.548321 0.379725 -0.505256
# 1 2.0 1.0 1.645852 -1.082875 0.845734 -2.236439 1.146429
# 2 3.0 2.0 -0.133489 2.735338 -0.212505 0.620595 -0.566238
# 3 4.0 2.0 1.968104 1.597961 0.529239 0.662387 0.687573
#
# x2_3 x3_1 x3_2 x3_3 z1 z2 z3 CHOICE
# 0 -0.976297 0.127518 -0.645296 -1.067059 2.419673 0.0 1.0 1.0
# 1 -0.983731 -0.219266 1.052095 0.802691 2.419673 0.0 1.0 1.0
# 2 1.465828 0.157588 0.201903 -1.986853 2.822802 0.0 1.0 2.0
# 3 0.532595 0.588541 0.683943 -0.505777 2.822802 0.0 1.0 2.0
I want to perform two operations.
First, I want to convert the DataFrame data into a dictionary of DataFrame()s where the keys are the number of individuals (in this particular case, numbers ranging from 1.0 to 5.0.). I've done this below as suggested here. Unfortunately, I am getting a dictionary of numpy values and not a dictionary of DataFrame()s.
# Create a dictionary with the data for each individual
data_dict = data.set_index('id_ind').groupby('id_ind').apply(lambda x : x.to_numpy().tolist()).to_dict()
print(data_dict.keys())
# dict_keys([1.0, 2.0, 3.0, 4.0, 5.0])
print(data_dict[1.0])
#[[1.0, -0.1766214634108258, -2.4207773498298844, -1.548320898226322, 0.3797245517345564, -0.5052556836786106, -0.9762973303149007, 0.12751822396637064, -0.6452955690715819, -1.067058994377549, 2.4196730570917233, 0.0, 1.0, 1.0], [2.0, 1.645852185286492, -1.0828751040719462, 0.8457342014424675, -2.2364391598508835, 1.1464291788297152, -0.983731467806563, -0.21926633684030983, 1.052094761527654, 0.8026914180717286, 2.4196730570917233, 0.0, 1.0, 1.0]]
Second, I want to recover the original DataFrame data reversing the previous operation. The naive approach is as follows. However, it is, of course, not producing the expected result.
# Naive approach
res = pd.DataFrame.from_dict(data_dict, orient='index')
print(res)
# 0 1
#1.0 [1.0, -0.1766214634108258, -2.4207773498298844... [2.0, 1.645852185286492, -1.0828751040719462, ...
#2.0 [3.0, -0.13348860101031038, 2.73533787008624, ... [4.0, 1.9681043689968933, 1.5979611987152071, ...
#3.0 [5.0, -1.7004428240831382, 0.08835542172064115... [6.0, 1.4580091413853749, 1.2209786277076156, ...
#4.0 [7.0, 0.06504113741068565, -0.4420597919595078... [8.0, -1.2168493676768384, -0.692872860268244,...
#5.0 [9.0, -0.3071304478616376, 0.0375521181289943,... [10.0, 0.07121332925591593, 0.4656030062266639...
This solution was inspired by #mozway comments.
# Create a dictionary with the data for each individual
data_dict = dict(list(data.groupby('id_ind')))
# Convert the dictionary into a dataframe
res = pd.concat(data_dict, axis=0).reset_index(drop=True)
print(res.head(4))
# id_choice id_ind x1_1 x1_2 x1_3 x2_1 x2_2 \
#0 1.0 1.0 -0.176621 -2.420777 -1.548321 0.379725 -0.505256
#1 2.0 1.0 1.645852 -1.082875 0.845734 -2.236439 1.146429
#2 3.0 2.0 -0.133489 2.735338 -0.212505 0.620595 -0.566238
#3 4.0 2.0 1.968104 1.597961 0.529239 0.662387 0.687573
#
# x2_3 x3_1 x3_2 x3_3 z1 z2 z3 CHOICE
#0 -0.976297 0.127518 -0.645296 -1.067059 2.419673 0.0 1.0 1.0
#1 -0.983731 -0.219266 1.052095 0.802691 2.419673 0.0 1.0 1.0
#2 1.465828 0.157588 0.201903 -1.986853 2.822802 0.0 1.0 2.0
#3 0.532595 0.588541 0.683943 -0.505777 2.822802 0.0 1.0 2.0
I have this dataset, I'm trying to have a mean of "AC_POWER" every hour but isn't working properly. The dataset have 20-22 value every 15 minutes. I want to have something like this:
DATE AC_POWER
'15-05-2020 00:00' 400
'15-05-2020 01:00' 500
'15-05-2020 02:00' 500
'15-05-2020 03:00' 500
How to solve this?
import pandas as pd
df = pd.read_csv('dataset.csv')
df = df.reset_index()
df['DATE_TIME'] = df['DATE_TIME'].astype('datetime64[ns]')
df = df.resample('H', on='DATE_TIME').mean()
>>> df.head(10).to_dict()
{'AC_POWER': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0},
'DAILY_YIELD': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0},
'DATE_TIME': {0: '15-05-2020 00:00', 1: '15-05-2020 00:00', 2: '15-05-2020 00:00', 3: '15-05-2020 00:00',
4: '15-05-2020 00:00', 5: '15-05-2020 00:00', 6: '15-05-2020 00:00', 7: '15-05-2020 00:00',
8: '15-05-2020 00:00', 9: '15-05-2020 00:00'},
'DC_POWER': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0},
'PLANT_ID': {0: 4135001, 1: 4135001, 2: 4135001, 3: 4135001, 4: 4135001, 5: 4135001,
6: 4135001, 7: 4135001, 8: 4135001, 9: 4135001},
'SOURCE_KEY': {0: '1BY6WEcLGh8j5v7', 1: '1IF53ai7Xc0U56Y', 2: '3PZuoBAID5Wc2HD', 3: '7JYdWkrLSPkdwr4',
4: 'McdE0feGgRqW7Ca', 5: 'VHMLBKoKgIrUVDU', 6: 'WRmjgnKYAwPKWDb', 7: 'ZnxXDlPa8U1GXgE',
8: 'ZoEaEvLYb1n2sOq', 9: 'adLQvlD726eNBSB'},
'TOTAL_YIELD': {0: 6259559.0, 1: 6183645.0, 2: 6987759.0, 3: 7602960.0, 4: 7158964.0,
5: 7206408.0, 6: 7028673.0, 7: 6522172.0, 8: 7098099.0, 9: 6271355.0}}
EDIT: I tried with a different dataset and the same code I've posted and it worked!
You need to set your date as an index first, the following does this and computes the mean for windows of 15 minutes:
df.set_index('DATE_TIME').resample('15T').mean()
Also, make sure your date vector is correctly formated.
I think you're looking for DataFrame.resample:
df.resample(rule='H', on='DATE_TIME')['AC_POWER'].mean()
This question already has answers here:
How do I create variable variables?
(17 answers)
Closed 1 year ago.
I have a list with nested dictionaries as list elements. There are 12 such dictionaries with 107 key-value pairs each, as the snapshot of the first dictionary below:
big_list = [{0: 0.9065934065934067,
1: 0.0,
2: 0.14285714285714288,
3: 0.03663003663003663,
4: 0.0,
5: 0.0,
6: 0.053113553113553126,
7: 0.03663003663003663,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.01098901098901099,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0,
20: 0.0,
21: 0.0,
22: 0.0,
23: 0.0,
24: 0.0,
25: 0.0,
26: 0.0,
27: 0.0,
28: 0.0,
29: 0.0,
30: 0.0,
31: 0.0,
32: 0.0,
33: 0.0,
34: 0.0,
35: 0.0,
36: 0.0,
37: 0.0,
38: 0.0,
39: 0.0,
40: 0.0,
41: 0.0,
42: 0.0,
43: 0.0,
44: 0.0,
45: 0.0,
46: 0.0,
47: 0.0,
48: 0.0,
49: 0.0,
50: 0.0,
51: 0.0,
52: 0.0,
53: 0.0,
54: 0.0,
55: 0.0,
56: 0.0,
57: 0.0,
58: 0.0,
59: 0.0,
60: 0.0,
61: 0.0,
62: 0.0,
63: 0.0,
64: 0.0,
65: 0.0,
66: 0.0,
67: 0.0,
68: 0.0,
69: 0.0,
70: 0.0,
71: 0.0,
72: 0.0,
73: 0.0,
74: 0.0,
75: 0.0,
76: 0.0,
77: 0.0,
78: 0.0,
79: 0.0,
80: 0.0,
81: 0.0,
82: 0.0,
83: 0.0,
84: 0.0,
85: 0.0,
86: 0.0,
87: 0.0,
88: 0.0,
89: 0.0,
90: 0.0,
91: 0.0,
92: 0.0,
93: 0.0,
94: 0.0,
95: 0.0,
96: 0.0,
97: 0.0,
98: 0.0,
99: 0.0,
100: 0.0,
101: 0.0,
102: 0.0,
103: 0.0,
104: 0.0,
105: 0.0,
106: 0.0},
I want to construct a loop through which I can extract the values in this way:
first_dict[key1][value]
second_dict[key1][value]
third_dict[key1][value]
...
twelfth_dict[key1][value]
...
first_dict[key107][value]
...
twelfth_dict[key107][value]
and so on for every key in every dictionary, and then find the average values of every key across dictionaries, ie average value of key 1, key 2 through key 106. I know the zeros may complicate things a bit but they're needed for the task. Please let me know if this is possible and happy to elaborate further if needed. Thanks.
In case the dictionary keys are ascending, you could iterate over the keys first:
big_list = [{
0: 0.9065934065934067,
1: 0.0,
2: 0.14285714285714288,
3: 0.03663003663003663,
4: 0.0,
5: 0.0
}, {
0: 0.9065934065934067,
1: 0.0,
2: 0.14285714285714288,
3: 0.03663003663003663,
4: 0.0,
5: 0.0,
6: 1.0
}, {
0: 0.9065934065934067,
1: 0.0,
2: 0.14285714285714288,
3: 0.03663003663003663,
4: 0.0,
5: 0.0,
6: 1.0,
7: 2.2
}]
longestDict = max(len(d.keys()) for d in big_list) # determine longest dictionary, in case they are not equal sized.
for key in range(0, longestDict):
print(f"Key: {key}")
for dct in big_list:
print(f"\t{dct.get(key, None)}")
Out:
Key: 0
0.9065934065934067
0.9065934065934067
0.9065934065934067
Key: 1
0.0
0.0
0.0
Key: 2
0.14285714285714288
0.14285714285714288
0.14285714285714288
...
I would like to sum certain rows based on a condition in a different row.
So I have a columns for points
{'secondBoxer1': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'secondBoxer2': {0: 0.0, 1: 0.0, 2: 10.0, 3: 0.0, 4: 0.0},
'secondBoxer3': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'secondBoxer4': {0: 15.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'secondBoxer5': {0: 15.0, 1: 53.57142857142857, 2: 0.0, 3: 0.0, 4: 0.0},
'secondBoxer6': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'secondBoxer7': {0: 0.0, 1: 0.0, 2: 0.0, 3: 50.0, 4: 0.0},
'secondBoxer8': {0: 0.0, 1: 0.0, 2: 0.0, 3: 37.142857142857146, 4: 0.0}}
and a column with the outcome of each fight
{'outcome1': {0: 'win ', 1: 'win ', 2: 'win ', 3: 'draw ', 4: 'win '},
'outcome2': {0: 'win ', 1: 'win ', 2: 'win ', 3: 'win ', 4: 'win '},
'outcome3': {0: 'win ', 1: 'win ', 2: 'win ', 3: 'win ', 4: 'scheduled '},
'outcome4': {0: 'win ', 1: 'win ', 2: 'nan', 3: 'loss ', 4: 'nan'},
'outcome5': {0: 'win ', 1: 'draw ', 2: 'nan', 3: 'win ', 4: 'nan'},
'outcome6': {0: 'nan', 1: 'nan', 2: 'nan', 3: 'loss ', 4: 'nan'},
'outcome7': {0: 'nan', 1: 'nan', 2: 'nan', 3: 'loss ', 4: 'nan'},
'outcome8': {0: 'nan', 1: 'nan', 2: 'nan', 3: 'win ', 4: 'nan'}}
I would like to sum the points in the first columns (points columns) in cases where the outcome is equals to a win.
I have written this code, where opp_names is the list of columns with the points and outcome_cols is a list of columns with the outcomes
data[opp_names].sum(axis=1).where(data[outcome_cols] == 'win')
The problem with the output from this code is that it returns a total sum of points that is not conditional
In your case we use mask :d is your first dict , d1 is your 2nd dict
pd.DataFrame(d).mask(pd.DataFrame(d1).ne('win ').to_numpy()).sum(1)
Out[164]:
0 30.000000
1 0.000000
2 10.000000
3 37.142857
4 0.000000
dtype: float64
I have a pandas multiindex dataframe that I'm trying to output as a nested dictionary.
# create the dataset
data = {'clump_thickness': {(0, 0): 274.0, (0, 1): 19.0, (1, 0): 67.0, (1, 1): 12.0, (2, 0): 83.0, (2, 1): 45.0, (3, 0): 16.0, (3, 1): 40.0, (4, 0): 4.0, (4, 1): 54.0, (5, 0): 0.0, (5, 1): 69.0, (6, 0): 0.0, (6, 1): 0.0, (7, 0): 0.0, (7, 1): 0.0, (8, 0): 0.0, (8, 1): 0.0, (9, 0): 0.0, (9, 1): 0.0}}
df = pd.DataFrame(data)
df.head()
# clump_thickness
# 0 0 274.0
# 1 19.0
# 1 0 67.0
# 1 12.0
# 2 0 83.0
df is the dataframe that I want to output as a nested dictionary. The output I'm looking for is in the form -
{"0":
{
"0":274,
"1":19
},
"1":{
"0":67,
"1":12
},
"2":{
"0":83,
"1":45
},
"3":{
"0":16,
"1":40
},
"4":{
"0":4,
"1":54
},
"5":{
"0":0,
"1":69
}
}
Here the first index forms the keys of the outer most dictionary. For each key we have a dictionary stored whose keys are the values in the second index.
When I do df.to_dict(), the instead of nesting, the multiindex is returned as a tuple. How do I achieve this?
For me working:
d = {l: df.xs(l)['clump_thickness'].to_dict() for l in df.index.levels[0]}
Another solution similar like DataFrame with MultiIndex to dict , but is necessary filter column for Series:
d = df.groupby(level=0).apply(lambda df: df.xs(df.name).clump_thickness.to_dict()).to_dict()
print (d)
{0: {0: 274.0, 1: 19.0},
1: {0: 67.0, 1: 12.0},
2: {0: 83.0, 1: 45.0},
3: {0: 16.0, 1: 40.0},
4: {0: 4.0, 1: 54.0},
5: {0: 0.0, 1: 69.0},
6: {0: 0.0, 1: 0.0},
7: {0: 0.0, 1: 0.0},
8: {0: 0.0, 1: 0.0},
9: {0: 0.0, 1: 0.0}}
df.unstack().clump_thickness.apply(lambda x: x.to_dict(), axis=1).to_dict()