I have the following DataFrame:
completeness
homogeneity
label_f1_score
label_precision
label_recall
mean_bbox_iou
mean_iou
px_accuracy
px_f1_score
px_iou
px_precision
px_recall
t_eval
v_score
mean
0.1
1
0.92
0.92
0.92
0.729377
0.784934
0.843802
0.898138
0.774729
0.998674
0.832576
1.10854
0.1
std
0.0707107
0
0.0447214
0.0447214
0.0447214
0.0574177
0.0313196
0.0341158
0.0224574
0.0299977
0.000432499
0.0327758
0.0588322
0.0707107
What I would like to obtain is a Series composed of completeness_mean, completeness_std, homogenety_mean, homogenety_std, ..., i.e. a label {column}_{index} for every cell.
Does Pandas have a function for this or do I have to iterate over all cells myself to build the desired result?
EDIT: I mean a Series with {column}_{index} as index and the corresponding values from the table.
(I believe this is not a duplicate of the other questions on SO related wide to long.)
IIUC, unstack and flatten the index:
df2 = df.unstack()
df2.index = df2.index.map('_'.join)
output:
completeness_mean 0.100000
completeness_std 0.070711
homogeneity_mean 1.000000
homogeneity_std 0.000000
label_f1_score_mean 0.920000
label_f1_score_std 0.044721
label_precision_mean 0.920000
label_precision_std 0.044721
label_recall_mean 0.920000
label_recall_std 0.044721
mean_bbox_iou_mean 0.729377
mean_bbox_iou_std 0.057418
mean_iou_mean 0.784934
mean_iou_std 0.031320
px_accuracy_mean 0.843802
px_accuracy_std 0.034116
px_f1_score_mean 0.898138
px_f1_score_std 0.022457
px_iou_mean 0.774729
px_iou_std 0.029998
px_precision_mean 0.998674
px_precision_std 0.000432
px_recall_mean 0.832576
px_recall_std 0.032776
t_eval_mean 1.108540
t_eval_std 0.058832
v_score_mean 0.100000
v_score_std 0.070711
dtype: float64
or with stack for a different order:
df2 = df.stack()
df2.index = df2.swaplevel().index.map('_'.join)
output:
completeness_mean 0.100000
homogeneity_mean 1.000000
label_f1_score_mean 0.920000
label_precision_mean 0.920000
label_recall_mean 0.920000
mean_bbox_iou_mean 0.729377
mean_iou_mean 0.784934
px_accuracy_mean 0.843802
px_f1_score_mean 0.898138
px_iou_mean 0.774729
px_precision_mean 0.998674
px_recall_mean 0.832576
t_eval_mean 1.108540
v_score_mean 0.100000
completeness_std 0.070711
homogeneity_std 0.000000
label_f1_score_std 0.044721
label_precision_std 0.044721
label_recall_std 0.044721
mean_bbox_iou_std 0.057418
mean_iou_std 0.031320
px_accuracy_std 0.034116
px_f1_score_std 0.022457
px_iou_std 0.029998
px_precision_std 0.000432
px_recall_std 0.032776
t_eval_std 0.058832
v_score_std 0.070711
dtype: float64
Is this what you're looking for?
pd.merge(df.columns.to_frame(), df.index.to_frame(), 'cross').apply('_'.join, axis=1)
# OR
pd.Series(df.unstack().index.map('_'.join))
Output:
0 completeness_mean
1 completeness_std
2 homogeneity_mean
3 homogeneity_std
4 label_f1_score_mean
5 label_f1_score_std
6 label_precision_mean
7 label_precision_std
8 label_recall_mean
9 label_recall_std
10 mean_bbox_iou_mean
11 mean_bbox_iou_std
12 mean_iou_mean
13 mean_iou_std
14 px_accuracy_mean
15 px_accuracy_std
16 px_f1_score_mean
17 px_f1_score_std
18 px_iou_mean
19 px_iou_std
20 px_precision_mean
21 px_precision_std
22 px_recall_mean
23 px_recall_std
24 t_eval_mean
25 t_eval_std
26 v_score_mean
27 v_score_std
dtype: object
Related
I am using make_interp_spline to do curve fitting. I'd like to keep same values beyond min and max values. I thought I could using bc_type='clamped' to do that. But the result I got is not correct.
Here is what I have
the data is
df_krow
Sw Krw Krow
0 0.247000 0.000000 1.000000
1 0.281562 0.000006 0.850997
2 0.316125 0.000098 0.716177
3 0.350688 0.000494 0.595057
4 0.385250 0.001563 0.487139
5 0.419813 0.003815 0.391906
6 0.454375 0.007910 0.308816
7 0.488938 0.014655 0.237305
8 0.523500 0.025000 0.176777
9 0.558063 0.040045 0.126603
10 0.592625 0.061035 0.086115
11 0.627188 0.089362 0.054592
12 0.661750 0.126562 0.031250
13 0.696313 0.174323 0.015223
14 0.730875 0.234473 0.005524
15 0.765437 0.308990 0.000977
16 0.800000 0.400000 0.000000
sw_loc_list=[0.2,0.4,0.6,0.8,0.9,1.0]
from scipy.interpolate import make_interp_spline
krw_loc_list=make_interp_spline(df_krow['Sw'],df_krow['Krw'],k=3,bc_type='clamped')(sw_loc_list)
after above, you can see when sw<0.247 or higher than 0.8, Krw is negative. The result I'd like to get is Krw is 0 when Sw<0.247 and Krw=0.4 when Sw>0.8. How to do it? Thanks
print(krw_loc_list)
[-4.21675500e-05 2.34342851e-03 6.64037797e-02 4.00000000e-01
-2.73531731e+00 -1.91966201e+01]
I have a dataframe as shown below:
8964_real 8964_imag 8965_real 8965_imag 8966_real 8966_imag 8967_real ... 8984_imag 8985_real 8985_imag 8986_real 8986_imag 8987_real 8987_imag
0 112.653120 0.000000 117.104887 0.000000 127.593406 0.000000 129.522106 ... 0.000000 125.423552 0.000000 127.888477 0.000000 136.160979 0.000000
1 -0.315831 16.363974 -2.083329 22.443628 -2.166950 15.026253 0.110502 ... -26.613220 8.454297 -35.000742 11.871405 -24.914035 7.448329 -16.370041
2 -1.863497 10.672129 -6.152232 15.980813 -5.679352 18.976117 -5.775777 ... -11.131600 -18.990022 -9.520732 -11.947319 -4.641286 -17.104710 -5.691642
3 -6.749938 14.870590 -12.222749 15.012352 -10.501423 9.345518 -9.103459 ... -2.860546 -29.862724 -5.237663 -28.791194 -5.685985 -24.565608 -10.385683
4 -2.991405 -10.332938 -4.097638 -10.204587 -12.056221 -5.684882 -12.861357 ... 0.821902 -8.787235 -1.521650 -3.798446 -2.390519 -6.527762 -1.145998
I have to convert above dataframe such that values in columns "_real" should come under one column and values under "_imag" should come under another column
That is totally there should be two columns at the end , one for real and other for imag.What could be the most efficient way to do it?
I refer this link . But this is good for one column,but I need two.
Another idea , I got was use regex to select columns containing "real" and do as said in above link (and similarly for imag) ,but felt it a bit round about.
Any help appreciated.
EDIT:
For example, real should be like
real
112.653120
-0.315831
-1.863497
-6.749938
-2.991405
---------
117.104887
-2.083329
-6.152232
-12.222749
-4.097638
---------
127.593406
-2.166950
-5.679352
-10.501423
-12.056221
I have made a dotted line to make it clear
Create MultiIndex by split, so possible reshape by DataFrame.stack:
df.columns = df.columns.str.split('_', expand=True)
print (df.head(10))
8964 8965 8966 \
real imag real imag real imag
0 112.653120 0.000000 117.104887 0.000000 127.593406 0.000000
1 -0.315831 16.363974 -2.083329 22.443628 -2.166950 15.026253
2 -1.863497 10.672129 -6.152232 15.980813 -5.679352 18.976117
3 -6.749938 14.870590 -12.222749 15.012352 -10.501423 9.345518
4 -2.991405 -10.332938 -4.097638 -10.204587 -12.056221 -5.684882
8967 8984 8985 8986 \
real imag real imag real imag
0 129.522106 0.000000 125.423552 0.000000 127.888477 0.000000
1 0.110502 -26.613220 8.454297 -35.000742 11.871405 -24.914035
2 -5.775777 -11.131600 -18.990022 -9.520732 -11.947319 -4.641286
3 -9.103459 -2.860546 -29.862724 -5.237663 -28.791194 -5.685985
4 -12.861357 0.821902 -8.787235 -1.521650 -3.798446 -2.390519
8987
real imag
0 136.160979 0.000000
1 7.448329 -16.370041
2 -17.104710 -5.691642
3 -24.565608 -10.385683
4 -6.527762 -1.145998
df = df.stack(0).reset_index(level=0, drop=True).rename_axis('a').reset_index()
print (df.head(10))
a imag real
0 8964 0.000000 112.653120
1 8965 0.000000 117.104887
2 8966 0.000000 127.593406
3 8967 NaN 129.522106
4 8984 0.000000 NaN
5 8985 0.000000 125.423552
6 8986 0.000000 127.888477
7 8987 0.000000 136.160979
8 8964 16.363974 -0.315831
9 8965 22.443628 -2.083329
EDIT: For new structure of data is possible reshape values by ravel:
a = df.filter(like='real')
b = df.filter(like='imag')
c = a.columns.str.replace('_real', '').astype(int)
print (c)
Int64Index([8964, 8965, 8966, 8967, 8985, 8986, 8987], dtype='int64')
df = pd.DataFrame({'r':a.T.to_numpy().ravel(), 'i':b.T.to_numpy().ravel()},
index=np.tile(c, len(df)))
print (df.head(10))
r i
8964 112.653120 0.000000
8965 -0.315831 16.363974
8966 -1.863497 10.672129
8967 -6.749938 14.870590
8985 -2.991405 -10.332938
8986 117.104887 0.000000
8987 -2.083329 22.443628
8964 -6.152232 15.980813
8965 -12.222749 15.012352
8966 -4.097638 -10.204587
I have a dataframe:
LF RF LR RR
11 22 33 44
23 43 23 12
33 23 12 43
What I want to accomplish is a calculation (The purpose is to identify which column within each row has the lowest value and determine a percentage compared to the rest of the cols mean).
For example:
Identify the min value in r1, which is 11 and col name (LF). The rest of the cols mean is (22+33+44)/3= 33. Then we calculate the percentage difference 11/33 = 0.333
Expected output:
LF RF LR RR Min_Col dif(%)
11 22 33 44 LF 0.333
23 43 23 12 RR 0.404
33 23 12 43 LR 0.364
a proper way of writing the equation would be:
(min_value)/(sum_rest_of_cols/3)
Note: I need to have a column that indicates for each row which column is the lowest (This is a program to identify problems, so within the error message we want to be able to tell the user which column it is, that is giving the problems)
EDITED:
My code (df_inter is the original df which I am locing to only get the desired columns to perform this calculation):
df_exc = df_inter.loc[:,['LF_Strut_Pressure', 'RF_Strut_Pressure', 'LR_Strut_Pressure' ,'RR_Strut_Pressure']]
df_exc['dif(%)'] = df_exc.min(1) * 3 / (df_exc.sum(1) - df_inter.min(1))
df_exc['Min_Col'] = df_exc.iloc[:, :-1].idxmin(1)
print(df_exc)
My Output:
LF_Strut RF_Strut LR_Strut RR_Strut dif(%) Min_Col
truck_id
EX7057 0.000000 0.000000 0.000000 0.000000 0.0000 LF_Strut
EX7105 0.000000 0.000000 0.000000 0.000000 0.0000 LF_Strut
EX7106 0.000000 0.000000 0.000000 0.000000 0.0000 LF_Strut
EX7107 0.000000 0.000000 0.000000 0.000000 0.0000 LF_Strut
TD6510 36588.000000 36587.000 36587.00000 36587.00 0.8204 RF_Strut
TD6511 36986.000000 36989.000 36987.00000 36989.00 0.8220 LF_Strut
TD6512 27704.000000 27705.000 27702.00000 27705.00 0.7757 LR_Strut
The problem is: When doing the calculation for TD6510 ( 36587 / ( (36587 + 36587 + 36588) / 3 ) ) = 0.9999999 .. not 0.8204 . I tried replicating where 0.8204 came from, I was unsuccesful. Thanx for al l the help and support.
First we use idxmin
df['dif(%)']=df.min(1)*3/(df.sum(1)-df.min(1))
df['Min_Col']=df.iloc[:,:-1].idxmin(1)
df
LF RF LR RR dif(%) Min_Col
0 11 22 33 44 0.333333 LF
1 23 43 23 12 0.404494 RR
2 33 23 12 43 0.363636 LR
I wrote the text in a file called "textfile.txt". This should be useful:
import pandas as pd
df= pd.read_csv('textfile.txt', sep = ' ')
df['min'] = df[['LF','RF','LR','RR']].min(axis=1)
df['sum_3'] = df[['LF','RF','LR','RR']].sum(axis=1)- df['min']
df['sum_3_div3'] = df['sum_3']/3
You can just do usual calculation, the min col is given by idxmin
# find the mins in each row
mins = df.min(axis=1)
# compute mean of the other values
other_means = (df.sum(1) - mins).div(df.shape[1]-1)
(mins /other_means)*100
Output:
0 33.333333
1 40.449438
2 36.363636
dtype: float64
Using idxmin and df.mask() with df.isin() and df.min():
final = df.assign(Min_Col=df.idxmin(1),
Diff=df.min(1).div(df.mask(df.isin(df.min(1))).mean(1)))
print(final)
LF RF LR RR Min_Col Diff
0 11 22 33 44 LF 0.333333
1 23 43 23 12 RR 0.404494
2 33 23 12 43 LR 0.363636
I am running a loop that appends three fields. Predictfinal is a list, though it is not necessary that it should be a list.
predictfinal.append(y_hat_orig[0])
predictfinal.append(mape)
predictfinal.append(length)
At the end, predictfinal returns a long list. But I really want to conform the list into a Dataframe, where each row is 3 columns. However the list does not designate between the 3 columns, it's just a long list with commas in between. Somehow I am trying to slice predictfinal into 3 columns and a Dataframe from currnet unstructured list - any help how?
predictfinal
Out[88]:
[1433.0459967608983,
1.6407741379111223,
23,
1433.6389125340916,
1.6474721044455922,
22,
1433.867408791692,
1.6756763089082383,
21,
1433.8484984008207,
1.6457581105556003,
20,
1433.6340460965778,
1.6380908467895527,
19,
1437.0294365907992,
1.6147672264908473,
18,
1439.7485102740507,
1.5010415925555876,
17,
1440.950406295299,
1.433891246672529,
16,
1434.837060644701,
1.5252803314930383,
15,
1434.9716303636983,
1.6125952442799232,
14,
1441.3153523102953,
3.2633984339696185,
13,
1435.6932462859334,
3.2703435261200497,
12,
1419.9057834496082,
1.9100005818319687,
11,
1426.0739741342488,
1.947684057178654,
10]
Based on https://stackoverflow.com/a/48347320/6926444
We can achieve it by using zip() and iter(). The code below iterates three elements each time.
res = pd.DataFrame(list(zip(*([iter(data)] * 3))), columns=['a', 'b', 'c'])
Result:
a b c
0 1433.045997 1.640774 23
1 1433.638913 1.647472 22
2 1433.867409 1.675676 21
3 1433.848498 1.645758 20
4 1433.634046 1.638091 19
5 1437.029437 1.614767 18
6 1439.748510 1.501042 17
7 1440.950406 1.433891 16
8 1434.837061 1.525280 15
9 1434.971630 1.612595 14
10 1441.315352 3.263398 13
11 1435.693246 3.270344 12
12 1419.905783 1.910001 11
13 1426.073974 1.947684 10
You could do:
pd.DataFrame(np.array(predictfinal).reshape(-1,3), columns=['origin', 'mape', 'length'])
Output:
origin mape length
0 1433.045997 1.640774 23.0
1 1433.638913 1.647472 22.0
2 1433.867409 1.675676 21.0
3 1433.848498 1.645758 20.0
4 1433.634046 1.638091 19.0
5 1437.029437 1.614767 18.0
6 1439.748510 1.501042 17.0
7 1440.950406 1.433891 16.0
8 1434.837061 1.525280 15.0
9 1434.971630 1.612595 14.0
10 1441.315352 3.263398 13.0
11 1435.693246 3.270344 12.0
12 1419.905783 1.910001 11.0
13 1426.073974 1.947684 10.0
Or you can also modify your loop:
predictfinal = []
for i in some_list:
predictfinal.append([y_hat_orig[0], mape, length])
# output dataframe
pd.DataFrame(predictfinal, columns=['origin', 'mape', 'length'])
Here is a pandas solution
s=pd.Series(l)
s.index=pd.MultiIndex.from_product([range(len(l)//3),['origin','map','len']])
s=s.unstack()
Out[268]:
len map origin
0 23.0 1.640774 1433.045997
1 22.0 1.647472 1433.638913
2 21.0 1.675676 1433.867409
3 20.0 1.645758 1433.848498
4 19.0 1.638091 1433.634046
5 18.0 1.614767 1437.029437
6 17.0 1.501042 1439.748510
7 16.0 1.433891 1440.950406
8 15.0 1.525280 1434.837061
9 14.0 1.612595 1434.971630
10 13.0 3.263398 1441.315352
11 12.0 3.270344 1435.693246
12 11.0 1.910001 1419.905783
13 10.0 1.947684 1426.073974
I have similar dataframe pandas:
df = pd.DataFrame({'x': np.random.rand(61800), 'y':np.random.rand(61800), 'z':np.random.rand(61800)})
I need to work out my dataset for the following result:
extract = df.assign(count=np.repeat(range(10),10)).groupby('count',as_index=False).agg(['mean','min', 'max'])
But if i use np.repeat(range(150),150)) i received this error:
This doesn't work because the .assign you're performing needs to have enough values to fit the original dataframe:
In [81]: df = pd.DataFrame({'x': np.random.rand(61800), 'y':np.random.rand(61800), 'z':np.random.rand(61800)})
In [82]: df.assign(count=np.repeat(range(10),10))
ValueError: Length of values does not match length of index
In this case, everything works fine if we do 10 groups repeated 6,180 times:
In [83]: df.assign(count=np.repeat(range(10),6180))
Out[83]:
x y z count
0 0.781364 0.996545 0.756592 0
1 0.609127 0.981688 0.626721 0
2 0.547029 0.167678 0.198857 0
3 0.184405 0.484623 0.219722 0
4 0.451698 0.535085 0.045942 0
... ... ... ... ...
61795 0.783192 0.969306 0.974836 9
61796 0.890720 0.286384 0.744779 9
61797 0.512688 0.945516 0.907192 9
61798 0.526564 0.165620 0.766733 9
61799 0.683092 0.976219 0.524048 9
[61800 rows x 4 columns]
In [84]: extract = df.assign(count=np.repeat(range(10),6180)).groupby('count',as_index=False).agg(['mean','min', 'max'])
In [85]: extract
Out[85]:
x y z
mean min max mean min max mean min max
count
0 0.502338 0.000230 0.999546 0.501603 0.000263 0.999842 0.503807 0.000113 0.999826
1 0.500392 0.000059 0.999979 0.499935 0.000012 0.999767 0.500114 0.000230 0.999811
2 0.498377 0.000023 0.999832 0.496921 0.000003 0.999475 0.502887 0.000028 0.999828
3 0.504970 0.000637 0.999680 0.500943 0.000256 0.999902 0.497370 0.000257 0.999969
4 0.501195 0.000290 0.999992 0.498617 0.000149 0.999779 0.497895 0.000022 0.999877
5 0.499476 0.000186 0.999956 0.503227 0.000308 0.999907 0.504688 0.000100 0.999756
6 0.495488 0.000378 0.999606 0.499893 0.000119 0.999740 0.495924 0.000031 0.999556
7 0.498443 0.000005 0.999417 0.495728 0.000262 0.999972 0.501255 0.000087 0.999978
8 0.494110 0.000014 0.999888 0.495197 0.000074 0.999970 0.493215 0.000166 0.999718
9 0.496333 0.000365 0.999307 0.502074 0.000110 0.999856 0.499164 0.000035 0.999927