I have this data frame:
ID Date X 123_Var 456_Var 789_Var
A 16-07-19 3 777 250 810
A 17-07-19 9 637 121 529
A 20-07-19 2 295 272 490
A 21-07-19 3 778 600 544
A 22-07-19 6 741 792 907
A 25-07-19 6 435 416 820
A 26-07-19 8 590 455 342
A 27-07-19 6 763 476 753
A 02-08-19 6 717 211 454
A 03-08-19 6 152 442 475
A 05-08-19 6 564 340 302
A 07-08-19 6 105 929 633
A 08-08-19 6 948 366 586
B 07-08-19 4 509 690 406
B 08-08-19 2 413 725 414
B 12-08-19 2 170 702 912
B 13-08-19 3 851 616 477
B 14-08-19 9 475 447 555
B 15-08-19 1 412 403 708
B 17-08-19 2 299 537 321
B 18-08-19 4 310 119 125
I want to show the mean value of n last days (using Date column), excluding the value of current day.
I'm using this code (what should I do to fix this?):
n = 4
cols = list(df.filter(regex='Var').columns)
df = df.set_index('Date')
df[cols] = (df.groupby('ID').rolling(window=f'{n}D')[cols].mean()
.reset_index(0,drop=True).add_suffix(f'_{n}'))
df.reset_index(inplace=True)
Expected result:
ID Date X 123_Var 456_Var 789_Var 123_Var_4 456_Var_4 789_Var_4
A 16-07-19 3 777 250 810 NaN NaN NaN
A 17-07-19 9 637 121 529 777.000000 250.000000 810.0
A 20-07-19 2 295 272 490 707.000000 185.500000 669.5
A 21-07-19 3 778 600 544 466.000000 196.500000 509.5
A 22-07-19 6 741 792 907 536.500000 436.000000 517.0
A 25-07-19 6 435 416 820 759.500000 696.000000 725.5
A 26-07-19 8 590 455 342 588.000000 604.000000 863.5
A 27-07-19 6 763 476 753 512.500000 435.500000 581.0
A 02-08-19 6 717 211 454 NaN NaN NaN
A 03-08-19 6 152 442 475 717.000000 211.000000 454.0
A 05-08-19 6 564 340 302 434.500000 326.500000 464.5
A 07-08-19 6 105 929 633 358.000000 391.000000 388.5
A 08-08-19 6 948 366 586 334.500000 634.500000 467.5
B 07-08-19 4 509 690 406 NaN NaN NaN
B 08-08-19 2 413 725 414 509.000000 690.000000 406.0
B 12-08-19 2 170 702 912 413.000000 725.000000 414.0
B 13-08-19 3 851 616 477 291.500000 713.500000 663.0
B 14-08-19 9 475 447 555 510.500000 659.000000 694.5
B 15-08-19 1 412 403 708 498.666667 588.333333 648.0
B 17-08-19 2 299 537 321 579.333333 488.666667 580.0
B 18-08-19 4 310 119 125 395.333333 462.333333 528.0
Note: dataframe has changed.
I change unutbu solution for working in rolling:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
n = 5
cols = df.filter(regex='Var').columns
df = df.set_index('Date')
df_ = df.set_index('ID', append=True).swaplevel(1,0)
df1 = df.groupby('ID').rolling(window=f'{n}D')[cols].count()
df2 = df.groupby('ID').rolling(window=f'{n}D')[cols].mean()
df3 = (df1.mul(df2)
.sub(df_[cols])
.div(df1[cols].sub(1)).add_suffix(f'_{n}')
)
df4 = df_.join(df3)
print (df4)
X 123_Var 456_Var 789_Var 123_Var_5 456_Var_5 789_Var_5
ID Date
A 2019-07-16 3 777 250 810 NaN NaN NaN
2019-07-17 9 637 121 529 777.000000 250.000000 810.0
2019-07-20 2 295 272 490 707.000000 185.500000 669.5
2019-07-21 3 778 600 544 466.000000 196.500000 509.5
2019-07-22 6 741 792 907 536.500000 436.000000 517.0
2019-07-25 6 435 416 820 759.500000 696.000000 725.5
2019-07-26 8 590 455 342 588.000000 604.000000 863.5
2019-07-27 6 763 476 753 512.500000 435.500000 581.0
2019-08-02 6 717 211 454 NaN NaN NaN
2019-08-03 6 152 442 475 717.000000 211.000000 454.0
2019-08-05 6 564 340 302 434.500000 326.500000 464.5
2019-08-07 6 105 929 633 358.000000 391.000000 388.5
2019-08-08 6 948 366 586 334.500000 634.500000 467.5
B 2019-08-07 4 509 690 406 NaN NaN NaN
2019-08-08 2 413 725 414 509.000000 690.000000 406.0
2019-08-12 2 170 702 912 413.000000 725.000000 414.0
2019-08-13 3 851 616 477 170.000000 702.000000 912.0
2019-08-14 9 475 447 555 510.500000 659.000000 694.5
2019-08-15 1 412 403 708 498.666667 588.333333 648.0
2019-08-17 2 299 537 321 579.333333 488.666667 580.0
2019-08-18 4 310 119 125 395.333333 462.333333 528.0
Related
I have a multi-index dataframe (df):
contract A B Total
sex Male Female Male Female TotalMale TotalFemale
grade2
B1 948 467 408 835 1356 1302
B2 184 863 515 359 699 1222
B3 241 351 907 360 1148 711
B4 472 175 809 555 1281 730
B5 740 563 606 601 1346 1164
B6 435 780 295 392 730 1172
Total 3020 3199 3540 3102 6560 6301
I am trying to drop all indexes so my output is:
0 1 2 3 4 5
0 948 467 408 835 1356 1302
1 184 863 515 359 699 1222
2 241 351 907 360 1148 711
3 472 175 809 555 1281 730
4 740 563 606 601 1346 1164
5 435 780 295 392 730 1172
6 3020 3199 3540 3102 6560 6301
I have tried:
df= df.reset_index()
and
df= df.reset_index(drop=True)
without success
Try with build newDataFrame
df = pd.DataFrame(df.to_numpy())
You can use set_axis for the columns:
df.set_axis(range(df.shape[1]), axis=1).reset_index(drop=True)
If you need to use it in a pipeline, combine it with pipe:
(df
.pipe(lambda d: d.set_axis(range(d.shape[1]), axis=1))
.reset_index(drop=True)
)
say that i have a df in the following format:
year 2016 2017 2018 2019 2020 min max avg
month
2021-01-01 284 288 311 383 476 284 476 357.4
2021-02-01 301 315 330 388 441 301 441 359.6
2021-03-01 303 331 341 400 475 303 475 375.4
2021-04-01 283 300 339 419 492 283 492 372.6
2021-05-01 287 288 346 420 445 287 445 359.7
2021-06-01 283 292 340 424 446 283 446 359.1
2021-07-01 294 296 360 444 452 294 452 370.3
2021-08-01 294 315 381 445 451 294 451 375.9
2021-09-01 288 331 405 464 459 288 464 385.6
2021-10-01 327 349 424 457 453 327 457 399.1
2021-11-01 316 351 413 469 471 316 471 401.0
2021-12-01 259 329 384 467 465 259 467 375.7
and i would like to get the difference of the 2020 column by using df['delta'] = df['2020'].diff()
this will obviously return NaN for the first value in the column. how can i make it so that it automatically interprets that diff as the difference between the FIRST value of 2020 and the LAST value of 2019?
If you want only for 2020:
df["delta"] = pd.concat([df["2019"], df["2020"]]).diff().tail(len(df))
Prints:
year 2016 2017 2018 2019 2020 min max avg delta
0 2021-01-01 284 288 311 383 476 284 476 357.4 9.0
1 2021-02-01 301 315 330 388 441 301 441 359.6 -35.0
2 2021-03-01 303 331 341 400 475 303 475 375.4 34.0
3 2021-04-01 283 300 339 419 492 283 492 372.6 17.0
4 2021-05-01 287 288 346 420 445 287 445 359.7 -47.0
5 2021-06-01 283 292 340 424 446 283 446 359.1 1.0
6 2021-07-01 294 296 360 444 452 294 452 370.3 6.0
7 2021-08-01 294 315 381 445 451 294 451 375.9 -1.0
8 2021-09-01 288 331 405 464 459 288 464 385.6 8.0
9 2021-10-01 327 349 424 457 453 327 457 399.1 -6.0
10 2021-11-01 316 351 413 469 471 316 471 401.0 18.0
11 2021-12-01 259 329 384 467 465 259 467 375.7 -6.0
You can try unstack then do the diff, notice the first item in 2016 will still be NaN
out = df.drop(['min','max','avg'],1).unstack().diff().unstack(0)
2016 2017 2018 2019 2020
2021-01-01 NaN 29.0 -18.0 -1.0 9.0
2021-02-01 17.0 27.0 19.0 5.0 -35.0
2021-03-01 2.0 16.0 11.0 12.0 34.0
2021-04-01 -20.0 -31.0 -2.0 19.0 17.0
2021-05-01 4.0 -12.0 7.0 1.0 -47.0
2021-06-01 -4.0 4.0 -6.0 4.0 1.0
2021-07-01 11.0 4.0 20.0 20.0 6.0
2021-08-01 0.0 19.0 21.0 1.0 -1.0
2021-09-01 -6.0 16.0 24.0 19.0 8.0
2021-10-01 39.0 18.0 19.0 -7.0 -6.0
2021-11-01 -11.0 2.0 -11.0 12.0 18.0
2021-12-01 -57.0 -22.0 -29.0 -2.0 -6.0
I am trying to create dataframes from this "master" dataframe based on unique entries in the row 2.
DATE PROP1 PROP1 PROP1 PROP1 PROP1 PROP1 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2
1 DAYS MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN
2 UNIT1 UNIT2 UNIT3 UNIT4 UNIT5 UNIT6 UNIT7 UNIT8 UNIT3 UNIT4 UNIT11 UNIT12 UNIT1 UNIT2
3
4 1/1/2020 677 92 342 432 878 831 293 88 69 621 586 576 972 733
5 2/1/2020 515 11 86 754 219 818 822 280 441 11 123 36 430 272
6 3/1/2020 253 295 644 401 574 184 354 12 680 729 823 822 174 602
7 4/1/2020 872 568 505 652 366 982 159 131 218 961 52 85 679 923
8 5/1/2020 93 58 864 682 346 19 293 19 206 500 793 962 630 413
9 6/1/2020 696 262 833 418 876 695 900 781 179 138 143 526 9 866
10 7/1/2020 810 58 579 244 81 858 362 440 186 425 55 920 345 596
11 8/1/2020 834 609 618 214 547 834 301 875 783 216 834 609 550 274
12 9/1/2020 687 935 976 380 885 246 339 904 627 460 659 352 361 793
13 10/1/2020 596 300 810 248 475 718 350 574 825 804 245 209 212 925
14 11/1/2020 584 984 711 879 916 107 277 412 122 683 151 811 129 4
15 12/1/2020 616 515 101 743 650 526 475 991 796 227 880 692 734 799
16 1/1/2021 106 441 305 964 452 249 282 486 374 620 652 793 115 697
17 2/1/2021 969 504 936 678 67 42 985 791 709 689 520 503 102 731
18 3/1/2021 823 169 412 177 783 601 613 251 533 463 13 127 516 15
19 4/1/2021 348 588 140 966 143 576 419 611 128 830 68 209 952 935
20 5/1/2021 96 711 651 121 708 360 159 229 552 951 79 665 709 165
21 6/1/2021 805 657 729 629 249 547 581 583 236 828 636 248 412 535
22 7/1/2021 286 320 908 765 336 286 148 168 821 567 63 908 248 320
23 8/1/2021 707 975 565 699 47 712 700 439 497 106 288 105 872 158
24 9/1/2021 346 523 142 181 904 266 28 740 125 64 287 707 553 437
25 10/1/2021 245 42 773 591 492 512 846 487 983 180 372 306 785 691
26 11/1/2021 785 577 448 489 425 205 672 358 868 637 104 422 873 919
so the output will look something like this
df_unit1
DATE PROP1 PROP2
1 DAYS MEAN MEAN
2 UNIT1 UNIT1
3
4 1/1/2020 677 972
5 2/1/2020 515 430
6 3/1/2020 253 174
7 4/1/2020 872 679
8 5/1/2020 93 630
9 6/1/2020 696 9
10 7/1/2020 810 345
11 8/1/2020 834 550
12 9/1/2020 687 361
13 10/1/2020 596 212
14 11/1/2020 584 129
15 12/1/2020 616 734
16 1/1/2021 106 115
17 2/1/2021 969 102
18 3/1/2021 823 516
19 4/1/2021 348 952
20 5/1/2021 96 709
21 6/1/2021 805 412
22 7/1/2021 286 248
23 8/1/2021 707 872
24 9/1/2021 346 553
25 10/1/2021 245 785
26 11/1/2021 785 873
df_unit2
DATE PROP1 PROP2
1 DAYS MEAN MEAN
2 UNIT2 UNIT2
3
4 1/1/2020 92 733
5 2/1/2020 11 272
6 3/1/2020 295 602
7 4/1/2020 568 923
8 5/1/2020 58 413
9 6/1/2020 262 866
10 7/1/2020 58 596
11 8/1/2020 609 274
12 9/1/2020 935 793
13 10/1/2020 300 925
14 11/1/2020 984 4
15 12/1/2020 515 799
16 1/1/2021 441 697
17 2/1/2021 504 731
18 3/1/2021 169 15
19 4/1/2021 588 935
20 5/1/2021 711 165
21 6/1/2021 657 535
22 7/1/2021 320 320
23 8/1/2021 975 158
24 9/1/2021 523 437
25 10/1/2021 42 691
26 11/1/2021 577 919
I have extracted the unique units from the row
unitName = pd.Series(pd.Series(df[2,:]).unique(), name = "Unit Names")
unitName = unitName.tolist()
Next I was planning to loop through this list of unique units and create dataframes with each units
for unit in unitName:
df_unit = df.iloc[[df.iloc[2:,:].str.match(unit)],:]
print(df_unit)
I am getting an error that 'DataFrame' object has no attribute 'str'. So my plan was to match all the cells in row2 that matches a given unit and then extract the entire column for the matched row cell.
This response has two parts:
Solution 1: Strip columns based on common name in dataframe
With the assumption that your dataframe columns look as follows:
['DATE DAYS', 'PROP1 MEAN UNIT1', 'PROP1 MEAN UNIT2', 'PROP1 MEAN UNIT3', 'PROP1 MEAN UNIT4', 'PROP1 MEAN UNIT5', 'PROP1 MEAN UNIT6', 'PROP2 MEAN UNIT7', 'PROP2 MEAN UNIT8', 'PROP2 MEAN UNIT3', 'PROP2 MEAN UNIT4', 'PROP2 MEAN UNIT11', 'PROP2 MEAN UNIT12', 'PROP2 MEAN UNIT1', 'PROP2 MEAN UNIT2']
and the first few records of your dataframe looks like this...
DATE DAYS PROP1 MEAN UNIT1 ... PROP2 MEAN UNIT1 PROP2 MEAN UNIT2
0 1/1/2020 677 ... 972 733
1 2/1/2020 515 ... 430 272
2 3/1/2020 253 ... 174 602
3 4/1/2020 872 ... 679 923
4 5/1/2020 93 ... 630 413
5 6/1/2020 696 ... 9 866
6 7/1/2020 810 ... 345 596
The following lines of code should give you what you want:
cols = df.columns.tolist()
units = sorted(set(x[x.rfind('UNIT'):] for x in cols[1:]))
s_units = sorted(cols[1:],key = lambda x: x.split()[2])
for i in units:
unit_sublist = ['DATE DAYS'] + [j for j in s_units if j[-6:].strip() == i]
print ('df_' + i.lower())
print (df[unit_sublist])
I got the following:
df_unit1
DATE DAYS PROP1 MEAN UNIT1 PROP2 MEAN UNIT1
0 1/1/2020 677 972
1 2/1/2020 515 430
2 3/1/2020 253 174
3 4/1/2020 872 679
4 5/1/2020 93 630
5 6/1/2020 696 9
6 7/1/2020 810 345
df_unit11
DATE DAYS PROP2 MEAN UNIT11
0 1/1/2020 586
1 2/1/2020 123
2 3/1/2020 823
3 4/1/2020 52
4 5/1/2020 793
5 6/1/2020 143
6 7/1/2020 55
df_unit12
DATE DAYS PROP2 MEAN UNIT12
0 1/1/2020 576
1 2/1/2020 36
2 3/1/2020 822
3 4/1/2020 85
4 5/1/2020 962
5 6/1/2020 526
6 7/1/2020 920
df_unit2
DATE DAYS PROP1 MEAN UNIT2 PROP2 MEAN UNIT2
0 1/1/2020 92 733
1 2/1/2020 11 272
2 3/1/2020 295 602
3 4/1/2020 568 923
4 5/1/2020 58 413
5 6/1/2020 262 866
6 7/1/2020 58 596
df_unit3
DATE DAYS PROP1 MEAN UNIT3 PROP2 MEAN UNIT3
0 1/1/2020 342 69
1 2/1/2020 86 441
2 3/1/2020 644 680
3 4/1/2020 505 218
4 5/1/2020 864 206
5 6/1/2020 833 179
6 7/1/2020 579 186
df_unit4
DATE DAYS PROP1 MEAN UNIT4 PROP2 MEAN UNIT4
0 1/1/2020 432 621
1 2/1/2020 754 11
2 3/1/2020 401 729
3 4/1/2020 652 961
4 5/1/2020 682 500
5 6/1/2020 418 138
6 7/1/2020 244 425
df_unit5
DATE DAYS PROP1 MEAN UNIT5
0 1/1/2020 878
1 2/1/2020 219
2 3/1/2020 574
3 4/1/2020 366
4 5/1/2020 346
5 6/1/2020 876
6 7/1/2020 81
df_unit6
DATE DAYS PROP1 MEAN UNIT6
0 1/1/2020 831
1 2/1/2020 818
2 3/1/2020 184
3 4/1/2020 982
4 5/1/2020 19
5 6/1/2020 695
6 7/1/2020 858
df_unit7
DATE DAYS PROP2 MEAN UNIT7
0 1/1/2020 293
1 2/1/2020 822
2 3/1/2020 354
3 4/1/2020 159
4 5/1/2020 293
5 6/1/2020 900
6 7/1/2020 362
df_unit8
DATE DAYS PROP2 MEAN UNIT8
0 1/1/2020 88
1 2/1/2020 280
2 3/1/2020 12
3 4/1/2020 131
4 5/1/2020 19
5 6/1/2020 781
6 7/1/2020 440
Solution 2: Create column names based on first 3 rows in the source data
Let us assume the first 6 rows of your dataframe looks like this.
DATE PROP1 PROP1 PROP1 PROP1 PROP1 PROP1 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2
DAYS MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN
UNIT1 UNIT2 UNIT3 UNIT4 UNIT5 UNIT6 UNIT7 UNIT8 UNIT3 UNIT4 UNIT11 UNIT12 UNIT1 UNIT2
4 1/1/2020 677 92 342 432 878 831 293 88 69 621 586 576 972 733
5 2/1/2020 515 11 86 754 219 818 822 280 441 11 123 36 430 272
6 3/1/2020 253 295 644 401 574 184 354 12 680 729 823 822 174 602
Then you can write the below code to create the dataframe.
data = '''DATE PROP1 PROP1 PROP1 PROP1 PROP1 PROP1 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2
DAYS MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN
UNIT1 UNIT2 UNIT3 UNIT4 UNIT5 UNIT6 UNIT7 UNIT8 UNIT3 UNIT4 UNIT11 UNIT12 UNIT1 UNIT2
4 1/1/2020 677 92 342 432 878 831 293 88 69 621 586 576 972 733
5 2/1/2020 515 11 86 754 219 818 822 280 441 11 123 36 430 272
6 3/1/2020 253 295 644 401 574 184 354 12 680 729 823 822 174 602
7 4/1/2020 872 568 505 652 366 982 159 131 218 961 52 85 679 923
8 5/1/2020 93 58 864 682 346 19 293 19 206 500 793 962 630 413
9 6/1/2020 696 262 833 418 876 695 900 781 179 138 143 526 9 866
10 7/1/2020 810 58 579 244 81 858 362 440 186 425 55 920 345 596
11 8/1/2020 834 609 618 214 547 834 301 875 783 216 834 609 550 274
12 9/1/2020 687 935 976 380 885 246 339 904 627 460 659 352 361 793
13 10/1/2020 596 300 810 248 475 718 350 574 825 804 245 209 212 925
14 11/1/2020 584 984 711 879 916 107 277 412 122 683 151 811 129 4
15 12/1/2020 616 515 101 743 650 526 475 991 796 227 880 692 734 799
16 1/1/2021 106 441 305 964 452 249 282 486 374 620 652 793 115 697
17 2/1/2021 969 504 936 678 67 42 985 791 709 689 520 503 102 731
18 3/1/2021 823 169 412 177 783 601 613 251 533 463 13 127 516 15
19 4/1/2021 348 588 140 966 143 576 419 611 128 830 68 209 952 935
20 5/1/2021 96 711 651 121 708 360 159 229 552 951 79 665 709 165
21 6/1/2021 805 657 729 629 249 547 581 583 236 828 636 248 412 535
22 7/1/2021 286 320 908 765 336 286 148 168 821 567 63 908 248 320
23 8/1/2021 707 975 565 699 47 712 700 439 497 106 288 105 872 158
24 9/1/2021 346 523 142 181 904 266 28 740 125 64 287 707 553 437
25 10/1/2021 245 42 773 591 492 512 846 487 983 180 372 306 785 691
26 11/1/2021 785 577 448 489 425 205 672 358 868 637 104 422 873 919'''
data_list = data.split('\n')
data_line1 = data_list[0].split()
data_line2 = data_list[1].split()
data_line3 = [''] + data_list[2].split()
data_header = [' '.join([data_line1[i],data_line2[i],data_line3[i]]) for i in range(len(data_line1))]
data_header[0] = data_header[0][:-1]
new_data= data_list[3:]
import pandas as pd
df = pd.DataFrame(data = None,columns=data_header)
for i in range(len(new_data)-1):
df.loc[i] = new_data[i].split()[1:]
print (df)
Here is what worked for me.
#Assign unique column names to the dataframe
df.columns = range(df.shape[1])
#Get all the unique units in the dataframe
unitName = pd.Series(pd.Series(df.loc[2,:]).unique(), name = "Unit Names")
#Convert them to a list to loop through
unitName = unitName.tolist()
for var in unitName:
#this looks for an exact match for the unit in row index 2 and
#extracts the entire column with the match
df_item = df[df.columns[df.iloc[3].str.fullmatch(var)]]
print (df_item)
I have a dataset with an uneven sample frequency as seen on this subset:
time date x y id nn1 nn2
0 2019-09-17 08:43:06 234 236 4909 22.02271554554524 38.2099463490856
0 2019-09-17 08:43:06 251 222 4911 22.02271554554524 46.57252408878007
1 2019-09-17 08:43:07 231 244 4909 30.4138126514911 41.617304093369626
1 2019-09-17 08:43:07 252 222 4911 30.4138126514911 46.57252408878007
1 2019-09-17 08:43:07 207 210 4900 41.617304093369626 46.57252408878007
2 2019-09-17 08:43:08 234 250 4909 33.28663395418648 48.82622246293481
2 2019-09-17 08:43:08 206 210 4900 47.53945729601885 48.82622246293481
3 2019-09-17 08:43:09 252 222 4911 38.28837943815329 47.53945729601885
3 2019-09-17 08:43:09 206 210 4900 40.718546143004666 47.53945729601885
3 2019-09-17 08:43:09 223 247 4909 38.28837943815329 40.718546143004666
4 2019-09-17 08:43:10 206 210 4900 35.4682957019364 47.53945729601885
4 2019-09-17 08:43:10 229 237 4909 27.459060435491963 35.4682957019364
4 2019-09-17 08:43:10 252 222 4911 27.459060435491963 47.53945729601885
5 2019-09-17 08:43:12 226 241 4909 30.805843601498726 38.01315561749642
5 2019-09-17 08:43:12 251 223 4911 30.805843601498726 44.94441010848846
5 2019-09-17 08:43:12 209 207 4900 38.01315561749642 44.94441010848846
6 2019-09-17 08:43:13 251 222 4911 34.20526275297414 44.598206241955516
6 2019-09-17 08:43:13 224 243 4909 34.20526275297414 39.0
6 2019-09-17 08:43:13 209 207 4900 39.0 44.598206241955516
7 2019-09-17 08:43:14 251 222 4911 33.421549934136806 45.5411901469428
7 2019-09-17 08:43:14 225 243 4909 33.421549934136806 39.81205847478876
8 2019-09-17 08:43:15 225 245 4909 34.713109915419565 41.23105625617661
8 2019-09-17 08:43:15 209 207 4900 41.23105625617661 44.598206241955516
8 2019-09-17 08:43:15 251 222 4911 34.713109915419565 44.598206241955516
9 2019-09-17 08:43:16 209 207 4900 37.20215047547655 48.46648326421054
9 2019-09-17 08:43:16 254 225 4911 25.942243542145693 48.46648326421054
10 2019-09-17 08:43:18 206 207 4900 41.182520563948 67.26812023536856
10 2019-09-17 08:43:18 242 227 4909 30.805843601498726 41.182520563948
10 2019-09-17 08:43:18 272 220 4911 30.805843601498726 67.26812023536856
I want to reshape the data set into even 0.25 Seconds intervals (increasing the sample frequency to 4 fps) and fill the Nan Values with average values of the given second. I fail with interpolating and reshaping, is there anyone that can help?? Also, the ID has to stay the same. I deeply appreciate!
I am trying to concat two dataframes:
DataFrame 1 'AB1'
AB_BH AB_CA
Date
2007-01-05 305 324
2007-01-12 427 435
2007-01-19 481 460
2007-01-26 491 506
2007-02-02 459 503
2007-02-09 459 493
2007-02-16 450 486
DataFrame 2 'ABFluid'
Obj Total Rigs
Date
2007-01-03 312
2007-01-09 412
2007-01-16 446
2007-01-23 468
2007-01-30 456
2007-02-06 465
2007-02-14 456
2007-02-20 435
2007-02-27 440
Using the following code:
rigdata = pd.concat([AB1,ABFluid['Total Rigs']], axis=1
Which results in this:
AB_BH AB_CA Total Rigs
Date
2007-01-03 NaN NaN 312
2007-01-05 305 324 NaN
2007-01-09 NaN NaN 412
2007-01-12 427 435 NaN
2007-01-16 NaN NaN 446
2007-01-19 481 460 NaN
2007-01-23 NaN NaN 468
2007-01-26 491 506 NaN
But I am looking to force the 'Total Rigs' dataframe to have the same dates as the AB1 frame like this:
AB_BH AB_CA Total Rigs
Date
2007-01-03 305 324 312
2007-01-12 427 435 412
2007-01-19 481 460 446
2007-01-26 491 506 468
Which is just aligning them by column and re_indexing the dates.
Any suggestions??
You could do ABFluid.index = AB1.index before the concat, to make the second DataFrame have the same index as the first.