how to select mutilindex row? - python

Let's say i have a multiindex dataframe like below.
ROW_ID HADM_ID ICUSTAY_ID
SUBJECT_ID CHARTTIME
23 2157-10-21 12:05:00 1 124321 234044.0
2157-10-21 14:00:00 30 124321 234044.0
2157-10-21 19:00:00 77 124321 234044.0
2157-10-22 00:00:00 148 124321 234044.0
2157-10-22 04:00:00 197 124321 234044.0
2157-10-22 08:00:00 226 124321 234044.0
2157-10-22 16:00:00 320 124321 234044.0
34 2191-02-23 08:00:00 367 144319 290505.0
2191-02-23 12:00:00 450 144319 290505.0
2191-02-23 15:00:00 476 144319 290505.0
2191-02-23 20:00:00 511 144319 290505.0
2191-02-24 00:00:00 538 144319 290505.0
2191-02-24 04:00:00 567 144319 290505.0
2191-02-24 07:00:00 608 144319 290505.0
2191-02-24 12:00:00 648 144319 290505.0
36 2134-05-12 07:00:00 685 165660 241249.0
2134-05-12 12:00:00 787 165660 241249.0
2134-05-12 16:00:00 855 165660 241249.0
2134-05-12 20:00:00 924 165660 241249.0
2134-05-13 00:00:00 988 165660 241249.0
SUBJECT_ID and CHARTTIME are the multiindex. now i want to get the row of every SUBJECT_ID with every first CHARTTIME.so the expected output is:
ROW_ID HADM_ID ICUSTAY_ID
SUBJECT_ID CHARTTIME
23 2157-10-21 12:05:00 1 124321 234044.0
34 2191-02-23 08:00:00 367 144319 290505.0
36 2134-05-12 07:00:00 685 165660 241249.0
I have tried to use iloc and xs, but it not work. any help will appreciate.

If you want to groupby on index, you must pass the level param instead of the by param
df = df.reset_index('CHARTTIME')
df = df.groupby(level=['SUBJECT_ID']).first().set_index('CHARTTIME', append=True)

Related

Aggregate time series data on weekly basis

I have a dataframe that consists of 3 years of data and two columns remaining useful life and predicted remaining useful life.
I am aggregating rul and pred_rul of 3 years data for each machineID for the maximum date they have. The original dataframe looks like this-
rul pred_diff machineID datetime
10476749 870 312.207825 408 2021-05-25 00:00:00
11452943 68 288.517578 447 2023-03-01 12:00:00
12693829 381 273.159698 493 2021-09-16 16:00:00
3413787 331 291.326416 133 2022-10-26 12:00:00
464093 77 341.506195 19 2023-10-10 16:00:00
... ... ... ... ...
11677555 537 310.586090 456 2022-04-07 00:00:00
2334804 551 289.307129 92 2021-09-04 20:00:00
5508311 35 293.721771 214 2023-01-06 04:00:00
12319704 348 322.199219 479 2021-11-11 20:00:00
4777501 87 278.089417 186 2021-06-29 12:00:00
1287421 rows × 4 columns
And I am aggregating it based on this code-
y_test_grp = y_test.groupby('machineID').agg({'datetime':'max', 'rul':'mean', 'pred_diff':'mean'})[['datetime','rul', 'pred_diff']].reset_index()
which gives the following output-
machineID datetime rul pred_diff
0 1 2023-10-03 20:00:00 286.817681 266.419401
1 2 2023-11-14 00:00:00 225.561953 263.372531
2 3 2023-10-25 00:00:00 304.736237 256.933351
3 4 2023-01-13 12:00:00 204.084899 252.476066
4 5 2023-09-07 00:00:00 208.702431 252.487156
... ... ... ... ...
495 496 2023-10-11 00:00:00 302.445285 298.836798
496 497 2023-08-26 04:00:00 281.601613 263.479885
497 498 2023-11-28 04:00:00 292.593906 263.985034
498 499 2023-06-29 20:00:00 260.887529 263.494844
499 500 2023-11-08 20:00:00 160.223614 257.326034
500 rows × 4 columns
Since this is grouped by on machineID, it is giving just 500 rows which is less. I want to aggregate rul and pred_rul on weekly basis such that for each machineID I get 52weeks*3years=156 rows. I am not able to identify which function to use for taking 7 days as interval and aggregating rul and pred_rul on that.
You can use Grouper:
pd.groupby(['machineID', pd.Grouper(key='datetime', freq='7D')]).mean()

formating file with hours and date in the same column

our electricity provider think it could be very fun to make difficult to read csv files they provide.
This is precise electric consumption, every 30 min but in the SAME column you have hours, and date, example :
[EDIT : here the raw version of the csv file, my bad]
;
"Récapitulatif de mes puissances atteintes en W";
;
"Date et heure de relève par le distributeur";"Puissance atteinte (W)"
;
"19/11/2022";
"00:00:00";4494
"23:30:00";1174
"23:00:00";1130
[...]
"01:30:00";216
"01:00:00";2672
"00:30:00";2816
;
"18/11/2022";
"00:00:00";4494
"23:30:00";1174
"23:00:00";1130
[...]
"01:30:00";216
"01:00:00";2672
"00:30:00";2816
How damn can I obtain this kind of lovely formated file :
2022-11-19 00:00:00 2098
2022-11-19 23:30:00 218
2022-11-19 23:00:00 606
etc.
Okay I have an idiotic brutforce solution for you, so dont take that as coding recommondation but just something that gets the job done:
import itertools
dList = [f"{f}/{s}/2022" for f, s in itertools.product(range(1, 32), range(1, 13))]
i assume you have a text file with that so im just gonna use that:
file = 'yourfilename.txt'
#make sure youre running the program in the same directory as the .txt file
with open(file, "r") as f:
global lines
lines = f.readlines()
lines = [word.replace('\n','') for word in lines]
for i in lines:
if i in dList:
curD = i
else:
with open('output.txt', 'w') as g:
g.write(f'{i} {(i.split())[0]} {(i.split())[1]}')
make sure to create a file called output.txt in the same directory and everything will get writen into that file.
Try:
import pandas as pd
current_date = None
all_data = []
with open("your_file.txt", "r") as f_in:
# skip first 5 rows (header)
for _ in range(5):
next(f_in)
for row in map(str.strip, f_in):
row = row.replace('"', "")
if row == "":
continue
if "/" in row:
current_date = row
else:
all_data.append([current_date, *row.split(";")])
df = pd.DataFrame(all_data, columns=["Date", "Time", "Value"])
print(df)
Prints:
Date Time Value
0 19/11/2022; 00:00:00 4494
1 19/11/2022; 23:30:00 1174
2 19/11/2022; 23:00:00 1130
3 19/11/2022; 01:30:00 216
4 19/11/2022; 01:00:00 2672
5 19/11/2022; 00:30:00 2816
6 18/11/2022; 00:00:00 4494
7 18/11/2022; 23:30:00 1174
8 18/11/2022; 23:00:00 1130
9 18/11/2022; 01:30:00 216
10 18/11/2022; 01:00:00 2672
11 18/11/2022; 00:30:00 2816
Using pandas operations would be like the following:
data.csv
19/11/2022
00:00:00 2098
23:30:00 218
23:00:00 606
01:30:00 216
01:00:00 2672
00:30:00 2816
18/11/2022
00:00:00 1994
23:30:00 260
23:00:00 732
01:30:00 200
01:00:00 1378
00:30:00 2520
17/11/2022
00:00:00 1830
23:30:00 96
23:00:00 122
01:30:00 694
01:00:00 2950
00:30:00 3062
16/11/2022
00:00:00 2420
23:30:00 678
23:00:00 644
Implementation
import pandas as pd
df = pd.read_csv('data.csv', header=None)
df['amount'] = df[0].apply(lambda item:item.split(' ')[-1] if item.find(':')>0 else None)
df['time'] = df[0].apply(lambda item:item.split(' ')[0] if item.find(':')>0 else None)
df['date'] = df[0].apply(lambda item:item if item.find('/')>0 else None)
df['date'] = df['date'].fillna(method='ffill')
df = df.dropna(subset=['amount'], how='any')
df = df.drop(0, axis=1)
print(df)
output
amount time date
1 2098 00:00:00 19/11/2022
2 218 23:30:00 19/11/2022
3 606 23:00:00 19/11/2022
4 216 01:30:00 19/11/2022
5 2672 01:00:00 19/11/2022
6 2816 00:30:00 19/11/2022
8 1994 00:00:00 18/11/2022
9 260 23:30:00 18/11/2022
10 732 23:00:00 18/11/2022
11 200 01:30:00 18/11/2022
12 1378 01:00:00 18/11/2022
13 2520 00:30:00 18/11/2022
15 1830 00:00:00 17/11/2022
16 96 23:30:00 17/11/2022
17 122 23:00:00 17/11/2022
18 694 01:30:00 17/11/2022
19 2950 01:00:00 17/11/2022
20 3062 00:30:00 17/11/2022
22 2420 00:00:00 16/11/2022
23 678 23:30:00 16/11/2022
24 644 23:00:00 16/11/2022

why I take this plot with matplotlib.pyplot when add date too x axis

First data frame:
date time open high low close volume avg
0 2021-05-23 00:00:00 37458.51 38270.64 31111.01 34655.25 217136.046593 NaN
1 2021-05-24 00:00:00 34681.44 39920.00 34031.00 38796.29 161630.893971 NaN
2 2021-05-25 00:00:00 38810.99 39791.77 36419.62 38324.72 111996.228404 NaN
3 2021-05-26 00:00:00 38324.72 40841.00 37800.44 39241.91 104780.773396 NaN
4 2021-05-27 00:00:00 39241.92 40411.14 37134.27 38529.98 86547.158794 NaN
5 2021-05-28 00:00:00 38529.99 38877.83 34684.00 35663.49 135377.629720 NaN
6 2021-05-29 00:00:00 35661.79 37338.58 33632.76 34605.15 112663.092689 NaN
7 2021-05-30 00:00:00 34605.15 36488.00 33379.00 35641.27 73535.386967 NaN
8 2021-05-31 00:00:00 35641.26 37499.00 34153.84 37253.81 94160.735289 NaN
9 2021-01-06 00:00:00 37253.82 37894.81 35666.00 36693.09 81234.663770 NaN
10 2021-02-06 00:00:00 36694.85 38225.00 35920.00 37568.68 67587.372495 NaN
11 2021-03-06 00:00:00 37568.68 39476.00 37170.00 39246.79 75889.106011 NaN
12 2021-04-06 00:00:00 39246.78 39289.07 35555.15 36829.00 91317.799245 NaN
13 2021-05-06 00:00:00 36829.15 37925.00 34800.00 35513.20 70459.621490 NaN
14 2021-06-06 00:00:00 35516.07 36480.00 35222.00 35796.31 47650.206637 NaN
15 2021-07-06 00:00:00 35796.31 36900.00 33300.00 33552.79 77574.952573 NaN
16 2021-08-06 00:00:00 33556.96 34068.01 31000.00 33380.81 123251.189037 NaN
17 2021-09-06 00:00:00 33380.80 37534.79 32396.82 37388.05 136607.597517 NaN
18 2021-10-06 00:00:00 37388.05 38491.00 35782.00 36675.72 109527.284943 NaN
19 2021-11-06 00:00:00 36677.83 37680.40 35936.77 37331.98 78466.005300 NaN
20 2021-12-06 00:00:00 37331.98 37463.63 34600.36 35546.11 87717.549990 NaN
21 2021-06-13 00:00:00 35546.12 39380.00 34757.00 39020.57 86921.025555 NaN
22 2021-06-14 00:00:00 39020.56 41064.05 38730.00 40516.29 108522.391949 NaN
23 2021-06-15 00:00:00 40516.28 41330.00 39506.40 40144.04 80679.622838 NaN
24 2021-06-16 00:00:00 40143.80 40527.14 38116.01 38349.01 87771.976937 NaN
25 2021-06-17 00:00:00 38349.00 39559.88 37365.00 38092.97 79541.307119 NaN
26 2021-06-18 00:00:00 38092.97 38202.84 35129.29 35819.84 95228.042935 NaN
27 2021-06-19 00:00:00 35820.48 36457.00 34803.52 35483.72 68712.449461 NaN
28 2021-06-20 00:00:00 35483.72 36137.72 33336.00 35600.16 89878.170850 NaN
29 2021-06-21 00:00:00 35600.17 35750.00 31251.23 31608.93 168778.873159 NaN
30 2021-06-22 00:00:00 31614.12 33298.78 28805.00 32509.56 204208.179762 NaN
31 2021-06-23 00:00:00 32509.56 34881.00 31683.00 33678.07 126966.100563 NaN
32 2021-06-24 00:00:00 33675.07 35298.00 32286.57 34663.09 86625.804260 NaN
33 2021-06-25 00:00:00 34663.08 35500.00 31275.00 31584.45 116061.130356 NaN
34 2021-06-26 00:00:00 31576.09 32730.00 30151.00 32283.65 107820.375287 NaN
35 2021-06-27 00:00:00 32283.65 34749.00 31973.45 34700.34 96613.244211 NaN
36 2021-06-28 00:00:00 34702.49 35297.71 33862.72 34494.89 82222.267819 NaN
37 2021-06-29 00:00:00 34494.89 36600.00 34225.43 35911.73 90788.796220 NaN
38 2021-06-30 00:00:00 35911.72 36100.00 34017.55 35045.00 77152.197634 NaN
39 2021-01-07 00:00:00 35045.00 35057.57 32711.00 33504.69 71708.266112 15.362372
40 2021-02-07 00:00:00 33502.33 33977.04 32699.00 33786.55 56172.181378 15.386331
41 2021-03-07 00:00:00 33786.54 34945.61 33316.73 34669.13 43044.578641 15.154877
42 2021-04-07 00:00:00 34669.12 35967.85 34357.15 35286.51 43703.475789 14.677524
43 2021-05-07 00:00:00 35288.13 35293.78 33125.55 33690.14 64123.874245 14.486827
44 2021-06-07 00:00:00 33690.15 35118.88 33532.00 34220.01 58210.596349 14.305665
45 2021-07-07 00:00:00 34220.02 35059.09 33777.77 33862.12 53807.521675 14.133561
46 2021-08-07 00:00:00 33862.11 33929.64 32077.00 32875.71 70136.480320 14.336865
47 2021-09-07 00:00:00 32875.71 34100.00 32261.07 33815.81 47153.939899 14.479159
48 2021-10-07 00:00:00 33815.81 34262.00 33004.78 33502.87 34761.175468 14.564313
49 2021-11-07 00:00:00 33502.87 34666.00 33306.47 34258.99 31572.647448 14.517866
50 2021-12-07 00:00:00 34259.00 34678.43 32658.34 33086.63 48181.403762 14.627892
51 2021-07-13 00:00:00 33086.94 33340.00 32202.25 32729.77 41126.361008 14.839689
52 2021-07-14 00:00:00 32729.12 33114.03 31550.00 32820.02 46777.823484 15.192346
53 2021-07-15 00:00:00 32820.03 33185.25 31133.00 31880.00 51639.576353 15.623083
54 2021-07-16 00:00:00 31874.49 32249.18 31020.00 31383.87 48499.864154 16.058731
55 2021-07-17 00:00:00 31383.86 31955.92 31164.31 31520.07 34012.242132 16.472596
56 2021-07-18 00:00:00 31520.07 32435.00 31108.97 31778.56 35923.716186 16.669426
57 2021-07-19 00:00:00 31778.57 31899.00 30407.44 30839.65 47340.468499 17.041150
58 2021-07-20 00:00:00 30839.65 31063.07 29278.00 29790.35 61034.049017 17.671053
59 2021-07-21 00:00:00 29790.34 32858.00 29482.61 32144.51 82796.265128 17.564616
60 2021-07-22 00:00:00 32144.51 32591.35 31708.00 32287.83 46148.092433 17.463500
61 2021-07-23 00:00:00 32287.58 33650.00 31924.32 33634.09 50112.863626 16.984139
62 2021-07-24 00:00:00 33634.10 34500.00 33401.14 34258.14 47977.550138 16.242346
63 2021-07-25 00:00:00 34261.51 35398.00 33851.12 35381.02 47852.928313 15.607586
64 2021-07-26 00:00:00 35381.02 40550.00 35205.78 37237.60 152452.512724 16.219395
65 2021-07-27 00:00:00 37241.33 39542.61 36383.00 39457.87 88397.267015 16.800613
66 2021-07-28 00:00:00 39456.61 40900.00 38772.00 40019.56 101344.528441 17.599907
67 2021-07-29 00:00:00 40019.57 40640.00 39200.00 40016.48 53998.439283 18.359237
68 2021-07-30 00:00:00 40018.49 42316.71 38313.23 42206.37 73602.784805 19.368676
69 2021-07-31 00:00:00 42206.36 42448.00 41000.15 41461.83 44849.791012 20.349200
70 2021-01-08 00:00:00 41461.84 42599.00 39422.01 39845.44 53953.186326 20.714136
71 2021-02-08 00:00:00 39850.27 40480.01 38690.00 39147.82 50837.351954 20.816480
72 2021-03-08 00:00:00 39146.86 39780.00 37642.03 38207.05 57117.435853 20.578895
73 2021-04-08 00:00:00 38207.04 39969.66 37508.56 39723.18 52329.352430 20.396351
74 2021-05-08 00:00:00 39723.17 41350.00 37332.70 40862.46 84343.755621 20.526294
75 2021-06-08 00:00:00 40862.46 43392.43 39853.86 42836.87 75753.941347 21.042989
76 2021-07-08 00:00:00 42836.87 44700.00 42446.41 44572.54 73396.740808 21.756471
77 2021-08-08 00:00:00 44572.54 45310.00 43261.00 43794.37 69329.092698 22.533424
78 2021-09-08 00:00:00 43794.36 46454.15 42779.00 46253.40 74587.884845 23.450453
79 2021-10-08 00:00:00 46248.87 46700.00 44589.46 45584.99 53814.643421 24.359303
80 2021-11-08 00:00:00 45585.00 46743.47 45341.14 45511.00 52734.901977 25.229618
81 2021-12-08 00:00:00 45510.67 46218.12 43770.00 44399.00 55266.108781 25.471002
82 2021-08-13 00:00:00 44400.06 47886.00 44217.39 47800.00 48239.370431 25.995794
83 2021-08-14 00:00:00 47799.99 48144.00 45971.03 47068.51 46114.359022 26.537795
84 2021-08-15 00:00:00 47068.50 47372.27 45500.00 46973.82 42110.711334 26.878796
85 2021-08-16 00:00:00 46973.82 48053.83 45660.00 45901.29 52480.574014 27.326937
86 2021-08-17 00:00:00 45901.30 47160.00 44376.00 44695.95 57039.341629 27.285215
87 2021-08-18 00:00:00 44695.95 46000.00 44203.28 44705.29 54099.415985 27.184539
88 2021-08-19 00:00:00 44699.37 47033.00 43927.70 46760.62 53411.753920 27.302916
89 2021-08-20 00:00:00 46760.62 49382.99 46622.99 49322.47 56850.352228 27.840242
90 2021-08-21 00:00:00 49322.47 49757.04 48222.00 48821.87 46745.136584 28.412062
91 2021-08-22 00:00:00 48821.88 49500.00 48050.00 49239.22 37007.887795 28.889153
92 2021-08-23 00:00:00 49239.22 50500.00 49029.00 49488.85 52462.541954 29.512800
93 2021-08-24 00:00:00 49488.85 49860.00 47600.00 47674.01 51014.594748 29.565824
94 2021-08-25 00:00:00 47674.01 49264.30 47126.28 48973.32 44655.830342 29.446836
95 2021-08-26 00:00:00 48973.32 49352.84 46250.00 46843.87 49371.277774 29.028026
96 2021-08-27 00:00:00 46843.86 49149.93 46348.00 49069.90 42068.104965 28.630156
97 2021-08-28 00:00:00 49069.90 49299.00 48346.88 48895.35 26681.063786 28.287626
98 2021-08-29 00:00:00 48895.35 49632.27 47762.54 48767.83 32652.283473 27.744622
99 2021-08-30 00:00:00 48767.84 48888.61 46853.00 46982.91 40288.350830 26.903998
100 2021-08-31 00:00:00 46982.91 48246.11 46700.00 47100.89 48645.527370 26.051605
101 2021-01-09 00:00:00 47100.89 49156.00 46512.00 48810.52 49904.655280 25.499838
102 2021-02-09 00:00:00 48810.51 50450.13 48584.06 49246.64 54410.770538 25.311075
103 2021-03-09 00:00:00 49246.63 51000.00 48316.84 49999.14 59025.644157 25.265214
104 2021-04-09 00:00:00 49998.00 50535.69 49370.00 49915.64 34664.659590 25.221647
105 2021-05-09 00:00:00 49917.54 51900.00 49450.00 51756.88 40544.835873 25.504286
106 2021-06-09 00:00:00 51756.88 52780.00 50969.33 52663.90 49249.667081 25.962876
107 2021-07-09 00:00:00 52666.20 52920.00 42843.05 46863.73 123048.802719 25.276717
108 2021-08-09 00:00:00 46868.57 47340.99 44412.02 46048.31 65069.315200 24.624866
109 2021-09-09 00:00:00 46048.31 47399.97 45513.08 46395.14 50651.660020 23.989928
110 2021-10-09 00:00:00 46395.14 47033.00 44132.29 44850.91 49048.266180 23.670387
111 2021-11-09 00:00:00 44842.20 45987.93 44722.22 45173.69 30440.408100 23.366822
112 2021-12-09 00:00:00 45173.68 46460.00 44742.06 46025.24 32094.280520 22.938381
113 2021-09-13 00:00:00 46025.23 46880.00 43370.00 44940.73 65429.150560 22.820722
114 2021-09-14 00:00:00 44940.72 47250.00 44594.44 47111.52 44855.850990 22.594896
115 2021-09-15 00:00:00 47103.28 48500.00 46682.32 48121.41 43204.711740 22.007531
116 2021-09-16 00:00:00 48121.40 48557.00 47021.10 47737.82 40725.088950 21.432816
117 2021-09-17 00:00:00 47737.81 48150.00 46699.56 47299.98 34461.927760 20.965565
118 2021-09-18 00:00:00 47299.98 48843.20 47035.56 48292.74 30906.470380 20.306487
119 2021-09-19 00:00:00 48292.75 48372.83 46829.18 47241.75 29847.243490 19.735184
120 2021-09-20 00:00:00 47241.75 47347.25 42500.00 43015.62 78003.524443 20.139851
121 2021-09-21 00:00:00 43016.64 43639.00 39600.00 40734.38 84534.080485 20.985744
122 2021-09-22 00:00:00 40734.09 44000.55 40565.39 43543.61 58349.055420 21.676235
123 2021-09-23 00:00:00 43546.37 44978.00 43069.09 44865.26 48699.576550 22.029837
124 2021-09-24 00:00:00 44865.26 45200.00 40675.00 42810.57 84113.426292 22.735109
125 2021-09-25 00:00:00 42810.58 42966.84 41646.28 42670.64 33594.571890 23.405118
126 2021-09-26 00:00:00 42670.63 43950.00 40750.00 43160.90 49879.997650 23.734984
127 2021-09-27 00:00:00 43160.90 44350.00 42098.00 42147.35 39776.843830 23.925323
128 2021-09-28 00:00:00 42147.35 42787.38 40888.00 41026.54 43372.262400 24.312088
129 2021-09-29 00:00:00 41025.01 42590.00 40753.88 41524.28 33511.534870 24.702028
130 2021-09-30 00:00:00 41524.29 44141.37 41410.17 43824.10 46381.227810 24.581907
131 2021-01-10 00:00:00 43820.01 48495.00 43283.03 48141.61 66244.874920 23.367632
132 2021-02-10 00:00:00 48141.60 48336.59 47430.18 47634.90 30508.981310 22.214071
133 2021-03-10 00:00:00 47634.89 49228.08 47088.00 48200.01 30825.056010 21.285226
134 2021-04-10 00:00:00 48200.01 49536.12 46891.00 49224.94 46796.493720 20.470586
135 2021-05-10 00:00:00 49224.93 51886.30 49022.40 51471.99 52125.667930 20.178783
136 2021-06-10 00:00:00 51471.99 55750.00 50382.41 55315.00 79877.545181 20.539207
137 2021-07-10 00:00:00 55315.00 55332.31 53357.00 53785.22 54917.377660 20.881611
138 2021-08-10 00:00:00 53785.22 56100.00 53617.61 53951.43 46160.257850 21.322501
139 2021-09-10 00:00:00 53955.67 55489.00 53661.67 54949.72 55177.080130 21.741347
140 2021-10-10 00:00:00 54949.72 56561.31 54080.00 54659.00 89237.836128 22.304343
141 2021-11-10 00:00:00 54659.01 57839.04 54415.06 57471.35 52933.165751 23.025557
142 2021-12-10 00:00:00 57471.35 57680.00 53879.00 55996.93 53471.285500 23.546775
143 2021-10-13 00:00:00 55996.91 57777.00 54167.19 57367.00 55808.444920 24.057061
144 2021-10-14 00:00:00 57370.83 58532.54 56818.05 57347.94 43053.336781 24.660876
145 2021-10-15 00:00:00 57347.94 62933.00 56850.00 61672.42 82512.908022 25.811065
146 2021-10-16 00:00:00 61672.42 62378.42 60150.00 60875.57 35467.880960 26.903744
147 2021-10-17 00:00:00 60875.57 61718.39 58963.00 61528.33 39099.241240 27.563757
148 2021-10-18 00:00:00 61528.32 62695.78 59844.45 62009.84 51798.448440 28.318027
149 2021-10-19 00:00:00 62005.60 64486.00 61322.22 64280.59 53628.107744 29.251726
150 2021-10-20 00:00:00 64280.59 67000.00 63481.40 66001.41 51428.934856 30.405550
151 2021-10-21 00:00:00 66001.40 66639.74 62000.00 62193.15 68538.645370 31.054053
152 2021-10-22 00:00:00 62193.15 63732.39 60000.00 60688.22 52119.358860 31.117531
153 2021-10-23 00:00:00 60688.23 61747.64 59562.15 61286.75 27626.936780 31.062358
154 2021-10-24 00:00:00 61286.75 61500.00 59510.63 60852.22 31226.576760 30.995921
155 2021-10-25 00:00:00 60852.22 63710.63 60650.00 63078.78 36853.838060 31.244720
156 2021-10-26 00:00:00 63078.78 63293.48 59817.55 60328.81 40217.500830 31.249961
157 2021-10-27 00:00:00 60328.81 61496.00 58000.00 58413.44 62124.490160 30.779004
158 2021-10-28 00:00:00 58413.44 62499.00 57820.00 60575.89 61056.353010 30.489479
159 2021-10-29 00:00:00 60575.90 62980.00 60174.81 62253.71 43973.904140 30.289382
160 2021-10-30 00:00:00 62253.70 62359.25 60673.00 61859.19 31478.125660 30.099291
161 2021-10-31 00:00:00 61859.19 62405.30 59945.36 61299.80 39267.637940 29.713720
162 2021-01-11 00:00:00 61299.81 62437.74 59405.00 60911.11 44687.666720 29.196216
163 2021-02-11 00:00:00 60911.12 64270.00 60624.68 63219.99 46368.284100 29.031364
164 2021-03-11 00:00:00 63220.57 63500.00 60382.76 62896.48 43336.090490 28.804634
165 2021-04-11 00:00:00 62896.49 63086.31 60677.01 61395.01 35930.933140 28.589242
166 2021-05-11 00:00:00 61395.01 62595.72 60721.00 60937.12 31604.487490 28.384619
167 2021-06-11 00:00:00 60940.18 61560.49 60050.00 61470.61 25590.574080 27.973716
168 2021-07-11 00:00:00 61470.62 63286.35 61322.78 63273.59 25515.688300 27.926901
169 2021-08-11 00:00:00 63273.58 67789.00 63273.58 67525.83 54442.094554 28.579845
170 2021-09-11 00:00:00 67525.82 68524.25 66222.40 66947.66 44661.378068 29.294016
171 2021-10-11 00:00:00 66947.67 69000.00 62822.90 64882.43 65171.504046 29.014734
172 2021-11-11 00:00:00 64882.42 65600.07 64100.00 64774.26 37237.980580 28.749416
173 2021-12-11 00:00:00 64774.25 65450.70 62278.00 64122.23 44490.108160 28.041179
174 2021-11-13 00:00:00 64122.22 65000.00 63360.22 64380.00 22504.973830 27.368353
175 2021-11-14 00:00:00 64380.01 65550.51 63576.27 65519.10 25705.073470 26.832078
176 2021-11-15 00:00:00 65519.11 66401.82 63400.00 63606.74 37829.371240 26.479925
177 2021-11-16 00:00:00 63606.73 63617.31 58574.07 60058.87 77455.156090 25.267463
178 2021-11-17 00:00:00 60058.87 60840.23 58373.00 60344.87 46289.384910 24.154719
179 2021-11-18 00:00:00 60344.86 60976.00 56474.26 56891.62 62146.999310 23.454728
180 2021-11-19 00:00:00 56891.62 58320.00 55600.00 58052.24 50715.887260 22.944550
181 2021-11-20 00:00:00 58057.10 59845.00 57353.00 59707.51 33811.590100 22.122892
182 2021-11-21 00:00:00 59707.52 60029.76 58486.65 58622.02 31902.227850 21.302202
183 2021-11-22 00:00:00 58617.70 59444.00 55610.00 56247.18 51724.320470 21.040602
184 2021-11-23 00:00:00 56243.83 58009.99 55317.00 57541.27 49917.850170 20.840946
185 2021-11-24 00:00:00 57541.26 57735.00 55837.00 57138.29 39612.049640 20.651273
186 2021-11-25 00:00:00 57138.29 59398.90 57000.00 58960.36 42153.515220 20.071560
187 2021-11-26 00:00:00 58960.37 59150.00 53500.00 53726.53 65927.870660 20.117912
188 2021-11-27 00:00:00 53723.72 55280.00 53610.00 54721.03 29716.999570 20.161946
189 2021-11-28 00:00:00 54716.47 57445.05 53256.64 57274.88 36163.713700 19.704241
190 2021-11-29 00:00:00 57274.89 58865.97 56666.67 57776.25 40125.280090 18.969898
191 2021-11-30 00:00:00 57776.25 59176.99 55875.55 56950.56 49161.051940 18.417868
192 2021-01-12 00:00:00 56950.56 59053.55 56458.01 57184.07 44956.636560 17.893439
193 2021-02-12 00:00:00 57184.07 57375.47 55777.77 56480.34 37574.059760 17.525876
194 2021-03-12 00:00:00 56484.26 57600.00 51680.00 53601.05 58927.690270 17.858850
195 2021-04-12 00:00:00 53601.05 53859.10 42000.30 49152.47 114203.373748 19.217441
196 2021-05-12 00:00:00 49152.46 49699.05 47727.21 49396.33 45580.820120 20.508102
197 2021-06-12 00:00:00 49396.32 50891.11 47100.00 50441.92 58571.215750 21.472003
198 2021-07-12 00:00:00 50441.91 51936.33 50039.74 50588.95 38253.468770 22.161968
199 2021-08-12 00:00:00 50588.95 51200.00 48600.00 50471.19 38425.924660 22.962218
200 2021-09-12 00:00:00 50471.19 50797.76 47320.00 47545.59 37692.686650 23.846688
201 2021-10-12 00:00:00 47535.90 50125.00 46852.00 47140.54 44233.573910 24.732127
202 2021-11-12 00:00:00 47140.54 49485.71 46751.00 49389.99 28889.193580 25.583369
203 2021-12-12 00:00:00 49389.99 50777.00 48638.00 50053.90 26017.934210 26.077754
204 2021-12-13 00:00:00 50053.90 50189.97 45672.75 46702.75 50869.520930 26.859770
205 2021-12-14 00:00:00 46702.76 48700.41 46290.00 48343.28 39955.984450 27.602685
206 2021-12-15 00:00:00 48336.95 49500.00 46547.00 48864.98 51629.181000 28.109255
207 2021-12-16 00:00:00 48864.98 49436.43 47511.00 47632.38 31949.867390 28.590496
208 2021-12-17 00:00:00 47632.38 47995.96 45456.00 46131.20 43104.488700 29.278437
209 2021-12-18 00:00:00 46133.83 47392.37 45500.00 46834.48 25020.052710 29.931981
210 2021-12-19 00:00:00 46834.47 48300.01 46406.91 46681.23 29305.706650 30.303705
211 2021-12-20 00:00:00 46681.24 47537.57 45558.85 46914.16 35848.506090 30.761072
212 2021-12-21 00:00:00 46914.17 49328.96 46630.00 48889.88 37713.929240 30.715132
213 2021-12-22 00:00:00 48887.59 49576.13 48421.87 48588.16 27004.202200 30.607162
214 2021-12-23 00:00:00 48588.17 51375.00 47920.42 50838.81 35192.540460 30.051098
215 2021-12-24 00:00:00 50838.82 51810.00 50384.43 50820.00 31661.949460 29.417439
When run below code is well. But I need date in x axis
test['avg'].plot(legend=True,figsize=(12,5))
plt.grid(True)
plt.xlabel('ADX')
plt.ylabel('date')
plt.title('ADX indicator')
plt.gcf().autofmt_xdate()
plt.show()
Correct plot:
But when I chose date for x axis, I take a bad plot. Code is below:
df.set_index('date',drop=True, inplace=True)
Modified data
test['avg'].plot(legend=True,figsize=(12,5))
plt.grid(True)
plt.xlabel('ADX')
plt.ylabel('date')
plt.title('ADX indicator')
plt.gcf().autofmt_xdate()
plt.show()
Bad plot:
and also why I take NaN value for ADX in TA-lib
Can you help me with this problem?
It does appear to be the problem of the source file. The column names are not tab separated. Once this is fixed, the plotting works fine.
The NaN issue is also the source file; the average was not calculated for the first several rows.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
test = pd.read_csv(r"modified_data.dat", sep='\t')
test.set_index('date')
date = test['date']
avg = test['avg']
fig, ax = plt.subplots(figsize=(20,10))
ax.plot(date, avg)
ax.tick_params(rotation=30, width = 2)
plt.xticks(np.arange(0, len(date)+1, 5))
ax.set_xticks
Output looks like this:

Append category to column if date range is between start and end date

I'm sure this is simple, but I can't wrap my head around it. Essentially I have two dataframes, a large df that contains process data every six hours and a smaller df that contains a condition number, a start date and an end date. I need to fill the condition column of the large dataframe with the condition number that corresponds to the date range, or else leave it blank if the dates do not fall between any date range in the small df. So my two frames would look like this:
Large df
Date P1 P2
7/1/2019 11:00 102 240
7/1/2019 17:00 102 247
7/1/2019 23:00 100 219
7/2/2019 5:00 107 213
7/2/2019 11:00 100 226
7/2/2019 17:00 104 239
7/2/2019 23:00 110 240
7/3/2019 5:00 110 232
7/3/2019 11:00 102 215
7/3/2019 17:00 103 219
7/3/2019 23:00 107 243
7/4/2019 5:00 107 246
7/4/2019 11:00 103 219
7/4/2019 17:00 105 220
7/4/2019 23:00 107 220
7/5/2019 5:00 107 227
7/5/2019 11:00 108 208
7/5/2019 17:00 110 248
7/5/2019 23:00 107 235
Small df
Condition Start Time End Time
A 7/1/2019 11:00 7/2/2019 5:00
B 7/3/2019 5:00 7/3/2019 23:00
C 7/4/2019 23:00 7/5/2019 17:00
And I need the result to look like this:
Date P1 P2 Cond
7/1/2019 11:00 102 240 A
7/1/2019 17:00 102 247 A
7/1/2019 23:00 100 219 A
7/2/2019 5:00 107 213 A
7/2/2019 11:00 100 226
7/2/2019 17:00 104 239
7/2/2019 23:00 110 240
7/3/2019 5:00 110 232 B
7/3/2019 11:00 102 215 B
7/3/2019 17:00 103 219 B
7/3/2019 23:00 107 243 B
7/4/2019 5:00 107 246
7/4/2019 11:00 103 219
7/4/2019 17:00 105 220
7/4/2019 23:00 107 220 C
7/5/2019 5:00 107 227 C
7/5/2019 11:00 108 208 C
7/5/2019 17:00 110 248 C
7/5/2019 23:00 107 235
You need:
for i, row in sdf.iterrows():
df.loc[df['Date'].between(row['Start Time'], row['End Time']), 'Cond'] = row['Condition']
Output:
Date P1 P2 Cond
0 2019-07-01 11:00:00 102 240 A
1 2019-07-01 17:00:00 102 247 A
2 2019-07-01 23:00:00 100 219 A
3 2019-07-02 05:00:00 107 213 A
4 2019-07-02 11:00:00 100 226 NaN
5 2019-07-02 17:00:00 104 239 NaN
6 2019-07-02 23:00:00 110 240 NaN
7 2019-07-03 05:00:00 110 232 B
8 2019-07-03 11:00:00 102 215 B
9 2019-07-03 17:00:00 103 219 B
10 2019-07-03 23:00:00 107 243 B
11 2019-07-04 05:00:00 107 246 NaN
12 2019-07-04 11:00:00 103 219 NaN
13 2019-07-04 17:00:00 105 220 NaN
14 2019-07-04 23:00:00 107 220 C
15 2019-07-05 05:00:00 107 227 C
16 2019-07-05 11:00:00 108 208 C
17 2019-07-05 17:00:00 110 248 C
18 2019-07-05 23:00:00 107 235 NaN
You may try pd.IntervalIndex and map as follows:
inx = pd.IntervalIndex.from_arrays(df2['Start Time'], df2['End Time'], closed='both')
df2.index = inx
df1['cond'] = df1.Date.map(df2.Condition)
Out[423]:
Date P1 P2 cond
0 2019-07-01 11:00:00 102 240 A
1 2019-07-01 17:00:00 102 247 A
2 2019-07-01 23:00:00 100 219 A
3 2019-07-02 05:00:00 107 213 A
4 2019-07-02 11:00:00 100 226 NaN
5 2019-07-02 17:00:00 104 239 NaN
6 2019-07-02 23:00:00 110 240 NaN
7 2019-07-03 05:00:00 110 232 B
8 2019-07-03 11:00:00 102 215 B
9 2019-07-03 17:00:00 103 219 B
10 2019-07-03 23:00:00 107 243 B
11 2019-07-04 05:00:00 107 246 NaN
12 2019-07-04 11:00:00 103 219 NaN
13 2019-07-04 17:00:00 105 220 NaN
14 2019-07-04 23:00:00 107 220 C
15 2019-07-05 05:00:00 107 227 C
16 2019-07-05 11:00:00 108 208 C
17 2019-07-05 17:00:00 110 248 C
18 2019-07-05 23:00:00 107 235 NaN
You could do something like the following:
df1 = pd.read_csv(io.StringIO(s1), sep='\s\s+', engine='python',
converters={'Date': pd.to_datetime})
df2 = pd.read_csv(io.StringIO(s2), sep='\s\s+', engine='python',
converters={'Start Time': pd.to_datetime, 'End Time': pd.to_datetime})
df2 = df2.set_index('Condition').stack().reset_index()
df = pd.merge_asof(df1, df2, left_on='Date', right_on=0, direction='backward')
df.loc[(df['level_1'].eq('End Time')) & (df['Date'] > df[0]), 'Condition'] = ''
print(df.iloc[:, :-2])
Date P1 P2 Condition
0 2019-07-01 11:00:00 102 240 A
1 2019-07-01 17:00:00 102 247 A
2 2019-07-01 23:00:00 100 219 A
3 2019-07-02 05:00:00 107 213 A
4 2019-07-02 11:00:00 100 226
5 2019-07-02 17:00:00 104 239
6 2019-07-02 23:00:00 110 240
7 2019-07-03 05:00:00 110 232 B
8 2019-07-03 11:00:00 102 215 B
9 2019-07-03 17:00:00 103 219 B
10 2019-07-03 23:00:00 107 243 B
11 2019-07-04 05:00:00 107 246
12 2019-07-04 11:00:00 103 219
13 2019-07-04 17:00:00 105 220
14 2019-07-04 23:00:00 107 220 C
15 2019-07-05 05:00:00 107 227 C
16 2019-07-05 11:00:00 108 208 C
17 2019-07-05 17:00:00 110 248 C
18 2019-07-05 23:00:00 107 235
df1.insert(3, "Cond", [None] * len(df1))
for i in range(len(df2)):
df1.loc[(df1["Date"] >= df2["Start Time"].loc[i]) * (df1["Date"] <= df2["End Time"].loc[i]), "Cond"] = df2["Condition"].loc[i]

How can I slice a dataframe by timestamp, when timestamp isn't classified as index?

How can I split my pandas dataframe by using the timestamp on it?
I got the following prices when I call df30m:
Timestamp Open High Low Close Volume
0 2016-05-01 19:30:00 449.80 450.13 449.80 449.90 74.1760
1 2016-05-01 20:00:00 449.90 450.27 449.90 450.07 63.5840
2 2016-05-01 20:30:00 450.12 451.00 450.02 450.51 64.1080
3 2016-05-01 21:00:00 450.51 452.05 450.50 451.22 75.7390
4 2016-05-01 21:30:00 451.21 451.64 450.81 450.87 71.1190
5 2016-05-01 22:00:00 450.87 452.05 450.87 451.07 73.8430
6 2016-05-01 22:30:00 451.09 451.70 450.91 450.91 68.1490
7 2016-05-01 23:00:00 450.91 450.98 449.97 450.61 84.5430
8 2016-05-01 23:30:00 450.61 451.50 450.55 451.45 111.2370
9 2016-05-02 00:00:00 451.47 452.31 450.69 451.19 190.0750
10 2016-05-02 00:30:00 451.20 451.68 450.45 450.82 186.0930
11 2016-05-02 01:00:00 450.83 451.64 450.65 450.73 112.4630
12 2016-05-02 01:30:00 450.73 451.10 450.31 450.56 137.7530
13 2016-05-02 02:00:00 450.56 452.01 449.98 450.27 151.6140
14 2016-05-02 02:30:00 450.27 451.30 450.23 451.11 99.5490
15 2016-05-02 03:00:00 451.29 451.29 450.17 450.33 178.9860
16 2016-05-02 03:30:00 450.44 451.20 450.44 450.75 65.1480
17 2016-05-02 04:00:00 450.79 451.20 450.75 451.00 78.0430
18 2016-05-02 04:30:00 451.00 451.11 450.85 451.11 64.7250
19 2016-05-02 05:00:00 451.11 451.64 451.00 451.12 73.4840
20 2016-05-02 05:30:00 451.12 451.83 450.67 451.33 94.1950
21 2016-05-02 06:00:00 451.35 451.37 450.17 450.18 227.7480
22 2016-05-02 06:30:00 450.18 450.43 450.17 450.17 83.0270
23 2016-05-02 07:00:00 450.17 450.43 448.90 449.41 170.4950
24 2016-05-02 07:30:00 449.38 450.00 448.56 448.56 243.0420
25 2016-05-02 08:00:00 448.67 448.67 446.21 448.00 525.7090
26 2016-05-02 08:30:00 448.12 448.49 445.00 445.00 673.5810
27 2016-05-02 09:00:00 445.00 445.51 440.11 444.20 1392.9049
28 2016-05-02 09:30:00 444.24 444.36 440.11 442.00 438.6860
29 2016-05-02 10:00:00 441.91 443.20 440.05 442.24 400.5850
... ... ... ... ... ... ...
1651 2016-06-05 05:00:00 578.74 579.00 577.92 578.39 93.6980
1652 2016-06-05 05:30:00 578.40 578.48 574.52 575.26 98.1580
1653 2016-06-05 06:00:00 575.24 576.02 572.47 574.06 126.8620
1654 2016-06-05 06:30:00 574.06 576.35 574.06 576.34 125.4120
1655 2016-06-05 07:00:00 576.34 576.34 574.73 575.83 34.8070
1656 2016-06-05 07:30:00 575.84 576.27 574.91 575.58 74.8180
1657 2016-06-05 08:00:00 575.58 578.57 575.58 578.36 123.2560
1658 2016-06-05 08:30:00 578.23 578.47 576.18 577.25 43.6590
1659 2016-06-05 09:00:00 577.20 578.85 576.70 577.27 95.3900
1660 2016-06-05 09:30:00 577.36 578.18 576.70 576.70 51.0250
1661 2016-06-05 10:00:00 576.70 576.70 574.55 575.39 101.0590
1662 2016-06-05 10:30:00 575.41 576.44 575.18 576.44 86.4340
1663 2016-06-05 11:00:00 576.50 577.89 576.50 577.80 113.0600
1664 2016-06-05 11:30:00 577.80 578.10 576.03 576.98 57.5050
1665 2016-06-05 12:00:00 576.98 577.55 576.59 577.54 56.1070
1666 2016-06-05 12:30:00 577.54 583.00 570.93 572.82 872.8200
1667 2016-06-05 13:00:00 572.94 573.19 569.64 572.50 310.0020
1668 2016-06-05 13:30:00 572.50 574.37 572.50 574.09 59.3410
1669 2016-06-05 14:00:00 574.09 574.19 571.51 572.98 155.4310
1670 2016-06-05 14:30:00 572.98 573.57 572.02 573.47 76.9270
1671 2016-06-05 15:00:00 573.62 575.10 572.97 573.37 59.1430
1672 2016-06-05 15:30:00 573.37 574.39 573.37 574.38 77.3270
1673 2016-06-05 16:00:00 574.39 575.59 574.38 575.59 52.0150
1674 2016-06-05 16:30:00 575.00 575.59 574.50 575.00 66.9300
1675 2016-06-05 17:00:00 575.00 576.83 574.38 576.60 50.2990
1676 2016-06-05 17:30:00 576.60 577.50 575.50 576.86 104.5200
1677 2016-06-05 18:00:00 576.86 577.21 575.44 575.80 55.3270
1678 2016-06-05 18:30:00 575.77 575.80 574.52 574.77 78.7760
1679 2016-06-05 19:00:00 574.73 575.18 572.52 574.47 126.4300
1680 2016-06-05 19:30:00 574.49 574.87 573.80 574.32 10.4930
As you can see, it contains the last 35 days grouped by intervals of 30 min.
I wanna manipulate this price history in different time windows.
So, as a beginner example, I would like to fetch only the info from the last 1 day.
How can I filter this dataframe to show the info from the last 1 day?
This is what I've tried:
import datetime
d0 = datetime.datetime.today()
d1 = datetime.datetime.today() - datetime.timedelta(days=1)
print d0
>>> 2016-06-05 17:10:37.633824
print d1
>>> 2016-06-04 17:10:37.633967
df_1d = df30m['Timestamp'] > d1
print df_1d
This returns me a pandas series filled with True or False
0 False
1 False
2 False
3 False
4 False
...
1676 True
1677 True
1678 True
1679 True
1680 True
Also I've tried to use the between_time() module.
df_1d = df30m.between_time(d0, d1)
But I got the following error message:
TypeError: Index must be DatetimeIndex
Please, can anyone show me a pythonic way to slice my dataframe?
You can use loc to index your data. Do you know if your timestamps at datetime.datetime formats or Pandas Timestamps?
df30m.loc[(df30m.Timestamp <= d0) & (df30m.Timestamp >= d1)]
You can set the index to the Timestamp column and then index as follows:
df.set_index('Timestamp', inplace=True)
df[d1:d0]

Categories