df1
slot Time Location User
56 2017-10-26 22:15:00 89 1
2 2017-10-27 00:30:00 54 1
20 2017-10-28 05:00:00 64 1
24 2017-10-29 06:00:00 2 1
91 2017-11-01 22:45:00 78 1
62 2017-11-02 15:30:00 99 1
91 2017-11-02 22:45:00 34 1
47 2017-10-26 20:15:00 465 2
1 2017-10-27 00:10:00 67 2
20 2017-10-28 05:00:00 5746 2
28 2017-10-29 07:00:00 36 2
91 2017-11-01 22:45:00 786 2
58 2017-11-02 14:30:00 477 2
95 2017-11-02 23:45:00 7322 2
df2
slot
2
91
62
58
I need the output df3 as
slot Time Location User
2 2017-10-27 00:30:00 54 1
91 2017-11-01 22:45:00 78 1
91 2017-11-02 22:45:00 34 1
91 2017-11-01 22:45:00 786 2
62 2017-11-02 15:30:00 99 1
58 2017-11-02 14:30:00 477 2
if those are csv file then we can join them
join File1 file2 > file3
But how can we do the same for the outputs in Jupyter notebook
Try isin:
df1[df1.slot.isin(df2.slot)]
Output:
slot Time Location User
1 2 2017-10-27 00:30:00 54 1
4 91 2017-11-01 22:45:00 78 1
5 62 2017-11-02 15:30:00 99 1
6 91 2017-11-02 22:45:00 34 1
11 91 2017-11-01 22:45:00 786 2
12 58 2017-11-02 14:30:00 477 2
Related
First data frame:
date time open high low close volume avg
0 2021-05-23 00:00:00 37458.51 38270.64 31111.01 34655.25 217136.046593 NaN
1 2021-05-24 00:00:00 34681.44 39920.00 34031.00 38796.29 161630.893971 NaN
2 2021-05-25 00:00:00 38810.99 39791.77 36419.62 38324.72 111996.228404 NaN
3 2021-05-26 00:00:00 38324.72 40841.00 37800.44 39241.91 104780.773396 NaN
4 2021-05-27 00:00:00 39241.92 40411.14 37134.27 38529.98 86547.158794 NaN
5 2021-05-28 00:00:00 38529.99 38877.83 34684.00 35663.49 135377.629720 NaN
6 2021-05-29 00:00:00 35661.79 37338.58 33632.76 34605.15 112663.092689 NaN
7 2021-05-30 00:00:00 34605.15 36488.00 33379.00 35641.27 73535.386967 NaN
8 2021-05-31 00:00:00 35641.26 37499.00 34153.84 37253.81 94160.735289 NaN
9 2021-01-06 00:00:00 37253.82 37894.81 35666.00 36693.09 81234.663770 NaN
10 2021-02-06 00:00:00 36694.85 38225.00 35920.00 37568.68 67587.372495 NaN
11 2021-03-06 00:00:00 37568.68 39476.00 37170.00 39246.79 75889.106011 NaN
12 2021-04-06 00:00:00 39246.78 39289.07 35555.15 36829.00 91317.799245 NaN
13 2021-05-06 00:00:00 36829.15 37925.00 34800.00 35513.20 70459.621490 NaN
14 2021-06-06 00:00:00 35516.07 36480.00 35222.00 35796.31 47650.206637 NaN
15 2021-07-06 00:00:00 35796.31 36900.00 33300.00 33552.79 77574.952573 NaN
16 2021-08-06 00:00:00 33556.96 34068.01 31000.00 33380.81 123251.189037 NaN
17 2021-09-06 00:00:00 33380.80 37534.79 32396.82 37388.05 136607.597517 NaN
18 2021-10-06 00:00:00 37388.05 38491.00 35782.00 36675.72 109527.284943 NaN
19 2021-11-06 00:00:00 36677.83 37680.40 35936.77 37331.98 78466.005300 NaN
20 2021-12-06 00:00:00 37331.98 37463.63 34600.36 35546.11 87717.549990 NaN
21 2021-06-13 00:00:00 35546.12 39380.00 34757.00 39020.57 86921.025555 NaN
22 2021-06-14 00:00:00 39020.56 41064.05 38730.00 40516.29 108522.391949 NaN
23 2021-06-15 00:00:00 40516.28 41330.00 39506.40 40144.04 80679.622838 NaN
24 2021-06-16 00:00:00 40143.80 40527.14 38116.01 38349.01 87771.976937 NaN
25 2021-06-17 00:00:00 38349.00 39559.88 37365.00 38092.97 79541.307119 NaN
26 2021-06-18 00:00:00 38092.97 38202.84 35129.29 35819.84 95228.042935 NaN
27 2021-06-19 00:00:00 35820.48 36457.00 34803.52 35483.72 68712.449461 NaN
28 2021-06-20 00:00:00 35483.72 36137.72 33336.00 35600.16 89878.170850 NaN
29 2021-06-21 00:00:00 35600.17 35750.00 31251.23 31608.93 168778.873159 NaN
30 2021-06-22 00:00:00 31614.12 33298.78 28805.00 32509.56 204208.179762 NaN
31 2021-06-23 00:00:00 32509.56 34881.00 31683.00 33678.07 126966.100563 NaN
32 2021-06-24 00:00:00 33675.07 35298.00 32286.57 34663.09 86625.804260 NaN
33 2021-06-25 00:00:00 34663.08 35500.00 31275.00 31584.45 116061.130356 NaN
34 2021-06-26 00:00:00 31576.09 32730.00 30151.00 32283.65 107820.375287 NaN
35 2021-06-27 00:00:00 32283.65 34749.00 31973.45 34700.34 96613.244211 NaN
36 2021-06-28 00:00:00 34702.49 35297.71 33862.72 34494.89 82222.267819 NaN
37 2021-06-29 00:00:00 34494.89 36600.00 34225.43 35911.73 90788.796220 NaN
38 2021-06-30 00:00:00 35911.72 36100.00 34017.55 35045.00 77152.197634 NaN
39 2021-01-07 00:00:00 35045.00 35057.57 32711.00 33504.69 71708.266112 15.362372
40 2021-02-07 00:00:00 33502.33 33977.04 32699.00 33786.55 56172.181378 15.386331
41 2021-03-07 00:00:00 33786.54 34945.61 33316.73 34669.13 43044.578641 15.154877
42 2021-04-07 00:00:00 34669.12 35967.85 34357.15 35286.51 43703.475789 14.677524
43 2021-05-07 00:00:00 35288.13 35293.78 33125.55 33690.14 64123.874245 14.486827
44 2021-06-07 00:00:00 33690.15 35118.88 33532.00 34220.01 58210.596349 14.305665
45 2021-07-07 00:00:00 34220.02 35059.09 33777.77 33862.12 53807.521675 14.133561
46 2021-08-07 00:00:00 33862.11 33929.64 32077.00 32875.71 70136.480320 14.336865
47 2021-09-07 00:00:00 32875.71 34100.00 32261.07 33815.81 47153.939899 14.479159
48 2021-10-07 00:00:00 33815.81 34262.00 33004.78 33502.87 34761.175468 14.564313
49 2021-11-07 00:00:00 33502.87 34666.00 33306.47 34258.99 31572.647448 14.517866
50 2021-12-07 00:00:00 34259.00 34678.43 32658.34 33086.63 48181.403762 14.627892
51 2021-07-13 00:00:00 33086.94 33340.00 32202.25 32729.77 41126.361008 14.839689
52 2021-07-14 00:00:00 32729.12 33114.03 31550.00 32820.02 46777.823484 15.192346
53 2021-07-15 00:00:00 32820.03 33185.25 31133.00 31880.00 51639.576353 15.623083
54 2021-07-16 00:00:00 31874.49 32249.18 31020.00 31383.87 48499.864154 16.058731
55 2021-07-17 00:00:00 31383.86 31955.92 31164.31 31520.07 34012.242132 16.472596
56 2021-07-18 00:00:00 31520.07 32435.00 31108.97 31778.56 35923.716186 16.669426
57 2021-07-19 00:00:00 31778.57 31899.00 30407.44 30839.65 47340.468499 17.041150
58 2021-07-20 00:00:00 30839.65 31063.07 29278.00 29790.35 61034.049017 17.671053
59 2021-07-21 00:00:00 29790.34 32858.00 29482.61 32144.51 82796.265128 17.564616
60 2021-07-22 00:00:00 32144.51 32591.35 31708.00 32287.83 46148.092433 17.463500
61 2021-07-23 00:00:00 32287.58 33650.00 31924.32 33634.09 50112.863626 16.984139
62 2021-07-24 00:00:00 33634.10 34500.00 33401.14 34258.14 47977.550138 16.242346
63 2021-07-25 00:00:00 34261.51 35398.00 33851.12 35381.02 47852.928313 15.607586
64 2021-07-26 00:00:00 35381.02 40550.00 35205.78 37237.60 152452.512724 16.219395
65 2021-07-27 00:00:00 37241.33 39542.61 36383.00 39457.87 88397.267015 16.800613
66 2021-07-28 00:00:00 39456.61 40900.00 38772.00 40019.56 101344.528441 17.599907
67 2021-07-29 00:00:00 40019.57 40640.00 39200.00 40016.48 53998.439283 18.359237
68 2021-07-30 00:00:00 40018.49 42316.71 38313.23 42206.37 73602.784805 19.368676
69 2021-07-31 00:00:00 42206.36 42448.00 41000.15 41461.83 44849.791012 20.349200
70 2021-01-08 00:00:00 41461.84 42599.00 39422.01 39845.44 53953.186326 20.714136
71 2021-02-08 00:00:00 39850.27 40480.01 38690.00 39147.82 50837.351954 20.816480
72 2021-03-08 00:00:00 39146.86 39780.00 37642.03 38207.05 57117.435853 20.578895
73 2021-04-08 00:00:00 38207.04 39969.66 37508.56 39723.18 52329.352430 20.396351
74 2021-05-08 00:00:00 39723.17 41350.00 37332.70 40862.46 84343.755621 20.526294
75 2021-06-08 00:00:00 40862.46 43392.43 39853.86 42836.87 75753.941347 21.042989
76 2021-07-08 00:00:00 42836.87 44700.00 42446.41 44572.54 73396.740808 21.756471
77 2021-08-08 00:00:00 44572.54 45310.00 43261.00 43794.37 69329.092698 22.533424
78 2021-09-08 00:00:00 43794.36 46454.15 42779.00 46253.40 74587.884845 23.450453
79 2021-10-08 00:00:00 46248.87 46700.00 44589.46 45584.99 53814.643421 24.359303
80 2021-11-08 00:00:00 45585.00 46743.47 45341.14 45511.00 52734.901977 25.229618
81 2021-12-08 00:00:00 45510.67 46218.12 43770.00 44399.00 55266.108781 25.471002
82 2021-08-13 00:00:00 44400.06 47886.00 44217.39 47800.00 48239.370431 25.995794
83 2021-08-14 00:00:00 47799.99 48144.00 45971.03 47068.51 46114.359022 26.537795
84 2021-08-15 00:00:00 47068.50 47372.27 45500.00 46973.82 42110.711334 26.878796
85 2021-08-16 00:00:00 46973.82 48053.83 45660.00 45901.29 52480.574014 27.326937
86 2021-08-17 00:00:00 45901.30 47160.00 44376.00 44695.95 57039.341629 27.285215
87 2021-08-18 00:00:00 44695.95 46000.00 44203.28 44705.29 54099.415985 27.184539
88 2021-08-19 00:00:00 44699.37 47033.00 43927.70 46760.62 53411.753920 27.302916
89 2021-08-20 00:00:00 46760.62 49382.99 46622.99 49322.47 56850.352228 27.840242
90 2021-08-21 00:00:00 49322.47 49757.04 48222.00 48821.87 46745.136584 28.412062
91 2021-08-22 00:00:00 48821.88 49500.00 48050.00 49239.22 37007.887795 28.889153
92 2021-08-23 00:00:00 49239.22 50500.00 49029.00 49488.85 52462.541954 29.512800
93 2021-08-24 00:00:00 49488.85 49860.00 47600.00 47674.01 51014.594748 29.565824
94 2021-08-25 00:00:00 47674.01 49264.30 47126.28 48973.32 44655.830342 29.446836
95 2021-08-26 00:00:00 48973.32 49352.84 46250.00 46843.87 49371.277774 29.028026
96 2021-08-27 00:00:00 46843.86 49149.93 46348.00 49069.90 42068.104965 28.630156
97 2021-08-28 00:00:00 49069.90 49299.00 48346.88 48895.35 26681.063786 28.287626
98 2021-08-29 00:00:00 48895.35 49632.27 47762.54 48767.83 32652.283473 27.744622
99 2021-08-30 00:00:00 48767.84 48888.61 46853.00 46982.91 40288.350830 26.903998
100 2021-08-31 00:00:00 46982.91 48246.11 46700.00 47100.89 48645.527370 26.051605
101 2021-01-09 00:00:00 47100.89 49156.00 46512.00 48810.52 49904.655280 25.499838
102 2021-02-09 00:00:00 48810.51 50450.13 48584.06 49246.64 54410.770538 25.311075
103 2021-03-09 00:00:00 49246.63 51000.00 48316.84 49999.14 59025.644157 25.265214
104 2021-04-09 00:00:00 49998.00 50535.69 49370.00 49915.64 34664.659590 25.221647
105 2021-05-09 00:00:00 49917.54 51900.00 49450.00 51756.88 40544.835873 25.504286
106 2021-06-09 00:00:00 51756.88 52780.00 50969.33 52663.90 49249.667081 25.962876
107 2021-07-09 00:00:00 52666.20 52920.00 42843.05 46863.73 123048.802719 25.276717
108 2021-08-09 00:00:00 46868.57 47340.99 44412.02 46048.31 65069.315200 24.624866
109 2021-09-09 00:00:00 46048.31 47399.97 45513.08 46395.14 50651.660020 23.989928
110 2021-10-09 00:00:00 46395.14 47033.00 44132.29 44850.91 49048.266180 23.670387
111 2021-11-09 00:00:00 44842.20 45987.93 44722.22 45173.69 30440.408100 23.366822
112 2021-12-09 00:00:00 45173.68 46460.00 44742.06 46025.24 32094.280520 22.938381
113 2021-09-13 00:00:00 46025.23 46880.00 43370.00 44940.73 65429.150560 22.820722
114 2021-09-14 00:00:00 44940.72 47250.00 44594.44 47111.52 44855.850990 22.594896
115 2021-09-15 00:00:00 47103.28 48500.00 46682.32 48121.41 43204.711740 22.007531
116 2021-09-16 00:00:00 48121.40 48557.00 47021.10 47737.82 40725.088950 21.432816
117 2021-09-17 00:00:00 47737.81 48150.00 46699.56 47299.98 34461.927760 20.965565
118 2021-09-18 00:00:00 47299.98 48843.20 47035.56 48292.74 30906.470380 20.306487
119 2021-09-19 00:00:00 48292.75 48372.83 46829.18 47241.75 29847.243490 19.735184
120 2021-09-20 00:00:00 47241.75 47347.25 42500.00 43015.62 78003.524443 20.139851
121 2021-09-21 00:00:00 43016.64 43639.00 39600.00 40734.38 84534.080485 20.985744
122 2021-09-22 00:00:00 40734.09 44000.55 40565.39 43543.61 58349.055420 21.676235
123 2021-09-23 00:00:00 43546.37 44978.00 43069.09 44865.26 48699.576550 22.029837
124 2021-09-24 00:00:00 44865.26 45200.00 40675.00 42810.57 84113.426292 22.735109
125 2021-09-25 00:00:00 42810.58 42966.84 41646.28 42670.64 33594.571890 23.405118
126 2021-09-26 00:00:00 42670.63 43950.00 40750.00 43160.90 49879.997650 23.734984
127 2021-09-27 00:00:00 43160.90 44350.00 42098.00 42147.35 39776.843830 23.925323
128 2021-09-28 00:00:00 42147.35 42787.38 40888.00 41026.54 43372.262400 24.312088
129 2021-09-29 00:00:00 41025.01 42590.00 40753.88 41524.28 33511.534870 24.702028
130 2021-09-30 00:00:00 41524.29 44141.37 41410.17 43824.10 46381.227810 24.581907
131 2021-01-10 00:00:00 43820.01 48495.00 43283.03 48141.61 66244.874920 23.367632
132 2021-02-10 00:00:00 48141.60 48336.59 47430.18 47634.90 30508.981310 22.214071
133 2021-03-10 00:00:00 47634.89 49228.08 47088.00 48200.01 30825.056010 21.285226
134 2021-04-10 00:00:00 48200.01 49536.12 46891.00 49224.94 46796.493720 20.470586
135 2021-05-10 00:00:00 49224.93 51886.30 49022.40 51471.99 52125.667930 20.178783
136 2021-06-10 00:00:00 51471.99 55750.00 50382.41 55315.00 79877.545181 20.539207
137 2021-07-10 00:00:00 55315.00 55332.31 53357.00 53785.22 54917.377660 20.881611
138 2021-08-10 00:00:00 53785.22 56100.00 53617.61 53951.43 46160.257850 21.322501
139 2021-09-10 00:00:00 53955.67 55489.00 53661.67 54949.72 55177.080130 21.741347
140 2021-10-10 00:00:00 54949.72 56561.31 54080.00 54659.00 89237.836128 22.304343
141 2021-11-10 00:00:00 54659.01 57839.04 54415.06 57471.35 52933.165751 23.025557
142 2021-12-10 00:00:00 57471.35 57680.00 53879.00 55996.93 53471.285500 23.546775
143 2021-10-13 00:00:00 55996.91 57777.00 54167.19 57367.00 55808.444920 24.057061
144 2021-10-14 00:00:00 57370.83 58532.54 56818.05 57347.94 43053.336781 24.660876
145 2021-10-15 00:00:00 57347.94 62933.00 56850.00 61672.42 82512.908022 25.811065
146 2021-10-16 00:00:00 61672.42 62378.42 60150.00 60875.57 35467.880960 26.903744
147 2021-10-17 00:00:00 60875.57 61718.39 58963.00 61528.33 39099.241240 27.563757
148 2021-10-18 00:00:00 61528.32 62695.78 59844.45 62009.84 51798.448440 28.318027
149 2021-10-19 00:00:00 62005.60 64486.00 61322.22 64280.59 53628.107744 29.251726
150 2021-10-20 00:00:00 64280.59 67000.00 63481.40 66001.41 51428.934856 30.405550
151 2021-10-21 00:00:00 66001.40 66639.74 62000.00 62193.15 68538.645370 31.054053
152 2021-10-22 00:00:00 62193.15 63732.39 60000.00 60688.22 52119.358860 31.117531
153 2021-10-23 00:00:00 60688.23 61747.64 59562.15 61286.75 27626.936780 31.062358
154 2021-10-24 00:00:00 61286.75 61500.00 59510.63 60852.22 31226.576760 30.995921
155 2021-10-25 00:00:00 60852.22 63710.63 60650.00 63078.78 36853.838060 31.244720
156 2021-10-26 00:00:00 63078.78 63293.48 59817.55 60328.81 40217.500830 31.249961
157 2021-10-27 00:00:00 60328.81 61496.00 58000.00 58413.44 62124.490160 30.779004
158 2021-10-28 00:00:00 58413.44 62499.00 57820.00 60575.89 61056.353010 30.489479
159 2021-10-29 00:00:00 60575.90 62980.00 60174.81 62253.71 43973.904140 30.289382
160 2021-10-30 00:00:00 62253.70 62359.25 60673.00 61859.19 31478.125660 30.099291
161 2021-10-31 00:00:00 61859.19 62405.30 59945.36 61299.80 39267.637940 29.713720
162 2021-01-11 00:00:00 61299.81 62437.74 59405.00 60911.11 44687.666720 29.196216
163 2021-02-11 00:00:00 60911.12 64270.00 60624.68 63219.99 46368.284100 29.031364
164 2021-03-11 00:00:00 63220.57 63500.00 60382.76 62896.48 43336.090490 28.804634
165 2021-04-11 00:00:00 62896.49 63086.31 60677.01 61395.01 35930.933140 28.589242
166 2021-05-11 00:00:00 61395.01 62595.72 60721.00 60937.12 31604.487490 28.384619
167 2021-06-11 00:00:00 60940.18 61560.49 60050.00 61470.61 25590.574080 27.973716
168 2021-07-11 00:00:00 61470.62 63286.35 61322.78 63273.59 25515.688300 27.926901
169 2021-08-11 00:00:00 63273.58 67789.00 63273.58 67525.83 54442.094554 28.579845
170 2021-09-11 00:00:00 67525.82 68524.25 66222.40 66947.66 44661.378068 29.294016
171 2021-10-11 00:00:00 66947.67 69000.00 62822.90 64882.43 65171.504046 29.014734
172 2021-11-11 00:00:00 64882.42 65600.07 64100.00 64774.26 37237.980580 28.749416
173 2021-12-11 00:00:00 64774.25 65450.70 62278.00 64122.23 44490.108160 28.041179
174 2021-11-13 00:00:00 64122.22 65000.00 63360.22 64380.00 22504.973830 27.368353
175 2021-11-14 00:00:00 64380.01 65550.51 63576.27 65519.10 25705.073470 26.832078
176 2021-11-15 00:00:00 65519.11 66401.82 63400.00 63606.74 37829.371240 26.479925
177 2021-11-16 00:00:00 63606.73 63617.31 58574.07 60058.87 77455.156090 25.267463
178 2021-11-17 00:00:00 60058.87 60840.23 58373.00 60344.87 46289.384910 24.154719
179 2021-11-18 00:00:00 60344.86 60976.00 56474.26 56891.62 62146.999310 23.454728
180 2021-11-19 00:00:00 56891.62 58320.00 55600.00 58052.24 50715.887260 22.944550
181 2021-11-20 00:00:00 58057.10 59845.00 57353.00 59707.51 33811.590100 22.122892
182 2021-11-21 00:00:00 59707.52 60029.76 58486.65 58622.02 31902.227850 21.302202
183 2021-11-22 00:00:00 58617.70 59444.00 55610.00 56247.18 51724.320470 21.040602
184 2021-11-23 00:00:00 56243.83 58009.99 55317.00 57541.27 49917.850170 20.840946
185 2021-11-24 00:00:00 57541.26 57735.00 55837.00 57138.29 39612.049640 20.651273
186 2021-11-25 00:00:00 57138.29 59398.90 57000.00 58960.36 42153.515220 20.071560
187 2021-11-26 00:00:00 58960.37 59150.00 53500.00 53726.53 65927.870660 20.117912
188 2021-11-27 00:00:00 53723.72 55280.00 53610.00 54721.03 29716.999570 20.161946
189 2021-11-28 00:00:00 54716.47 57445.05 53256.64 57274.88 36163.713700 19.704241
190 2021-11-29 00:00:00 57274.89 58865.97 56666.67 57776.25 40125.280090 18.969898
191 2021-11-30 00:00:00 57776.25 59176.99 55875.55 56950.56 49161.051940 18.417868
192 2021-01-12 00:00:00 56950.56 59053.55 56458.01 57184.07 44956.636560 17.893439
193 2021-02-12 00:00:00 57184.07 57375.47 55777.77 56480.34 37574.059760 17.525876
194 2021-03-12 00:00:00 56484.26 57600.00 51680.00 53601.05 58927.690270 17.858850
195 2021-04-12 00:00:00 53601.05 53859.10 42000.30 49152.47 114203.373748 19.217441
196 2021-05-12 00:00:00 49152.46 49699.05 47727.21 49396.33 45580.820120 20.508102
197 2021-06-12 00:00:00 49396.32 50891.11 47100.00 50441.92 58571.215750 21.472003
198 2021-07-12 00:00:00 50441.91 51936.33 50039.74 50588.95 38253.468770 22.161968
199 2021-08-12 00:00:00 50588.95 51200.00 48600.00 50471.19 38425.924660 22.962218
200 2021-09-12 00:00:00 50471.19 50797.76 47320.00 47545.59 37692.686650 23.846688
201 2021-10-12 00:00:00 47535.90 50125.00 46852.00 47140.54 44233.573910 24.732127
202 2021-11-12 00:00:00 47140.54 49485.71 46751.00 49389.99 28889.193580 25.583369
203 2021-12-12 00:00:00 49389.99 50777.00 48638.00 50053.90 26017.934210 26.077754
204 2021-12-13 00:00:00 50053.90 50189.97 45672.75 46702.75 50869.520930 26.859770
205 2021-12-14 00:00:00 46702.76 48700.41 46290.00 48343.28 39955.984450 27.602685
206 2021-12-15 00:00:00 48336.95 49500.00 46547.00 48864.98 51629.181000 28.109255
207 2021-12-16 00:00:00 48864.98 49436.43 47511.00 47632.38 31949.867390 28.590496
208 2021-12-17 00:00:00 47632.38 47995.96 45456.00 46131.20 43104.488700 29.278437
209 2021-12-18 00:00:00 46133.83 47392.37 45500.00 46834.48 25020.052710 29.931981
210 2021-12-19 00:00:00 46834.47 48300.01 46406.91 46681.23 29305.706650 30.303705
211 2021-12-20 00:00:00 46681.24 47537.57 45558.85 46914.16 35848.506090 30.761072
212 2021-12-21 00:00:00 46914.17 49328.96 46630.00 48889.88 37713.929240 30.715132
213 2021-12-22 00:00:00 48887.59 49576.13 48421.87 48588.16 27004.202200 30.607162
214 2021-12-23 00:00:00 48588.17 51375.00 47920.42 50838.81 35192.540460 30.051098
215 2021-12-24 00:00:00 50838.82 51810.00 50384.43 50820.00 31661.949460 29.417439
When run below code is well. But I need date in x axis
test['avg'].plot(legend=True,figsize=(12,5))
plt.grid(True)
plt.xlabel('ADX')
plt.ylabel('date')
plt.title('ADX indicator')
plt.gcf().autofmt_xdate()
plt.show()
Correct plot:
But when I chose date for x axis, I take a bad plot. Code is below:
df.set_index('date',drop=True, inplace=True)
Modified data
test['avg'].plot(legend=True,figsize=(12,5))
plt.grid(True)
plt.xlabel('ADX')
plt.ylabel('date')
plt.title('ADX indicator')
plt.gcf().autofmt_xdate()
plt.show()
Bad plot:
and also why I take NaN value for ADX in TA-lib
Can you help me with this problem?
It does appear to be the problem of the source file. The column names are not tab separated. Once this is fixed, the plotting works fine.
The NaN issue is also the source file; the average was not calculated for the first several rows.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
test = pd.read_csv(r"modified_data.dat", sep='\t')
test.set_index('date')
date = test['date']
avg = test['avg']
fig, ax = plt.subplots(figsize=(20,10))
ax.plot(date, avg)
ax.tick_params(rotation=30, width = 2)
plt.xticks(np.arange(0, len(date)+1, 5))
ax.set_xticks
Output looks like this:
I have a dataframe of daily stock data, which is indexed by a datetimeindex.
There are multiple stock entries, thus there are duplicate datetimeindex values.
I am looking for a way to:
Group the dataframe by the stock symbol
Resample the prices for each symbol group into monthly price frequency data
Perform a pct_change calculation on each symbol group monthly price
Store it as a new column 'monthly_return' in the original dataframe.
I have been able to manage the first three operations. Storing the result in the original dataframe is where I'm having some trouble.
To illustrate this, I created a toy dataset which includes a 'dummy' index (idx) column which I use to assist creation of the desired output later on in the third code block.
import random
import pandas as pd
import numpy as np
datelist = pd.date_range(pd.datetime(2018,1,1), periods=PER).to_pydatetime().tolist() * 2
ids = [random.choice(['A', 'B']) for i in range(len(datelist))]
prices = random.sample(range(200), len(datelist))
idx = range(len(datelist))
df1 = pd.DataFrame(data=zip(idx, ids, prices), index=datelist, columns='idx label prices'.split())
print(df1.head(10))
df1
idx label prices
2018-01-01 0 B 40
2018-01-02 1 A 190
2018-01-03 2 A 159
2018-01-04 3 A 25
2018-01-05 4 A 89
2018-01-06 5 B 164
...
2018-01-31 30 A 102
2018-02-01 31 A 117
2018-02-02 32 A 120
2018-02-03 33 B 75
2018-02-04 34 B 170
...
Desired Output
idx label prices monthly_return
2018-01-01 0 B 40 0.000000
2018-01-02 1 A 190 0.000000
2018-01-03 2 A 159 0.000000
2018-01-04 3 A 25 0.000000
2018-01-05 4 A 89 0.000000
2018-01-06 5 B 164 0.000000
...
2018-01-31 30 A 102 -0.098039
2018-02-01 31 A 117 0.000000
2018-02-02 32 A 120 0.000000
...
2018-02-26 56 B 152 0.000000
2018-02-27 57 B 2 0.000000
2018-02-28 58 B 49 -0.040816
2018-03-01 59 B 188 0.000000
...
2018-01-28 89 A 88 0.000000
2018-01-29 90 A 26 0.000000
2018-01-30 91 B 128 0.000000
2018-01-31 92 A 144 -0.098039
...
2018-02-26 118 A 92 0.000000
2018-02-27 119 B 111 0.000000
2018-02-28 120 B 34 -0.040816
...
What I have tried so far is:
dfX = df1.copy(deep=True)
dfX = df1.groupby('label').resample('M')['prices'].last().pct_change(1).shift(-1)
print(dfX)
Which outputs:
label
A 2018-01-31 -0.067961
2018-02-28 -0.364583
2018-03-31 0.081967
B 2018-01-31 1.636364
2018-02-28 -0.557471
2018-03-31 NaN
This is quite close to what I would like to do, however I am only getting pct_change data on end of month dates back which is annoying to store back in the original dataframe (df1) as a new column.
Something like this doesn't work:
dfX = df1.copy(deep=True)
dfX['monthly_return'] = df1.groupby('label').resample('M')['prices'].last().pct_change(1).shift(-1)
As it yields the error:
TypeError: incompatible index of inserted column with frame index
I have considered 'upsampling' the monthly_return data back into a daily series, however this could likely end up causing the same error mentioned above since the original dataset could be missing dates (such as weekends). Additionally, resetting the index to clear this error would still create problems as the grouped dfX does not have the same number of rows/frequency as the original df1 which is of daily frequency.
I have a hunch that this can be done by using multi-indexing and dataframe merging however I am unsure how to go about doing so.
This generates my desired output, but it isn't as clean of a solution as I was hoping for
df1 is generated the same as before (code given in question):
idx label prices
2018-01-01 0 A 145
2018-01-02 1 B 86
2018-01-03 2 B 141
...
2018-01-25 86 B 12
2018-01-26 87 B 71
2018-01-27 88 B 186
2018-01-28 89 B 151
2018-01-29 90 A 161
2018-01-30 91 B 143
2018-01-31 92 B 88
...
Then:
def fun(x):
dates = x.date
x = x.set_index('date', drop=True)
x['monthly_return'] = x.resample('M').last()['prices'].pct_change(1).shift(-1)
x = x.reindex(dates)
return x
dfX = df1.copy(deep=True)
dfX.reset_index(inplace=True)
dfX.columns = 'date idx label prices'.split()
dfX = dfX.groupby('label').apply(fun).droplevel(level='label')
print(dfX)
Which outputs the desired result (unsorted):
idx label prices monthly_return
date
2018-01-01 0 A 145 NaN
2018-01-06 5 A 77 NaN
2018-01-08 7 A 48 NaN
2018-01-09 8 A 31 NaN
2018-01-11 10 A 20 NaN
2018-01-12 11 A 27 NaN
2018-01-14 13 A 109 NaN
2018-01-15 14 A 166 NaN
2018-01-17 16 A 130 NaN
2018-01-18 17 A 139 NaN
2018-01-19 18 A 191 NaN
2018-01-21 20 A 164 NaN
2018-01-22 21 A 112 NaN
2018-01-23 22 A 167 NaN
2018-01-25 24 A 140 NaN
2018-01-26 25 A 42 NaN
2018-01-30 29 A 107 NaN
2018-02-04 34 A 9 NaN
2018-02-07 37 A 84 NaN
2018-02-08 38 A 23 NaN
2018-02-10 40 A 30 NaN
2018-02-12 42 A 89 NaN
2018-02-15 45 A 79 NaN
2018-02-16 46 A 115 NaN
2018-02-19 49 A 197 NaN
2018-02-21 51 A 11 NaN
2018-02-26 56 A 111 NaN
2018-02-27 57 A 126 NaN
2018-03-01 59 A 135 NaN
2018-03-03 61 A 28 NaN
2018-01-01 62 A 120 NaN
2018-01-03 64 A 170 NaN
2018-01-05 66 A 45 NaN
2018-01-07 68 A 173 NaN
2018-01-08 69 A 158 NaN
2018-01-09 70 A 63 NaN
2018-01-11 72 A 62 NaN
2018-01-12 73 A 168 NaN
2018-01-14 75 A 169 NaN
2018-01-15 76 A 142 NaN
2018-01-17 78 A 83 NaN
2018-01-18 79 A 96 NaN
2018-01-21 82 A 25 NaN
2018-01-22 83 A 90 NaN
2018-01-23 84 A 59 NaN
2018-01-29 90 A 161 NaN
2018-02-01 93 A 150 NaN
2018-02-04 96 A 85 NaN
2018-02-06 98 A 124 NaN
2018-02-14 106 A 195 NaN
2018-02-16 108 A 136 NaN
2018-02-17 109 A 134 NaN
2018-02-18 110 A 183 NaN
2018-02-19 111 A 32 NaN
2018-02-24 116 A 102 NaN
2018-02-25 117 A 72 NaN
2018-02-27 119 A 38 NaN
2018-03-02 122 A 137 NaN
2018-03-03 123 A 171 NaN
2018-01-02 1 B 86 NaN
2018-01-03 2 B 141 NaN
2018-01-04 3 B 189 NaN
2018-01-05 4 B 60 NaN
2018-01-07 6 B 1 NaN
2018-01-10 9 B 87 NaN
2018-01-13 12 B 44 NaN
2018-01-16 15 B 147 NaN
2018-01-20 19 B 92 NaN
2018-01-24 23 B 81 NaN
2018-01-27 26 B 190 NaN
2018-01-28 27 B 24 NaN
2018-01-29 28 B 116 NaN
2018-01-31 30 B 98 1.181818
2018-02-01 31 B 121 NaN
2018-02-02 32 B 110 NaN
2018-02-03 33 B 66 NaN
2018-02-05 35 B 4 NaN
2018-02-06 36 B 13 NaN
2018-02-09 39 B 114 NaN
2018-02-11 41 B 16 NaN
2018-02-13 43 B 174 NaN
2018-02-14 44 B 78 NaN
2018-02-17 47 B 144 NaN
2018-02-18 48 B 14 NaN
2018-02-20 50 B 133 NaN
2018-02-22 52 B 156 NaN
2018-02-23 53 B 159 NaN
2018-02-24 54 B 177 NaN
2018-02-25 55 B 43 NaN
2018-02-28 58 B 19 -0.338542
2018-03-02 60 B 127 NaN
2018-01-02 63 B 2 NaN
2018-01-04 65 B 97 NaN
2018-01-06 67 B 8 NaN
2018-01-10 71 B 54 NaN
2018-01-13 74 B 106 NaN
2018-01-16 77 B 74 NaN
2018-01-19 80 B 188 NaN
2018-01-20 81 B 172 NaN
2018-01-24 85 B 51 NaN
2018-01-25 86 B 12 NaN
2018-01-26 87 B 71 NaN
2018-01-27 88 B 186 NaN
2018-01-28 89 B 151 NaN
2018-01-30 91 B 143 NaN
2018-01-31 92 B 88 1.181818
2018-02-02 94 B 75 NaN
2018-02-03 95 B 103 NaN
2018-02-05 97 B 82 NaN
2018-02-07 99 B 128 NaN
2018-02-08 100 B 123 NaN
2018-02-09 101 B 52 NaN
2018-02-10 102 B 18 NaN
2018-02-11 103 B 21 NaN
2018-02-12 104 B 50 NaN
2018-02-13 105 B 64 NaN
2018-02-15 107 B 185 NaN
2018-02-20 112 B 125 NaN
2018-02-21 113 B 108 NaN
2018-02-22 114 B 132 NaN
2018-02-23 115 B 180 NaN
2018-02-26 118 B 67 NaN
2018-02-28 120 B 192 -0.338542
2018-03-01 121 B 58 NaN
Perhaps there is a more concise and pythonic way of doing this.
How do I convert column B into the transition matrix in python?
Size of the matrix is 19 which is unique values in column B.
There are a total of 432 rows in the dataset.
time A B
2017-10-26 09:00:00 36 816
2017-10-26 10:45:00 43 816
2017-10-26 12:30:00 50 998
2017-10-26 12:45:00 51 750
2017-10-26 13:00:00 52 998
2017-10-26 13:15:00 53 998
2017-10-26 13:30:00 54 998
2017-10-26 14:00:00 56 998
2017-10-26 14:15:00 57 834
2017-10-26 14:30:00 58 1285
2017-10-26 14:45:00 59 1288
2017-10-26 23:45:00 95 1285
2017-10-27 03:00:00 12 1285
2017-10-27 03:30:00 14 1285
...
2017-11-02 14:00:00 56 998
2017-11-02 14:15:00 57 998
2017-11-02 14:30:00 58 998
2017-11-02 14:45:00 59 998
2017-11-02 15:00:00 60 816
2017-11-02 15:15:00 61 275
2017-11-02 15:30:00 62 225
2017-11-02 15:45:00 63 1288
2017-11-02 16:00:00 64 1088
2017-11-02 18:15:00 73 1285
2017-11-02 20:30:00 82 1285
2017-11-02 21:00:00 84 1088
2017-11-02 21:15:00 85 1088
2017-11-02 21:30:00 86 1088
2017-11-02 22:00:00 88 1088
2017-11-02 22:30:00 90 1088
2017-11-02 23:00:00 92 1088
2017-11-02 23:30:00 94 1088
2017-11-02 23:45:00 95 1088
The matrix should contain the number of transition between them.
B -----------------1088------1288----------------------------
B
.
.
1088 8 2
.
.
.
.
. Number of transitions between them.
..
.
.
I use your data to create DataFrame only with column B but it should work also with all columns.
text = '''time A B
2017-10-26 09:00:00 36 816
2017-10-26 10:45:00 43 816
2017-10-26 12:30:00 50 998
2017-10-26 12:45:00 51 750
2017-10-26 13:00:00 52 998
2017-10-26 13:15:00 53 998
2017-10-26 13:30:00 54 998
2017-10-26 14:00:00 56 998
2017-10-26 14:15:00 57 834
2017-10-26 14:30:00 58 1285
2017-10-26 14:45:00 59 1288
2017-10-26 23:45:00 95 1285
2017-10-27 03:00:00 12 1285
2017-10-27 03:30:00 14 1285
2017-11-02 14:00:00 56 998
2017-11-02 14:15:00 57 998
2017-11-02 14:30:00 58 998
2017-11-02 14:45:00 59 998
2017-11-02 15:00:00 60 816
2017-11-02 15:15:00 61 275
2017-11-02 15:30:00 62 225
2017-11-02 15:45:00 63 1288
2017-11-02 16:00:00 64 1088
2017-11-02 18:15:00 73 1285
2017-11-02 20:30:00 82 1285
2017-11-02 21:00:00 84 1088
2017-11-02 21:15:00 85 1088
2017-11-02 21:30:00 86 1088
2017-11-02 22:00:00 88 1088
2017-11-02 22:30:00 90 1088
2017-11-02 23:00:00 92 1088
2017-11-02 23:30:00 94 1088
2017-11-02 23:45:00 95 1088'''
import pandas as pd
B = [int(row[29:].strip()) for row in text.split('\n') if 'B' not in row]
df = pd.DataFrame({'B': B})
I get unique values in colum to use it later to create matrix
numbers = sorted(df['B'].unique())
print(numbers)
[225, 275, 750, 816, 834, 998, 1088, 1285, 1288]
I create shifted column C so I have both values in every row
df['C'] = df.shift(-1)
print(df)
B C
0 816 816.0
1 816 998.0
2 998 750.0
3 750 998.0
I group by ['B', 'C'] so I can count pairs
groups = df.groupby(['B', 'C'])
counts = {i[0]:(len(i[1]) if i[0][0] != i[0][1] else 0) for i in groups} # don't count (816,816)
# counts = {i[0]:len(i[1]) for i in groups} # count even (816,816)
print(counts)
{(225, 1288.0): 2, (275, 225.0): 2, (750, 998.0): 2, (816, 275.0): 2, (816, 816.0): 2, (816, 998.0): 2, (834, 1285.0): 2, (998, 750.0): 2, (998, 816.0): 2, (998, 834.0): 2, (998, 998.0): 12, (1088, 1088.0): 14, (1088, 1285.0): 2, (1285, 998.0): 2, (1285, 1088.0): 2, (1285, 1285.0): 6, (1285, 1288.0): 2, (1288, 1088.0): 2, (1288, 1285.0): 2}
Now I can create matrix. Using numbers and counts I create column/Series (with correct index) and I add it to matrix.
matrix = pd.DataFrame()
for x in numbers:
matrix[x] = pd.Series([counts.get((x,y), 0) for y in numbers], index=numbers)
print(matrix)
Result
225 275 750 816 834 998 1088 1285 1288
225 0 2 0 0 0 0 0 0 0
275 0 0 0 2 0 0 0 0 0
750 0 0 0 0 0 2 0 0 0
816 0 0 0 2 0 2 0 0 0
834 0 0 0 0 0 2 0 0 0
998 0 0 2 2 0 12 0 2 0
1088 0 0 0 0 0 0 14 2 2
1285 0 0 0 0 2 0 2 6 2
1288 2 0 0 0 0 0 0 2 0
Full example
text = '''time A B
2017-10-26 09:00:00 36 816
2017-10-26 10:45:00 43 816
2017-10-26 12:30:00 50 998
2017-10-26 12:45:00 51 750
2017-10-26 13:00:00 52 998
2017-10-26 13:15:00 53 998
2017-10-26 13:30:00 54 998
2017-10-26 14:00:00 56 998
2017-10-26 14:15:00 57 834
2017-10-26 14:30:00 58 1285
2017-10-26 14:45:00 59 1288
2017-10-26 23:45:00 95 1285
2017-10-27 03:00:00 12 1285
2017-10-27 03:30:00 14 1285
2017-11-02 14:00:00 56 998
2017-11-02 14:15:00 57 998
2017-11-02 14:30:00 58 998
2017-11-02 14:45:00 59 998
2017-11-02 15:00:00 60 816
2017-11-02 15:15:00 61 275
2017-11-02 15:30:00 62 225
2017-11-02 15:45:00 63 1288
2017-11-02 16:00:00 64 1088
2017-11-02 18:15:00 73 1285
2017-11-02 20:30:00 82 1285
2017-11-02 21:00:00 84 1088
2017-11-02 21:15:00 85 1088
2017-11-02 21:30:00 86 1088
2017-11-02 22:00:00 88 1088
2017-11-02 22:30:00 90 1088
2017-11-02 23:00:00 92 1088
2017-11-02 23:30:00 94 1088
2017-11-02 23:45:00 95 1088'''
import pandas as pd
B = [int(row[29:].strip()) for row in text.split('\n') if 'B' not in row]
df = pd.DataFrame({'B': B})
numbers = sorted(df['B'].unique())
print(numbers)
df['C'] = df.shift(-1)
print(df)
groups = df.groupby(['B', 'C'])
counts = {i[0]:(len(i[1]) if i[0][0] != i[0][1] else 0) for i in groups} # don't count (816,816)
# counts = {i[0]:len(i[1]) for i in groups} # count even (816,816)
print(counts)
matrix = pd.DataFrame()
for x in numbers:
matrix[str(x)] = pd.Series([counts.get((x,y), 0) for y in numbers], index=numbers)
print(matrix)
EDIT:
counts = {i[0]:(len(i[1]) if i[0][0] != i[0][1] else 0) for i in groups} # don't count (816,816)
as normal for loop
counts = {}
for pair, group in groups:
if pair[0] != pair[1]: # don't count (816,816)
counts[pair] = len(group)
else:
counts[pair] = 0
Invert value when it is bigger thant 10
counts = {}
for pair, group in groups:
if pair[0] != pair[1]: # don't count (816,816)
count = len(group)
if count > 10 :
counts[pair] = -count
else
counts[pair] = count
else:
counts[pair] = 0
EDIT:
counts = {}
for pair, group in groups:
if pair[0] != pair[1]: # don't count (816,816)
#counts[(A,B)] = len((A,B)) + len((B,A))
if pair not in counts:
counts[pair] = len(group) # put first value
else:
counts[pair] += len(group) # add second value
#counts[(B,A)] = len((A,B)) + len((B,A))
if (pair[1],pair[0]) not in counts:
counts[(pair[1],pair[0])] = len(group) # put first value
else:
counts[(pair[1],pair[0])] += len(group) # add second value
else:
counts[pair] = 0 # (816,816) gives 0
#counts[(A,B)] == counts[(B,A)]
counts_2 = {}
for pair, count in counts.items():
if count > 10 :
counts_2[pair] = -count
else:
counts_2[pair] = count
matrix = pd.DataFrame()
for x in numbers:
matrix[str(x)] = pd.Series([counts_2.get((x,y), 0) for y in numbers], index=numbers)
print(matrix)
An alternative, pandas based approach. Note I've used shift(1) which means transition is the next number:
text = '''time A B
2017-10-26 09:00:00 36 816
2017-10-26 10:45:00 43 816
2017-10-26 12:30:00 50 998
2017-10-26 12:45:00 51 750
2017-10-26 13:00:00 52 998
2017-10-26 13:15:00 53 998
2017-10-26 13:30:00 54 998
2017-10-26 14:00:00 56 998
2017-10-26 14:15:00 57 834
2017-10-26 14:30:00 58 1285
2017-10-26 14:45:00 59 1288
2017-10-26 23:45:00 95 1285
2017-10-27 03:00:00 12 1285
2017-10-27 03:30:00 14 1285
2017-11-02 14:00:00 56 998
2017-11-02 14:15:00 57 998
2017-11-02 14:30:00 58 998
2017-11-02 14:45:00 59 998
2017-11-02 15:00:00 60 816
2017-11-02 15:15:00 61 275
2017-11-02 15:30:00 62 225
2017-11-02 15:45:00 63 1288
2017-11-02 16:00:00 64 1088
2017-11-02 18:15:00 73 1285
2017-11-02 20:30:00 82 1285
2017-11-02 21:00:00 84 1088
2017-11-02 21:15:00 85 1088
2017-11-02 21:30:00 86 1088
2017-11-02 22:00:00 88 1088
2017-11-02 22:30:00 90 1088
2017-11-02 23:00:00 92 1088
2017-11-02 23:30:00 94 1088
2017-11-02 23:45:00 95 1088'''
import pandas as pd
B = [int(row[29:].strip()) for row in text.split('\n') if 'B' not in row]
df = pd.DataFrame({'B': B})
# alternative approach
df['C'] = df['B'].shift(1) # shift forward so B transitions to C
df['counts'] = 1 # add an arbirtary counts column for group by
# group together the combinations then unstack to get matrix
trans_matrix = df.groupby(['B', 'C']).count().unstack()
# max the columns a bit neater
trans_matrix.columns = trans_matrix.columns.droplevel()
The result is:
Which I think is correct, i.e the one time you observe 225, it then transitions to 1288. You would just divide through by the sample size to get a probability transition matrix for each value.
I'm trying to fill the missing slots in the CSV file which has date and time as a string.
My input from a csv file is:
A B C
56 2017-10-26 22:15:00 89
2 2017-10-27 00:30:00 54
20 2017-10-28 05:00:00 64
24 2017-10-29 06:00:00 2
91 2017-11-01 22:45:00 78
62 2017-11-02 15:30:00 99
91 2017-11-02 22:45:00 34
Output should be
A B C
0 2017-10-26 00:00:00 89
1 2017-10-26 00:15:00 89
.
.
.
.
.
56 2017-10-26 22:15:00 89
..
.
.
.
.
96 2017-10-26 23:45:00 89
0 2017-10-27 00:00:00 54
1 2017-10-27 00:15:00 54
2 2017-10-27 00:30:00 54
.
.
.
20 2017-10-28 05:00:00 64
21 2017-10-28 05:15:00 64
.
.
.
.
24 2017-10-29 06:00:00 2
.
91 2017-11-01 22:45:00 78
.
62 2017-11-02 15:30:00 99
.
91 2017-11-02 22:45:00 34
The output range is 15 min time slots for days between 2017-10-26 -> 2017-11-02 and each day have 96 slots.
And the same as above.
Using resample to get 15-min intervalsand bfill to fill missing values in B:
df = df.set_index(pd.to_datetime(df.pop('B')))
df.loc[df.index.min().normalize()] = None
df = df.resample('15min').max().bfill()
df['A'] = 4*df.index.hour + df.index.minute//15
print(df)
Output:
A C
B
2017-10-26 00:00:00 0 89.0
2017-10-26 00:15:00 1 89.0
2017-10-26 00:30:00 2 89.0
... .. ...
2017-11-02 22:15:00 89 34.0
2017-11-02 22:30:00 90 34.0
2017-11-02 22:45:00 91 34.0
You need to resample your data and to fill missing values by propagating the last known value for each date. Pandas could be helpful to do that. Assuming you loaded your csv in pandas (with pandas.read_csv), and you obtained a dataframe (let's call it df) where the date column is your index (df.set_index('B')), then:
df.resample(rule='15M').ffill()
The rule parameter defines the new frequency, and the call to .ffill() means "forward fill", i.e., replace missing data with previous ones.
I have these data:
val1 val2 val3
dt
2017-12-15 00:00:00 81 90 79
2017-12-15 00:01:00 67 85 80
2017-12-15 00:02:00 4 41 37
2017-12-15 00:03:00 61 68 29
2017-12-15 00:04:00 49 6 56
2017-12-15 00:05:00 94 13 93
2017-12-15 00:06:00 91 3 75
2017-12-15 00:07:00 94 81 7
2017-12-15 00:08:00 55 59 33
2017-12-15 00:09:00 97 89 26
2017-12-15 00:10:00 17 75 88
2017-12-15 00:11:00 39 40 96
2017-12-15 00:12:00 61 20 70
2017-12-15 00:13:00 62 31 93
2017-12-15 00:14:00 7 26 29
I would like to find the 3 max values for each 5-minute period.
The max values can be in any column (val1, val2, val3) and must be searched among the 15 values available for the 5 minutes.
At the moment I can only find the largest in a single column.
Is it possible to search for nlargest in multiple columns?
This is the code to generate the data and to search for the max for val1:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
date_ref = datetime(2017, 12, 15, 0,0,0)
dtime = pd.date_range(date_ref, freq='1min', periods=15)
np.random.seed(seed=1115)
data1 = np.random.randint(1, high=100, size=len(dtime))
data2 = np.random.randint(1, high=100, size=len(dtime))
data3 = np.random.randint(1, high=100, size=len(dtime))
df = pd.DataFrame({'dt': dtime, 'val1': data1, 'val2': data2, 'val3': data3})
df.set_index('dt', inplace=True)
print(df)
group = df.groupby(pd.Grouper(freq='5min'))
max_only_for_val1 = (pd.DataFrame(
group["val1"]
.nlargest(3))
.reset_index(level=1, drop=True)
)
print(max_only_for_val1)
This is the output:
val1
dt
2017-12-15 00:00:00 81
2017-12-15 00:00:00 67
2017-12-15 00:00:00 61
2017-12-15 00:05:00 97
2017-12-15 00:05:00 94
2017-12-15 00:05:00 94
2017-12-15 00:10:00 62
2017-12-15 00:10:00 61
2017-12-15 00:10:00 39
Since it doesn't matter where your values come from, let's reshape your data a bit.
df = df.reset_index().melt('dt').drop('variable', 1)
df.head(10)
dt value
0 2017-12-15 00:00:00 81
1 2017-12-15 00:01:00 67
2 2017-12-15 00:02:00 4
3 2017-12-15 00:03:00 61
4 2017-12-15 00:04:00 49
5 2017-12-15 00:05:00 94
6 2017-12-15 00:06:00 91
7 2017-12-15 00:07:00 94
8 2017-12-15 00:08:00 55
9 2017-12-15 00:09:00 97
Now, call groupby + apply -
def get_max3(x):
return x.sort_values(ascending=False).head(3)
df = df.groupby(pd.Grouper(key='dt', freq='5min'))['value']\
.apply(get_max3)\
.reset_index(0)\
.reset_index(drop=True)
dt value
0 2017-12-15 00:00:00 90
1 2017-12-15 00:00:00 85
2 2017-12-15 00:00:00 81
3 2017-12-15 00:05:00 97
4 2017-12-15 00:05:00 94
5 2017-12-15 00:05:00 94
6 2017-12-15 00:10:00 96
7 2017-12-15 00:10:00 93
8 2017-12-15 00:10:00 88
An alternative definition for get_max3 using numpy.sort -
def get_max3(x):
return np.sort(x.values)[-4::-1]