DateTime adjustment in pandas - python

I have a dataframe with thousands of rows, there is a column which is datetime:
I would like to adjust the time, a little like 00 ± 15 -> 00, and 30±15 ->30.
More precise saying is the minute within the range 46<->15 will change to 00, 16<->45 will change to 30, but it also needs care ± 1 on the hour
datetime
2022/11/15 00:29
2022/11/15 00:29
2022/11/15 00:29
2022/11/15 00:59
2022/11/15 00:59
2022/11/15 00:59
2022/11/15 01:35
2022/11/15 01:35
2022/11/15 01:35
2022/11/15 02:01
2022/11/15 02:01
2022/11/15 02:01
2022/11/15 02:45
2022/11/15 02:45
2022/11/15 02:45
2022/11/15 02:48
2022/11/15 02:48
2022/11/15 02:48
After adjustment, it would become
datetime
2022/11/15 00:30
2022/11/15 00:30
2022/11/15 00:30
2022/11/15 01:00
2022/11/15 01:00
2022/11/15 01:00
2022/11/15 01:30
2022/11/15 01:30
2022/11/15 01:30
2022/11/15 02:00
2022/11/15 02:00
2022/11/15 02:00
2022/11/15 02:30
2022/11/15 02:30
2022/11/15 02:30
2022/11/15 03:00
2022/11/15 03:00
2022/11/15 03:00

Use Series.dt.ceil by 15 minutes and then Series.dt.floor by 30:
df['datetime'] = pd.to_datetime(df['datetime']).dt.ceil('15Min').dt.floor('30Min')
print (df)
datetime
0 2022-11-15 00:30:00
1 2022-11-15 00:30:00
2 2022-11-15 00:30:00
3 2022-11-15 01:00:00
4 2022-11-15 01:00:00
5 2022-11-15 01:00:00
6 2022-11-15 01:30:00
7 2022-11-15 01:30:00
8 2022-11-15 01:30:00
9 2022-11-15 02:00:00
10 2022-11-15 02:00:00
11 2022-11-15 02:00:00
12 2022-11-15 02:30:00
13 2022-11-15 02:30:00
14 2022-11-15 02:30:00
15 2022-11-15 03:00:00
16 2022-11-15 03:00:00
17 2022-11-15 03:00:00

Related

Splitting Dataframe time into morning and evening

I have a df that looks like this (shortened):
DateTime Value Date Time
0 2022-09-18 06:00:00 5.4 18/09/2022 06:00
1 2022-09-18 07:00:00 6.0 18/09/2022 07:00
2 2022-09-18 08:00:00 6.5 18/09/2022 08:00
3 2022-09-18 09:00:00 6.7 18/09/2022 09:00
8 2022-09-18 14:00:00 7.9 18/09/2022 14:00
9 2022-09-18 15:00:00 7.8 18/09/2022 15:00
10 2022-09-18 16:00:00 7.6 18/09/2022 16:00
11 2022-09-18 17:00:00 6.8 18/09/2022 17:00
12 2022-09-18 18:00:00 6.4 18/09/2022 18:00
13 2022-09-18 19:00:00 5.7 18/09/2022 19:00
14 2022-09-18 20:00:00 4.8 18/09/2022 20:00
15 2022-09-18 21:00:00 5.4 18/09/2022 21:00
16 2022-09-18 22:00:00 4.7 18/09/2022 22:00
17 2022-09-18 23:00:00 4.3 18/09/2022 23:00
18 2022-09-19 00:00:00 4.1 19/09/2022 00:00
19 2022-09-19 01:00:00 4.4 19/09/2022 01:00
22 2022-09-19 04:00:00 3.5 19/09/2022 04:00
23 2022-09-19 05:00:00 2.8 19/09/2022 05:00
24 2022-09-19 06:00:00 3.8 19/09/2022 06:00
I want to create a new column where i split the between day and night like this:
00:00 - 05:00 night ,
06:00 - 18:00 day ,
19:00 - 23:00 night
But apparently one can't use same label? How can I solve this problem? Here is my code
df['period'] = pd.cut(pd.to_datetime(df.DateTime).dt.hour,
bins=[0, 5, 17, 23],
labels=['night', 'morning', 'night'],
include_lowest=True)
It's returning
ValueError: labels must be unique if ordered=True; pass ordered=False for duplicate labels
if i understood correctly, if time is between 00:00 - 05:00 or 19:00 - 23:00, you want your new column to say 'night', else 'day', well here's that code:
df['day/night'] = df['Time'].apply(lambda x: 'night' if '00:00' <= x <= '05:00' or '19:00' <= x <= '23:00' else 'day')
or you can add ordered = false parameter using your method
input ->
df = pd.DataFrame(columns=['DateTime', 'Value', 'Date', 'Time'], data=[
['2022-09-18 06:00:00', 5.4, '18/09/2022', '06:00'],
['2022-09-18 07:00:00', 6.0, '18/09/2022', '07:00'],
['2022-09-18 08:00:00', 6.5, '18/09/2022', '08:00'],
['2022-09-18 09:00:00', 6.7, '18/09/2022', '09:00'],
['2022-09-18 14:00:00', 7.9, '18/09/2022', '14:00'],
['2022-09-18 15:00:00', 7.8, '18/09/2022', '15:00'],
['2022-09-18 16:00:00', 7.6, '18/09/2022', '16:00'],
['2022-09-18 17:00:00', 6.8, '18/09/2022', '17:00'],
['2022-09-18 18:00:00', 6.4, '18/09/2022', '18:00'],
['2022-09-18 19:00:00', 5.7, '18/09/2022', '19:00'],
['2022-09-18 20:00:00', 4.8, '18/09/2022', '20:00'],
['2022-09-18 21:00:00', 5.4, '18/09/2022', '21:00'],
['2022-09-18 22:00:00', 4.7, '18/09/2022', '22:00'],
['2022-09-18 23:00:00', 4.3, '18/09/2022', '23:00'],
['2022-09-19 00:00:00', 4.1, '19/09/2022', '00:00'],
['2022-09-19 01:00:00', 4.4, '19/09/2022', '01:00'],
['2022-09-19 04:00:00', 3.5, '19/09/2022', '04:00'],
['2022-09-19 05:00:00', 2.8, '19/09/2022', '05:00'],
['2022-09-19 06:00:00', 3.8, '19/09/2022', '06:00']])
output ->
DateTime Value Date Time is_0600_0900
0 2022-09-18 06:00:00 5.4 18/09/2022 06:00 day
1 2022-09-18 07:00:00 6.0 18/09/2022 07:00 day
2 2022-09-18 08:00:00 6.5 18/09/2022 08:00 day
3 2022-09-18 09:00:00 6.7 18/09/2022 09:00 day
4 2022-09-18 14:00:00 7.9 18/09/2022 14:00 day
5 2022-09-18 15:00:00 7.8 18/09/2022 15:00 day
6 2022-09-18 16:00:00 7.6 18/09/2022 16:00 day
7 2022-09-18 17:00:00 6.8 18/09/2022 17:00 day
8 2022-09-18 18:00:00 6.4 18/09/2022 18:00 day
9 2022-09-18 19:00:00 5.7 18/09/2022 19:00 night
10 2022-09-18 20:00:00 4.8 18/09/2022 20:00 night
11 2022-09-18 21:00:00 5.4 18/09/2022 21:00 night
12 2022-09-18 22:00:00 4.7 18/09/2022 22:00 night
13 2022-09-18 23:00:00 4.3 18/09/2022 23:00 night
14 2022-09-19 00:00:00 4.1 19/09/2022 00:00 night
15 2022-09-19 01:00:00 4.4 19/09/2022 01:00 night
16 2022-09-19 04:00:00 3.5 19/09/2022 04:00 night
17 2022-09-19 05:00:00 2.8 19/09/2022 05:00 night
18 2022-09-19 06:00:00 3.8 19/09/2022 06:00 day
You have two options.
Either you don't care about the order and you can set ordered=False as parameter of cut:
df['period'] = pd.cut(pd.to_datetime(df.DateTime).dt.hour,
bins=[0, 5, 17, 23],
labels=['night', 'morning', 'night'],
ordered=False,
include_lowest=True)
Or you care to have night and morning ordered, in which case you can further convert to ordered Categorical:
df['period'] = pd.Categorical(df['period'], categories=['night', 'morning'], ordered=True)
output:
DateTime Value Date Time period
0 2022-09-18 06:00:00 5.4 18/09/2022 06:00 morning
1 2022-09-18 07:00:00 6.0 18/09/2022 07:00 morning
2 2022-09-18 08:00:00 6.5 18/09/2022 08:00 morning
3 2022-09-18 09:00:00 6.7 18/09/2022 09:00 morning
8 2022-09-18 14:00:00 7.9 18/09/2022 14:00 morning
9 2022-09-18 15:00:00 7.8 18/09/2022 15:00 morning
10 2022-09-18 16:00:00 7.6 18/09/2022 16:00 morning
11 2022-09-18 17:00:00 6.8 18/09/2022 17:00 morning
12 2022-09-18 18:00:00 6.4 18/09/2022 18:00 night
13 2022-09-18 19:00:00 5.7 18/09/2022 19:00 night
14 2022-09-18 20:00:00 4.8 18/09/2022 20:00 night
15 2022-09-18 21:00:00 5.4 18/09/2022 21:00 night
16 2022-09-18 22:00:00 4.7 18/09/2022 22:00 night
17 2022-09-18 23:00:00 4.3 18/09/2022 23:00 night
18 2022-09-19 00:00:00 4.1 19/09/2022 00:00 night
19 2022-09-19 01:00:00 4.4 19/09/2022 01:00 night
22 2022-09-19 04:00:00 3.5 19/09/2022 04:00 night
23 2022-09-19 05:00:00 2.8 19/09/2022 05:00 night
24 2022-09-19 06:00:00 3.8 19/09/2022 06:00 morning
column:
df['period']
0 morning
1 morning
2 morning
...
23 night
24 morning
Name: period, dtype: category
Categories (2, object): ['morning', 'night']

pandas merge/rearrange/sum single dataframe

I have following dataframe:
latitude longitude d1 d2 ar merge_time
0 15 10.0 12/1/1981 0:00 12/4/1981 3:00 2.317681391 1981-12-04 04:00:00
1 15 10.1 12/1/1981 0:00 12/1/1981 3:00 2.293604127 1981-12-01 04:00:00
2 15 10.2 12/1/1981 0:00 12/1/1981 2:00 2.264552161 1981-12-01 03:00:00
3 15 10.3 12/1/1981 0:00 12/4/1981 2:00 2.278556423 1981-12-04 03:00:00
4 15 10.1 12/1/1981 4:00 12/1/1981 22:00 2.168275766 1981-12-01 23:00:00
5 15 10.2 12/1/1981 3:00 12/1/1981 21:00 2.114636628 1981-12-01 22:00:00
6 15 10.4 12/1/1981 0:00 12/2/1981 17:00 1.384415903 1981-12-02 18:00:00
7 15 10.1 12/2/1981 8:00 12/2/1981 11:00 2.293604127 1981-12-01 12:00:00
I want to group and rearrange above dataframe (value of column ar) based on following criteria:
1. Values latitude and longitude are equal and
2. Values d2 and merge_time are equal withing grouped in 1
Here is desired output:
latitude longitude d1 d2 ar
15 10 12/1/1981 0:00 12/4/1981 3:00 2.317681391
15 10.1 12/1/1981 0:00 12/1/1981 22:00 4.461879893
15 10.2 12/1/1981 0:00 12/1/1981 21:00 4.379188789
15 10.3 12/1/1981 0:00 12/4/1981 2:00 2.278556423
15 10.4 12/1/1981 0:00 12/2/1981 17:00 1.384415903
15 10.1 12/2/1981 8:00 12/2/1981 11:00 2.293604127
How can I achieve this?
Any help is appreceated.
after expressing your requirements in comments
group by location (longitude & latitude)
find rows within this grouping that are contiguous in time
group and aggregate these contiguous sections
import io
import pandas as pd
df = pd.read_csv(io.StringIO(""" latitude longitude d1 d2 ar merge_time
0 15 10.0 12/1/1981 0:00 12/4/1981 3:00 2.317681391 1981-12-04 04:00:00
1 15 10.1 12/1/1981 0:00 12/1/1981 3:00 2.293604127 1981-12-01 04:00:00
2 15 10.2 12/1/1981 0:00 12/1/1981 2:00 2.264552161 1981-12-01 03:00:00
3 15 10.3 12/1/1981 0:00 12/4/1981 2:00 2.278556423 1981-12-04 03:00:00
4 15 10.1 12/1/1981 4:00 12/1/1981 22:00 2.168275766 1981-12-01 23:00:00
5 15 10.2 12/1/1981 3:00 12/1/1981 21:00 2.114636628 1981-12-01 22:00:00
6 15 10.4 12/1/1981 0:00 12/2/1981 17:00 1.384415903 1981-12-02 18:00:00
7 15 10.1 12/2/1981 8:00 12/2/1981 11:00 2.293604127 1981-12-01 12:00:00"""), sep="\s\s+", engine="python")
df = df.assign(**{c:pd.to_datetime(df[c]) for c in ["d1","d2","merge_time"]})
df.groupby(["latitude", "longitude"]).apply(
lambda d: d.groupby(
(d["d1"] != (d["d2"].shift() + pd.Timedelta("1H"))).cumsum(), as_index=False
).agg({"d1": "min", "d2": "max", "ar": "sum"})
).droplevel(2,0).reset_index()
output
latitude
longitude
d1
d2
ar
0
15
10
1981-12-01 00:00:00
1981-12-04 03:00:00
2.31768
1
15
10.1
1981-12-01 00:00:00
1981-12-01 22:00:00
4.46188
2
15
10.1
1981-12-02 08:00:00
1981-12-02 11:00:00
2.2936
3
15
10.2
1981-12-01 00:00:00
1981-12-01 21:00:00
4.37919
4
15
10.3
1981-12-01 00:00:00
1981-12-04 02:00:00
2.27856
5
15
10.4
1981-12-01 00:00:00
1981-12-02 17:00:00
1.38442

Convert 'object' column to datetime

I currently have the following dataframe (with seven days, one day displayed below). Hours run from 01:00 to 24:00. How do I convert the HourEnding column to datetime format and combine it with the date_time column (which is already in datetime format)?
HourEnding LMP date_time
0 01:00 165.27 2021-02-20
1 02:00 155.89 2021-02-20
2 03:00 154.50 2021-02-20
3 04:00 153.44 2021-02-20
4 05:00 210.15 2021-02-20
5 06:00 298.90 2021-02-20
6 07:00 152.71 2021-02-20
7 08:00 204.61 2021-02-20
8 09:00 155.77 2021-02-20
9 10:00 90.64 2021-02-20
10 11:00 57.17 2021-02-20
11 12:00 43.74 2021-02-20
12 13:00 33.42 2021-02-20
13 14:00 5.05 2021-02-20
14 15:00 1.43 2021-02-20
15 16:00 0.99 2021-02-20
16 17:00 0.94 2021-02-20
17 18:00 12.13 2021-02-20
18 19:00 18.90 2021-02-20
19 20:00 19.04 2021-02-20
20 21:00 16.42 2021-02-20
21 22:00 14.47 2021-02-20
22 23:00 44.55 2021-02-20
23 24:00 40.51 2021-02-20
So far I've tried
df['time'] = pd.to_datetime(df['HourEnding'])
but that seems to fail because of the 24:00.
Similarly
df['time'] = pd.to_timedelta('HourEnding', 'h', errors = 'coerce')
yields a column of NaTs.
As you mentioned in the comments, hour 24 corresponds to midnight of the same day. I would simply start by replacing "24" by "00" :
df['HourEnding'] = df.HourEnding.str.replace('24:00', '00:00')
Then, convert date_time to string :
df['date_time'] = df.date_time.astype(str)
Create a new column that concatenates date_time and HourEnding :
df['date_and_hour'] = df.date_time + " " + df.HourEnding
df['date_and_hour'] = pd.to_datetime(df.date_and_hour)
Which gives you this :
>>> df
HourEnding LMP date_time date_and_hour
0 01:00 165.27 2021-02-20 2021-02-20 01:00:00
1 02:00 155.89 2021-02-20 2021-02-20 02:00:00
2 03:00 154.50 2021-02-20 2021-02-20 03:00:00
3 04:00 153.44 2021-02-20 2021-02-20 04:00:00
4 05:00 210.15 2021-02-20 2021-02-20 05:00:00
5 06:00 298.90 2021-02-20 2021-02-20 06:00:00
6 07:00 152.71 2021-02-20 2021-02-20 07:00:00
7 08:00 204.61 2021-02-20 2021-02-20 08:00:00
8 09:00 155.77 2021-02-20 2021-02-20 09:00:00
9 10:00 90.64 2021-02-20 2021-02-20 10:00:00
10 11:00 57.17 2021-02-20 2021-02-20 11:00:00
11 12:00 43.74 2021-02-20 2021-02-20 12:00:00
12 13:00 33.42 2021-02-20 2021-02-20 13:00:00
13 14:00 5.05 2021-02-20 2021-02-20 14:00:00
14 15:00 1.43 2021-02-20 2021-02-20 15:00:00
15 16:00 0.99 2021-02-20 2021-02-20 16:00:00
16 17:00 0.94 2021-02-20 2021-02-20 17:00:00
17 18:00 12.13 2021-02-20 2021-02-20 18:00:00
18 19:00 18.90 2021-02-20 2021-02-20 19:00:00
19 20:00 19.04 2021-02-20 2021-02-20 20:00:00
20 21:00 16.42 2021-02-20 2021-02-20 21:00:00
21 22:00 14.47 2021-02-20 2021-02-20 22:00:00
22 23:00 44.55 2021-02-20 2021-02-20 23:00:00
23 00:00 40.51 2021-02-20 2021-02-20 00:00:00
>>> df.dtypes
HourEnding object
LMP float64
date_time object
date_and_hour datetime64[ns]
Convert both columns to strings, then join them into a new 'datetime' column, and finally convert the 'datetime' column to datetime.
EDIT: To deal with the 1-24 hour problem, build a function to split the string and subtract 1 from each of the hours and then join:
def subtract_hour(t):
t = t.split(':')
t[0] = str(int(t[0]) - 1)
if len(t[0]) < 2:
t[0] = '0' + t[0]
return ':'.join(t)
Then you can apply this to your hour column (e.g., df['hour'] = df['hour'].apply(subtract_hour)) and proceed with joining columns and then parsing using pd.to_datetime.
EDIT 2: You just want to change '24' to '00', my bad.
def mod_midnight(t):
t = t.split(':')
if t[0] == '24':
t[0] = '00'
return ':'.join(t)

Upsampling hourly data to 5 minute data in pandas

I have the following data:
MTU (CET) Day-ahead Price [EUR/MWh]
0 09.10.2017 00:00 - 09.10.2017 01:00 43.13
1 09.10.2017 01:00 - 09.10.2017 02:00 34.80
2 09.10.2017 02:00 - 09.10.2017 03:00 33.31
3 09.10.2017 03:00 - 09.10.2017 04:00 32.24
.......
22 09.10.2017 22:00 - 09.10.2017 23:00 49.06
23 09.10.2017 23:00 - 10.10.2017 00:00 38.46
From which I would like to have data for every 5 minutes.
By using:
price = pd.read_csv(price_data)
price_x = price.set_index(pd.DatetimeIndex(price['MTU (CET)'].str[:-19]))
price2 = price_x.resample('300S').pad()
I get the following data:
2017-09-10 00:00:00 43.13
2017-09-10 00:05:00 43.13
2017-09-10 00:10:00 43.13
...
2017-09-10 22:45:00 49.06
2017-09-10 22:50:00 49.06
2017-09-10 22:55:00 49.06
2017-09-10 23:00:00 38.46
However, for the minutes between 23:00 and 00:00 the price should also be 38.46. Does anyone know how to help?
You need manually add last row with next hour and with data from last row seelcted by iloc:
price_x = price.set_index(pd.DatetimeIndex(price['MTU (CET)'].str[:-19]))
price_x.loc[price_x.index[-1] + pd.Timedelta(1, unit='h')] = price_x.iloc[-1]
print (price_x.tail(3))
Day-ahead Price [EUR/MWh]
MTU (CET)
2017-09-10 22:00:00 49.06
2017-09-10 23:00:00 38.46
2017-09-11 00:00:00 38.46
price2 = price_x.resample('300S').pad()
print (price2.tail(20))
Day-ahead Price [EUR/MWh]
MTU (CET)
2017-09-10 22:25:00 49.06
2017-09-10 22:30:00 49.06
2017-09-10 22:35:00 49.06
2017-09-10 22:40:00 49.06
2017-09-10 22:45:00 49.06
2017-09-10 22:50:00 49.06
2017-09-10 22:55:00 49.06
2017-09-10 23:00:00 38.46
2017-09-10 23:05:00 38.46
2017-09-10 23:10:00 38.46
2017-09-10 23:15:00 38.46
2017-09-10 23:20:00 38.46
2017-09-10 23:25:00 38.46
2017-09-10 23:30:00 38.46
2017-09-10 23:35:00 38.46
2017-09-10 23:40:00 38.46
2017-09-10 23:45:00 38.46
2017-09-10 23:50:00 38.46
2017-09-10 23:55:00 38.46
2017-09-11 00:00:00 38.46

How can I slice a dataframe by timestamp, when timestamp isn't classified as index?

How can I split my pandas dataframe by using the timestamp on it?
I got the following prices when I call df30m:
Timestamp Open High Low Close Volume
0 2016-05-01 19:30:00 449.80 450.13 449.80 449.90 74.1760
1 2016-05-01 20:00:00 449.90 450.27 449.90 450.07 63.5840
2 2016-05-01 20:30:00 450.12 451.00 450.02 450.51 64.1080
3 2016-05-01 21:00:00 450.51 452.05 450.50 451.22 75.7390
4 2016-05-01 21:30:00 451.21 451.64 450.81 450.87 71.1190
5 2016-05-01 22:00:00 450.87 452.05 450.87 451.07 73.8430
6 2016-05-01 22:30:00 451.09 451.70 450.91 450.91 68.1490
7 2016-05-01 23:00:00 450.91 450.98 449.97 450.61 84.5430
8 2016-05-01 23:30:00 450.61 451.50 450.55 451.45 111.2370
9 2016-05-02 00:00:00 451.47 452.31 450.69 451.19 190.0750
10 2016-05-02 00:30:00 451.20 451.68 450.45 450.82 186.0930
11 2016-05-02 01:00:00 450.83 451.64 450.65 450.73 112.4630
12 2016-05-02 01:30:00 450.73 451.10 450.31 450.56 137.7530
13 2016-05-02 02:00:00 450.56 452.01 449.98 450.27 151.6140
14 2016-05-02 02:30:00 450.27 451.30 450.23 451.11 99.5490
15 2016-05-02 03:00:00 451.29 451.29 450.17 450.33 178.9860
16 2016-05-02 03:30:00 450.44 451.20 450.44 450.75 65.1480
17 2016-05-02 04:00:00 450.79 451.20 450.75 451.00 78.0430
18 2016-05-02 04:30:00 451.00 451.11 450.85 451.11 64.7250
19 2016-05-02 05:00:00 451.11 451.64 451.00 451.12 73.4840
20 2016-05-02 05:30:00 451.12 451.83 450.67 451.33 94.1950
21 2016-05-02 06:00:00 451.35 451.37 450.17 450.18 227.7480
22 2016-05-02 06:30:00 450.18 450.43 450.17 450.17 83.0270
23 2016-05-02 07:00:00 450.17 450.43 448.90 449.41 170.4950
24 2016-05-02 07:30:00 449.38 450.00 448.56 448.56 243.0420
25 2016-05-02 08:00:00 448.67 448.67 446.21 448.00 525.7090
26 2016-05-02 08:30:00 448.12 448.49 445.00 445.00 673.5810
27 2016-05-02 09:00:00 445.00 445.51 440.11 444.20 1392.9049
28 2016-05-02 09:30:00 444.24 444.36 440.11 442.00 438.6860
29 2016-05-02 10:00:00 441.91 443.20 440.05 442.24 400.5850
... ... ... ... ... ... ...
1651 2016-06-05 05:00:00 578.74 579.00 577.92 578.39 93.6980
1652 2016-06-05 05:30:00 578.40 578.48 574.52 575.26 98.1580
1653 2016-06-05 06:00:00 575.24 576.02 572.47 574.06 126.8620
1654 2016-06-05 06:30:00 574.06 576.35 574.06 576.34 125.4120
1655 2016-06-05 07:00:00 576.34 576.34 574.73 575.83 34.8070
1656 2016-06-05 07:30:00 575.84 576.27 574.91 575.58 74.8180
1657 2016-06-05 08:00:00 575.58 578.57 575.58 578.36 123.2560
1658 2016-06-05 08:30:00 578.23 578.47 576.18 577.25 43.6590
1659 2016-06-05 09:00:00 577.20 578.85 576.70 577.27 95.3900
1660 2016-06-05 09:30:00 577.36 578.18 576.70 576.70 51.0250
1661 2016-06-05 10:00:00 576.70 576.70 574.55 575.39 101.0590
1662 2016-06-05 10:30:00 575.41 576.44 575.18 576.44 86.4340
1663 2016-06-05 11:00:00 576.50 577.89 576.50 577.80 113.0600
1664 2016-06-05 11:30:00 577.80 578.10 576.03 576.98 57.5050
1665 2016-06-05 12:00:00 576.98 577.55 576.59 577.54 56.1070
1666 2016-06-05 12:30:00 577.54 583.00 570.93 572.82 872.8200
1667 2016-06-05 13:00:00 572.94 573.19 569.64 572.50 310.0020
1668 2016-06-05 13:30:00 572.50 574.37 572.50 574.09 59.3410
1669 2016-06-05 14:00:00 574.09 574.19 571.51 572.98 155.4310
1670 2016-06-05 14:30:00 572.98 573.57 572.02 573.47 76.9270
1671 2016-06-05 15:00:00 573.62 575.10 572.97 573.37 59.1430
1672 2016-06-05 15:30:00 573.37 574.39 573.37 574.38 77.3270
1673 2016-06-05 16:00:00 574.39 575.59 574.38 575.59 52.0150
1674 2016-06-05 16:30:00 575.00 575.59 574.50 575.00 66.9300
1675 2016-06-05 17:00:00 575.00 576.83 574.38 576.60 50.2990
1676 2016-06-05 17:30:00 576.60 577.50 575.50 576.86 104.5200
1677 2016-06-05 18:00:00 576.86 577.21 575.44 575.80 55.3270
1678 2016-06-05 18:30:00 575.77 575.80 574.52 574.77 78.7760
1679 2016-06-05 19:00:00 574.73 575.18 572.52 574.47 126.4300
1680 2016-06-05 19:30:00 574.49 574.87 573.80 574.32 10.4930
As you can see, it contains the last 35 days grouped by intervals of 30 min.
I wanna manipulate this price history in different time windows.
So, as a beginner example, I would like to fetch only the info from the last 1 day.
How can I filter this dataframe to show the info from the last 1 day?
This is what I've tried:
import datetime
d0 = datetime.datetime.today()
d1 = datetime.datetime.today() - datetime.timedelta(days=1)
print d0
>>> 2016-06-05 17:10:37.633824
print d1
>>> 2016-06-04 17:10:37.633967
df_1d = df30m['Timestamp'] > d1
print df_1d
This returns me a pandas series filled with True or False
0 False
1 False
2 False
3 False
4 False
...
1676 True
1677 True
1678 True
1679 True
1680 True
Also I've tried to use the between_time() module.
df_1d = df30m.between_time(d0, d1)
But I got the following error message:
TypeError: Index must be DatetimeIndex
Please, can anyone show me a pythonic way to slice my dataframe?
You can use loc to index your data. Do you know if your timestamps at datetime.datetime formats or Pandas Timestamps?
df30m.loc[(df30m.Timestamp <= d0) & (df30m.Timestamp >= d1)]
You can set the index to the Timestamp column and then index as follows:
df.set_index('Timestamp', inplace=True)
df[d1:d0]

Categories