Convert/use timedelta's time to datetime - python

I have a dataframe with two columns: 1 timedelta 'Time', and 1 datetime 'DateTime'.
My timedelta column simply contains/displays a normal regular time, it never has more than 24 hours. It's not being used as a 'timedetla', just 'time'.
It's just the way it comes when pandas gets the data from my database.
I want a new column 'NewDateTime', with the date from the datetime, and time from the deltatime.
So I have this:
Time DateTime
1 09:01:00 2018-01-01 10:10:10
2 21:43:00 2018-01-01 11:11:11
3 03:20:00 2018-01-01 12:12:12
And I want this:
Time DateTime NewDateTime
1 09:01:00 2018-01-01 10:10:10 2018-01-01 09:01:00
2 21:43:00 2018-01-01 11:11:11 2018-01-01 21:43:00
3 03:20:00 2018-01-01 12:12:12 2018-01-01 03:20:00
At first I tried to set the DateTime column's hours, minutes and seconds to 0.
Then I planned to add the timedelta to the datetime.
But when I tried to do:
df['NewDateTime'] = df['DateTime'].dt.replace(hour=0, minute=0, second=0)
I get AttributeError: 'DatetimeProperties' object has no attribute 'replace'

Use Series.dt.floor for remove times:
df['NewDateTime'] = df['DateTime'].dt.floor('D') + pd.to_timedelta(df['Time'])
#if necesary convert times to strings
#df['NewDateTime'] = df['DateTime'].dt.floor('D') + pd.to_timedelta(df['Time'].astype(str))
print (df)
Time DateTime NewDateTime
1 09:01:00 2018-01-01 10:10:10 2018-01-01 09:01:00
2 21:43:00 2018-01-01 11:11:11 2018-01-01 21:43:00
3 03:20:00 2018-01-01 12:12:12 2018-01-01 03:20:00

Related

DataFrame Pandas datetime error: hour must be in 0..23

I have the following time series and I want to convert to datetime in DataFrame using "pd.to_datetime". I am getting the following error: "hour must be in 0..23: 2017/ 01/01 24:00:00". How can I go around this error?
DateTime
0 2017/ 01/01 01:00:00
1 2017/ 01/01 02:00:00
2 2017/ 01/01 03:00:00
3 2017/ 01/01 04:00:00
...
22 2017/ 01/01 23:00:00
23 2017/ 01/01 24:00:00
Given:
DateTime
0 2017/01/01 01:00:00
1 2017/01/01 02:00:00
2 2017/01/01 03:00:00
3 2017/01/01 04:00:00
4 2017/01/01 23:00:00
5 2017/01/01 24:00:00
As the error says, 24:00:00 isn't a valid time. Depending on what it actually means, we can salvage it like this:
# Split up your Date and Time Values into separate Columns:
df[['Date', 'Time']] = df.DateTime.str.split(expand=True)
# Convert them separately, one as datetime, the other as timedelta.
df.Date = pd.to_datetime(df.Date)
df.Time = pd.to_timedelta(df.Time)
# Fix your DateTime Column, Drop the helper Columns:
df.DateTime = df.Date + df.Time
df = df.drop(['Date', 'Time'], axis=1)
print(df)
print(df.dtypes)
Output:
DateTime
0 2017-01-01 01:00:00
1 2017-01-01 02:00:00
2 2017-01-01 03:00:00
3 2017-01-01 04:00:00
4 2017-01-01 23:00:00
5 2017-01-02 00:00:00
DateTime datetime64[ns]
dtype: object
df['DateTime'] =pd.to_datetime(df['DateTime'], format='%y-%m-%d %H:%M', errors='coerce')
Try this out!

Pandas: Remove duplicate dates but keeping the last

(not a duplicate question)
I have the following datasets:
GMT TIME, Value
2018-01-01 00:00:00, 1.2030
2018-01-01 00:01:00, 1.2000
2018-01-01 00:02:00, 1.2030
2018-01-01 00:03:00, 1.2030
.... , ....
2018-12-31 23:59:59, 1.2030
I am trying to find a way to remove the following:
hh:mm:ss form the datetime
After removing the time (hh:mm:ss) section, we will have duplicate date entry like multiple 2018-01-01 and so on... so I need to remove the duplicate date data and only keep the last date, before the next date, eg 2018-01-02 and similarly keep the last 2018-01-02 before the next date 2018-01-03 and repeat...
How can I do it with Pandas?
Suppose you have data:
GMT TIME Value
0 2018-01-01 00:00:00 1.203
1 2018-01-01 00:01:00 1.200
2 2018-01-01 00:02:00 1.203
3 2018-01-01 00:03:00 1.203
4 2018-01-02 00:03:00 1.203
5 2018-01-03 00:03:00 1.203
6 2018-01-04 00:03:00 1.203
7 2018-12-31 23:59:59 1.203
Use pandas.to_datetime.dt.date with pandas.DataFrame.groupby:
import pandas as pd
df['GMT TIME'] = pd.to_datetime(df['GMT TIME']).dt.date
df.groupby(df['GMT TIME']).last()
Output:
Value
GMT TIME
2018-01-01 1.203
2018-01-02 1.203
2018-01-03 1.203
2018-01-04 1.203
2018-12-31 1.203
Or use pandas.DataFrame.drop_duplicates:
df['GMT TIME'] = pd.to_datetime(df['GMT TIME']).dt.date
df.drop_duplicates('GMT TIME', 'last')
Output:
GMT TIME Value
3 2018-01-01 1.203
4 2018-01-02 1.203
5 2018-01-03 1.203
6 2018-01-04 1.203
7 2018-12-31 1.203
Using duplicated
#df['GMT TIME'] = pd.to_datetime(df['GMT TIME']).dt.date
df[~df['GMT TIME'].dt.date.iloc[::-1].duplicated()]\
Or using
df.groupby(df['GMT TIME'].dt.date).tail(1)

Error rounding time to previous 15 min - Python

I've developed a crude method to round timestamps to the previous 15 mins. For instance, if the timestamp is 8:10:00, it gets rounded to 8:00:00.
However, when it goes over 15 mins it rounds to the previous hour. For instance, if the timestamp was 8:20:00, it gets rounded to 7:00:00 for some reason? I'll list the two examples below.
Correct Rounding:
import pandas as pd
from datetime import datetime, timedelta
d = ({
'Time' : ['8:00:00'],
})
df = pd.DataFrame(data=d)
df['Time'] = pd.to_datetime(df['Time'])
FirstTime = df['Time'].iloc[0]
def hour_rounder(t):
return (t.replace(second=0, microsecond=0, minute=0, hour=t.hour)
-timedelta(hours=t.minute//15))
StartTime = hour_rounder(FirstTime)
StartTime = datetime.time(StartTime)
print(StartTime)
Out:
08:00:00
Incorrect Rounding:
import pandas as pd
from datetime import datetime, timedelta
d = ({
'Time' : ['8:20:00'],
})
df = pd.DataFrame(data=d)
df['Time'] = pd.to_datetime(df['Time'])
FirstTime = df['Time'].iloc[0]
def hour_rounder(t):
return (t.replace(second=0, microsecond=0, minute=0, hour=t.hour)
-timedelta(hours=t.minute//15))
StartTime = hour_rounder(FirstTime)
StartTime = datetime.time(StartTime)
print(StartTime)
Out:
07:00:00
I don't understand what I'm doing wrong?
- timedelta(hours=t.minute//15)
If minute is 20, then minute // 15 equals 1, so you're subtracting one hour.
Try this instead:
return t.replace(second=0, microsecond=0, minute=(t.minute // 15 * 15), hour=t.hour)
Use .dt.floor('15min') to round down to 15 minute invervals.
import pandas as pd
df = pd.DataFrame({'Time': pd.date_range('2018-01-01', freq='13.141min', periods=13)})
df['prev_15'] = df.Time.dt.floor('15min')
Output:
Time prev_15
0 2018-01-01 00:00:00.000 2018-01-01 00:00:00
1 2018-01-01 00:13:08.460 2018-01-01 00:00:00
2 2018-01-01 00:26:16.920 2018-01-01 00:15:00
3 2018-01-01 00:39:25.380 2018-01-01 00:30:00
4 2018-01-01 00:52:33.840 2018-01-01 00:45:00
5 2018-01-01 01:05:42.300 2018-01-01 01:00:00
6 2018-01-01 01:18:50.760 2018-01-01 01:15:00
7 2018-01-01 01:31:59.220 2018-01-01 01:30:00
8 2018-01-01 01:45:07.680 2018-01-01 01:45:00
9 2018-01-01 01:58:16.140 2018-01-01 01:45:00
10 2018-01-01 02:11:24.600 2018-01-01 02:00:00
11 2018-01-01 02:24:33.060 2018-01-01 02:15:00
12 2018-01-01 02:37:41.520 2018-01-01 02:30:00
There is also .dt.round() and .dt.ceil() if you need to get the nearest 15 minute, or the following 15 minute invterval respectively.

Pandas Series of hour values to Series of dates

I have a time series covering January of 1979 with 6 hours time deltas. Time format is in continuous hour range:
1
7
13
18
25
31
.
.
.
739
Is it possible to convert these ints to dates? For instance:
1979/01/01 - 1:00
1979/01/01 - 7:00
1979/01/01 - 13:00
1979/01/01 - 18:00
1979/01/02 - 1:00
Thank you so much!
Setup
df = pd.DataFrame({'hour': [1,7,13,18,25,31]})
Use pd.to_datetime with the unit flag, and set the origin flag to the beginning of your desired year.
pd.to_datetime(df.hour, unit='h', origin='1979-01-01')
0 1979-01-01 01:00:00
1 1979-01-01 07:00:00
2 1979-01-01 13:00:00
3 1979-01-01 18:00:00
4 1979-01-02 01:00:00
5 1979-01-02 07:00:00
Name: hour, dtype: datetime64[ns]
Here is another way:
import pandas as pd
s = pd.Series([1,7,13])
s = pd.to_datetime(s*1e9*60*60+ pd.Timestamp(1979,1,1).value)
print(s)
Returns:
0 1979-01-01 01:00:00
1 1979-01-01 07:00:00
2 1979-01-01 13:00:00
dtype: datetime64[ns]
Could also just do this:
from datetime import datetime, timedelta
s = pd.Series([1,7,13,18,25])
s = s.apply(lambda h: datetime(1979, 1, 1) + timedelta(hours=h))
print(s)
Returns:
0 1979-01-01 01:00:00
1 1979-01-01 07:00:00
2 1979-01-01 13:00:00
3 1979-01-01 18:00:00
4 1979-01-02 01:00:00
dtype: datetime64[ns]

Pandas .resample() or .asfreq() fill forward times

I'm trying to resample a dataframe with a time series from 1-hour increments to 15-minute. Both .resample() and .asfreq() do almost exactly what I want, but I'm having a hard time filling the last three intervals.
I could add an extra hour at the end, resample, and then drop that last hour, but it feels hacky.
Current code:
df = pd.DataFrame({'date':pd.date_range('2018-01-01 00:00', '2018-01-01 01:00', freq = '1H'), 'num':5})
df = df.set_index('date').asfreq('15T', method = 'ffill', how = 'end').reset_index()
Current output:
date num
0 2018-01-01 00:00:00 5
1 2018-01-01 00:15:00 5
2 2018-01-01 00:30:00 5
3 2018-01-01 00:45:00 5
4 2018-01-01 01:00:00 5
Desired output:
date num
0 2018-01-01 00:00:00 5
1 2018-01-01 00:15:00 5
2 2018-01-01 00:30:00 5
3 2018-01-01 00:45:00 5
4 2018-01-01 01:00:00 5
5 2018-01-01 01:15:00 5
6 2018-01-01 01:30:00 5
7 2018-01-01 01:45:00 5
Thoughts?
Not sure about asfreq but reindex works wonderfully:
df.set_index('date').reindex(
pd.date_range(
df.date.min(),
df.date.max() + pd.Timedelta('1H'), freq='15T', closed='left'
),
method='ffill'
)
num
2018-01-01 00:00:00 5
2018-01-01 00:15:00 5
2018-01-01 00:30:00 5
2018-01-01 00:45:00 5
2018-01-01 01:00:00 5
2018-01-01 01:15:00 5
2018-01-01 01:30:00 5
2018-01-01 01:45:00 5

Categories