Pandas read format %D:%H:%M:%S with python - python

Currently I am reading in a data frame with the timestamp from film 00(days):00(hours clocks over at 24 to day):00(min):00(sec)
pandas reads time formats HH:MM:SS and YYYY:MM:DD HH:MM:SS fine.
Though is there a way of having pandas read the duration of time such as the DD:HH:MM:SS.
Alternatively using timedelta how would I go about getting the DD into HH in the data frame so that pandas can make it "1 day HH:MM:SS" for example
Data sample
00:00:00:00
00:07:33:57
02:07:02:13
00:00:13:11
00:00:10:11
00:00:00:00
00:06:20:06
01:12:13:25
Expected output for last sample
36:13:25
Thanks

If you want timedelta objects, a simple way is to replace the first colon with days :
df['timedelta'] = pd.to_timedelta(df['col'].str.replace(':', 'days ', n=1))
output:
col timedelta
0 00:00:00:00 0 days 00:00:00
1 00:07:33:57 0 days 07:33:57
2 02:07:02:13 2 days 07:02:13
3 00:00:13:11 0 days 00:13:11
4 00:00:10:11 0 days 00:10:11
5 00:00:00:00 0 days 00:00:00
6 00:06:20:06 0 days 06:20:06
7 01:12:13:25 1 days 12:13:25
>>> df.dtypes
col object
timedelta timedelta64[ns]
dtype: object
From there it's also relatively easy to combine the days and hours as string:
c = df['timedelta'].dt.components
df['str_format'] = ((c['hours']+c['days']*24).astype(str)
+df['col'].str.split('(?=:)', n=2).str[-1]).str.zfill(8)
output:
col timedelta str_format
0 00:00:00:00 0 days 00:00:00 00:00:00
1 00:07:33:57 0 days 07:33:57 07:33:57
2 02:07:02:13 2 days 07:02:13 55:02:13
3 00:00:13:11 0 days 00:13:11 00:13:11
4 00:00:10:11 0 days 00:10:11 00:10:11
5 00:00:00:00 0 days 00:00:00 00:00:00
6 00:06:20:06 0 days 06:20:06 06:20:06
7 01:12:13:25 1 days 12:13:25 36:13:25

Convert days separately, add to times and last call custom function:
def f(x):
ts = x.total_seconds()
hours, remainder = divmod(ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
d = pd.to_timedelta(df['col'].str[:2].astype(int), unit='d')
td = pd.to_timedelta(df['col'].str[3:])
df['col'] = d.add(td).apply(f)
print (df)
col
0 0:00:00
1 7:33:57
2 55:02:13
3 0:13:11
4 0:10:11
5 0:00:00
6 6:20:06
7 36:13:25

Related

Convert multiple time format object as datetime format

I have a dataframe with a list of time value as object and needed to convert them to datetime, the issue is, they are not on the same format so when I try:
df['Total call time'] = pd.to_datetime(df['Total call time'], format='%H:%M:%S')
it gives me an error
ValueError: time data '3:22' does not match format '%H:%M:%S' (match)
or if use this code
df['Total call time'] = pd.to_datetime(df['Total call time'], format='%H:%M')
I get this error
ValueError: unconverted data remains: :58
These are the values on my data
Total call time
2:04:07
3:22:41
2:30:41
2:19:06
1:45:55
1:30:08
1:32:15
1:43:28
**45:48**
1:41:40
5:08:37
**3:22**
4:29:05
2:47:25
2:39:29
2:29:32
2:09:52
3:31:57
2:27:58
2:34:28
3:14:10
2:12:10
2:46:58
times = """\
2:04:07
3:22:41
2:30:41
2:19:06
1:45:55
1:30:08
1:32:15
1:43:28
45:48
1:41:40
5:08:37
3:22
4:29:05
2:47:25
2:39:29
2:29:32
2:09:52
3:31:57
2:27:58
2:34:28
3:14:10
2:12:10
2:46:58""".split()
import pandas as pd
df = pd.DataFrame(times, columns=['elapsed'])
def pad(s):
if len(s) == 4:
return '00:0'+s
elif len(s) == 5:
return '00:'+s
return s
print(pd.to_timedelta(df['elapsed'].apply(pad)))
Output:
0 0 days 02:04:07
1 0 days 03:22:41
2 0 days 02:30:41
3 0 days 02:19:06
4 0 days 01:45:55
5 0 days 01:30:08
6 0 days 01:32:15
7 0 days 01:43:28
8 0 days 00:45:48
9 0 days 01:41:40
10 0 days 05:08:37
11 0 days 00:03:22
12 0 days 04:29:05
13 0 days 02:47:25
14 0 days 02:39:29
15 0 days 02:29:32
16 0 days 02:09:52
17 0 days 03:31:57
18 0 days 02:27:58
19 0 days 02:34:28
20 0 days 03:14:10
21 0 days 02:12:10
22 0 days 02:46:58
Name: elapsed, dtype: timedelta64[ns]
Alternatively to grovina's answer ... instead of using apply you can directly use the dt accessor.
Here's a sample:
>>> data = [['2017-12-01'], ['2017-12-
30'],['2018-01-01']]
>>> df = pd.DataFrame(data=data,
columns=['date'])
>>> df
date
0 2017-12-01
1 2017-12-30
2 2018-01-01
>>> df.date
0 2017-12-01
1 2017-12-30
2 2018-01-01
Name: date, dtype: object
Note how df.date is an object? Let's turn it into a date like you want
>>> df.date = pd.to_datetime(df.date)
>>> df.date
0 2017-12-01
1 2017-12-30
2 2018-01-01
Name: date, dtype: datetime64[ns]
The format you want is for string formatting. I don't think you'll be able to convert the actual datetime64 to look like that format. For now, let's make a newly formatted string version of your date in a separate column
>>> df['new_formatted_date'] =
df.date.dt.strftime('%d/%m/%y %H:%M')
>>> df.new_formatted_date
0 01/12/17 00:00
1 30/12/17 00:00
2 01/01/18 00:00
Name: new_formatted_date, dtype: object
Finally, since the df.date column is now of date datetime64... you can use the dt accessor right on it. No need to use apply
>>> df['month'] = df.date.dt.month
>>> df['day'] = df.date.dt.day
>>> df['year'] = df.date.dt.year
>>> df['hour'] = df.date.dt.hour
>>> df['minute'] = df.date.dt.minute
>>> df
date new_formatted_date month day
year hour minute
0 2017-12-01 01/12/17 00:00 12
1 2017 0 0
1 2017-12-30 30/12/17 00:00 12
30 2017 0 0
2 2018-01-01 01/01/18 00:00
Another idea is test if double : and if not added :00 with converting to timedeltas by to_timedelta, also is test if number before first : is less like 23 - then is parsing like HH:MM, if is greater is parising like MM:SS:
m1 = df['Total call time'].str.count(':').ne(2)
m2 = df['Total call time'].str.extract('^(\d+):', expand=False).astype(float).gt(23)
s = np.select([m1 & m2, m1 & ~m2],
['00:' + df['Total call time'], df['Total call time']+ ':00'],
df['Total call time'] )
df['Total call time'] = pd.to_timedelta(s)
print (df)
Total call time
0 0 days 02:04:07
1 0 days 03:22:41
2 0 days 02:30:41
3 0 days 02:19:06
4 0 days 01:45:55
5 0 days 01:30:08
6 0 days 01:32:15
7 0 days 01:43:28
8 0 days 00:45:48
9 0 days 01:41:40
10 0 days 05:08:37
11 0 days 03:22:00
12 0 days 04:29:05
13 0 days 02:47:25
14 0 days 02:39:29
15 0 days 02:29:32
16 0 days 02:09:52
17 0 days 03:31:57
18 0 days 02:27:58
19 0 days 02:34:28
20 0 days 03:14:10
21 0 days 02:12:10
22 0 days 02:46:58

Convert string hours to minute pd.eval

I want to convert all rows of my DataFrame that contains hours and minutes into minutes only.
I have a dataframe that looks like this:
df=
time
0 8h30
1 14h07
2 08h30
3 7h50
4 8h0
5 8h15
6 6h15
I'm using the following method to convert:
df['time'] = pd.eval(
df['time'].replace(['h'], ['*60+'], regex=True))
Output
SyntaxError: invalid syntax
I think the error comes from the format of the hour, maybe pd.evalcant accept 08h30 or 8h0, how to solve this probleme ?
Pandas can already handle such strings if the units are included in the string. While 14h07 can't be parse (why assume 07 is minutes?), 14h07 can be converted to a Timedelta :
>>> pd.to_timedelta("14h07m")
Timedelta('0 days 14:07:00')
Given this dataframe :
d1 = pd.DataFrame(['8h30m', '14h07m', '08h30m', '8h0m'],
columns=['time'])
You can convert the time series into a Timedelta series with pd.to_timedelta :
>>> d1['tm'] = pd.to_timedelta(d1['time'])
>>> d1
time tm
0 8h30m 0 days 08:30:00
1 14h07m 0 days 14:07:00
2 08h30m 0 days 08:30:00
3 8h0m 0 days 08:00:00
To handle the missing minutes unit in the original data, just append m:
d1['tm'] = pd.to_timedelta(d1['time'] + 'm')
Once you have a Timedelta you can calculate hours and minutes.
The components of the values can be retrieved with Timedelta.components
>>> d1.tm.dt.components.hours
0 8
1 14
2 8
3 8
Name: hours, dtype: int64
To get the total minutes, seconds or hours, change the frequency to minutes:
>>> d1.tm.astype('timedelta64[m]')
0 510.0
1 847.0
2 510.0
3 480.0
Name: tm, dtype: float64
Bringing all the operations together :
>>> d1['tm'] = pd.to_timedelta(d1['time'])
>>> d2 = (d1.assign(h=d1.tm.dt.components.hours,
... m=d1.tm.dt.components.minutes,
... total_minutes=d1.tm.astype('timedelta64[m]')))
>>>
>>> d2
time tm h m total_minutes
0 8h30m 0 days 08:30:00 8 30 510.0
1 14h07m 0 days 14:07:00 14 7 847.0
2 08h30m 0 days 08:30:00 8 30 510.0
3 8h0m 0 days 08:00:00 8 0 480.0
To avoid having to trim leading zeros, an alternative approach:
df[['h', 'm']] = df['time'].str.split('h', expand=True).astype(int)
df['total_min'] = df['h']*60 + df['m']
Result:
time h m total_min
0 8h30 8 30 510
1 14h07 14 7 847
2 08h30 8 30 510
3 7h50 7 50 470
4 8h0 8 0 480
5 8h15 8 15 495
6 6h15 6 15 375
Just to give an alternative approach with kind of the same elements as above you could do:
df = pd.DataFrame(data=["8h30", "14h07", "08h30", "7h50", "8h0 ", "8h15", "6h15"],
columns=["time"])
First split you column on the "h"
hm = df["time"].str.split("h", expand=True)
Then combine the columns again, but zeropad time hours and minutes in order to make valid time strings:
df2 = hm[0].str.strip().str.zfill(2) + hm[1].str.strip().str.zfill(2)
Then convert the string column with proper values to a date time column:
df3 = pd.to_datetime(df2, format="%H%M")
Finally, calculate the number of minutes by subtrackting a zero time (to make deltatimes) and divide by the minutes deltatime:
zerotime= pd.to_datetime("0000", format="%H%M")
df['minutes'] = (df3 - zerotime) / pd.Timedelta(minutes=1)
The results look like:
time minutes
0 8h30 510.0
1 14h07 847.0
2 08h30 510.0
3 7h50 470.0
4 8h0 480.0
5 8h15 495.0
6 6h15 375.0

Map a column to return hours, minutes, seconds without days

I am mapping a function to a Dataframe column that integers (relating to seconds) to return hours, minutes and seconds into a new column
.map(lambda row: datetime.timedelta(seconds=row))
However it returns
Sec Time
0 0 days 00:00:00
1 0 days 00:00:01
2 0 days 00:00:02
3 0 days 00:00:03
Im expecting to get the same without the "0 days" component

Pandas and DateTime TypeError: cannot compare a TimedeltaIndex with type float

I have a pandas DataFrame Series time differences that looks like::
print(delta_t)
1 0 days 00:00:59
3 0 days 00:04:22
6 0 days 00:00:56
8 0 days 00:01:21
19 0 days 00:01:09
22 0 days 00:00:36
...
(the full DataFrame had a bunch of NaNs which I dropped).
I'd like to know which delta_t's are less than 1 day, 1 hour, 1 minute,
so I tried:
delta_t_lt1day = delta_t[np.where(delta_t < 30.)]
but then got a:
TypeError: cannot compare a TimedeltaIndex with type float
Little help?!?!
Assuming your Series is in timedelta format, you can skip the np.where, and index using something like this, where you compare your actual values to other timedeltas, using the appropriate units:
delta_t_lt1day = delta_t[delta_t < pd.Timedelta(1,'D')]
delta_t_lt1hour = delta_t[delta_t < pd.Timedelta(1,'h')]
delta_t_lt1minute = delta_t[delta_t < pd.Timedelta(1,'m')]
You'll get the following series:
>>> delta_t_lt1day
0
1 00:00:59
3 00:04:22
6 00:00:56
8 00:01:21
19 00:01:09
22 00:00:36
Name: 1, dtype: timedelta64[ns]
>>> delta_t_lt1hour
0
1 00:00:59
3 00:04:22
6 00:00:56
8 00:01:21
19 00:01:09
22 00:00:36
Name: 1, dtype: timedelta64[ns]
>>> delta_t_lt1minute
0
1 00:00:59
6 00:00:56
22 00:00:36
Name: 1, dtype: timedelta64[ns]
You could use the TimeDelta class:
import pandas as pd
deltas = pd.to_timedelta(['0 days 00:00:59',
'0 days 00:04:22',
'0 days 00:00:56',
'0 days 00:01:21',
'0 days 00:01:09',
'0 days 00:31:09',
'0 days 00:00:36'])
for e in deltas[deltas < pd.Timedelta(value=30, unit='m')]:
print(e)
Output
0 days 00:00:59
0 days 00:04:22
0 days 00:00:56
0 days 00:01:21
0 days 00:01:09
0 days 00:00:36
Note that this filter outs '0 days 00:31:09' as expected. The expression pd.Timedelta(value=30, unit='m') creates a time delta of 30 minutes.

how to convert a python date in to an integer

Struggling with something that should be easy:
today = '26/8/2018'
start = '1/8/2018'
diff = today - start
diff gives us 26 days
how do I take the integer value of this datetime? i.e. 26?
basically, im trying to calc a daycount fraction, (diff / 365) * 10,000 say, but it wont work.
My actual values I have are:
0 304.548
1 371.397
2 350.466
3 -3574.36
4 255.452
and im trying to multiply them by:
duration
0 13 days
1 2 days
2 1 days
3 20 days
4 7 days
But I get:
0 TimedeltaIndex(['3959 days 02:57:32.054794', ...
1 TimedeltaIndex([ '4828 days 03:56:42.739725', ...
2 TimedeltaIndex([ '4556 days 01:18:54.246575', ...
3 TimedeltaIndex(['-46467 days +08:52:36.164383'...
4 TimedeltaIndex(['3320 days 21:02:27.945204', ...
desired output is
0 3959.124 as an integer (304.548*13), not as a daycount
Perhaps something like this might work:
In [1]: import datetime
In [4]: diff = datetime.datetime.today() - datetime.datetime(year=2018, month=8, day=1)
In [5]: diff.days
Out[5]: 25
Then you can do something like:
In [10]: diff.days / 365 * 10000
Out[10]: 684.931506849315

Categories