Manipulating values not matching HH:MM:SS format - python - python

I have a dataframe that looks like the following:
arrival departure
0 23:55:00 23:57:00
1 23:57:00 23:59:00
2 23:59:00 24:01:00
3 24:01:00 24:03:00
4 24:03:00 24:05:00
I am working with data that cover a whole day and part of the day after. Data are (most of the time) in the HH:MM:SS format. However some time values are higher than 23:59:59 and go up to 27:00:00.
I would like to get the time difference between departure and arrival columns.
I tried using datetime to do that but I guess something went wrong:
FMT = '%H:%M:%S'
delta = datetime.strptime(df['departure'], FMT) - datetime.strptime(df['arrival'], FMT)
Which raises the following error:
ValueError: time data '24:01:00' does not match format '%H:%M:%S'
Is there a way to get the time difference between these two columns even though their format do not always match the HH:MM:SS format?

You could use timedelta from datetime
import datetime
delta1 = datetime.timedelta(hours=23, minutes=59, seconds=0)
delta2 = datetime.timedelta(hours=24, minutes=01, seconds=0)
timedelta = delta2 - delta1
>>> timedelta # or timedelta.to_seconds()
datetime.timedelta(seconds=120)
Give you the delta in seconds. Full example:
import datetime
arrival = "24:01:00"
departure = "24:03:00"
def get_time_from_string(t):
return dict(
zip(["hours", "minutes", "seconds"], list(map(lambda x: int(x), t.split(":"))),)
)
delta1 = datetime.timedelta(**get_time_from_string(arrival))
delta2 = datetime.timedelta(**get_time_from_string(departure))
delta = delta2 - delta1
print(delta.total_seconds())

Related

How to get datetime difference in hour format between two dates columns on pandas? [duplicate]

How do I calculate the difference in time in minutes for the following timestamp in Python?
2010-01-01 17:31:22
2010-01-03 17:31:22
minutes_diff = (datetime_end - datetime_start).total_seconds() / 60.0
RSabet's answer doesn't work in cases where the dates don't have the same exact time.
Original problem:
from datetime import datetime
fmt = '%Y-%m-%d %H:%M:%S'
d1 = datetime.strptime('2010-01-01 17:31:22', fmt)
d2 = datetime.strptime('2010-01-03 17:31:22', fmt)
daysDiff = (d2-d1).days
print daysDiff
> 2
# Convert days to minutes
minutesDiff = daysDiff * 24 * 60
print minutesDiff
> 2880
d2-d1 gives you a datetime.timedelta and when you use days it will only show you the days in the timedelta. In this case it works fine, but if you would have the following.
from datetime import datetime
fmt = '%Y-%m-%d %H:%M:%S'
d1 = datetime.strptime('2010-01-01 16:31:22', fmt)
d2 = datetime.strptime('2010-01-03 20:15:14', fmt)
daysDiff = (d2-d1).days
print daysDiff
> 2
# Convert days to minutes
minutesDiff = daysDiff * 24 * 60
print minutesDiff
> 2880 # that is wrong
It would have still given you the same answer since it still returns 2 for days; it ignores the hour, min and second from the timedelta.
A better approach would be to convert the dates to a common format and then do the calculation. The easiest way to do this is to convert them to Unix timestamps. Here is the code to do that.
from datetime import datetime
import time
fmt = '%Y-%m-%d %H:%M:%S'
d1 = datetime.strptime('2010-01-01 17:31:22', fmt)
d2 = datetime.strptime('2010-01-03 20:15:14', fmt)
# Convert to Unix timestamp
d1_ts = time.mktime(d1.timetuple())
d2_ts = time.mktime(d2.timetuple())
# They are now in seconds, subtract and then divide by 60 to get minutes.
print int(d2_ts-d1_ts) / 60
> 3043 # Much better
In case someone doesn't realize it, one way to do this would be to combine Christophe and RSabet's answers:
from datetime import datetime
import time
fmt = '%Y-%m-%d %H:%M:%S'
d1 = datetime.strptime('2010-01-01 17:31:22', fmt)
d2 = datetime.strptime('2010-01-03 20:15:14', fmt)
diff = d2 -d1
diff_minutes = (diff.days * 24 * 60) + (diff.seconds/60)
print(diff_minutes)
> 3043
To calculate with a different time date:
from datetime import datetime
fmt = '%Y-%m-%d %H:%M:%S'
d1 = datetime.strptime('2010-01-01 16:31:22', fmt)
d2 = datetime.strptime('2010-01-03 20:15:14', fmt)
diff = d2-d1
diff_minutes = diff.seconds/60
Use datetime.strptime() to parse into datetime instances, and then compute the difference, and finally convert the difference into minutes.
The result depends on the timezone that corresponds to the input time strings.
The simplest case if both dates use the same utc offset:
#!/usr/bin/env python3
from datetime import datetime, timedelta
time_format = "%Y-%d-%m %H:%M:%S"
dt1 = datetime.strptime("2010-01-01 17:31:22", time_format)
dt2 = datetime.strptime("2010-01-03 17:31:22", time_format)
print((dt2 - dt1) // timedelta(minutes=1)) # minutes
If your Python version doesn't support td // timedelta; replace it with int(td.total_seconds() // 60).
If the input time is in the local timezone that might have different utc offset at different times e.g., it has daylight saving time then you should make dt1, dt2 into aware datetime objects before finding the difference, to take into account the possible changes in the utc offset.
The portable way to make an aware local datetime objects is to use pytz timezones:
#!/usr/bin/env python
from datetime import timedelta
import tzlocal # $ pip install tzlocal
local_tz = tzlocal.get_localzone() # get pytz timezone
aware_dt1, aware_dt2 = map(local_tz.localize, [dt1, dt2])
td = aware_dt2 - aware_dt1 # elapsed time
If either dt1 or dt2 correspond to an ambiguous time then the default is_dst=False is used to disambiguate. You could set is_dst=None to raise an exception for ambiguous or non-existent local times instead.
If you can't install 3rd party modules then time.mktime() could be used from #Ken Cochrane's answer that can find the correct utc offset on some platforms for some dates in some timezones -- if you don't need a consistent (but perhaps wrong) result then it is much better than doing dt2 - dt1 with naive datetime objects that always fails if the corresponding utc offsets are different.
If you are trying to find the difference between timestamps that are in pandas columns, the the answer is fairly simple.
If you need it in days or seconds then
# For difference in days:
df['diff_in_days']=(df['timestamp2'] - df['timestamp1']).dt.days
# For difference in seconds
df['diff_in_seconds']=(df['timestamp2'] - df['timestamp1']).dt.seconds
Now minute is tricky as dt.minute works only on datetime64[ns] dtype.
whereas the column generated from subtracting two datetimes has format
AttributeError: 'TimedeltaProperties' object has no attribute 'm8'
So like mentioned by many above to get the actual value of the difference in minute you have to do:
df['diff_in_min']=df['diff_in_seconds']/60
But if just want the difference between the minute parts of the two timestamps then do the following
#convert the timedelta to datetime and then extract minute
df['diff_in_min']=(pd.to_datetime(df['timestamp2']-df['timestamp1'])).dt.minute
You can also read the article https://docs.python.org/3.4/library/datetime.html
and see section 8.1.2 you'll see the read only attributes are only seconds,days and milliseconds. And this settles why the minute function doesn't work directly.
In Other ways to get difference between date;
import dateutil.parser
import datetime
timeDifference = current_date - dateutil.parser.parse(last_sent_date)
time_difference_in_minutes = (int(timeDifference.days) * 24 * 60) + int((timeDifference.seconds) / 60)
Thanks
As was kind of said already, you need to use datetime.datetime's strptime method:
from datetime import datetime
fmt = '%Y-%m-%d %H:%M:%S'
d1 = datetime.strptime('2010-01-01 17:31:22', fmt)
d2 = datetime.strptime('2010-01-03 17:31:22', fmt)
daysDiff = (d2-d1).days
# convert days to minutes
minutesDiff = daysDiff * 24 * 60
print minutesDiff
there is also a sneak way with pandas:
pd.to_timedelta(x) - pd.to_timedelta(y)
You can solve it using divmod,
minutes = divmod((end_date - start_date).total_seconds(), 60)[0]
from datetime import datetime
fmt = '%Y-%m-%d %H:%M:%S'
d1 = datetime.strptime('2010-01-01 17:31:22', fmt)
d2 = datetime.strptime('2010-01-03 17:31:22', fmt)
print (d2-d1).days * 24 * 60

Timestamp string to seconds in Dataframe

I have a large dataframe containing a Timestamp column like the one shown below:
Timestamp
16T122109960
16T122109965
16T122109970
16T122109975
[73853 rows x 1 columns]
I need to convert this into a seconds (formatted 12.523) since first timestamp column using something like this:
start_time = log_file['Timestamp'][0]
log_file['Timestamp'] = log_file.Timestamp.apply(lambda x: x - start_time)
But first I need to parse the timestamps into seconds as quickly as possible, I've tried using regex to split the timestamp into hours, minuntes, seconds, and milliseconds and then multipling & dividing appropriatly but was given a memory error. Is there a function within datetime or dateutils that would help?
The method I have used at the moment is below:
def regex_time(time):
list = re.split(r"(\d*)(T)(\d{2})(\d{2})(\d{2})(\d{3})", time)
date, delim, hours, minutes, seconds, mills = list[1:-1]
seconds = int(seconds)
seconds += int(mills) /1000
seconds += int(minutes) * 60
seconds += int(hours) * 3600
return seconds
df['Timestamp'] = df.Timestamp.apply(lambda j: regex_time(j))
You could try to convert the timestamp to datetime format and then extract the seconds in the format you want.
Here I attach you a code sample of how it works:
from datetime import datetime
timestamp = 1545730073
dt_object = datetime.fromtimestamp(timestamp)
seconds = dt_object.strftime("%S.%f")
print(seconds)
Output:
53.000000
You can also apply it to the dataframe you are using, for instance:
from datetime import datetime
df = pd.DataFrame({'timestamp':[1545730073]})
df['datetime'] = df['timestamp'].apply(lambda x: datetime.fromtimestamp(x))
df['seconds'] = df['datetime'] .apply(lambda x : x.strftime("%S.%f"))
And it will return a dataFrame containing:
timestamp datetime seconds
0 1545730073 2018-12-25 10:27:53 53.000000
you could parse the string with strptime, subtract the start_time as a pd.Timestamp and use the total_seconds() of the resulting timedelta:
import pandas as pd
df = pd.DataFrame({'Timestamp': ['16T122109960','16T122109965','16T122109970','16T122109975']})
start_time = pd.Timestamp('1900-01-01')
df['totalseconds'] = (pd.to_datetime(df['Timestamp'], format='%dT%H%M%S%f')-start_time).dt.total_seconds()
df['totalseconds']
# 0 1340469.960
# 1 1340469.965
# 2 1340469.970
# 3 1340469.975
# Name: totalseconds, dtype: float64
To use the first entry of the 'Timestamp' column as reference time start_time, use
start_time = pd.to_datetime(df['Timestamp'].iloc[0], format='%dT%H%M%S%f')

pandas get integer seconds when doing a time difference instead of a x days mm:ss:hh format

I have two date-hour column A and B of type shown below. Both colum are in a Dataframe (pandas).
yyyy-mm-dd hh:mm:ss
I create
df['difference'] = df['A'] - df['B']
I get a format like
0 days 00:01:13
I would prefer to have a column which contains the seconds in integer. For instance, I need to get 73 in my above example instead of 1min13.
How to do that?
We can use total_seconds
(df['A'] - df['B']).dt.total_seconds()
from datetime import datetime, time
#Specified date
date1 = datetime.strptime('2015-01-01 01:00:00', '%Y-%m-%d %H:%M:%S')
import datetime
old_time = date1
print(old_time)
new_time = old_time - datetime.timedelta(hours=2, minutes=10)
print(new_time)

Subtract hours and minutes from time

In my task what I need to do is to subtract some hours and minutes like (0:20,1:10...any value) from time (2:34 PM) and on the output side I need to display the time after subtraction.
time and hh:mm value are hardcoded
Ex:
my_time = 1:05 AM
duration = 0:50
so output should be 12:15 PM
Output should be exact and in AM/PM Format and Applicable for all values
(0:00 < Duration > 6:00) .
I don't care about seconds, I only need the hours and minutes.
from datetime import datetime, timedelta
d = datetime.today() - timedelta(hours=0, minutes=50)
d.strftime('%H:%M %p')
This worked for me :
from datetime import datetime
s1 = '10:04:00'
s2 = '11:03:11' # for example
format = '%H:%M:%S'
time = datetime.strptime(s2, format) - datetime.strptime(s1, format)
print time
from datetime import datetime
d1 = datetime.strptime("01:05:00.000", "%H:%M:%S.%f")
d2 = datetime.strptime("00:50:00.000", "%H:%M:%S.%f")
print (d1-d2)

Subtract date-time in Python

I need to calculate difference between time (and if it exceed 24 hours then days)
Like:
from datetime import datetime
from time import strftime
s1 = '24:11:2014:14:28:42'
s2 = datetime.now().strftime("%d:%m:%Y:%H:%M:%S")
FMT = '%d:%m:%Y:%H:%M:%S'
timedelta = datetime.now.strftime(s2,FMT) - datetime.now.strftime(s1,FMT)
print (timedelta)
But this is not detecting more than 24 hours, If found this code which can detect the days:
from datetime import datetime
date_format = "%d/%m/%Y %H%M%S"
a = datetime.strptime('22/10/2014 090000', date_format)
b = datetime.strptime('25/11/2014 100000', date_format)
delta = b - a
print (delta.days)
What I want is something like this in return: "2 days 03:35:00 HH:MM:SS" in return"
The timedelta you are getting from b - a already has all the information you need, have a look at https://docs.python.org/2/library/datetime.html#datetime.timedelta.

Categories