I have a dataset with two columns: Actual Time and Promised Time (representing the actual and promised start times of some process).
For example:
import pandas as pd
example_df = pd.DataFrame(columns = ['Actual Time', 'Promised Time'],
data = [
('2016-6-10 9:00', '2016-6-10 9:00'),
('2016-6-15 8:52', '2016-6-15 9:52'),
('2016-6-19 8:54', '2016-6-19 9:02')]).applymap(pd.Timestamp)
So as we can see, sometimes Actual Time = Promised Time, but there are also cases where Actual Time < Promised Time.
I defined a column that shows the difference between these two columns (example_df['Actual Time']-example_df['Promised Time']), but the problem is that for the third row it returned -1 day +23:52:00 instead of - 00:08:00.
Sample:
print (df)
Actual Time Promised Time
0 2016-6-10 9:00 2016-6-10 9:00
1 2016-6-15 10:52 2016-6-15 9:52 <- changed datetimes
2 2016-6-19 8:54 2016-6-19 9:02
def format_timedelta(x):
ts = x.total_seconds()
if ts >= 0:
hours, remainder = divmod(ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
else:
hours, remainder = divmod(-ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('-{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
First create datetimes:
df['Actual Time'] = pd.to_datetime(df['Actual Time'])
df['Promised Time'] = pd.to_datetime(df['Promised Time'])
And then timedeltas:
df['diff'] = (df['Actual Time'] - df['Promised Time'])
If convert negative timedeltas to seconds by Series.dt.total_seconds it working nice:
df['diff1'] = df['diff'].dt.total_seconds()
But if want negative timedeltas in string representation it is possible with custom function, because strftime for timedeltas is not yet implemented:
df['diff2'] = df['diff'].apply(format_timedelta)
print (df)
Actual Time Promised Time diff diff1 diff2
0 2016-06-10 09:00:00 2016-06-10 09:00:00 00:00:00 0.0 0:00:00
1 2016-06-15 10:52:00 2016-06-15 09:52:00 01:00:00 3600.0 1:00:00
2 2016-06-19 08:54:00 2016-06-19 09:02:00 -1 days +23:52:00 -480.0 -0:08:00
I assume your dataframe already in datetime dtype. abs works just fine
Without abs
df['Actual Time'] - df['Promised Time']
Out[526]:
0 00:00:00
1 -1 days +23:00:00
2 -1 days +23:52:00
dtype: timedelta64[ns]
With abs
abs(df['Promised Time'] - df['Actual Time'])
Out[529]:
0 00:00:00
1 01:00:00
2 00:08:00
dtype: timedelta64[ns]
The difference result is timedelta type which by default is in ns format.
You need to change the type of your result to you desired format:
import pandas as pd
df=pd.DataFrame(data={
'Actual Time':['2016-6-10 9:00','2016-6-15 8:52','2016-6-19 8:54'],
'Promised Time':['2016-6-10 9:00','2016-6-15 9:52','2016-6-19 9:02']
},dtype='datetime64[ns]')
# here you need to add the `astype` part and to determine the unit you want
df['diff']=(df['Actual Time']-df['Promised Time']).astype('timedelta64[m]')
Related
I'm looking to convert a UNIX timestamp object to pandas date time. I'm importing the timestamps from a separate source, which displays a date time of 21-01-22 00:01 for the first timepoint and 21-01-22 00:15 for the second time point. Yet my conversion is 10 hours behind these two. Is this related to the +1000 at the end of each string?
df = pd.DataFrame({
'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/'],
})
df['Time'] = df['Time'].str.split('+').str[0]
df['Time'] = df['Time'].str.split('(').str[1]
df['Time'] = pd.to_datetime(df['Time'], unit = 'ms')
Out:
Time
0 2022-01-20 14:01:00
1 2022-01-20 14:15:00
Other source:
Time
0 2022-01-21 00:01:00
1 2022-01-21 00:15:00
You could use a regex to extract Unix time and UTC offset, then parse Unix time to datetime and add the UTC offset as a timedelta, e.g.
import pandas as pd
df = pd.DataFrame({
'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/', None],
})
df[['unix', 'offset']] = df['Time'].str.extract(r'(\d+)([+-]\d+)')
# datetime from unix first, leaves NaT for invalid values
df['datetime'] = pd.to_datetime(df['unix'], unit='ms')
# where datetime is not NaT, add the offset:
df.loc[~df['datetime'].isnull(), 'datetime'] += (
pd.to_datetime(df['offset'][~df['datetime'].isnull()], format='%z').apply(lambda t: t.utcoffset())
)
# or without the apply, but by using an underscored method:
# df['datetime'] = (pd.to_datetime(df['unix'], unit='ms') +
# pd.to_datetime(df['offset'], format='%z').dt.tz._offset)
df['datetime']
# 0 2022-01-21 00:01:00
# 1 2022-01-21 00:15:00
# 2 NaT
# Name: datetime, dtype: datetime64[ns]
Unfortunately, you'll have to use an underscored ("private") method, if you want to avoid the apply. This also only works if you have a constant offset, i.e. if it's the same offset throughout the whole series.
Any ideas on how I can manipulate my current date-time data to make it suitable for use when converting the datatype to time?
For example:
df1['Date/Time'] = pd.to_datetime(df1['Date/Time'])
The current format for the data is mm/dd 00:00:00
an example of the column in the dataframe can be seen below.
Date/Time Dry_Temp[C] Wet_Temp[C] Solar_Diffuse_Rate[[W/m2]] \
0 01/01 00:10:00 8.45 8.237306 0.0
1 01/01 00:20:00 7.30 6.968360 0.0
2 01/01 00:30:00 6.15 5.710239 0.0
3 01/01 00:40:00 5.00 4.462898 0.0
4 01/01 00:50:00 3.85 3.226244 0.0
For the condition where the hour is denoted as 24, you have two choices. First you can simply reset the hour to 00 and second you can reset the hour to 00 and also add 1 to the date.
In either case the first step is detecting the condition which can be done with a simple find statement t.find(' 24:')
Having detected the condition in the first case it is a simple matter of reseting the hour to 00 and proceeding with the process of formatting the field. In the second case, however, adding 1 to the day is a little more complicated because of the fact you can roll over to next month.
Here is the approach I would use:
Given a df of form:
Date Time
0 01/01 00:00:00
1 01/01 00:24:00
2 01/01 24:00:00
3 01/31 24:00:00
The First Case
def parseDate2(tx):
ti = tx.find(' 24:')
if ti >= 0:
tk = pd.to_datetime(tx[:5]+' 00:'+tx[10:], format= '%m/%d %H:%M:%S')
return tk + du.relativedelta.relativedelta(hours=+24)
return pd.to_datetime(tx, format= '%m/%d %H:%M:%S')
df['Date Time'] = df['Date Time'].apply(lambda x: parseDate(x))
Produces the following:
Date Time
0 1900-01-01 00:00:00
1 1900-01-01 00:24:00
2 1900-01-01 00:00:00
3 1900-01-31 00:00:00
For the second case, I employed the dateutil relativedelta library and slightly modified my parseDate funstion as shown below:
import dateutil as du
def parseDate2(tx):
ti = tx.find(' 24:')
if ti >= 0:
tk = pd.to_datetime(tx[:5]+' 00:'+tx[10:], format= '%m/%d %H:%M:%S')
return tk + du.relativedelta.relativedelta(hours=+24)
return pd.to_datetime(tx, format= '%m/%d %H:%M:%S')
df['Date Time'] = df['Date Time'].apply(lambda x: parseDate2(x))
Yields:
Date Time
0 1900-01-01 00:00:00
1 1900-01-01 00:24:00
2 1900-01-02 00:00:00
3 1900-02-01 00:00:00
To access the values of the datetime (namely the time), you can use:
# These are now in a usable format
seconds = df1['Date/Time'].dt.second
minutes = df1['Date/Time'].dt.minute
hours = df1['Date/Time'].dt.hours
And if need be, you can create its own independent time series with:
df1['Dat/Time'].dt.time
I have a dataset - below
Create Complete
0 2005-01-02 01:15:00 2005-01-05 14:05:00
1 2005-01-06 00:00:00 open
I want to get the difference in minutes between the two using the below code. However as the 'complete' column also contains a string value, how can I get pandas to ign
df['diff_mins'] = df.Create - df.Complete
you can use pd.to_datetime for example:
import pandas as pd
df = pd.DataFrame([
['2005-01-02 01:15:00', '2005-01-05 14:05:00'],
['2005-01-06 00:00:00', 'open']],
columns=('Create', 'Complete')
)
and then:
df['diff_mins'] = (
pd.to_datetime(df.Create) - pd.to_datetime(df.Complete, errors='coerce')
)
to get the value in hours, just implement simple lambda function lambda x: x.total_seconds() / 60 / 60:
df['diff_mins_hours'] = (
pd.to_datetime(df.Create) - pd.to_datetime(df.Complete, errors='coerce')
).apply(lambda x: x.total_seconds() / 60 / 60)
give you:
print(df)
Create Complete diff_mins diff_mins_hours
0 2005-01-02 01:15:00 2005-01-05 14:05:00 -4 days +11:10:00 -84.833333
1 2005-01-06 00:00:00 open NaT NaN
I tried to do it using map. It should look something like this:
import datetime
def get_diff_mins(elem_a, elem_b):
if (elem_b=='open'):
elem_b = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
a = elem_a.replace(' ', '-').replace(':','-').split('-')
b = elem_b.replace(' ', '-').replace(':','-').split('-')
# Roughly converts yearly time to mins
# since month is always considered 30 days
f = [60*24*30*12, 60*24*30, 60*24, 60, 1, 0]
mins_a = sum([int(a)*f for a,f in zip(a,f)])
mins_b = sum([int(b)*f for b,f in zip(b,f)])
return mins_a-mins_b
df['diff_mins'] = map(get_diff_mins, df.Create, df.Complete)
I have a pandas dataframe with time periods in the second column. Every period represents 30 minutes and it goes all the way up to 48 periods (24 hours). Is there some way to change the integers representing the periods into a time format and concatenate it with the date column for a full datetime? E.g. 1 becomes 00:30, 2 becomes 01:00, 3 becomes 01:30 and so on.
You can cast the DATE column to datetime and add a timedelta of 30 minutes multiplied by PERIOD.
import pandas as pd
df = pd.DataFrame({'DATE':['2015-01-03', '2015-01-03', '2015-01-03'],
'PERIOD':[1,2,3]})
df['DATETIME'] = pd.to_datetime(df['DATE']) + df['PERIOD'] * pd.Timedelta(30, unit='min')
# df
# DATE PERIOD DATETIME
# 0 2015-01-03 1 2015-01-03 00:30:00
# 1 2015-01-03 2 2015-01-03 01:00:00
# 2 2015-01-03 3 2015-01-03 01:30:00
i have dataframe which contains 3 column for date and time: date ,depart time and arrive time. i want to make two columns of datetime (depart time and arrive time) using pandas so I use to_datetime function.
since the date column based only on the depart time, there are some cases where the depart time is around 23:00 and the arrive time is after 24:00 but the date stays the same. for instance:
depart datetime: 01/12/2017 23:58:00 arrive time 01/12/2017 00:30:00
how can i write a function that will update the day to the day after if the arrive time is after midnight? (in the example it should be arrive time 02/12/2017)
thanks
I think you can check difference is bellow 0 Timedelta and by mask add one day:
print (df)
depart time arrive time
0 01/12/2017 23:58:00 01/12/2017 00:30:00
1 01/12/2017 00:30:00 01/12/2017 23:58:00
df['depart time'] = pd.to_datetime(df['depart time'], dayfirst=True)
df['arrive time'] = pd.to_datetime(df['arrive time'], dayfirst=True)
m = (df['arrive time'] - df['depart time']) < pd.Timedelta(0)
Another condition should be:
m = (df['depart time'] - df['arrive time']).dt.days != -1
print (m)
0 True
1 False
dtype: bool
df['arrive time'] = df['arrive time'].mask(m, df['arrive time'] + pd.Timedelta(1, unit='d'))
print (df)
depart time arrive time
0 2017-12-01 23:58:00 2017-12-02 00:30:00
1 2017-12-01 00:30:00 2017-12-01 23:58:00