Frequecny of hours between two datetime series in pandas - python

I am trying to capture the frequency of hours between two timestamps in a dataframe. For example, one row of data has '2022-01-01 00:35:00' and '2022-01-01 05:29:47'. I would like for frequency to be attributed to Hours 0, 1, 2, 3, 4, and 5.
Start Time
End Time
2022-01-01 00:35:00
2022-01-01 05:29:47
2022-01-01 00:55:00
2022-01-01 05:00:17
2022-01-01 01:35:00
2022-01-01 06:26:00
2022-01-01 02:29:00
2022-01-01 04:25:17
I have been trying to capture the time delta between the two but have not been able to figure out counting the frequency of hours.

You can extract the hours and then calculate the delta:
import datetime
df['start_hour'] = [datetime.datetime.strptime(i, "%Y-%m-%d %H:%M:%S").hour for i in df['Start Time']]
df['end_hour'] = [datetime.datetime.strptime(i, "%Y-%m-%d %H:%M:%S").hour for i in df['End Time']]
df['delta'] = df['end_hour'] - df['start_hour']

Try this:
df['freq'] = df.apply(lambda x:[i + x['Start Time'].hour for i in list(range(x['End Time'].hour - x['Start Time'].hour)], axis=1)

Related

How to convert this column into 12h format?

I have imported a dataset into a Pandas dataframe, but I can't quite figure out how I could convert the start time to a 12h clock (e.g. 4 pm)?
The variable columns are as follows:
start
2022-01-01 00:07:52.943
2022-01-01 00:09:31.745
2022-01-01 00:14:37.187
Thank you.
You can use:
df['start'] = pd.to_datetime(df['start'])
df['start'] = df['start'].dt.date.astype(str) + ' ' + df['start'].dt.strftime('%I:%M %p')
OUTPUT
start
0 2022-01-01 12:07 AM
1 2022-01-01 12:09 AM
2 2022-01-01 12:14 AM
If dates are datetime values:
df.start.dt.strftime('%H%P')

Convert column of unix objects to datetime - python

I'm looking to convert a UNIX timestamp object to pandas date time. I'm importing the timestamps from a separate source, which displays a date time of 21-01-22 00:01 for the first timepoint and 21-01-22 00:15 for the second time point. Yet my conversion is 10 hours behind these two. Is this related to the +1000 at the end of each string?
df = pd.DataFrame({
'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/'],
})
df['Time'] = df['Time'].str.split('+').str[0]
df['Time'] = df['Time'].str.split('(').str[1]
df['Time'] = pd.to_datetime(df['Time'], unit = 'ms')
Out:
Time
0 2022-01-20 14:01:00
1 2022-01-20 14:15:00
Other source:
Time
0 2022-01-21 00:01:00
1 2022-01-21 00:15:00
You could use a regex to extract Unix time and UTC offset, then parse Unix time to datetime and add the UTC offset as a timedelta, e.g.
import pandas as pd
df = pd.DataFrame({
'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/', None],
})
df[['unix', 'offset']] = df['Time'].str.extract(r'(\d+)([+-]\d+)')
# datetime from unix first, leaves NaT for invalid values
df['datetime'] = pd.to_datetime(df['unix'], unit='ms')
# where datetime is not NaT, add the offset:
df.loc[~df['datetime'].isnull(), 'datetime'] += (
pd.to_datetime(df['offset'][~df['datetime'].isnull()], format='%z').apply(lambda t: t.utcoffset())
)
# or without the apply, but by using an underscored method:
# df['datetime'] = (pd.to_datetime(df['unix'], unit='ms') +
# pd.to_datetime(df['offset'], format='%z').dt.tz._offset)
df['datetime']
# 0 2022-01-21 00:01:00
# 1 2022-01-21 00:15:00
# 2 NaT
# Name: datetime, dtype: datetime64[ns]
Unfortunately, you'll have to use an underscored ("private") method, if you want to avoid the apply. This also only works if you have a constant offset, i.e. if it's the same offset throughout the whole series.

Is there any function calculate duration in minutes between two datetimes values?

This is my dataframe.
Start_hour End_date
23:58:00 00:26:00
23:56:00 00:01:00
23:18:00 23:36:00
How can I get in a new column the difference (in minutes) between these two columns?
>>> from datetime import datetime
>>>
>>> before = datetime.now()
>>> print('wait for more than 1 minute')
wait for more than 1 minute
>>> after = datetime.now()
>>> td = after - before
>>>
>>> td
datetime.timedelta(seconds=98, microseconds=389121)
>>> td.total_seconds()
98.389121
>>> td.total_seconds() / 60
1.6398186833333335
Then you can round it or use it as-is.
You can do something like this:
import pandas as pd
df = pd.DataFrame({
'Start_hour': ['23:58:00', '23:56:00', '23:18:00'],
'End_date': ['00:26:00', '00:01:00', '23:36:00']}
)
df['Start_hour'] = pd.to_datetime(df['Start_hour'])
df['End_date'] = pd.to_datetime(df['End_date'])
df['diff'] = df.apply(
lambda row: (row['End_date']-row['Start_hour']).seconds / 60,
axis=1
)
print(df)
Start_hour End_date diff
0 2021-03-29 23:58:00 2021-03-29 00:26:00 28.0
1 2021-03-29 23:56:00 2021-03-29 00:01:00 5.0
2 2021-03-29 23:18:00 2021-03-29 23:36:00 18.0
You can also rearrange your dates as string again if you like:
df['Start_hour'] = df['Start_hour'].apply(lambda x: x.strftime('%H:%M:%S'))
df['End_date'] = df['End_date'].apply(lambda x: x.strftime('%H:%M:%S'))
print(df)
Output:
Start_hour End_date diff
0 23:58:00 00:26:00 28.0
1 23:56:00 00:01:00 5.0
2 23:18:00 23:36:00 18.0
Short answer:
df['interval'] = df['End_date'] - df['Start_hour']
df['interval'][df['End_date'] < df['Start_hour']] += timedelta(hours=24)
Why so:
You probably trying to solve the problem that your Start_hout and End_date values sometimes belong to a different days, and that's why you can't just substutute one from the other.
It your time window never exceeds 24 hours interval, you could use some modular arithmetic to deal with 23:59:59 - 00:00:00 border:
if End_date < Start_hour, this always means End_date belongs to a next day
this implies, if End_date - Start_hour < 0 then we should add 24 hours to End_date to find the actual difference
The final formula is:
if rec['Start_hour'] < rec['End_date']:
offset = 0
else:
offset = timedelta(hours=24)
rec['delta'] = offset + rec['End_date'] - rec['Start_hour']
To do the same with pandas.DataFrame we need to change code accordingly. And
that's how we get the snippet from the beginning of the answer.
import pandas as pd
df = pd.DataFrame([
{'Start_hour': datetime(1, 1, 1, 23, 58, 0), 'End_date': datetime(1, 1, 1, 0, 26, 0)},
{'Start_hour': datetime(1, 1, 1, 23, 58, 0), 'End_date': datetime(1, 1, 1, 23, 59, 0)},
])
# ...
df['interval'] = df['End_date'] - df['Start_hour']
df['interval'][df['End_date'] < df['Start_hour']] += timedelta(hours=24)
> df
Start_hour End_date interval
0 0001-01-01 23:58:00 0001-01-01 00:26:00 0 days 00:28:00
1 0001-01-01 23:58:00 0001-01-01 23:59:00 0 days 00:01:00

Negative time duration in Pandas

I have a dataset with two columns: Actual Time and Promised Time (representing the actual and promised start times of some process).
For example:
import pandas as pd
example_df = pd.DataFrame(columns = ['Actual Time', 'Promised Time'],
data = [
('2016-6-10 9:00', '2016-6-10 9:00'),
('2016-6-15 8:52', '2016-6-15 9:52'),
('2016-6-19 8:54', '2016-6-19 9:02')]).applymap(pd.Timestamp)
So as we can see, sometimes Actual Time = Promised Time, but there are also cases where Actual Time < Promised Time.
I defined a column that shows the difference between these two columns (example_df['Actual Time']-example_df['Promised Time']), but the problem is that for the third row it returned -1 day +23:52:00 instead of - 00:08:00.
Sample:
print (df)
Actual Time Promised Time
0 2016-6-10 9:00 2016-6-10 9:00
1 2016-6-15 10:52 2016-6-15 9:52 <- changed datetimes
2 2016-6-19 8:54 2016-6-19 9:02
def format_timedelta(x):
ts = x.total_seconds()
if ts >= 0:
hours, remainder = divmod(ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
else:
hours, remainder = divmod(-ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('-{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
First create datetimes:
df['Actual Time'] = pd.to_datetime(df['Actual Time'])
df['Promised Time'] = pd.to_datetime(df['Promised Time'])
And then timedeltas:
df['diff'] = (df['Actual Time'] - df['Promised Time'])
If convert negative timedeltas to seconds by Series.dt.total_seconds it working nice:
df['diff1'] = df['diff'].dt.total_seconds()
But if want negative timedeltas in string representation it is possible with custom function, because strftime for timedeltas is not yet implemented:
df['diff2'] = df['diff'].apply(format_timedelta)
print (df)
Actual Time Promised Time diff diff1 diff2
0 2016-06-10 09:00:00 2016-06-10 09:00:00 00:00:00 0.0 0:00:00
1 2016-06-15 10:52:00 2016-06-15 09:52:00 01:00:00 3600.0 1:00:00
2 2016-06-19 08:54:00 2016-06-19 09:02:00 -1 days +23:52:00 -480.0 -0:08:00
I assume your dataframe already in datetime dtype. abs works just fine
Without abs
df['Actual Time'] - df['Promised Time']
Out[526]:
0 00:00:00
1 -1 days +23:00:00
2 -1 days +23:52:00
dtype: timedelta64[ns]
With abs
abs(df['Promised Time'] - df['Actual Time'])
Out[529]:
0 00:00:00
1 01:00:00
2 00:08:00
dtype: timedelta64[ns]
The difference result is timedelta type which by default is in ns format.
You need to change the type of your result to you desired format:
import pandas as pd
df=pd.DataFrame(data={
'Actual Time':['2016-6-10 9:00','2016-6-15 8:52','2016-6-19 8:54'],
'Promised Time':['2016-6-10 9:00','2016-6-15 9:52','2016-6-19 9:02']
},dtype='datetime64[ns]')
# here you need to add the `astype` part and to determine the unit you want
df['diff']=(df['Actual Time']-df['Promised Time']).astype('timedelta64[m]')

Adding 1 day to selective time period of date column

I have a table where it has a column 'Date', 'Time', 'Costs'.
I want to select rows where the time is greater than 12:00:00, then add 1 day to 'Date' column of the selected rows.
How should I go about in doing it?
So far I have:
df[df['Time']>'12:00:00']['Date'] = df[df['Time']>'12:00:00']['Date'].astype('datetime64[ns]') + timedelta(days=1)
I am a beginner in learning coding and any suggestions would be really helpful! Thanks.
Use to_datetime first for column Date if not datetimes, then convert column Time to string if possible python times, convert to datetimes and get hours by Series.dt.hour, compare and add 1 day by condition:
df = pd.DataFrame({'Date':['2015-01-02','2016-05-08'],
'Time':['10:00:00','15:00:00']})
print (df)
Date Time
0 2015-01-02 10:00:00
1 2016-05-08 15:00:00
df['Date'] = pd.to_datetime(df['Date'])
mask = pd.to_datetime(df['Time'].astype(str)).dt.hour > 12
df.loc[mask, 'Date'] += pd.Timedelta(days=1)
print (df)
Date Time
0 2015-01-02 10:00:00
1 2016-05-09 15:00:00

Categories