Converting integers representing time periods to time in Pandas - python

I have a pandas dataframe with time periods in the second column. Every period represents 30 minutes and it goes all the way up to 48 periods (24 hours). Is there some way to change the integers representing the periods into a time format and concatenate it with the date column for a full datetime? E.g. 1 becomes 00:30, 2 becomes 01:00, 3 becomes 01:30 and so on.

You can cast the DATE column to datetime and add a timedelta of 30 minutes multiplied by PERIOD.
import pandas as pd
df = pd.DataFrame({'DATE':['2015-01-03', '2015-01-03', '2015-01-03'],
'PERIOD':[1,2,3]})
df['DATETIME'] = pd.to_datetime(df['DATE']) + df['PERIOD'] * pd.Timedelta(30, unit='min')
# df
# DATE PERIOD DATETIME
# 0 2015-01-03 1 2015-01-03 00:30:00
# 1 2015-01-03 2 2015-01-03 01:00:00
# 2 2015-01-03 3 2015-01-03 01:30:00

Related

Calculate the time difference between two hh:mm columns in a pandas dataframe

I am reading some data from an csv file where the datatype of the two columns are in hh:mm format. Here is an example:
Start End
11:15 15:00
22:30 2:00
In the above example, the End in the 2nd row happens in the next day. I am trying to get the time difference between these two columns in the most efficient way as the dataset is huge. Is there any good pythonic way for doing this? Also, since there is no date, and some Ends happen in the next I get wrong result when I calculate the diff.
>>> import pandas as pd
>>> df = pd.read_csv(file_path)
>>> pd.to_datetime(df['End'])-pd.to_datetime(df['Start'])
0 0 days 03:45:00
1 0 days 03:00:00
2 -1 days +03:30:00
You can use the technique (a+x)%x with a timedelta of 24h (or 1d, same)
the + timedelta(hours=24) makes all values becomes positive
the % timedelta(hours=24) makes the ones above 24h back of 24h
df['duration'] = (pd.to_datetime(df['End']) - pd.to_datetime(df['Start']) + timedelta(hours=24)) \
% timedelta(hours=24)
Gives
Start End duration
0 11:15 15:00 0 days 03:45:00
1 22:30 2:00 0 days 03:30:00

Dealing with time objects in pandas python

I am working with a Pandas Series that contains (Date/Time) Strings of the form:
"2020-04-01 09:29:21"-"2020-04-01 09:53:17"-"2020-04-13 09:55:55"-.....).
The format is : "yyyy-mm-dd H:M:s".
I am only interested in the hour and minute components and I am looking for a way to divide the data into 30 minute buckets and count the values in each bucket.
An example of my end result:
Range count
9:00-9:30 7
9:30-10:00 25
10:00-10:30 35.......
You need to resample first and then do a groupby the time. Lets us create a serie and set the index to DateTimeIndex otherwise resample won't work:
# random data
np.random.seed(0)
serie = pd.Series(
np.random.choice(pd.date_range(
'2020-01-01', freq='7T22S', periods=10000), 1000)
)
serie.index = serie
Do a resample and then do a groupby:
res = serie.resample('30T').count()
results = res.groupby(res.index.time).sum()
#Change the index to match the format
results.index = results.index.astype(str) + ' - ' +\
np.roll(results.index.astype(str), -1)
results.head()
# 00:00:00 - 00:30:00 19
# 00:30:00 - 01:00:00 25
# 01:00:00 - 01:30:00 19
# 01:30:00 - 02:00:00 28
# 02:00:00 - 02:30:00 22

String dates into unixtime in a pandas dataframe

i got dataframe with column like this:
Date
3 mins
2 hours
9-Feb
13-Feb
the type of the dates is string for every row. What is the easiest way to get that dates into integer unixtime ?
One idea is convert columns to datetimes and to timedeltas:
df['dates'] = pd.to_datetime(df['Date']+'-2020', format='%d-%b-%Y', errors='coerce')
times = df['Date'].replace({'(\d+)\s+mins': '00:\\1:00',
'\s+hours': ':00:00'}, regex=True)
df['times'] = pd.to_timedelta(times, errors='coerce')
#remove rows if missing values in dates and times
df = df[df['Date'].notna() | df['times'].notna()]
df['all'] = df['dates'].dropna().astype(np.int64).append(df['times'].dropna().astype(np.int64))
print (df)
Date dates times all
0 3 mins NaT 00:03:00 180000000000
1 2 hours NaT 02:00:00 7200000000000
2 9-Feb 2020-02-09 NaT 1581206400000000000
3 13-Feb 2020-02-13 NaT 1581552000000000000

Negative time duration in Pandas

I have a dataset with two columns: Actual Time and Promised Time (representing the actual and promised start times of some process).
For example:
import pandas as pd
example_df = pd.DataFrame(columns = ['Actual Time', 'Promised Time'],
data = [
('2016-6-10 9:00', '2016-6-10 9:00'),
('2016-6-15 8:52', '2016-6-15 9:52'),
('2016-6-19 8:54', '2016-6-19 9:02')]).applymap(pd.Timestamp)
So as we can see, sometimes Actual Time = Promised Time, but there are also cases where Actual Time < Promised Time.
I defined a column that shows the difference between these two columns (example_df['Actual Time']-example_df['Promised Time']), but the problem is that for the third row it returned -1 day +23:52:00 instead of - 00:08:00.
Sample:
print (df)
Actual Time Promised Time
0 2016-6-10 9:00 2016-6-10 9:00
1 2016-6-15 10:52 2016-6-15 9:52 <- changed datetimes
2 2016-6-19 8:54 2016-6-19 9:02
def format_timedelta(x):
ts = x.total_seconds()
if ts >= 0:
hours, remainder = divmod(ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
else:
hours, remainder = divmod(-ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('-{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
First create datetimes:
df['Actual Time'] = pd.to_datetime(df['Actual Time'])
df['Promised Time'] = pd.to_datetime(df['Promised Time'])
And then timedeltas:
df['diff'] = (df['Actual Time'] - df['Promised Time'])
If convert negative timedeltas to seconds by Series.dt.total_seconds it working nice:
df['diff1'] = df['diff'].dt.total_seconds()
But if want negative timedeltas in string representation it is possible with custom function, because strftime for timedeltas is not yet implemented:
df['diff2'] = df['diff'].apply(format_timedelta)
print (df)
Actual Time Promised Time diff diff1 diff2
0 2016-06-10 09:00:00 2016-06-10 09:00:00 00:00:00 0.0 0:00:00
1 2016-06-15 10:52:00 2016-06-15 09:52:00 01:00:00 3600.0 1:00:00
2 2016-06-19 08:54:00 2016-06-19 09:02:00 -1 days +23:52:00 -480.0 -0:08:00
I assume your dataframe already in datetime dtype. abs works just fine
Without abs
df['Actual Time'] - df['Promised Time']
Out[526]:
0 00:00:00
1 -1 days +23:00:00
2 -1 days +23:52:00
dtype: timedelta64[ns]
With abs
abs(df['Promised Time'] - df['Actual Time'])
Out[529]:
0 00:00:00
1 01:00:00
2 00:08:00
dtype: timedelta64[ns]
The difference result is timedelta type which by default is in ns format.
You need to change the type of your result to you desired format:
import pandas as pd
df=pd.DataFrame(data={
'Actual Time':['2016-6-10 9:00','2016-6-15 8:52','2016-6-19 8:54'],
'Promised Time':['2016-6-10 9:00','2016-6-15 9:52','2016-6-19 9:02']
},dtype='datetime64[ns]')
# here you need to add the `astype` part and to determine the unit you want
df['diff']=(df['Actual Time']-df['Promised Time']).astype('timedelta64[m]')

Python Pandas sizeof times

I am working in a dataframe in Pandas that looks like this.
Identifier datetime
0 AL011851 00:00:00
1 AL011851 06:00:00
2 Al011851 12:00:00
This is my code so far:
import pandas as pd
hurricane_df = pd.read_csv("hurdat2.csv",parse_dates=['datetime'])
hurricane_df['datetime'] = pd.to_timedelta(hurricane_df['datetime'].dt.strftime('%H:%M:%S'))
hurricane_df
grouped = hurricane_df.groupby('datetime').size()
grouped
What I did was convert the datetime column to a timedelta to get the hours. I want to get the size of the datetime column but I want just hours like 1:00, 2:00, 3:00, etc. but I get minute intervals as well like 1:15 and 2:45.
Any way to just display the hour?
Thank you.
You can use pandas.Timestamp.round with Series.dt shortcut:
df['datetime'] = df['datetime'].dt.round('h')
So
... datetime
01:15:00
02:45:00
becomes
... datetime
01:00:00
03:00:00
df = pd.DataFrame({'Identifier':['AL011851','AL011851','AL011851'],'datetime': ["2018-12-08 16:35:23","2018-12-08 14:20:45", "2018-12-08 11:45:00"]})
df['datetime'] = pd.to_datetime(df['datetime'])
df
Identifier datetime
0 AL011851 2018-12-08 16:35:23
1 AL011851 2018-12-08 14:20:45
2 AL011851 2018-12-08 11:45:00
# Rounds to nearest hour
def roundHour(t):
return (t.replace(second=0, microsecond=0, minute=0, hour=t.hour)
+timedelta(hours=t.minute//30))
df.datetime=df.datetime.map(lambda t: roundHour(t)) # Step 1: Round to nearest hour
df.datetime=df.datetime.map(lambda t: t.strftime('%H:%M')) # Step 2: Remove seconds
df
Identifier datetime
0 AL011851 17:00
1 AL011851 14:00
2 AL011851 12:00

Categories