finding time slots present between start time and end time in python - python

We have csv file containing predefined time slots.
According to start time and end time provided by the user we want time slots present between the start time and end time.
eg
start time =11:00:00
end time=19:00:00
output- slot_no 2,3,4,5

I think you need boolean indexing with loc and between for selecting column Slot_no, all columns and values are converted to_timedelta, also midnight is replaced to 24:00:00:
df = pd.DataFrame(
{'Slot_no':[1,2,3,4,5,6,7],
'start_time':['0:01:00','8:01:00','10:01:01','12:01:00','14:01:00','18:01:01','20:01:00'],
'end_time':['8:00:00','10:00:00','12:00:00','14:00:00','18:00:00','20:00:00','0:00:00']})
df = df.reindex_axis(['Slot_no','start_time','end_time'], axis=1)
df['start_time'] = pd.to_timedelta(df['start_time'])
df['end_time'] = pd.to_timedelta(df['end_time'].replace('0:00:00', '24:00:00'))
print (df)
Slot_no start_time end_time
0 1 00:01:00 0 days 08:00:00
1 2 08:01:00 0 days 10:00:00
2 3 10:01:01 0 days 12:00:00
3 4 12:01:00 0 days 14:00:00
4 5 14:01:00 0 days 18:00:00
5 6 18:01:01 0 days 20:00:00
6 7 20:01:00 1 days 00:00:00
start = pd.to_timedelta('11:00:00')
end = pd.to_timedelta('19:00:00')
mask = df['start_time'].between(start, end) | df['end_time'].between(start, end)
s = df.loc[mask, 'Slot_no']
print (s)
2 3
3 4
4 5
5 6
Name: Slot_no, dtype: int64
L = df.loc[mask, 'Slot_no'].tolist()
print (L)
[3, 4, 5, 6]

Related

Convert multiple time format object as datetime format

I have a dataframe with a list of time value as object and needed to convert them to datetime, the issue is, they are not on the same format so when I try:
df['Total call time'] = pd.to_datetime(df['Total call time'], format='%H:%M:%S')
it gives me an error
ValueError: time data '3:22' does not match format '%H:%M:%S' (match)
or if use this code
df['Total call time'] = pd.to_datetime(df['Total call time'], format='%H:%M')
I get this error
ValueError: unconverted data remains: :58
These are the values on my data
Total call time
2:04:07
3:22:41
2:30:41
2:19:06
1:45:55
1:30:08
1:32:15
1:43:28
**45:48**
1:41:40
5:08:37
**3:22**
4:29:05
2:47:25
2:39:29
2:29:32
2:09:52
3:31:57
2:27:58
2:34:28
3:14:10
2:12:10
2:46:58
times = """\
2:04:07
3:22:41
2:30:41
2:19:06
1:45:55
1:30:08
1:32:15
1:43:28
45:48
1:41:40
5:08:37
3:22
4:29:05
2:47:25
2:39:29
2:29:32
2:09:52
3:31:57
2:27:58
2:34:28
3:14:10
2:12:10
2:46:58""".split()
import pandas as pd
df = pd.DataFrame(times, columns=['elapsed'])
def pad(s):
if len(s) == 4:
return '00:0'+s
elif len(s) == 5:
return '00:'+s
return s
print(pd.to_timedelta(df['elapsed'].apply(pad)))
Output:
0 0 days 02:04:07
1 0 days 03:22:41
2 0 days 02:30:41
3 0 days 02:19:06
4 0 days 01:45:55
5 0 days 01:30:08
6 0 days 01:32:15
7 0 days 01:43:28
8 0 days 00:45:48
9 0 days 01:41:40
10 0 days 05:08:37
11 0 days 00:03:22
12 0 days 04:29:05
13 0 days 02:47:25
14 0 days 02:39:29
15 0 days 02:29:32
16 0 days 02:09:52
17 0 days 03:31:57
18 0 days 02:27:58
19 0 days 02:34:28
20 0 days 03:14:10
21 0 days 02:12:10
22 0 days 02:46:58
Name: elapsed, dtype: timedelta64[ns]
Alternatively to grovina's answer ... instead of using apply you can directly use the dt accessor.
Here's a sample:
>>> data = [['2017-12-01'], ['2017-12-
30'],['2018-01-01']]
>>> df = pd.DataFrame(data=data,
columns=['date'])
>>> df
date
0 2017-12-01
1 2017-12-30
2 2018-01-01
>>> df.date
0 2017-12-01
1 2017-12-30
2 2018-01-01
Name: date, dtype: object
Note how df.date is an object? Let's turn it into a date like you want
>>> df.date = pd.to_datetime(df.date)
>>> df.date
0 2017-12-01
1 2017-12-30
2 2018-01-01
Name: date, dtype: datetime64[ns]
The format you want is for string formatting. I don't think you'll be able to convert the actual datetime64 to look like that format. For now, let's make a newly formatted string version of your date in a separate column
>>> df['new_formatted_date'] =
df.date.dt.strftime('%d/%m/%y %H:%M')
>>> df.new_formatted_date
0 01/12/17 00:00
1 30/12/17 00:00
2 01/01/18 00:00
Name: new_formatted_date, dtype: object
Finally, since the df.date column is now of date datetime64... you can use the dt accessor right on it. No need to use apply
>>> df['month'] = df.date.dt.month
>>> df['day'] = df.date.dt.day
>>> df['year'] = df.date.dt.year
>>> df['hour'] = df.date.dt.hour
>>> df['minute'] = df.date.dt.minute
>>> df
date new_formatted_date month day
year hour minute
0 2017-12-01 01/12/17 00:00 12
1 2017 0 0
1 2017-12-30 30/12/17 00:00 12
30 2017 0 0
2 2018-01-01 01/01/18 00:00
Another idea is test if double : and if not added :00 with converting to timedeltas by to_timedelta, also is test if number before first : is less like 23 - then is parsing like HH:MM, if is greater is parising like MM:SS:
m1 = df['Total call time'].str.count(':').ne(2)
m2 = df['Total call time'].str.extract('^(\d+):', expand=False).astype(float).gt(23)
s = np.select([m1 & m2, m1 & ~m2],
['00:' + df['Total call time'], df['Total call time']+ ':00'],
df['Total call time'] )
df['Total call time'] = pd.to_timedelta(s)
print (df)
Total call time
0 0 days 02:04:07
1 0 days 03:22:41
2 0 days 02:30:41
3 0 days 02:19:06
4 0 days 01:45:55
5 0 days 01:30:08
6 0 days 01:32:15
7 0 days 01:43:28
8 0 days 00:45:48
9 0 days 01:41:40
10 0 days 05:08:37
11 0 days 03:22:00
12 0 days 04:29:05
13 0 days 02:47:25
14 0 days 02:39:29
15 0 days 02:29:32
16 0 days 02:09:52
17 0 days 03:31:57
18 0 days 02:27:58
19 0 days 02:34:28
20 0 days 03:14:10
21 0 days 02:12:10
22 0 days 02:46:58

Convert string hours to minute pd.eval

I want to convert all rows of my DataFrame that contains hours and minutes into minutes only.
I have a dataframe that looks like this:
df=
time
0 8h30
1 14h07
2 08h30
3 7h50
4 8h0
5 8h15
6 6h15
I'm using the following method to convert:
df['time'] = pd.eval(
df['time'].replace(['h'], ['*60+'], regex=True))
Output
SyntaxError: invalid syntax
I think the error comes from the format of the hour, maybe pd.evalcant accept 08h30 or 8h0, how to solve this probleme ?
Pandas can already handle such strings if the units are included in the string. While 14h07 can't be parse (why assume 07 is minutes?), 14h07 can be converted to a Timedelta :
>>> pd.to_timedelta("14h07m")
Timedelta('0 days 14:07:00')
Given this dataframe :
d1 = pd.DataFrame(['8h30m', '14h07m', '08h30m', '8h0m'],
columns=['time'])
You can convert the time series into a Timedelta series with pd.to_timedelta :
>>> d1['tm'] = pd.to_timedelta(d1['time'])
>>> d1
time tm
0 8h30m 0 days 08:30:00
1 14h07m 0 days 14:07:00
2 08h30m 0 days 08:30:00
3 8h0m 0 days 08:00:00
To handle the missing minutes unit in the original data, just append m:
d1['tm'] = pd.to_timedelta(d1['time'] + 'm')
Once you have a Timedelta you can calculate hours and minutes.
The components of the values can be retrieved with Timedelta.components
>>> d1.tm.dt.components.hours
0 8
1 14
2 8
3 8
Name: hours, dtype: int64
To get the total minutes, seconds or hours, change the frequency to minutes:
>>> d1.tm.astype('timedelta64[m]')
0 510.0
1 847.0
2 510.0
3 480.0
Name: tm, dtype: float64
Bringing all the operations together :
>>> d1['tm'] = pd.to_timedelta(d1['time'])
>>> d2 = (d1.assign(h=d1.tm.dt.components.hours,
... m=d1.tm.dt.components.minutes,
... total_minutes=d1.tm.astype('timedelta64[m]')))
>>>
>>> d2
time tm h m total_minutes
0 8h30m 0 days 08:30:00 8 30 510.0
1 14h07m 0 days 14:07:00 14 7 847.0
2 08h30m 0 days 08:30:00 8 30 510.0
3 8h0m 0 days 08:00:00 8 0 480.0
To avoid having to trim leading zeros, an alternative approach:
df[['h', 'm']] = df['time'].str.split('h', expand=True).astype(int)
df['total_min'] = df['h']*60 + df['m']
Result:
time h m total_min
0 8h30 8 30 510
1 14h07 14 7 847
2 08h30 8 30 510
3 7h50 7 50 470
4 8h0 8 0 480
5 8h15 8 15 495
6 6h15 6 15 375
Just to give an alternative approach with kind of the same elements as above you could do:
df = pd.DataFrame(data=["8h30", "14h07", "08h30", "7h50", "8h0 ", "8h15", "6h15"],
columns=["time"])
First split you column on the "h"
hm = df["time"].str.split("h", expand=True)
Then combine the columns again, but zeropad time hours and minutes in order to make valid time strings:
df2 = hm[0].str.strip().str.zfill(2) + hm[1].str.strip().str.zfill(2)
Then convert the string column with proper values to a date time column:
df3 = pd.to_datetime(df2, format="%H%M")
Finally, calculate the number of minutes by subtrackting a zero time (to make deltatimes) and divide by the minutes deltatime:
zerotime= pd.to_datetime("0000", format="%H%M")
df['minutes'] = (df3 - zerotime) / pd.Timedelta(minutes=1)
The results look like:
time minutes
0 8h30 510.0
1 14h07 847.0
2 08h30 510.0
3 7h50 470.0
4 8h0 480.0
5 8h15 495.0
6 6h15 375.0

Pandas read format %D:%H:%M:%S with python

Currently I am reading in a data frame with the timestamp from film 00(days):00(hours clocks over at 24 to day):00(min):00(sec)
pandas reads time formats HH:MM:SS and YYYY:MM:DD HH:MM:SS fine.
Though is there a way of having pandas read the duration of time such as the DD:HH:MM:SS.
Alternatively using timedelta how would I go about getting the DD into HH in the data frame so that pandas can make it "1 day HH:MM:SS" for example
Data sample
00:00:00:00
00:07:33:57
02:07:02:13
00:00:13:11
00:00:10:11
00:00:00:00
00:06:20:06
01:12:13:25
Expected output for last sample
36:13:25
Thanks
If you want timedelta objects, a simple way is to replace the first colon with days :
df['timedelta'] = pd.to_timedelta(df['col'].str.replace(':', 'days ', n=1))
output:
col timedelta
0 00:00:00:00 0 days 00:00:00
1 00:07:33:57 0 days 07:33:57
2 02:07:02:13 2 days 07:02:13
3 00:00:13:11 0 days 00:13:11
4 00:00:10:11 0 days 00:10:11
5 00:00:00:00 0 days 00:00:00
6 00:06:20:06 0 days 06:20:06
7 01:12:13:25 1 days 12:13:25
>>> df.dtypes
col object
timedelta timedelta64[ns]
dtype: object
From there it's also relatively easy to combine the days and hours as string:
c = df['timedelta'].dt.components
df['str_format'] = ((c['hours']+c['days']*24).astype(str)
+df['col'].str.split('(?=:)', n=2).str[-1]).str.zfill(8)
output:
col timedelta str_format
0 00:00:00:00 0 days 00:00:00 00:00:00
1 00:07:33:57 0 days 07:33:57 07:33:57
2 02:07:02:13 2 days 07:02:13 55:02:13
3 00:00:13:11 0 days 00:13:11 00:13:11
4 00:00:10:11 0 days 00:10:11 00:10:11
5 00:00:00:00 0 days 00:00:00 00:00:00
6 00:06:20:06 0 days 06:20:06 06:20:06
7 01:12:13:25 1 days 12:13:25 36:13:25
Convert days separately, add to times and last call custom function:
def f(x):
ts = x.total_seconds()
hours, remainder = divmod(ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
d = pd.to_timedelta(df['col'].str[:2].astype(int), unit='d')
td = pd.to_timedelta(df['col'].str[3:])
df['col'] = d.add(td).apply(f)
print (df)
col
0 0:00:00
1 7:33:57
2 55:02:13
3 0:13:11
4 0:10:11
5 0:00:00
6 6:20:06
7 36:13:25

Pandas elements per week between start date and end date

I'm starting from a dataframe that has a start date and an end date, for instance:
ID START END A
0 2014-04-09 2014-04-15 5
1 2018-06-05 2018-07-01 8
2 2018-06-05 2018-07-01 7
And I'm trying to find, for each week, how many elements were started but not ended at that point.
For instance, in the DF above:
Week-Monday N
2014-04-07 1
2014-04-14 1
2014-04-21 0
...
2018-06-04 2
...
Something like the below doesn't quite work, since it only resamples on end date:
df = df.resample("W-Mon", on="END").sum()
I don't know how to integrate both conditions: that the occurrences be after the start date, yet before the end date.
You can start from here:
import pandas as pd
df = pd.DataFrame({'ID':[0,1,2],
'START':['2014-04-09', '2018-06-05', '2018-06-05'],
'END':['2014-04-15', '2018-07-01', '2018-07-01'],
'A':[5,8,7]})
1- Find week number for each SRART and each END, and find Week-Monday.
import datetime, time
from datetime import timedelta
df.loc[:,'startWeek'] = df.START.apply(lambda x: datetime.datetime.strptime(x,'%Y-%m-%d').isocalendar()[1])
df.loc[:,'endWeek'] = df.END.apply(lambda x: datetime.datetime.strptime(x,'%Y-%m-%d').isocalendar()[1])
df.loc[:, 'Week-Monday'] = df.START.apply(lambda x: datetime.datetime.strptime(x,'%Y-%m-%d')- timedelta(days=datetime.datetime.strptime(x,'%Y-%m-%d').weekday()))
2- Check if they are the same, if yes, then ended during the same week.
def endedNotSameWeek(row):
if row['startWeek']!=row['endWeek']:
return 1
return 0
df.loc[:,'NotSameWeek'] = df.apply(endedNotSameWeek, axis=1)
print(df)
Output:
ID START END A startWeek endWeek Week-Monday NotSameWeek
0 0 2014-04-09 2014-04-15 5 15 16 2014-04-07 1
1 1 2018-06-05 2018-07-01 8 23 26 2018-06-04 1
2 2 2018-06-05 2018-07-01 7 23 26 2018-06-04 1
3- Groupby each Week-Monday to get the number of cases did not end during the same week.
df.groupby('Week-Monday')['NotSameWeek'].agg({'N':'sum'}).reset_index()
Week-Monday N
0 2014-04-07 1
1 2018-06-04 2

Week of a month pandas

I'm trying to get week on a month, some months might have four weeks some might have five.
For each date i would like to know to which week does it belongs to. I'm mostly interested in the last week of the month.
data = pd.DataFrame(pd.date_range(' 1/ 1/ 2000', periods = 100, freq ='D'))
0 2000-01-01
1 2000-01-02
2 2000-01-03
3 2000-01-04
4 2000-01-05
5 2000-01-06
6 2000-01-07
See this answer and decide which week of month you want.
There's nothing built-in, so you'll need to calculate it with apply. For example, for an easy 'how many 7 day periods have passed' measure.
data['wom'] = data[0].apply(lambda d: (d.day-1) // 7 + 1)
For a more complicated (based on the calender), using the function from that answer.
import datetime
import calendar
def week_of_month(tgtdate):
tgtdate = tgtdate.to_datetime()
days_this_month = calendar.mdays[tgtdate.month]
for i in range(1, days_this_month):
d = datetime.datetime(tgtdate.year, tgtdate.month, i)
if d.day - d.weekday() > 0:
startdate = d
break
# now we canuse the modulo 7 appraoch
return (tgtdate - startdate).days //7 + 1
data['calendar_wom'] = data[0].apply(week_of_month)
I've used the code below when dealing with dataframes that have a datetime index.
import pandas as pd
import math
def add_week_of_month(df):
df['week_in_month'] = pd.to_numeric(df.index.day/7)
df['week_in_month'] = df['week_in_month'].apply(lambda x: math.ceil(x))
return df
If you run this example:
df = test = pd.DataFrame({'count':['a','b','c','d','e']},
index = ['2018-01-01', '2018-01-08','2018-01-31','2018-02-01','2018-02-28'])
df.index = pd.to_datetime(df.index)
you should get the following dataframe
count week_in_month
2018-01-01 a 1
2018-01-08 b 2
2018-01-31 c 5
2018-02-01 d 1
2018-02-28 e 4
TL;DR
import pandas as pd
def weekinmonth(dates):
"""Get week number in a month.
Parameters:
dates (pd.Series): Series of dates.
Returns:
pd.Series: Week number in a month.
"""
firstday_in_month = dates - pd.to_timedelta(dates.dt.day - 1, unit='d')
return (dates.dt.day-1 + firstday_in_month.dt.weekday) // 7 + 1
df = pd.DataFrame(pd.date_range(' 1/ 1/ 2000', periods = 100, freq ='D'), columns=['Date'])
weekinmonth(df['Date'])
0 1
1 1
2 2
3 2
4 2
..
95 2
96 2
97 2
98 2
99 2
Name: Date, Length: 100, dtype: int64
Explanation
At first, calculate first day in month (from this answer: How floor a date to the first date of that month?):
df = pd.DataFrame(pd.date_range(' 1/ 1/ 2000', periods = 100, freq ='D'), columns=['Date'])
df['MonthFirstDay'] = df['Date'] - pd.to_timedelta(df['Date'].dt.day - 1, unit='d')
df
Date MonthFirstDay
0 2000-01-01 2000-01-01
1 2000-01-02 2000-01-01
2 2000-01-03 2000-01-01
3 2000-01-04 2000-01-01
4 2000-01-05 2000-01-01
.. ... ...
95 2000-04-05 2000-04-01
96 2000-04-06 2000-04-01
97 2000-04-07 2000-04-01
98 2000-04-08 2000-04-01
99 2000-04-09 2000-04-01
[100 rows x 2 columns]
Obtain weekday from first day:
df['FirstWeekday'] = df['MonthFirstDay'].dt.weekday
df
Date MonthFirstDay FirstWeekday
0 2000-01-01 2000-01-01 5
1 2000-01-02 2000-01-01 5
2 2000-01-03 2000-01-01 5
3 2000-01-04 2000-01-01 5
4 2000-01-05 2000-01-01 5
.. ... ... ...
95 2000-04-05 2000-04-01 5
96 2000-04-06 2000-04-01 5
97 2000-04-07 2000-04-01 5
98 2000-04-08 2000-04-01 5
99 2000-04-09 2000-04-01 5
[100 rows x 3 columns]
Now I can calculate with modulo of weekdays to obtain the week number in a month:
Get day of the month by df['Date'].dt.day and make sure that begins with 0 due to modulo calculation df['Date'].dt.day-1.
Add weekday number to make sure which day of month starts + df['FirstWeekday']
Be safe to use the integer division of 7 days in a week and add 1 to start week number in month from 1 // 7 + 1.
Whole modulo calculation:
df['WeekInMonth'] = (df['Date'].dt.day-1 + df['FirstWeekday']) // 7 + 1
df
Date MonthFirstDay FirstWeekday WeekInMonth
0 2000-01-01 2000-01-01 5 1
1 2000-01-02 2000-01-01 5 1
2 2000-01-03 2000-01-01 5 2
3 2000-01-04 2000-01-01 5 2
4 2000-01-05 2000-01-01 5 2
.. ... ... ... ...
95 2000-04-05 2000-04-01 5 2
96 2000-04-06 2000-04-01 5 2
97 2000-04-07 2000-04-01 5 2
98 2000-04-08 2000-04-01 5 2
99 2000-04-09 2000-04-01 5 2
[100 rows x 4 columns]
This seems to do the trick for me
df_dates = pd.DataFrame({'date':pd.bdate_range(df['date'].min(),df['date'].max())})
df_dates_tues = df_dates[df_dates['date'].dt.weekday==2].copy()
df_dates_tues['week']=np.mod(df_dates_tues['date'].dt.strftime('%W').astype(int),4)
You can get it subtracting the current week and the week of the first day of the month, but extra logic is needed to handle first and last week of the year:
def get_week(s):
prev_week = (s - pd.to_timedelta(7, unit='d')).dt.week
return (
s.dt.week
.where((s.dt.month != 1) | (s.dt.week < 50), 0)
.where((s.dt.month != 12) | (s.dt.week > 1), prev_week + 1)
)
def get_week_of_month(s):
first_day_of_month = s - pd.to_timedelta(s.dt.day - 1, unit='d')
first_week_of_month = get_week(first_day_of_month)
current_week = get_week(s)
return current_week - first_week_of_month
My logic to get the week of the month depends on the week of the year.
1st calculate week of the year in a data frame
Then get the max week month of the previous year if the month is not 1, if month is 1 return week of year
if max week of previous month equals max week of current month
Then return the difference current week of the year with the max week month of the previous month plus 1
Else return difference of current week of the year with the max week month of the previous month
Hope this solves the problem for multiple logics used above which have limitations, the below function does the same. Temp here is the data frame for which week of the year is calculated using dt.weekofyear
def weekofmonth(dt1):
if dt1.month == 1:
return (dt1.weekofyear)
else:
pmth = dt1.month - 1
year = dt1.year
pmmaxweek = temp[(temp['timestamp_utc'].dt.month == pmth) & (temp['timestamp_utc'].dt.year == year)]['timestamp_utc'].dt.weekofyear.max()
if dt1.weekofyear == pmmaxweek:
return (dt1.weekofyear - pmmaxweek + 1)
else:
return (dt1.weekofyear - pmmaxweek)
import pandas as pd
import math
def week_of_month(dt:pd.Timestamp):
return math.ceil((x-1)//7)+1
dt["what_you_need"] = df["dt_col_name"].apply(week_of_month)
This gives you week from 1-5, if days>28, then it will count as 5th week.

Categories