This is my dataframe.
Start_hour End_date
23:58:00 00:26:00
23:56:00 00:01:00
23:18:00 23:36:00
How can I get in a new column the difference (in minutes) between these two columns?
>>> from datetime import datetime
>>>
>>> before = datetime.now()
>>> print('wait for more than 1 minute')
wait for more than 1 minute
>>> after = datetime.now()
>>> td = after - before
>>>
>>> td
datetime.timedelta(seconds=98, microseconds=389121)
>>> td.total_seconds()
98.389121
>>> td.total_seconds() / 60
1.6398186833333335
Then you can round it or use it as-is.
You can do something like this:
import pandas as pd
df = pd.DataFrame({
'Start_hour': ['23:58:00', '23:56:00', '23:18:00'],
'End_date': ['00:26:00', '00:01:00', '23:36:00']}
)
df['Start_hour'] = pd.to_datetime(df['Start_hour'])
df['End_date'] = pd.to_datetime(df['End_date'])
df['diff'] = df.apply(
lambda row: (row['End_date']-row['Start_hour']).seconds / 60,
axis=1
)
print(df)
Start_hour End_date diff
0 2021-03-29 23:58:00 2021-03-29 00:26:00 28.0
1 2021-03-29 23:56:00 2021-03-29 00:01:00 5.0
2 2021-03-29 23:18:00 2021-03-29 23:36:00 18.0
You can also rearrange your dates as string again if you like:
df['Start_hour'] = df['Start_hour'].apply(lambda x: x.strftime('%H:%M:%S'))
df['End_date'] = df['End_date'].apply(lambda x: x.strftime('%H:%M:%S'))
print(df)
Output:
Start_hour End_date diff
0 23:58:00 00:26:00 28.0
1 23:56:00 00:01:00 5.0
2 23:18:00 23:36:00 18.0
Short answer:
df['interval'] = df['End_date'] - df['Start_hour']
df['interval'][df['End_date'] < df['Start_hour']] += timedelta(hours=24)
Why so:
You probably trying to solve the problem that your Start_hout and End_date values sometimes belong to a different days, and that's why you can't just substutute one from the other.
It your time window never exceeds 24 hours interval, you could use some modular arithmetic to deal with 23:59:59 - 00:00:00 border:
if End_date < Start_hour, this always means End_date belongs to a next day
this implies, if End_date - Start_hour < 0 then we should add 24 hours to End_date to find the actual difference
The final formula is:
if rec['Start_hour'] < rec['End_date']:
offset = 0
else:
offset = timedelta(hours=24)
rec['delta'] = offset + rec['End_date'] - rec['Start_hour']
To do the same with pandas.DataFrame we need to change code accordingly. And
that's how we get the snippet from the beginning of the answer.
import pandas as pd
df = pd.DataFrame([
{'Start_hour': datetime(1, 1, 1, 23, 58, 0), 'End_date': datetime(1, 1, 1, 0, 26, 0)},
{'Start_hour': datetime(1, 1, 1, 23, 58, 0), 'End_date': datetime(1, 1, 1, 23, 59, 0)},
])
# ...
df['interval'] = df['End_date'] - df['Start_hour']
df['interval'][df['End_date'] < df['Start_hour']] += timedelta(hours=24)
> df
Start_hour End_date interval
0 0001-01-01 23:58:00 0001-01-01 00:26:00 0 days 00:28:00
1 0001-01-01 23:58:00 0001-01-01 23:59:00 0 days 00:01:00
Related
I did some Googling and figured out how to generate all Friday dates in a year.
# get all Fridays in a year
from datetime import date, timedelta
def allfridays(year):
d = date(year, 1, 1) # January 1st
d += timedelta(days = 8 - 2) # Friday
while d.year == year:
yield d
d += timedelta(days = 7)
for d in allfridays(2022):
print(d)
Result:
2022-01-07
2022-01-14
2022-01-21
etc.
2022-12-16
2022-12-23
2022-12-30
Now, I'm trying to figure out how to loop through a range of rolling dates, so like 2022-01-07 + 60 days, then 2022-01-14 + 60 days, then 2022-01-21 + 60 days.
step #1:
start = '2022-01-07'
end = '2022-03-08'
step #2:
start = '2022-01-14'
end = '2022-03-15'
Ideally, I want to pass in the start and end date loop, into another loop, which looks like this...
price_data = []
for ticker in tickers:
try:
prices = wb.DataReader(ticker, start = start.strftime('%m/%d/%Y'), end = end.strftime('%m/%d/%Y'), data_source='yahoo')[['Adj Close']]
price_data.append(prices.assign(ticker=ticker)[['ticker', 'Adj Close']])
except:
print(ticker)
df = pd.concat(price_data)
as you use pandas then you can try to do it this way:
import pandas as pd
year = 2022
dates = pd.date_range(start=f'{year}-01-01',end=f'{year}-12-31',freq='W-FRI')
df = pd.DataFrame({'my_dates':dates, 'sixty_ahead':dates + pd.Timedelta(days=60)})
print(df.head())
'''
my_dates sixty_ahead
0 2022-01-07 2022-03-08
1 2022-01-14 2022-03-15
2 2022-01-21 2022-03-22
3 2022-01-28 2022-03-29
4 2022-02-04 2022-04-05
First, we have to figure out how to get the first Friday of a given year. Next, we will calculate the start, end days.
import datetime
FRIDAY = 4 # Based on Monday=0
WEEK = datetime.timedelta(days=7)
def first_friday(year):
"""Return the first Friday of the year."""
the_date = datetime.date(year, 1, 1)
while the_date.weekday() != FRIDAY:
the_date = the_date + datetime.timedelta(days=1)
return the_date
def friday_ranges(year, days_count):
"""
Generate date ranges that starts on first Friday of `year` and
lasts for `days_count`.
"""
DURATION = datetime.timedelta(days=days_count)
start_date = first_friday(year)
end_date = start_date + DURATION
while end_date.year == year:
yield start_date, end_date
start_date += WEEK
end_date = start_date + DURATION
for start_date, end_date in friday_ranges(year=2022, days_count=60):
# Do what you want with start_date and end_date
print((start_date, end_date))
Sample output:
(datetime.date(2022, 1, 7), datetime.date(2022, 3, 8))
(datetime.date(2022, 1, 14), datetime.date(2022, 3, 15))
(datetime.date(2022, 1, 21), datetime.date(2022, 3, 22))
...
(datetime.date(2022, 10, 21), datetime.date(2022, 12, 20))
(datetime.date(2022, 10, 28), datetime.date(2022, 12, 27))
Notes
The algorithm for first Friday is simple: Start with Jan 1, then keep advancing the day until Friday
I made an assumption that the end date must fall into the specified year. If that is not the case, you can adjust the condition in the while loop
This could work maybe. You can add the condition, the end of the loop within the lambda function.
from datetime import date, timedelta
def allfridays(year):
d = date(year, 1, 1) # January 1st
d += timedelta(days = 8 - 2) # Friday
while d.year == year:
yield d
d += timedelta(days = 7)
list_dates = []
for d in allfridays(2022):
list_dates.append(d)
add_days = map(lambda x: x+timedelta(days = 60),list_dates)
print(list(add_days))
Oh my, I totally missed this before. The solution below works just fine.
import pandas as pd
# get all Fridays in a year
from datetime import date, timedelta
def allfridays(year):
d = date(year, 1, 1) # January 1st
d += timedelta(days = 8 - 2) # Friday
while d.year == year:
yield d
d += timedelta(days = 7)
lst=[]
for d in allfridays(2022):
lst.append(d)
df = pd.DataFrame(lst)
print(type(df))
df.columns = ['my_dates']
df['sixty_ahead'] = df['my_dates'] + timedelta(days=60)
df
Result:
my_dates sixty_ahead
0 2022-01-07 2022-03-08
1 2022-01-14 2022-03-15
2 2022-01-21 2022-03-22
etc.
49 2022-12-16 2023-02-14
50 2022-12-23 2023-02-21
51 2022-12-30 2023-02-28
I have a dataframe:
data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00']}
df = pd.DataFrame(data)
I would like to convert the time based on conditions: if the hour is less than 9, I want to set it to 9 and if the hour is more than 17, I need to set it to 17.
I tried this approach:
df['time'] = np.where(((df['time'].dt.hour < 9) & (df['time'].dt.hour != 0)), dt.time(9, 00))
I am getting an error: Can only use .dt. accesor with datetimelike values.
Can anyone please help me with this? Thanks.
Here's a way to do what your question asks:
df.time = pd.to_datetime(df.time)
df.loc[df.time.dt.hour < 9, 'time'] = (df.time.astype('int64') + (9 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
df.loc[df.time.dt.hour > 17, 'time'] = (df.time.astype('int64') + (17 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
Input:
time
0 2022-06-06 08:45:00
1 2022-06-06 09:30:00
2 2022-06-06 18:00:00
3 2022-06-06 15:00:00
Output:
time
0 2022-06-06 09:45:00
1 2022-06-06 09:30:00
2 2022-06-06 17:00:00
3 2022-06-06 15:00:00
UPDATE:
Here's alternative code to try to address OP's error as described in the comments:
import pandas as pd
import datetime
data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00']}
df = pd.DataFrame(data)
print('', 'df loaded as strings:', df, sep='\n')
df.time = pd.to_datetime(df.time, format='%H:%M:%S')
print('', 'df converted to datetime by pd.to_datetime():', df, sep='\n')
df.loc[df.time.dt.hour < 9, 'time'] = (df.time.astype('int64') + (9 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
df.loc[df.time.dt.hour > 17, 'time'] = (df.time.astype('int64') + (17 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
df.time = [time.time() for time in pd.to_datetime(df.time)]
print('', 'df with time column adjusted to have hour between 9 and 17, converted to type "time":', df, sep='\n')
Output:
df loaded as strings:
time
0 08:45:00
1 09:30:00
2 18:00:00
3 15:00:00
df converted to datetime by pd.to_datetime():
time
0 1900-01-01 08:45:00
1 1900-01-01 09:30:00
2 1900-01-01 18:00:00
3 1900-01-01 15:00:00
df with time column adjusted to have hour between 9 and 17, converted to type "time":
time
0 09:45:00
1 09:30:00
2 17:00:00
3 15:00:00
UPDATE #2:
To not just change the hour for out-of-window times, but to simply apply 9:00 and 17:00 as min and max times, respectively (see OP's comment on this), you can do this:
df.loc[df['time'].dt.hour < 9, 'time'] = pd.to_datetime(pd.DataFrame({
'year':df['time'].dt.year, 'month':df['time'].dt.month, 'day':df['time'].dt.day,
'hour':[9]*len(df.index)}))
df.loc[df['time'].dt.hour > 17, 'time'] = pd.to_datetime(pd.DataFrame({
'year':df['time'].dt.year, 'month':df['time'].dt.month, 'day':df['time'].dt.day,
'hour':[17]*len(df.index)}))
df['time'] = [time.time() for time in pd.to_datetime(df['time'])]
Since your 'time' column contains strings they can kept as strings and assign new string values where appropriate. To filter for your criteria it is convenient to: create datetime Series from the 'time' column, create boolean Series by comparing the datetime Series with your criteria, use the boolean Series to filter the rows which need to be changed.
Your data:
import numpy as np
import pandas as pd
data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00']}
df = pd.DataFrame(data)
print(df.to_string())
>>>
time
0 08:45:00
1 09:30:00
2 18:00:00
3 15:00:00
Convert to datetime, make boolean Series with your criteria
dts = pd.to_datetime(df['time'])
lt_nine = dts.dt.hour < 9
gt_seventeen = (dts.dt.hour >= 17)
print(lt_nine)
print(gt_seventeen)
>>>
0 True
1 False
2 False
3 False
Name: time, dtype: bool
0 False
1 False
2 True
3 False
Name: time, dtype: bool
Use the boolean series to assign a new value:
df.loc[lt_nine,'time'] = '09:00:00'
df.loc[gt_seventeen,'time'] = '17:00:00'
print(df.to_string())
>>>
time
0 09:00:00
1 09:30:00
2 17:00:00
3 15:00:00
Or just stick with strings altogether and create the boolean Series using regex patterns and .str.match.
data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00','07:22:00','22:02:06']}
dg = pd.DataFrame(data)
print(dg.to_string())
>>>
time
0 08:45:00
1 09:30:00
2 18:00:00
3 15:00:00
4 07:22:00
5 22:02:06
# regex patterns
pattern_lt_nine = '^00|01|02|03|04|05|06|07|08'
pattern_gt_seventeen = '^17|18|19|20|21|22|23'
Make boolean Series and assign new values
gt_seventeen = dg['time'].str.match(pattern_gt_seventeen)
lt_nine = dg['time'].str.match(pattern_lt_nine)
dg.loc[lt_nine,'time'] = '09:00:00'
dg.loc[gt_seventeen,'time'] = '17:00:00'
print(dg.to_string())
>>>
time
0 09:00:00
1 09:30:00
2 17:00:00
3 15:00:00
4 09:00:00
5 17:00:00
Time series / date functionality
Working with text data
I have the following dataset:
date event next_event duration_Minutes
2021-09-09 22:30:00 1 2021-09-09 23:00:00 30
2021-09-09 23:00:00 2 2021-09-09 23:10:00 10
2021-09-09 23:10:00 1 2021-09-09 23:50:00 40
2021-09-09 23:50:00 4 2021-09-10 00:50:00 60
2021-09-10 00:50:00 4 2021-09-12 00:50:00 2880
The main problem is that I would like to split the multi-day events into separate events in the following way. I would like to have the event duration from 2021-09-09 23:50:00 until 2021-09-10 00: 00: 00 and then the duration from 2021-09-10 00: 00: 00 to 2021-09-10 00:50:00, and so on. This would be useful because after, I would need to group the events by day and calculate the duration of the each event by day, so I would like to fix these situation in which there is the day change between events.
I would like to obtain something like this:
date event next_event duration_Minutes
2021-09-09 22:30:00 1 2021-09-09 23:00:00 30
2021-09-09 23:00:00 2 2021-09-09 23:10:00 10
2021-09-09 23:10:00 1 2021-09-09 23:50:00 40
2021-09-09 23:50:00 4 2021-09-10 00:00:00 10
2021-09-09 00:00:00 4 2021-09-10 00:50:00 50
2021-09-10 00:50:00 4 2021-09-11 00:00:00 1390
2021-09-11 00:00:00 4 2021-09-12 00:00:00 1440
2021-09-12 00:00:00 4 2021-09-12 00:50:00 50
It should be able to handle situations in which we don't have an event for an entire day or more like in the example.
My current solution for now is:
first_record_hour_ts = df.index.floor('H')[0]
last_record_hour_ts = df.index.floor('H')[-1]
# Create a series from the first to the last date containing Nan
df_to_join = pd.Series(np.nan, index=pd.date_range(first_record_hour_ts, last_record_hour_ts, freq='H'))
df_to_join = pd.DataFrame(df_to_join)
# Concatenate with current status dataframe
df = pd.concat([df, df_to_join[~df_to_join.index.isin(df.index)]]).sort_index()
# Forward fill the nana
df.fillna(method='ffill', inplace=True)
df['next_event'] = df.index.shift(-1)
# Calculate the delta between the 2 status
df['duration'] = df['next_event'] - df.index
# Convert into minutes
df['duration_Minutes'] = df['duration_Minutes'].apply(lambda x: x.total_seconds() // 60)
This doesn't solve exactly the problem, but I think it may solve my goal which being able to group by event and by day at the end.
Ok, the code below looks a bit long -- and there's certainly a better/more efficient/shorter way of doing this. But I think it's pretty reasonably simple to follow along.
split_datetime_span_by_day below takes two dates: start_date and end_date. In your case, it would be date and next_event from your source data.
The function then checks whether that time period (start -> end) spans over midnight. If it doesn't, it returns the start date, the end date, and the time period in seconds. If it does span over midnight, it creates a new segment (start -> midnight), and then calls itself again (i.e. recurses), and the process continues until the time period does not span over midnight.
Just a note: the returned segment list is made up of tuples of (start, end, nmb_seconds). I'm returning the number of seconds, not the number of minutes as in your question, because I didn't know how you wanted to round the seconds (up, down, etc.). That's left as an exercise for the reader :-)
from datetime import datetime, timedelta
def split_datetime_span_by_day(start_date, end_date, split_segments=None):
assert start_date < end_date # sanity check
# when is the next midnight after start_date?
# adapted from https://ispycode.com/Blog/python/2016-07/Get-Midnight-Today
start_next_midnight = datetime.combine(start_date, datetime.min.time()) + timedelta(days=1)
if split_segments is None:
split_segments = []
if end_date < start_next_midnight:
# end date is before next midnight, no split necessary
return split_segments + [(
start_date,
end_date,
(end_date - start_date).total_seconds()
)]
# otherwise, split at next midnight...
split_segments += [(
start_date,
start_next_midnight,
(start_next_midnight - start_date).total_seconds()
)]
if (end_date - start_next_midnight).total_seconds() > 0:
# ...and recurse to get next segment
return split_datetime_span_by_day(
start_date=start_next_midnight,
end_date=end_date,
split_segments=split_segments
)
else:
# case where start_next_midnight == end_date i.e. end_date is midnight
# don't split & create a 0 second segment
return split_segments
# test case:
start_date = datetime.strptime('2021-09-12 00:00:00', '%Y-%m-%d %H:%M:%S')
end_date = datetime.strptime('2021-09-14 01:00:00', '%Y-%m-%d %H:%M:%S')
print(split_datetime_span_by_day(start_date=start_date, end_date=end_date))
# returned values:
# [
# (datetime.datetime(2021, 9, 12, 0, 0), datetime.datetime(2021, 9, 13, 0, 0), 86400.0),
# (datetime.datetime(2021, 9, 13, 0, 0), datetime.datetime(2021, 9, 14, 0, 0), 86400.0),
# (datetime.datetime(2021, 9, 14, 0, 0), datetime.datetime(2021, 9, 14, 1, 0), 3600.0)
# ]
I have a full year of data every minute:
dayofyear hourofday minuteofhour
1 0 0
.
.
365 23 57
365 23 58
365 23 59
I converted the dayofyear to a date:
df['date']=pd.to_datetime(df['dayofyear'], unit='D', origin=pd.Timestamp('2009-12-31'))
dayofyear hourofday minuteofhour date
1 0 0 2010-01-01
1 0 1 2010-01-01
1 0 2 2010-01-01
1 0 3 2010-01-01
1 0 4 2010-01-01
How can I combine the hourofday and minuteofhour with date in order to create a proper timestamp?
Like this maybe: '2010-12-30 19:00:00'
So that I can perform other time-filtering/subsetting etc in pandas later.
Convert the hourofday and minuteofhour columns into a TimeDelta, then add it to the date column:
df['timestamp'] = df['date'] + pd.to_timedelta(df['hourofday'].astype('str') + ':' + df['minuteofhour'].astype('str') + ':00')
import pandas as pd
from datetime import datetime, timedelta
df = pd.DataFrame({
'dayofyear': (365, ),
'hourofday': (23, ),
'minuteofhour': (57, ),
})
def parse_dt(x):
dt = datetime(2010, 1, 1) + timedelta(int(x['dayofyear']) - 1)
dt = dt.replace(hour=x['hourofday'], minute=x['minuteofhour'])
x['dt'] = dt
return x
df = df.apply(parse_dt, axis=1)
print(df)
# dayofyear hourofday minuteofhour dt
#0 365 23 57 2010-12-31 23:57:00
Hope this helps
Basically, I want my script to pause between 4 and 5 AM. The only way to do this I've come up with so far is this:
seconds_into_day = time.time() % (60*60*24)
if 60*60*4 < seconds_into_day < 60*60*5:
sleep(time_left_till_5am)
Any "proper" way to do this? Aka some built-in function/lib for calculating time; rather than just using seconds all the time?
You want datetime
The datetime module supplies classes for manipulating dates and times in both simple and complex ways
If you use date.hour from datetime.now() you'll get the current hour:
datetimenow = datetime.now();
if datetimenow.hour in range(4, 5)
sleep(time_left_till_5am)
You can calculate time_left_till_5am by taking 60 - datetimenow.minute multiplying by 60 and adding to 60 - datetimenow.second.
Python has a built-in datetime library: http://docs.python.org/library/datetime.html
This should probably get you what you're after:
import datetime as dt
from time import sleep
now = dt.datetime.now()
if now.hour >= 4 andnow.hour < 5:
sleep((60 - now.minute)*60 + (60 - now.second))
OK, the above works, but here's the purer, less error-prone solution (and what I was originally thinking of but suddenly forgot how to do):
import datetime as dt
from time import sleep
now = dt.datetime.now()
pause = dt.datetime(now.year, now.month, now.day, 4)
start = dt.datetime(now.year, now.month, now.day, 5)
if now >= pause and now < start:
sleep((start - now).seconds)
That's where my original "timedelta" comment came from -- what you get from subtracting two datetime objects is a timedelta object (which in this case we pull the 'seconds' attribute from).
The following code covers the more general case where a script needs to pause during any fixed window of less than 24 hours duration. Example: must sleep between 11:00 PM and 01:00 AM.
import datetime as dt
def sleep_duration(sleep_from, sleep_to, now=None):
# sleep_* are datetime.time objects
# now is a datetime.datetime object
if now is None:
now = dt.datetime.now()
duration = 0
lo = dt.datetime.combine(now, sleep_from)
hi = dt.datetime.combine(now, sleep_to)
if lo <= now < hi:
duration = (hi - now).seconds
elif hi < lo:
if now >= lo:
duration = (hi + dt.timedelta(hours=24) - now).seconds
elif now < hi:
duration = (hi - now).seconds
return duration
tests = [
(4, 5, 3, 30),
(4, 5, 4, 0),
(4, 5, 4, 30),
(4, 5, 5, 0),
(4, 5, 5, 30),
(23, 1, 0, 0),
(23, 1, 0, 30),
(23, 1, 0, 59),
(23, 1, 1, 0),
(23, 1, 1, 30),
(23, 1, 22, 30),
(23, 1, 22, 59),
(23, 1, 23, 0),
(23, 1, 23, 1),
(23, 1, 23, 59),
]
for hfrom, hto, hnow, mnow in tests:
sfrom = dt.time(hfrom)
sto = dt.time(hto)
dnow = dt.datetime(2010, 7, 5, hnow, mnow)
print sfrom, sto, dnow, sleep_duration(sfrom, sto, dnow)
and here's the output:
04:00:00 05:00:00 2010-07-05 03:30:00 0
04:00:00 05:00:00 2010-07-05 04:00:00 3600
04:00:00 05:00:00 2010-07-05 04:30:00 1800
04:00:00 05:00:00 2010-07-05 05:00:00 0
04:00:00 05:00:00 2010-07-05 05:30:00 0
23:00:00 01:00:00 2010-07-05 00:00:00 3600
23:00:00 01:00:00 2010-07-05 00:30:00 1800
23:00:00 01:00:00 2010-07-05 00:59:00 60
23:00:00 01:00:00 2010-07-05 01:00:00 0
23:00:00 01:00:00 2010-07-05 01:30:00 0
23:00:00 01:00:00 2010-07-05 22:30:00 0
23:00:00 01:00:00 2010-07-05 22:59:00 0
23:00:00 01:00:00 2010-07-05 23:00:00 7200
23:00:00 01:00:00 2010-07-05 23:01:00 7140
23:00:00 01:00:00 2010-07-05 23:59:00 3660
When dealing with dates and times in Python I still prefer mxDateTime over Python's datetime module as although the built-in one has improved greatly over the years it's still rather awkward and lacking in comparison. So if interested go here: mxDateTime It's free to download and use. Makes life much easier when dealing with datetime math.
import mx.DateTime as dt
from time import sleep
now = dt.now()
if 4 <= now.hour < 5:
stop = dt.RelativeDateTime(hour=5, minute=0, second=0)
secs_remaining = ((now + stop) - now).seconds
sleep(secs_remaining)