Sorting scheduled events python - python

So I have list of events that are sort of like alarms. They're defined by their start and end time (in hours and minutes), a range of days (ie 1-3 which is sunday through wed.), and a range of months (ie 1-3, january through march). The format of that data is largely unchangeable. I need to, not necessarily sort the list, but I need to find the next upcoming event based on the current time. There's just so many different ways to do this and so many different corner cases. This is my pseudo code:
now = time()
diff = []
# Start difference between now and start times
for s in schedule #assuming appending to diff
diff.minutes = s.minutes - time.minutes #
diff.hours = s.hours - time.hours
diff.days = s.days - time.days
diff.months = s.months - time.months
for d in diff
if d < 0
d = period + d
# period is the maximum period of the attribute. ie minutes is 60, hours is 24
# repeat for event end times
So now I have a list of tuples of differences in hours, minutes, days, and weeks. This tuple already takes into account if it's passed the start time, but before the end time. So let's say it's in August and the start month of the event is July and the end month is September, so diff.month == 0.
Now this specific corner case is giving me trouble:
Let's say a schedule runs from 0 to 23:59 thursdays in august. And it's Friday the 27th. Running my algorithm, the difference in months would be 0 when in reality it won't run again until next august, so it should be 12. And I'm stuck. The month is the only problem I think because the month is the only attribute that directly depends on what the date of the specific month is (versus just the day). Is my algorithm OK and I can just deal with this special case? Or is there something better out there for this?
This is the data I'm working with
map['start_time']=''
map['end_time']=''
map['start_moy']=''
map['end_moy']=''
map['start_dow']=''
map['end_dow']=''
The schedule getAllSchedules method just returns a list to all of the schedules. I can change the schedule class but I'm not sure what difference I can make there. I can't add/change the format of the schedules I'm given

Convert the items from the schedule into datetime objects. Then you can simply sort them
from datetime import datetime
events = sorted(datetime(s.year, s.month, s.day, s.hour, s.minute) for s in schedule)

Since your resolution is in minutes, and assuming that you don't have many events, then I'd simply scan all the events every minute.
Filter your events so that you have a new list where the event range match the current month and day.
Then for each of those events declare that they are active or inactive according to whether the current time matches the event's range.

The primary issue seems to be with the fact that you're using the day of the week, instead of explicit days of the month.
While your cited edge case is one example, does this issue not crop up with all events scheduled in any month outside of the current one?
I think the most robust approach here would be to do the work to get your scheduled events into datetime format, then use #gnibbler's suggestion of sorting the datetime objects.
Once you have determined that the last event for the current month has already passed, calculate the distance to the next month the event occurs in (be it + 1 year, or just + 1 month), then construct a datetime object with that information:
first_of_month = datetime.date(calculated_year, calculated_month, 1)
By using the first day of the month, you can then use:
day_of_week = first_of_month.strftime('%w')
To give you what day of the week the first of that month falls on, which you can then use to calculate how many days to add to get to the first, second, third, etc. instance of a given day of the week, for that month. Once you have that day, you can construct a valid datetime object and do whatever comparisons you wish with now().

I couldn't figure out how to do it using only datetimes. But I found a module and used this. It's perfect
http://labix.org/python-dateutil

Related

How to test if a time variable is in a series or not in python

Background: Sometimes we need to take a date which is a month after than the original timestamp, since not all days are trading days, some adjustments must be made.
I extracted the index of stock close price, getting a time series with lots of timestamps of trading days.
trading_day_glossory = stock_close_full.index
Now, given a datetime-format variable date, with the following function, the program should return me the day variable indicating a trading day. But indeed it did not. The if condition is never evoked, eventually it added up to 9999:99:99 and reported error.
def return_trading_day(day,trading_day_glossory):
while True:
day = day + relativedelta(days=1)
if day in trading_day_glossory:
break
I reckon that comparing a timestamp with a datetime is problematic, so I rewrote the first part of my function in this way:
trading_day_glossory = stock_close_full.index
trading_day_glossory = trading_day_glossory.to_pydatetime()
# Error message: OverflowError: date value out of range
However this change makes no difference. I further tested some characteristics of the variables involved:
testing1 = trading_day_glossory[20] # returns a datetime variable say 2000-05-08 00:00:00
testing2 = day # returns a datetime variable say 2000-05-07 00:00:00
What may be the problem and what should I do?
Thanks.
Not quite sure what is going on because the errors cannot be reproduced from your codes and variables.
However, you can try searchsorted to find the first timestamp not earlier than a given date in a sorted time series by binary search:
trading_day_glossory.searchsorted(day)
It's way better than comparing values in a while loop.

Python/Pandas: Find the Custom Business Quarter End of a datetime which takes holidays into account

I want to find the Business Quarter End of a datetime in python which will take care of holidays as well. These holidays may be passed as list for simplicity. I know BQuarterEnd() from pandas.tseries.offsets. As far as I know, it doesn't take holidays into account.
Example: If 2020-11-20 is passed and 2020-12-31 is a business day but a holiday as well; it should return 2020-12-30.
Thanks.
In Pandas, there are a set of Custom business days functions where you can define your own list of holidays and then the functions calculate the correct date offsets for you, taking into account the custom holiday list.
For example, we have CustomBusinessMonthEnd (better documentation here). Unfortunately, there is no corresponding CustomBusinessQuarterEnd (Custom Business QuarterEnd) function for quarter end.
However, we can still get some workaround solution, like below:
Define your custom holiday list, e.g. :
holiday_list = ['2020-12-31']
Make use of a combination of QuarterEnd + CustomBusinessMonthEnd to get the required date for Custom Business QuarterEnd skipping the holidays:
import pandas as pd
base_date = pd.to_datetime('2020-11-20') # Base date
custom_business_quarter_end = (base_date
+ pd.offsets.QuarterEnd(n=0)
- pd.offsets.MonthBegin()
+ pd.offsets.CustomBusinessMonthEnd(holidays=holiday_list))
Firstly, we add your base date to the QuarterEnd to get the quarter end date (without considering holidays). Then, to get the Custom Business QuarterEnd skipping the holidays, we use the CustomBusinessMonthEnd passing also the holiday list as parameter for it to adjust for the holidays.
For QuarterEnd, we pass the parameter n=0 to handle the edge case where the base date is already on the Quarter End date. We avoid QuarterEnd to rollover this quarter end date to the next quarter end date. You can refer to the official doc here to know more about how Pandas handles dates falling onto anchor dates (see the subsection starting with "For the case when n=0, ...")
We also make use of MonthBegin first before calling CustomBusinessMonthEnd. This is to avoid rolling over of a day at month-end anchor to the next month. We need this because the n=0 parameter does not work similarly for CustomBusinessMonthEnd like how it works for QuarterEnd to avoid rolling over. Hence, this extra minus MonthBegin is required. With the use of MonthBegin, we get the month begin date of the quarter-end, i.e. 2020-12-01 first, and then get the custom business month-end date. In this way, we can avoid the result of QuarterEnd e.g. 2020-12-31 being rolled over to the next month end e.g. 2021-01-31 when directly calling CustomBusinessMonthEnd.
Result:
print(custom_business_quarter_end)
2020-12-30 00:00:00
You probably need a custom function. Maybe something like:
def custom_quarter_end(date, holidays=[]):
holidays = [pd.Timestamp(h) for h in holidays]
end = pd.Timestamp(date)+pd.tseries.offsets.BQuarterEnd()
while end in holidays:
end = end - pd.tseries.offsets.BDay()
return end
>>> custom_quarter_end("2020-11-20", ["2020-12-30", "2020-12-31"])
Timestamp('2020-12-29 00:00:00')

Python - Check every other week using if statement

I have the following if statement:
if date.weekday() != 0:
How can I change this if statement to check every other Monday (one day every other week)?
Thanks in advance for your help!
You post doesn't mention what language you're using. That would be useful to know!
At any rate, a function like weekday probably returns a number from 0-6, which isn't going to give you enough info.
Most languages have a function that turns a date into a millisecond value, typically milliseconds since 1970-01-01 00:00 UTC.
If you take such a value and divide it by 86400000 (number of milliseconds in a day), you get a day number. Day 0 (January 1, 1970) happens to be a Thursday.
In JavaScript, this day number can be obtained from a Date object like this:
let dayNum = Math.floor(someDate.getTime() / 86400000)
From this, dayNum % 7 gives you a number from 0-6 for Thursday through Wednesday. You can add an offset to change what 0 means. (dayNum + 4) % 7 produces 0-6 for Sunday through Saturday.
You want a 14-day cycle for every other week, so dayNum % 14 is what you need, and you then just have to decide which value from 0 to 13 represents your target dates.
Take any sample date that is one of your desired dates, compute the remainder value you get for that date, and then test for that value to be able to match any qualifying date.
Two caveats:
For dates before 1970, the % operator might produces negative numbers. Python is one of the few languages that doesn't do this.
The timezone offset from UTC might be important to your task. And if that's important, Daylight Saving Time changes and/or historical changes in UTC offset can complicate matters.
Python update:
Let's say this coming Monday (July 5, 2021) is one of the every-other-week days that you want. This:
date(2021, 7, 5).toordinal() % 14
...produces a value of 8. Every other Monday before or after will also be 8. The toordinal() method goes straight to a day number, without having to fuss with milliseconds, 86400000, or having to worry about timezone offsets.

Date range calculating only 8737 hours in a year?

I am using python pandas date range package to create a list of hourly timestamps for a calendar year. I code to do this, it looks like :
year = 2018
times = list(pd.date_range('{}-01-01'.format(year), '{}-12-31'.format(year), freq='H'))
I expect the length of times to be 8760 (the number of hours in a year). But when I view the length of the times vector, it is only 8737. Why????
When you specify a list by range, the first boundary is included and the second boundary is not. So here you are including {}-01-01 and not including {}-12-31. But you are including the midnight value.
So, you need to include the last day of the year, but omit the "celebratory" New Year Hour:
>>> year = 2018
>>> times = list(pd.date_range('{}-01-01'.format(year), '{}-01-01'.format(year+1), freq='H'))
>>> times = times[:-1]
>>> len(times)
8760
You need to include the New Year's Day, {}-01-01, so that you get New Year's Eve, {}-12-31. But then you get the midnight hour since that's what starts the day. Hence the need to eliminate the last entry in the list: times = times[:-1], so that you're ending at 11:00pm on 12-31.

Why does W-DAY behave confusingly in Pandas?

The behaviour of freq = "W-SUN" etc. seems confusing and inconsistent. For example, d.date_range(pd.Timestamp('2019-07-09'), pd.Timestamp('2019-11-11'), freq='W-SUN') produces a sequence of Sundays, but pd.Index([pd.Timestamp('2019-07-09')]).to_period('W-SUN').to_timestamp() produces a Monday. What is going on here?
This has come up because I have an index of dates that I want to round to some frequency, while also generating a date_range with the same frequency and phase. It seems like index.to_period(freq).to_timestamp() and pd.date_range(start, end, freq=freq) should work for this, but it doesn't when freq is "W-DAY".
This is a little counter-intuitive, but here's the logic. When you use .to_period(), Pandas calculates the period of time within which the date you supplied falls. The way Pandas calculates this period is to find the next day that matches your specified frequency and extending the period backwards to include your chosen day. In other words, the period is end-inclusive, not start-inclusive.
To find the Sunday-anchored week for a given Tuesday, it finds the next Sunday after that Tuesday and adds the previous six days. When you convert to timestamp, however, it selects the first day of that period, which in this case will be a Monday. If you asked for the Sunday-anchored period of a Sunday, it would give you that day plus the previous six days, not the following six days.
If you want your period to start rather than end on a particular day of the week, just set the frequency string to the day prior. In your case, pd.Index([pd.Timestamp('2019-07-09')]).to_period('W-SAT').to_timestamp() should do the trick.
Some hopefully helpful demonstrations:
pd.Index([pd.Timestamp('2019-07-09')]).to_period('W-SUN') gives:
PeriodIndex(['2019-07-08/2019-07-14'], dtype='period[W-SUN]', freq='W-SUN
Note that this period ends on a Sunday. When you run pd.Index([pd.Timestamp('2019-07-09')]).to_period('W-SUN').to_timestamp() it gives you the first day of this period:
DatetimeIndex(['2019-07-08'], dtype='datetime64[ns]', freq=None)
You can observe how the days are chosen by running:
for f in ['W-SUN', 'W-MON', 'W-TUE', 'W-WED', 'W-THU', 'W-FRI', 'W-SAT']:
print(f, pd.Index([pd.Timestamp('2019-07-09')]).to_period(f))
Which gives:
PeriodIndex(['2019-07-08/2019-07-14'], dtype='period[W-SUN]', freq='W-SUN')
PeriodIndex(['2019-07-09/2019-07-15'], dtype='period[W-MON]', freq='W-MON')
PeriodIndex(['2019-07-03/2019-07-09'], dtype='period[W-TUE]', freq='W-TUE')
PeriodIndex(['2019-07-04/2019-07-10'], dtype='period[W-WED]', freq='W-WED')
PeriodIndex(['2019-07-05/2019-07-11'], dtype='period[W-THU]', freq='W-THU')
PeriodIndex(['2019-07-06/2019-07-12'], dtype='period[W-FRI]', freq='W-FRI')
PeriodIndex(['2019-07-07/2019-07-13'], dtype='period[W-SAT]', freq='W-SAT')
Note that the start of the chosen period jumps in the middle, but the logic remains consistent.

Categories