I am using python pandas date range package to create a list of hourly timestamps for a calendar year. I code to do this, it looks like :
year = 2018
times = list(pd.date_range('{}-01-01'.format(year), '{}-12-31'.format(year), freq='H'))
I expect the length of times to be 8760 (the number of hours in a year). But when I view the length of the times vector, it is only 8737. Why????
When you specify a list by range, the first boundary is included and the second boundary is not. So here you are including {}-01-01 and not including {}-12-31. But you are including the midnight value.
So, you need to include the last day of the year, but omit the "celebratory" New Year Hour:
>>> year = 2018
>>> times = list(pd.date_range('{}-01-01'.format(year), '{}-01-01'.format(year+1), freq='H'))
>>> times = times[:-1]
>>> len(times)
8760
You need to include the New Year's Day, {}-01-01, so that you get New Year's Eve, {}-12-31. But then you get the midnight hour since that's what starts the day. Hence the need to eliminate the last entry in the list: times = times[:-1], so that you're ending at 11:00pm on 12-31.
Related
I did find this solution click here to see article but still not exactly what I was looking for. The solution calculates all the days between 2 dates, including weekends. So, is there a solution that excludes the weekends in the calculation?
So, what I did is take that solution and expand it this way:
crtD = datetime.datetime.strptime(pd.loc[x,'createDate'], '%m/%d/%Y') # start date
tdyD = datetime.datetime.today() # end date
dayx = tdyD - crtD # number of days between start and end date. Includes weekends
wkds = dayx.days + 1 # eliminates time stamp and just leaves the number of days and adds 1 day
wkns = round(wkds/7,0) # divides the number of days by seven and rounds the result to the nearest integer
networkdays = int(wkds - wkns) - 1
print(networkdays)
I embedded these lines of code in a for loop. Hope this helps. If you have a solution to include Holidays, please post it here.
I have a time series data for solar radiation with 15 min time step values (from 1st June till 30th June) for a month. My aim is to simulate one single day from all the 30 days by taking an average of each time instants. For example, initially i have 30 different values at 11am , 11.15am, 11.45am and so on. I want to average those 30 values so that i have a single value at 11am, 11.15am, 11.45am respectively.
You can extract minutes to separate column an group by it:
data['Minutes15'] = data['Date'].apply(lambda x: int(x.minute/15) *15))
data.groupby('Minutes15').mean()
Where Date is your date column in datetime format
The behaviour of freq = "W-SUN" etc. seems confusing and inconsistent. For example, d.date_range(pd.Timestamp('2019-07-09'), pd.Timestamp('2019-11-11'), freq='W-SUN') produces a sequence of Sundays, but pd.Index([pd.Timestamp('2019-07-09')]).to_period('W-SUN').to_timestamp() produces a Monday. What is going on here?
This has come up because I have an index of dates that I want to round to some frequency, while also generating a date_range with the same frequency and phase. It seems like index.to_period(freq).to_timestamp() and pd.date_range(start, end, freq=freq) should work for this, but it doesn't when freq is "W-DAY".
This is a little counter-intuitive, but here's the logic. When you use .to_period(), Pandas calculates the period of time within which the date you supplied falls. The way Pandas calculates this period is to find the next day that matches your specified frequency and extending the period backwards to include your chosen day. In other words, the period is end-inclusive, not start-inclusive.
To find the Sunday-anchored week for a given Tuesday, it finds the next Sunday after that Tuesday and adds the previous six days. When you convert to timestamp, however, it selects the first day of that period, which in this case will be a Monday. If you asked for the Sunday-anchored period of a Sunday, it would give you that day plus the previous six days, not the following six days.
If you want your period to start rather than end on a particular day of the week, just set the frequency string to the day prior. In your case, pd.Index([pd.Timestamp('2019-07-09')]).to_period('W-SAT').to_timestamp() should do the trick.
Some hopefully helpful demonstrations:
pd.Index([pd.Timestamp('2019-07-09')]).to_period('W-SUN') gives:
PeriodIndex(['2019-07-08/2019-07-14'], dtype='period[W-SUN]', freq='W-SUN
Note that this period ends on a Sunday. When you run pd.Index([pd.Timestamp('2019-07-09')]).to_period('W-SUN').to_timestamp() it gives you the first day of this period:
DatetimeIndex(['2019-07-08'], dtype='datetime64[ns]', freq=None)
You can observe how the days are chosen by running:
for f in ['W-SUN', 'W-MON', 'W-TUE', 'W-WED', 'W-THU', 'W-FRI', 'W-SAT']:
print(f, pd.Index([pd.Timestamp('2019-07-09')]).to_period(f))
Which gives:
PeriodIndex(['2019-07-08/2019-07-14'], dtype='period[W-SUN]', freq='W-SUN')
PeriodIndex(['2019-07-09/2019-07-15'], dtype='period[W-MON]', freq='W-MON')
PeriodIndex(['2019-07-03/2019-07-09'], dtype='period[W-TUE]', freq='W-TUE')
PeriodIndex(['2019-07-04/2019-07-10'], dtype='period[W-WED]', freq='W-WED')
PeriodIndex(['2019-07-05/2019-07-11'], dtype='period[W-THU]', freq='W-THU')
PeriodIndex(['2019-07-06/2019-07-12'], dtype='period[W-FRI]', freq='W-FRI')
PeriodIndex(['2019-07-07/2019-07-13'], dtype='period[W-SAT]', freq='W-SAT')
Note that the start of the chosen period jumps in the middle, but the logic remains consistent.
I have 3 giant lists (30,000,000+ elements long) named year, month, and day that correspond to the year, month, and day of the month, respectively. The three lists together represent a long list of dates. I need to get the day of the year for each date.
My fist thought was to convert the 3 lists onto a list of datetime objects, and then use the method outlined in this thread:
day_of_year = datetime.now().timetuple().tm_yday
However, doing this appears to takes far too long.
def f():
DOY = [datetime.datetime(y,m,d).timetuple().tm_yday for y,m,d in zip(year,month,day)]
%timeit f()
1 loop, best of 3: 54.2 s per loop
I also tried to use the method outlined in the second answer here to get quickly an array of datetime64's, but I'm unsure how to then get day of the year with this.
So I have list of events that are sort of like alarms. They're defined by their start and end time (in hours and minutes), a range of days (ie 1-3 which is sunday through wed.), and a range of months (ie 1-3, january through march). The format of that data is largely unchangeable. I need to, not necessarily sort the list, but I need to find the next upcoming event based on the current time. There's just so many different ways to do this and so many different corner cases. This is my pseudo code:
now = time()
diff = []
# Start difference between now and start times
for s in schedule #assuming appending to diff
diff.minutes = s.minutes - time.minutes #
diff.hours = s.hours - time.hours
diff.days = s.days - time.days
diff.months = s.months - time.months
for d in diff
if d < 0
d = period + d
# period is the maximum period of the attribute. ie minutes is 60, hours is 24
# repeat for event end times
So now I have a list of tuples of differences in hours, minutes, days, and weeks. This tuple already takes into account if it's passed the start time, but before the end time. So let's say it's in August and the start month of the event is July and the end month is September, so diff.month == 0.
Now this specific corner case is giving me trouble:
Let's say a schedule runs from 0 to 23:59 thursdays in august. And it's Friday the 27th. Running my algorithm, the difference in months would be 0 when in reality it won't run again until next august, so it should be 12. And I'm stuck. The month is the only problem I think because the month is the only attribute that directly depends on what the date of the specific month is (versus just the day). Is my algorithm OK and I can just deal with this special case? Or is there something better out there for this?
This is the data I'm working with
map['start_time']=''
map['end_time']=''
map['start_moy']=''
map['end_moy']=''
map['start_dow']=''
map['end_dow']=''
The schedule getAllSchedules method just returns a list to all of the schedules. I can change the schedule class but I'm not sure what difference I can make there. I can't add/change the format of the schedules I'm given
Convert the items from the schedule into datetime objects. Then you can simply sort them
from datetime import datetime
events = sorted(datetime(s.year, s.month, s.day, s.hour, s.minute) for s in schedule)
Since your resolution is in minutes, and assuming that you don't have many events, then I'd simply scan all the events every minute.
Filter your events so that you have a new list where the event range match the current month and day.
Then for each of those events declare that they are active or inactive according to whether the current time matches the event's range.
The primary issue seems to be with the fact that you're using the day of the week, instead of explicit days of the month.
While your cited edge case is one example, does this issue not crop up with all events scheduled in any month outside of the current one?
I think the most robust approach here would be to do the work to get your scheduled events into datetime format, then use #gnibbler's suggestion of sorting the datetime objects.
Once you have determined that the last event for the current month has already passed, calculate the distance to the next month the event occurs in (be it + 1 year, or just + 1 month), then construct a datetime object with that information:
first_of_month = datetime.date(calculated_year, calculated_month, 1)
By using the first day of the month, you can then use:
day_of_week = first_of_month.strftime('%w')
To give you what day of the week the first of that month falls on, which you can then use to calculate how many days to add to get to the first, second, third, etc. instance of a given day of the week, for that month. Once you have that day, you can construct a valid datetime object and do whatever comparisons you wish with now().
I couldn't figure out how to do it using only datetimes. But I found a module and used this. It's perfect
http://labix.org/python-dateutil