My below working code calculates date/month ranges, but I am using the Pandas library, which I want to get rid of.
import pandas as pd
dates=pd.date_range("2019-12","2020-02",freq='MS').strftime("%Y%m%d").tolist()
#print dates : ['20191101','20191201','20200101','20200201']
df=(pd.to_datetime(dates,format="%Y%m%d") + MonthEnd(1)).strftime("%Y%m%d").tolist()
#print df : ['20191130','20191231','20200131','20200229']
How can I rewrite this code without using Pandas?
I don't want to use Pandas library as I am triggering my job through Oozie and we don't have Pandas installed on all our nodes.
Pandas offers some nice functionalities when using datetimes which the standard library datetime module does not have (like the frequency or the MonthEnd). You have to reproduce these yourself.
import datetime as DT
def next_first_of_the_month(dt):
"""return a new datetime where the month has been increased by 1 and
the day is always the first
"""
new_month = dt.month + 1
if new_month == 13:
new_year = dt.year + 1
new_month = 1
else:
new_year = dt.year
return DT.datetime(new_year, new_month, day=1)
start, stop = [DT.datetime.strptime(dd, "%Y-%m") for dd in ("2019-11", "2020-02")]
dates = [start]
cd = next_first_of_the_month(start)
while cd <= stop:
dates.append(cd)
cd = next_first_of_the_month(cd)
str_dates = [d.strftime("%Y%m%d") for d in dates]
print(str_dates)
# prints: ['20191101', '20191201', '20200101', '20200201']
end_dates = [next_first_of_the_month(d) - DT.timedelta(days=1) for d in dates]
str_end_dates = [d.strftime("%Y%m%d") for d in end_dates]
print(str_end_dates)
# prints ['20191130', '20191231', '20200131', '20200229']
I used here a function to get a datetime corresponding to the first day of the next month of the input datetime. Sadly, timedelta does not work with months, and adding 30 days of course is not feasible (not all months have 30 days).
Then a while loop to get a sequence of fist days of the month until the stop date.
And to the get the end of the month, again get the next first day of the month fo each datetime in your list and subtract a day.
Related
start = "Nov20"
end = "Jan21"
# Expected output:
["Nov20", "Dec20", "Jan21"]
What I've tried so far is the following but am looking for more elegant way.
from calendar import month_abbr
from time import strptime
def get_range(a, b):
start = strptime(a[:3], '%b').tm_mon
end = strptime(b[:3], '%b').tm_mon
dates = []
for m in month_abbr[start:]:
dates.append(m+a[-2:])
for mm in month_abbr[1:end + 1]:
dates.append(mm+b[-2:])
print(dates)
get_range('Nov20', 'Jan21')
Note: i don't want to use pandas as that's not logical to import such library for generating dates.
The date range may span different years so one way is to loop from the start date to end date and increment the month by 1 until end date is reached.
Try this:
from datetime import datetime
def get_range(a, b):
start = datetime.strptime(a, '%b%y')
end = datetime.strptime(b, '%b%y')
dates = []
while start <= end:
dates.append(start.strftime('%b%y'))
if start.month == 12:
start = start.replace(month=1, year=start.year+1)
else:
start = start.replace(month=start.month+1)
return dates
dates = get_range("Nov20", "Jan21")
print(dates)
Output:
['Nov20', 'Dec20', 'Jan21']
You can use timedelta to step one month (31 days) forward, but make sure you stay on the 1st of the month, otherwise the days might add up and eventually skip a month.
from datetime import datetime
from datetime import timedelta
def get_range(a, b):
start = datetime.strptime(a, '%b%y')
end = datetime.strptime(b, '%b%y')
dates = []
while start <= end:
dates.append(start.strftime('%b%y'))
start = (start + timedelta(days=31)).replace(day=1) # go to 1st of next month
return dates
dates = get_range("Jan20", "Jan21")
print(dates)
I've written this function to get the last Thursday of the month
def last_thurs_date(date):
month=date.dt.month
year=date.dt.year
cal = calendar.monthcalendar(year, month)
last_thurs_date = cal[4][4]
if month < 10:
thurday_date = str(year)+'-0'+ str(month)+'-' + str(last_thurs_date)
else:
thurday_date = str(year) + '-' + str(month) + '-' + str(last_thurs_date)
return thurday_date
But its not working with the lambda function.
datelist['Date'].map(lambda x: last_thurs_date(x))
Where datelist is
datelist = pd.DataFrame(pd.date_range(start = pd.to_datetime('01-01-2014',format='%d-%m-%Y')
, end = pd.to_datetime('06-03-2019',format='%d-%m-%Y'),freq='D').tolist()).rename(columns={0:'Date'})
datelist['Date']=pd.to_datetime(datelist['Date'])
Jpp already added the solution, but just to add a slightly more readable formatted string - see this awesome website.
import calendar
def last_thurs_date(date):
year, month = date.year, date.month
cal = calendar.monthcalendar(year, month)
# the last (4th week -> row) thursday (4th day -> column) of the calendar
# except when 0, then take the 3rd week (February exception)
last_thurs_date = cal[4][4] if cal[4][4] > 0 else cal[3][4]
return f'{year}-{month:02d}-{last_thurs_date}'
Also added a bit of logic - e.g. you got 2019-02-0 as February doesn't have 4 full weeks.
Scalar datetime objects don't have a dt accessor, series do: see pd.Series.dt. If you remove this, your function works fine. The key is understanding that pd.Series.apply passes scalars to your custom function via a loop, not an entire series.
def last_thurs_date(date):
month = date.month
year = date.year
cal = calendar.monthcalendar(year, month)
last_thurs_date = cal[4][4]
if month < 10:
thurday_date = str(year)+'-0'+ str(month)+'-' + str(last_thurs_date)
else:
thurday_date = str(year) + '-' + str(month) + '-' + str(last_thurs_date)
return thurday_date
You can rewrite your logic more succinctly via f-strings (Python 3.6+) and a ternary statement:
def last_thurs_date(date):
month = date.month
year = date.year
last_thurs_date = calendar.monthcalendar(year, month)[4][4]
return f'{year}{"-0" if month < 10 else "-"}{month}-{last_thurs_date}'
I know that a lot of time has passed since the date of this post, but I think it would be worth adding another option if someone came across this thread
Even though I use pandas every day at work, in that case my suggestion would be to just use the datetutil library. The solution is a simple one-liner, without unnecessary combinations.
from dateutil.rrule import rrule, MONTHLY, FR, SA
from datetime import datetime as dt
import pandas as pd
# monthly options expiration dates calculated for 2022
monthly_options = list(rrule(MONTHLY, count=12, byweekday=FR, bysetpos=3, dtstart=dt(2022,1,1)))
# last satruday of the month
last_saturday = list(rrule(MONTHLY, count=12, byweekday=SA, bysetpos=-1, dtstart=dt(2022,1,1)))
and then of course:
pd.DataFrame({'LAST_ST':last_saturdays}) #or whatever you need
This question answer Calculate Last Friday of Month in Pandas
This can be modified by selecting the appropriate day of the week, here freq='W-FRI'
I think the easiest way is to create a pandas.DataFrame using pandas.date_range and specifying freq='W-FRI.
W-FRI is Weekly Fridays
pd.date_range(df.Date.min(), df.Date.max(), freq='W-FRI')
Creates all the Fridays in the date range between the min and max of the dates in df
Use a .groupby on year and month, and select .last(), to get the last Friday of every month for every year in the date range.
Because this method finds all the Fridays for every month in the range and then chooses .last() for each month, there's not an issue with trying to figure out which week of the month has the last Friday.
With this, use pandas: Boolean Indexing to find values in the Date column of the dataframe that are in last_fridays_in_daterange.
Use the .isin method to determine containment.
pandas: DateOffset objects
import pandas as pd
# test data: given a dataframe with a datetime column
df = pd.DataFrame({'Date': pd.date_range(start=pd.to_datetime('2014-01-01'), end=pd.to_datetime('2020-08-31'), freq='D')})
# create a dateframe with all Fridays in the daterange for min and max of df.Date
fridays = pd.DataFrame({'datetime': pd.date_range(df.Date.min(), df.Date.max(), freq='W-FRI')})
# use groubpy and last, to get the last Friday of each month into a list
last_fridays_in_daterange = fridays.groupby([fridays.datetime.dt.year, fridays.datetime.dt.month]).last()['datetime'].tolist()
# find the data for the last Friday of the month
df[df.Date.isin(last_fridays_in_daterange)]
I have a dataframe (df) with start_date column's and add_days column's (=10). I want to create target_date (=start_date + add_days) excluding week-end and holidays (holidays as dataframe).
I do some research and I try this.
from datetime import date, timedelta
import datetime as dt
df["star_date"] = pd.to_datetime(df["star_date"])
Holidays['Date_holi'] = pd.to_datetime(Holidays['Date_holi'])
def date_by_adding_business_days(from_date, add_days, holidays):
business_days_to_add = add_days
current_date = from_date
while business_days_to_add > 0:
current_date += datetime.timedelta(days=1)
weekday = current_date.weekday()
if weekday >= 5: # sunday = 6
continue
if current_date in holidays:
continue
business_days_to_add -= 1
return current_date
#demo:
base["Target_date"]=date_by_adding_business_days(df["start_date"], 10, Holidays['Date_holi'])
but i get this error:
AttributeError: 'Series' object has no attribute 'weekday'
Thanks you for your help.
The comments by ALollz are very valid; customizing your date during creation to only keep what is defined as business day for your problem would be optimal.
However, I assume that you cannot define the business day beforehand and that you need to solve the problem with the data frame constructed as is.
Here is one possible solution:
import pandas as pd
import numpy as np
from datetime import timedelta
# Goal is to offset a start date by N business days (weekday + not a holiday)
# Here we fake the dataset as it was not provided
num_row = 1000
df = pd.DataFrame()
df['start_date'] = pd.date_range(start='1/1/1979', periods=num_row, freq='D')
df['add_days'] = pd.Series([10]*num_row)
# Define what is a week day
week_day = [0,1,2,3,4] # Monday to Friday
# Define what is a holiday with month and day without year (you can add more)
holidays = ['10-30','12-24']
def add_days_to_business_day(df, week_day, holidays, increment=10):
'''
modify the dataframe to increment only the days that are part of a weekday
and not part of a pre-defined holiday
>>> add_days_to_business_day(df, [0,1,2,3,4], ['10-31','12-31'])
this will increment by 10 the days from Monday to Friday excluding Halloween and new year-eve
'''
# Increment everything that is in a business day
df.loc[df['start_date'].dt.dayofweek.isin(week_day),'target_date'] = df['start_date'] + timedelta(days=increment)
# Remove every increment done on a holiday
df.loc[df['start_date'].dt.strftime('%m-%d').isin(holidays), 'target_date'] = np.datetime64('NaT')
add_days_to_business_day(df, week_day, holidays)
df
To Note: I'm not using the 'add_days' column since its just a repeated value. I am instead using a parameter for my function increment which will increment by N number of days (with a default of N = 10).
Hope it helps!
My Date should always fall on 8th or 22nd that comes off the input date.
For Example:
If the input date is 20190415 then the output date should be 20190422 as that's the nearest date and if input date is 20190424 then the output date should be 20190508.
Example1:
input_date = 20190415
Expected output_date = 20190422
Example2:
input_date = 20190424
Expected output_date = 20190508
Example3:
input_date = 20190506
Expected output_date = 20190508
Example4:
input_date = 20191223
Expected output_date = 20200108
How do we achieve this using Python?
You can check if the day is greater than 22, and if so you set it to the 8th of the next month. If it's between 8 and 22 you set it to 22 of the same month and if it's below the 8th you set it to the 8th of the month. There's probably more elegant ways to do it using date math, but this will work for your scenario.
Use the datetime module to find out what the "next month" is. One way to do it is to add a timedelta of 1 month to the first of the current month, and then change the date on that datetime object to the 8th. Here's a quick example of how that might look like:
from datetime import date, timedelta
input_date = date(2019, 12, 23)
if input_date.day > 22:
output_date = date(input_date.year, input_date.month) + timedelta(days=31)
output_date = output_date.replace(day = 8)
You can read a lot more about the details of how the datetime module works on the official documentation. It's kind of a long read, but I actually have that page bookmarked because I always have to go back and reference how to actually use the module :)
Considering the input as string, next date can be calculated using timedelta, check out the below code:
if 8<datetime.strptime(input_date, "%Y%m%d").day < 22:
delta = 22 - datetime.strptime(input_date, "%Y%m%d").day
print((datetime.strptime(input_date, "%Y%m%d") +
timedelta(days=delta)).strftime("%Y%m%d"))
elif datetime.strptime(str(input_date), "%Y%m%d").day < 8:
delta = 8 - datetime.strptime(input_date, "%Y%m%d").day
print((datetime.strptime(input_date, "%Y%m%d") +
timedelta(days=delta)).strftime("%Y%m%d"))
else:
delta = (datetime.strptime(input_date, "%Y%m%d")+ relativedelta(months=+1)).day -8
print((datetime.strptime(input_date, "%Y%m%d") + relativedelta(months=+1) -
timedelta(days=delta)).strftime("%Y%m%d") )
I have a list of dates as generated by:
from dateutil import parser
from datetime import date, timedelta
d1 = parser.parse("2015-11-25")
d2 = parser.parse("2016-02-06")
delta = (d2-d1).days
date_list = [d1 + timedelta(days=x) for x in range(0, delta+1)]
In this list there are 6 days in the month of november 2015, 31 days in december 2015 , 31 days in january 2016 and 6 days in february 2016. December 2015 and January 2016 are "full" months, i.e. the datelist has all days in those months.
How can I get this information programatically in python, in order to produce a list such as:
[(2015,11,6,False),(2015,12,31,True),(2016,1,31,True),(2016,2,6,False)]
Found a neat short solution:
from dateutil import parser
from datetime import date, timedelta
from collections import Counter
from calendar import monthrange
d1 = parser.parse("2015-11-25")
d2 = parser.parse("2016-02-06")
delta = (d2-d1).days
date_list = [d1 + timedelta(days=x) for x in range(0, delta+1)]
month_year_list = [(d.year, d.month) for d in date_list]
result = [(k[0],k[1],v , True if monthrange(k[0], k[1])[1] == v else
False) for k,v in Counter(month_year_list).iteritems()]
print result
Walk the list and accumulate the number of days for each year/month combination:
import collections
days_in_year_month = defaultdict(int)
for each_date in date_list:
days_in_year_month[(each_date.year, each_date.month)] += 1
Next output the tuples with each year, month, count and T/F:
import calendar
result = []
for year_month in date_list.keys():
days_in_ym = days_in_year_month([year_month[0], year_month[1])
is_complete = days_in_ym == calendar.monthrange(year_month[0], year_month[1])[1]
result.append(year_month[0], year_month[1], days_in_ym, is_complete)
So:
I learned about monthrange here: How do we determine the number of days for a given month in python
My solution sucks because it will do a total of 3 loops: the initial loop from your list comprehension, plus the two loops I added. Since you're walking the days in order for your list comprehension, this could be much optimized to run in a single loop.
I didn't test it :)
The previous mentioned solutions seem ok, however I believe I have a more optimal solution, since they require to calculate a list that contains all the days. For a small date difference this won't be problematic. However if the difference increases, your list will become a lot larger.
I want to give another approach that is more intuitive, since you basically know that all months that between the dates are full, and the months of the dates themselves are not full.
I try to leverage that information and the loop will only iterate the amount of months between the dates.
The code:
from dateutil import parser
from calendar import monthrange
d1 = parser.parse("2015-11-25")
d2 = parser.parse("2016-02-06")
# needed to calculate amount of months between the dates
m1 = d1.year * 12 + (d1.month- 1)
m2 = d2.year * 12 + (d2.month - 1)
result = []
# append first month since this will not be full
result.append((d1.year,d1.month,monthrange(d1.year, d1.month)[1]-d1.day+1,False))
current_month = d1.month
current_year = d1.year
# loop through the months and years that follow d1.
for _ in xrange(0,(m2-m1)-1):
if current_month+1 > 12:
current_month = 1
current_year += 1
else:
current_month += 1
result.append((current_year,current_month,monthrange(current_year, current_month)[1],True))
# append last month since this will not be full either.
result.append((d2.year,d2.month,d2.day,False))
print result
Keep in mind that the code I gave is an example, it doesn't support for instance the scenario where the 2 given dates have the same month.