Pandas select last friday of each month [duplicate] - python

I've written this function to get the last Thursday of the month
def last_thurs_date(date):
month=date.dt.month
year=date.dt.year
cal = calendar.monthcalendar(year, month)
last_thurs_date = cal[4][4]
if month < 10:
thurday_date = str(year)+'-0'+ str(month)+'-' + str(last_thurs_date)
else:
thurday_date = str(year) + '-' + str(month) + '-' + str(last_thurs_date)
return thurday_date
But its not working with the lambda function.
datelist['Date'].map(lambda x: last_thurs_date(x))
Where datelist is
datelist = pd.DataFrame(pd.date_range(start = pd.to_datetime('01-01-2014',format='%d-%m-%Y')
, end = pd.to_datetime('06-03-2019',format='%d-%m-%Y'),freq='D').tolist()).rename(columns={0:'Date'})
datelist['Date']=pd.to_datetime(datelist['Date'])

Jpp already added the solution, but just to add a slightly more readable formatted string - see this awesome website.
import calendar
def last_thurs_date(date):
year, month = date.year, date.month
cal = calendar.monthcalendar(year, month)
# the last (4th week -> row) thursday (4th day -> column) of the calendar
# except when 0, then take the 3rd week (February exception)
last_thurs_date = cal[4][4] if cal[4][4] > 0 else cal[3][4]
return f'{year}-{month:02d}-{last_thurs_date}'
Also added a bit of logic - e.g. you got 2019-02-0 as February doesn't have 4 full weeks.

Scalar datetime objects don't have a dt accessor, series do: see pd.Series.dt. If you remove this, your function works fine. The key is understanding that pd.Series.apply passes scalars to your custom function via a loop, not an entire series.
def last_thurs_date(date):
month = date.month
year = date.year
cal = calendar.monthcalendar(year, month)
last_thurs_date = cal[4][4]
if month < 10:
thurday_date = str(year)+'-0'+ str(month)+'-' + str(last_thurs_date)
else:
thurday_date = str(year) + '-' + str(month) + '-' + str(last_thurs_date)
return thurday_date
You can rewrite your logic more succinctly via f-strings (Python 3.6+) and a ternary statement:
def last_thurs_date(date):
month = date.month
year = date.year
last_thurs_date = calendar.monthcalendar(year, month)[4][4]
return f'{year}{"-0" if month < 10 else "-"}{month}-{last_thurs_date}'

I know that a lot of time has passed since the date of this post, but I think it would be worth adding another option if someone came across this thread
Even though I use pandas every day at work, in that case my suggestion would be to just use the datetutil library. The solution is a simple one-liner, without unnecessary combinations.
from dateutil.rrule import rrule, MONTHLY, FR, SA
from datetime import datetime as dt
import pandas as pd
# monthly options expiration dates calculated for 2022
monthly_options = list(rrule(MONTHLY, count=12, byweekday=FR, bysetpos=3, dtstart=dt(2022,1,1)))
# last satruday of the month
last_saturday = list(rrule(MONTHLY, count=12, byweekday=SA, bysetpos=-1, dtstart=dt(2022,1,1)))
and then of course:
pd.DataFrame({'LAST_ST':last_saturdays}) #or whatever you need

This question answer Calculate Last Friday of Month in Pandas
This can be modified by selecting the appropriate day of the week, here freq='W-FRI'
I think the easiest way is to create a pandas.DataFrame using pandas.date_range and specifying freq='W-FRI.
W-FRI is Weekly Fridays
pd.date_range(df.Date.min(), df.Date.max(), freq='W-FRI')
Creates all the Fridays in the date range between the min and max of the dates in df
Use a .groupby on year and month, and select .last(), to get the last Friday of every month for every year in the date range.
Because this method finds all the Fridays for every month in the range and then chooses .last() for each month, there's not an issue with trying to figure out which week of the month has the last Friday.
With this, use pandas: Boolean Indexing to find values in the Date column of the dataframe that are in last_fridays_in_daterange.
Use the .isin method to determine containment.
pandas: DateOffset objects
import pandas as pd
# test data: given a dataframe with a datetime column
df = pd.DataFrame({'Date': pd.date_range(start=pd.to_datetime('2014-01-01'), end=pd.to_datetime('2020-08-31'), freq='D')})
# create a dateframe with all Fridays in the daterange for min and max of df.Date
fridays = pd.DataFrame({'datetime': pd.date_range(df.Date.min(), df.Date.max(), freq='W-FRI')})
# use groubpy and last, to get the last Friday of each month into a list
last_fridays_in_daterange = fridays.groupby([fridays.datetime.dt.year, fridays.datetime.dt.month]).last()['datetime'].tolist()
# find the data for the last Friday of the month
df[df.Date.isin(last_fridays_in_daterange)]

Related

Deriving special period based on date from file - Python

I am new to scripting need some help in writing the code in correct way. I have a csv file in which we have date based on the date I need to create a new column name period which will be combination of year and month.
If the date range is between 1 to 25, month will be the current month from the date
If the date range is greater then 25, month will be next month.
Sample file:
Date
10/21/2021
10/26/2021
01/26/2021
Expected results:
Date
Period (year+month)
10/21/2021
202110
10/26/2021
202111
01/26/2021
202102
Two ways I can think of.
Convert the incoming string into a date object and get the values you need from there. See Converting string into datetime
Use split("/") to split the date string into a list of three values and use those to do your calculations.
Good question.
I've included the code that I wrote to do this, below. The process we will follow is:
Load the data from a csv
Define a function that will calculate the period for each date
Apply the function to our data and store the result as a new column
import pandas as pd
# Step 1
# read in the data from a csv, parsing dates and store the data in a DataFrame
data = pd.read_csv("filepath.csv", parse_dates=["Date"])
# Create day, month and year columns in our DataFrame
data['day'] = data['Date'].dt.day
data['month'] = data['Date'].dt.month
data['year'] = data['Date'].dt.year
# Step 2
# Define a function that will get our periods from a given date
def get_period(date):
day = date.day
month = date.month
year = date.year
if day > 25:
if month == 12: # if december, increment year and change month to jan.
year += 1
month = 1
else:
month += 1
# convert our year and month into strings that we can concatenate easily
year_string = str(year).zfill(4) #
month_string = str(month).zfill(2)
period = str(year_string) + str(month_string) # concat the strings together
return period
# Step 3
# Apply our custom function (get_period) to the DataFrame
data['period'] = data.apply(get_period, axis = 1)

Dates/months calculation

My below working code calculates date/month ranges, but I am using the Pandas library, which I want to get rid of.
import pandas as pd
dates=pd.date_range("2019-12","2020-02",freq='MS').strftime("%Y%m%d").tolist()
#print dates : ['20191101','20191201','20200101','20200201']
df=(pd.to_datetime(dates,format="%Y%m%d") + MonthEnd(1)).strftime("%Y%m%d").tolist()
#print df : ['20191130','20191231','20200131','20200229']
How can I rewrite this code without using Pandas?
I don't want to use Pandas library as I am triggering my job through Oozie and we don't have Pandas installed on all our nodes.
Pandas offers some nice functionalities when using datetimes which the standard library datetime module does not have (like the frequency or the MonthEnd). You have to reproduce these yourself.
import datetime as DT
def next_first_of_the_month(dt):
"""return a new datetime where the month has been increased by 1 and
the day is always the first
"""
new_month = dt.month + 1
if new_month == 13:
new_year = dt.year + 1
new_month = 1
else:
new_year = dt.year
return DT.datetime(new_year, new_month, day=1)
start, stop = [DT.datetime.strptime(dd, "%Y-%m") for dd in ("2019-11", "2020-02")]
dates = [start]
cd = next_first_of_the_month(start)
while cd <= stop:
dates.append(cd)
cd = next_first_of_the_month(cd)
str_dates = [d.strftime("%Y%m%d") for d in dates]
print(str_dates)
# prints: ['20191101', '20191201', '20200101', '20200201']
end_dates = [next_first_of_the_month(d) - DT.timedelta(days=1) for d in dates]
str_end_dates = [d.strftime("%Y%m%d") for d in end_dates]
print(str_end_dates)
# prints ['20191130', '20191231', '20200131', '20200229']
I used here a function to get a datetime corresponding to the first day of the next month of the input datetime. Sadly, timedelta does not work with months, and adding 30 days of course is not feasible (not all months have 30 days).
Then a while loop to get a sequence of fist days of the month until the stop date.
And to the get the end of the month, again get the next first day of the month fo each datetime in your list and subtract a day.

How to add business days in date excluding holidays

I have a dataframe (df) with start_date column's and add_days column's (=10). I want to create target_date (=start_date + add_days) excluding week-end and holidays (holidays as dataframe).
I do some research and I try this.
from datetime import date, timedelta
import datetime as dt
df["star_date"] = pd.to_datetime(df["star_date"])
Holidays['Date_holi'] = pd.to_datetime(Holidays['Date_holi'])
def date_by_adding_business_days(from_date, add_days, holidays):
business_days_to_add = add_days
current_date = from_date
while business_days_to_add > 0:
current_date += datetime.timedelta(days=1)
weekday = current_date.weekday()
if weekday >= 5: # sunday = 6
continue
if current_date in holidays:
continue
business_days_to_add -= 1
return current_date
#demo:
base["Target_date"]=date_by_adding_business_days(df["start_date"], 10, Holidays['Date_holi'])
but i get this error:
AttributeError: 'Series' object has no attribute 'weekday'
Thanks you for your help.
The comments by ALollz are very valid; customizing your date during creation to only keep what is defined as business day for your problem would be optimal.
However, I assume that you cannot define the business day beforehand and that you need to solve the problem with the data frame constructed as is.
Here is one possible solution:
import pandas as pd
import numpy as np
from datetime import timedelta
# Goal is to offset a start date by N business days (weekday + not a holiday)
# Here we fake the dataset as it was not provided
num_row = 1000
df = pd.DataFrame()
df['start_date'] = pd.date_range(start='1/1/1979', periods=num_row, freq='D')
df['add_days'] = pd.Series([10]*num_row)
# Define what is a week day
week_day = [0,1,2,3,4] # Monday to Friday
# Define what is a holiday with month and day without year (you can add more)
holidays = ['10-30','12-24']
def add_days_to_business_day(df, week_day, holidays, increment=10):
'''
modify the dataframe to increment only the days that are part of a weekday
and not part of a pre-defined holiday
>>> add_days_to_business_day(df, [0,1,2,3,4], ['10-31','12-31'])
this will increment by 10 the days from Monday to Friday excluding Halloween and new year-eve
'''
# Increment everything that is in a business day
df.loc[df['start_date'].dt.dayofweek.isin(week_day),'target_date'] = df['start_date'] + timedelta(days=increment)
# Remove every increment done on a holiday
df.loc[df['start_date'].dt.strftime('%m-%d').isin(holidays), 'target_date'] = np.datetime64('NaT')
add_days_to_business_day(df, week_day, holidays)
df
To Note: I'm not using the 'add_days' column since its just a repeated value. I am instead using a parameter for my function increment which will increment by N number of days (with a default of N = 10).
Hope it helps!

Get last day of month in python

Getting error for December month.
ValueError: month must be in 1..12
def last_day_of_month(ds):
cur_ds = datetime.strptime(ds, '%Y-%m-%d')
next_month = datetime(year=cur_ds.year, month=cur_ds.month+1, day=1)
last_day_month = next_month - timedelta(days=1)
return datetime.strftime(last_day_month, '%Y-%m-%d')
print last_day_of_month('2016-12-01')
In line 3 month=cur_ds.month+1 you are giving 13th month which is not valid. If you want to calculate last day of a given month you could also use month range from calendar library.
>>import calendar
>>year, month = 2016, 12
>>calendar.monthrange(year, month)[1]
31
You can't make a datetime with a month of 13. So you have to find a way to fix it. A simple solution is to convert the incremented month to an extra year:
# Reduce 12 to 1, 0 and all other #s to 0, #
extrayear, month = divmod(cur_ds.month, 12)
# Add 1 or 0 to existing year, add one to month (which was reduced to 0-11)
next_month = datetime(year=cur_ds.year + extrayear, month=month + 1, day=1)
You're passing in 12 as current month, then adding one to get next_month, making it 13. Check for the 12 case and set month=1 instead.
this is how I did it.
from django.utils import timezone
from calendar import monthrange
from datetime import datetime
current = timezone.now()
firstdayofmonth = current.replace(day=1)
endmonth = monthrange(current.year, current.month)
lastdayofmonth = datetime(current.year, current.month, endmonth[1])

get dates of present month in a list and difference in dates

How do I go about printing these:
At any given time, I should be able to return the dates of the present month in a list.
eg. Since this is March, I want to print a list [20140301, ...,20140331].
Find the difference in number of days from current date and the first of that month.
e.g. today is 4th March. I should be able to print the difference as 3
Thanks in advance
Here is what you can do in Python 2.7.3
from datetime import date
import calendar
today = date.today()
first_of_month = today.replace(day=1)
_, number_of_days_in_month = calendar.monthrange(first_of_month.year, first_of_month.month)
for i in range(1, number_of_days_in_month+1):
newdate = today.replace(day = i)
print newdate #prints all dates in months
#Lets calculate difference between 14th of this month and 1st
randomdate = today.replace(day=14)
delta = randomdate - first_of_month
print delta.days

Categories