I have a function that takes start date and end date and builds a dataframe off of it. The columns I want in my dataframe are what month it is, what year it is, quater and two more columns.
I want to build the year column based on fiscal year. So a new fiscal year is July-June. So if the date is Jan 1,2021 it is Fiscal Year 2021 but August 1,2021 is Fiscal year 2022. And a quarter is 3 months, so July-Sept is Q1, Oct-Dec is Q2, Jan-March is Q3 and April-June is Q4. How do I do I add this quarter column?
Next I wanted Year context and Month Context so if the year is 2022 its current year, if it is 2021 it is 1 year ago, if it is 2020, its 2 years ago and so on. And for Month, if the date is August,2021 its current month, if it is July,2021 its 1 month ago and so on. Similarly I want a quarter context column.
Here is my code for all this: When I do this it says January as Q1 which is not correct, and I am not sure how to change it.
Its a long post but to sum up here's what I need help with:
Quarter column,year context column, month context column
Its a long post but to sum up here's what I need help with:
Quarter column,year context column, month context column, creating quarter context column
def create_date_table3(start, end):
df = pd.DataFrame({"date": pd.date_range(start, end)})
df = df.assign(Year=(df.date - pd.offsets.MonthBegin(7)).dt.year + 1)
current_month = datetime.now().month
current_quarter = df["date"].dt.quarter
current_year = datetime.now().year if datetime.now().month <= 6 else datetime.now().year + 1
df["Month"] = df.date.dt.month
df["Year_Month"] = df.Year.astype(str) + '_' + df.Month.astype(str)
df = df.assign(Quarter=(df.date - pd.offsets.MonthBegin(7)).dt.quarter)
df["YearContext"] = ["Current Year" if x == current_year else str(current_year - x) + ' Yr Ago' for x in df["Year"]]
df["MonthContext"] = ["Current Month" if x == current_month else str(current_month - x) + ' Mo Ago' for x in df["Month"]]
return df
Related
I need a function to count the total number of days in the 'days' column between a start date of 1st Jan 1995 and an end date of 31st Dec 2019 in a dataframe taking leap years into account as well.
Example: 1st Jan 1995 - Day 1, 1st Feb 1995 - Day 32 .......and so on all the way to 31st.
If you want to filter a pandas dataframe using a range of 2 date you can do this by:
start_date = '1995/01/01'
end_date = '1995/02/01'
df = df[ (df['days']>=start_date) & (df['days']<=end_date) ]
and with len(df) you will see the number of rows of the filter dataframe.
Instead, if you want to calculate a range of days between 2 different date you can do without pandas with datetime:
from datetime import datetime
start_date = '1995/01/01'
end_date = '1995/02/01'
delta = datetime.strptime(end_date, '%Y/%m/%d') - datetime.strptime(start_date, '%Y/%m/%d')
print(delta.days)
Output:
31
The only thing is that this not taking into account leap years
I have 2 datasets to work with:
ID Date Amount
1 2020-01-02 1000
1 2020-01-09 200
1 2020-01-08 400
And another dataset which tells which is most frequent day of week and most frequent week of month for each ID(there are multiple such IDs)
ID Pref_Day_Of_Week_A Pref_Week_Of_Month_A
1 3 2
For this ID ,Thursday was the most frequent day of the week for ID 1 and 2nd week of the month was the most frequent week of the month.
I wish to find sum of all the amounts that took place on the most frequent day of week and frequent week of month, for all IDs(hence requiring groupby):
ID Amount_On_Pref_Day Amount_Pref_Week
1 1200 600
I would really appreciate it if anyone could help me calculating this dataframe using pandas. For reference, I have used this function to find the week of month for a given date:
#https://stackoverflow.com/a/64192858/2901002
def weekinmonth(dates):
"""Get week number in a month.
Parameters:
dates (pd.Series): Series of dates.
Returns:
pd.Series: Week number in a month.
"""
firstday_in_month = dates - pd.to_timedelta(dates.dt.day - 1, unit='d')
return (dates.dt.day-1 + firstday_in_month.dt.weekday) // 7 + 1
Idea is filter only matched dayofweek and week and aggregate sum, last join together by concat:
#https://stackoverflow.com/a/64192858/2901002
def weekinmonth(dates):
"""Get week number in a month.
Parameters:
dates (pd.Series): Series of dates.
Returns:
pd.Series: Week number in a month.
"""
firstday_in_month = dates - pd.to_timedelta(dates.dt.day - 1, unit='d')
return (dates.dt.day-1 + firstday_in_month.dt.weekday) // 7 + 1
df.Date = pd.to_datetime(df.Date)
df['dayofweek'] = df.Date.dt.dayofweek
df['week'] = weekinmonth(df['Date'])
f = lambda x: x.mode().iat[0]
df1 = (df.groupby('ID', as_index=False).agg(Pref_Day_Of_Week_A=('dayofweek',f),
Pref_Week_Of_Month_A=('week',f)))
s1 = df1.rename(columns={'Pref_Day_Of_Week_A':'dayofweek'}).merge(df).groupby('ID')['Amount'].sum()
s2 = df1.rename(columns={'Pref_Week_Of_Month_A':'week'}).merge(df).groupby('ID')['Amount'].sum()
df2 = pd.concat([s1, s2], axis=1, keys=('Amount_On_Pref_Day','Amount_Pref_Week'))
print (df2)
Amount_On_Pref_Day Amount_Pref_Week
ID
1 1200 600
I have a Pandas dataframe, which looks like below
I want to create a new column, which tells the exact date from the information from all the above columns. The code should look something like this:
df['Date'] = pd.to_datetime(df['Month']+df['WeekOfMonth']+df['DayOfWeek']+df['Year'])
I was able to find a workaround for your case. You will need to define the dictionaries for the months and the days of the week.
month = {"Jan":"01", "Feb":"02", "March":"03", "Apr": "04", "May":"05", "Jun":"06", "Jul":"07", "Aug":"08", "Sep":"09", "Oct":"10", "Nov":"11", "Dec":"12"}
week = {"Monday":1,"Tuesday":2,"Wednesday":3,"Thursday":4,"Friday":5,"Saturday":6,"Sunday":7}
With this dictionaries the transformation that I used with a custom dataframe was:
rows = [["Dec",5,"Wednesday", "1995"],
["Jan",3,"Wednesday","2013"]]
df = pd.DataFrame(rows, columns=["Month","Week","Weekday","Year"])
df['Date'] = (df["Year"] + "-" + df["Month"].map(month) + "-" + (df["Week"].apply(lambda x: (x - 1)*7) + df["Weekday"].map(week).apply(int) ).apply(str)).astype('datetime64[ns]')
However you have to be careful. With some data that you posted as example there were some dates that exceeds the date range. For example, for
row = ["Oct",5,"Friday","2018"]
The date displayed is 2018-10-33. I recommend using some logic to filter your data in order to avoid this kind of problems.
Let's approach it in 3 steps as follows:
Get the date of month start Month_Start from Year and Month
Calculate the date offsets DateOffset relative to Month_Start from WeekOfMonth and DayOfWeek
Get the actual date Date from Month_Start and DateOffset
Here's the codes:
df['Month_Start'] = pd.to_datetime(df['Year'].astype(str) + df['Month'] + '01', format="%Y%b%d")
import time
df['DateOffset'] = (df['WeekOfMonth'] - 1) * 7 + df['DayOfWeek'].map(lambda x: time.strptime(x, '%A').tm_wday) - df['Month_Start'].dt.dayofweek
df['Date'] = df['Month_Start'] + pd.to_timedelta(df['DateOffset'], unit='D')
Output:
Month WeekOfMonth DayOfWeek Year Month_Start DateOffset Date
0 Dec 5 Wednesday 1995 1995-12-01 26 1995-12-27
1 Jan 3 Wednesday 2013 2013-01-01 15 2013-01-16
2 Oct 5 Friday 2018 2018-10-01 32 2018-11-02
3 Jun 2 Saturday 1980 1980-06-01 6 1980-06-07
4 Jan 5 Monday 1976 1976-01-01 25 1976-01-26
The Date column now contains the dates derived from the information from other columns.
You can remove the working interim columns, if you like, as follows:
df = df.drop(['Month_Start', 'DateOffset'], axis=1)
I have DataFrame like below:
date = pd.DataFrame({'inputDates':['2015-01-07', '2015-12-02',
'2005-01-03', '2016-11-13',
'2020-06-03']})
And I need to check for all of these dates:
number of day in month - for example 07.01.2015 it is seventh day in month
number of week in year - for example 07.01.2015 is 1st week in year
number of month in year - for example 07.01.2015 is 1st month in year
number of day in year - for example 07.01.2015 is the 7th day in year
number of quarter in year for example 07.01.2015 is the 1st quarter in year
Try (see more in the doc ):
date['inputDates'] = pd.to_datetime(date['inputDates'])
# day in month
date.inputDates.dt.day
# week in year
date.inputDates.dt.isocalendar().week
# month in year
date.inputDates.dt.month
# day in year
date.inputDates.dt.dayofyear
# quarter
date.inputDates.dt.to_period('Q-DEC').dt.quarter
data.CSV
ID Activity Month Activity Date
0 04/2019 04-01-2019
1 05/2019 05-13-2019
2 05/2019 05-25-2019
3 06/2019 06-10-2019
4 06/2019 06-19-2019
5 07/2019 07-15-2019
6 07/2019 07-18-2019
7 07/2019 07-29-2019
8 08/2019 06-03-2019
9 08/2019 06-15-2019
10 08/2019 06-20-2019
MY PLAN
Read csv:
df = pd.read_csv('data.CSV')
Convert to datetime:
df['Activity Date'] = pd.to_datetime(df['Activity Date'], dayfirst=True)
Groupby the Activity Month column:
grouped = df.groupby(['Activity Month'])['Activity Date'].count()
print(grouped)
Activity Month
04/2019 15532
05/2019 13924
06/2019 12822
07/2019 14067
08/2019 10939
Name: Activity Date, dtype: int64
While the date is grouped, perform business day calculation:
This part i'm not sure what to do. Lost already
CODE I USED TO CALCULATE BUSINESS DAYS
import calendar
import datetime
x = datetime.date(2019, 4, 1)
cal = calendar.Calendar()
working_days = len([x for x in cal.itermonthdays2(x.year, x.month) if x[0] !=0 and x[1] < 5])
print ("Total business days for month (" + str(x.month) + ") is " + str(working_days) + " days")
OUTPUT THAT I WANTED
Total business days for month (4) is 22 days
Total business days for month (5) is 23 days
Total business days for month (6) is 20 days
Total business days for month (7) is 23 days
Total business days for month (8) is 22 days
I'm not entirely clear and the problem statement here, but if you want to calculate the number of business days for each Activity Month, you can wrap your calculation in a method, and apply that method over your Activity Month column (the lambda expression is basically a for loop operation over each row for specified columns).
grouped = df.groupby(['Activity Month'])['Activity Date'].count().reset_index()
def get_business_days(x):
x = datetime.date(int(x.split('/')[1]), int(x.split('/')[0]), 1)
cal = calendar.Calendar()
working_days = len([x for x in cal.itermonthdays2(x.year, x.month) if x[0] !=0 and x[1] < 5])
return ("Total business days for month (" + str(x.month) + ") is " + str(working_days) + " days")
grouped['Activity Month'].apply(get_business_days)
The output is a Series that has your text output.
0 Total business days for month (4) is 22 days
1 Total business days for month (5) is 23 days
2 Total business days for month (6) is 20 days
3 Total business days for month (7) is 23 days
4 Total business days for month (8) is 22 days
But, it's a bad idea to store repeated information in every cell. It'd be preferable to simply return working_days instead of having it embedded in a string.