data.CSV
ID Activity Month Activity Date
0 04/2019 04-01-2019
1 05/2019 05-13-2019
2 05/2019 05-25-2019
3 06/2019 06-10-2019
4 06/2019 06-19-2019
5 07/2019 07-15-2019
6 07/2019 07-18-2019
7 07/2019 07-29-2019
8 08/2019 06-03-2019
9 08/2019 06-15-2019
10 08/2019 06-20-2019
MY PLAN
Read csv:
df = pd.read_csv('data.CSV')
Convert to datetime:
df['Activity Date'] = pd.to_datetime(df['Activity Date'], dayfirst=True)
Groupby the Activity Month column:
grouped = df.groupby(['Activity Month'])['Activity Date'].count()
print(grouped)
Activity Month
04/2019 15532
05/2019 13924
06/2019 12822
07/2019 14067
08/2019 10939
Name: Activity Date, dtype: int64
While the date is grouped, perform business day calculation:
This part i'm not sure what to do. Lost already
CODE I USED TO CALCULATE BUSINESS DAYS
import calendar
import datetime
x = datetime.date(2019, 4, 1)
cal = calendar.Calendar()
working_days = len([x for x in cal.itermonthdays2(x.year, x.month) if x[0] !=0 and x[1] < 5])
print ("Total business days for month (" + str(x.month) + ") is " + str(working_days) + " days")
OUTPUT THAT I WANTED
Total business days for month (4) is 22 days
Total business days for month (5) is 23 days
Total business days for month (6) is 20 days
Total business days for month (7) is 23 days
Total business days for month (8) is 22 days
I'm not entirely clear and the problem statement here, but if you want to calculate the number of business days for each Activity Month, you can wrap your calculation in a method, and apply that method over your Activity Month column (the lambda expression is basically a for loop operation over each row for specified columns).
grouped = df.groupby(['Activity Month'])['Activity Date'].count().reset_index()
def get_business_days(x):
x = datetime.date(int(x.split('/')[1]), int(x.split('/')[0]), 1)
cal = calendar.Calendar()
working_days = len([x for x in cal.itermonthdays2(x.year, x.month) if x[0] !=0 and x[1] < 5])
return ("Total business days for month (" + str(x.month) + ") is " + str(working_days) + " days")
grouped['Activity Month'].apply(get_business_days)
The output is a Series that has your text output.
0 Total business days for month (4) is 22 days
1 Total business days for month (5) is 23 days
2 Total business days for month (6) is 20 days
3 Total business days for month (7) is 23 days
4 Total business days for month (8) is 22 days
But, it's a bad idea to store repeated information in every cell. It'd be preferable to simply return working_days instead of having it embedded in a string.
Related
I have a function that takes start date and end date and builds a dataframe off of it. The columns I want in my dataframe are what month it is, what year it is, quater and two more columns.
I want to build the year column based on fiscal year. So a new fiscal year is July-June. So if the date is Jan 1,2021 it is Fiscal Year 2021 but August 1,2021 is Fiscal year 2022. And a quarter is 3 months, so July-Sept is Q1, Oct-Dec is Q2, Jan-March is Q3 and April-June is Q4. How do I do I add this quarter column?
Next I wanted Year context and Month Context so if the year is 2022 its current year, if it is 2021 it is 1 year ago, if it is 2020, its 2 years ago and so on. And for Month, if the date is August,2021 its current month, if it is July,2021 its 1 month ago and so on. Similarly I want a quarter context column.
Here is my code for all this: When I do this it says January as Q1 which is not correct, and I am not sure how to change it.
Its a long post but to sum up here's what I need help with:
Quarter column,year context column, month context column
Its a long post but to sum up here's what I need help with:
Quarter column,year context column, month context column, creating quarter context column
def create_date_table3(start, end):
df = pd.DataFrame({"date": pd.date_range(start, end)})
df = df.assign(Year=(df.date - pd.offsets.MonthBegin(7)).dt.year + 1)
current_month = datetime.now().month
current_quarter = df["date"].dt.quarter
current_year = datetime.now().year if datetime.now().month <= 6 else datetime.now().year + 1
df["Month"] = df.date.dt.month
df["Year_Month"] = df.Year.astype(str) + '_' + df.Month.astype(str)
df = df.assign(Quarter=(df.date - pd.offsets.MonthBegin(7)).dt.quarter)
df["YearContext"] = ["Current Year" if x == current_year else str(current_year - x) + ' Yr Ago' for x in df["Year"]]
df["MonthContext"] = ["Current Month" if x == current_month else str(current_month - x) + ' Mo Ago' for x in df["Month"]]
return df
I have a df
date
2021-03-12
2021-03-17
...
2022-05-21
2022-08-17
I am trying to add a column year_week, but my year week starts at 2021-06-28, which is the first day of July.
I tried:
df['date'] = pd.to_datetime(df['date'])
df['year_week'] = (df['date'] - timedelta(days=datetime(2021, 6, 24).timetuple()
.tm_yday)).dt.isocalendar().week
I played around with the timedelta days values so that the 2021-06-28 has a value of 1.
But then I got problems with previous & dates exceeding my start date + 1 year:
2021-03-12 has a value of 38
2022-08-17 has a value of 8
So it looks like the valid period is from 2021-06-28 + 1 year.
date year_week
2021-03-12 38 # LY38
2021-03-17 39 # LY39
2021-06-28 1 # correct
...
2022-05-21 47 # correct
2022-08-17 8 # NY8
Is there a way to get around this? As I am aggregating the data by year week I get incorrect results due to the past & upcoming dates. I would want to have negative dates for the days before 2021-06-28 or LY38 denoting that its the year week of the last year, accordingly year weeks of 52+ or NY8 denoting that this is the 8th week of the next year?
Here is a way, I added two dates more than a year away. You need the isocalendar from the difference between the date column and the dayofyear of your specific date. Then you can select the different scenario depending on the year of your specific date. use np.select for the different result format.
#dummy dataframe
df = pd.DataFrame(
{'date': ['2020-03-12', '2021-03-12', '2021-03-17', '2021-06-28',
'2022-05-21', '2022-08-17', '2023-08-17']
}
)
# define start date
d = pd.to_datetime('2021-6-24')
# remove the nomber of day of year from each date
s = (pd.to_datetime(df['date']) - pd.Timedelta(days=d.day_of_year)
).dt.isocalendar()
# get the difference in year
m = (s['year'].astype('int32') - d.year)
# all condition of result depending on year difference
conds = [m.eq(0), m.eq(-1), m.eq(1), m.lt(-1), m.gt(1)]
choices = ['', 'LY','NY',(m+1).astype(str)+'LY', '+'+(m-1).astype(str)+'NY']
# create the column
df['res'] = np.select(conds, choices) + s['week'].astype(str)
print(df)
date res
0 2020-03-12 -1LY38
1 2021-03-12 LY38
2 2021-03-17 LY39
3 2021-06-28 1
4 2022-05-21 47
5 2022-08-17 NY8
6 2023-08-17 +1NY8
I think
pandas period_range can be of some help
pd.Series(pd.period_range("6/28/2017", freq="W", periods=Number of weeks you want))
I have 2 datasets to work with:
ID Date Amount
1 2020-01-02 1000
1 2020-01-09 200
1 2020-01-08 400
And another dataset which tells which is most frequent day of week and most frequent week of month for each ID(there are multiple such IDs)
ID Pref_Day_Of_Week_A Pref_Week_Of_Month_A
1 3 2
For this ID ,Thursday was the most frequent day of the week for ID 1 and 2nd week of the month was the most frequent week of the month.
I wish to find sum of all the amounts that took place on the most frequent day of week and frequent week of month, for all IDs(hence requiring groupby):
ID Amount_On_Pref_Day Amount_Pref_Week
1 1200 600
I would really appreciate it if anyone could help me calculating this dataframe using pandas. For reference, I have used this function to find the week of month for a given date:
#https://stackoverflow.com/a/64192858/2901002
def weekinmonth(dates):
"""Get week number in a month.
Parameters:
dates (pd.Series): Series of dates.
Returns:
pd.Series: Week number in a month.
"""
firstday_in_month = dates - pd.to_timedelta(dates.dt.day - 1, unit='d')
return (dates.dt.day-1 + firstday_in_month.dt.weekday) // 7 + 1
Idea is filter only matched dayofweek and week and aggregate sum, last join together by concat:
#https://stackoverflow.com/a/64192858/2901002
def weekinmonth(dates):
"""Get week number in a month.
Parameters:
dates (pd.Series): Series of dates.
Returns:
pd.Series: Week number in a month.
"""
firstday_in_month = dates - pd.to_timedelta(dates.dt.day - 1, unit='d')
return (dates.dt.day-1 + firstday_in_month.dt.weekday) // 7 + 1
df.Date = pd.to_datetime(df.Date)
df['dayofweek'] = df.Date.dt.dayofweek
df['week'] = weekinmonth(df['Date'])
f = lambda x: x.mode().iat[0]
df1 = (df.groupby('ID', as_index=False).agg(Pref_Day_Of_Week_A=('dayofweek',f),
Pref_Week_Of_Month_A=('week',f)))
s1 = df1.rename(columns={'Pref_Day_Of_Week_A':'dayofweek'}).merge(df).groupby('ID')['Amount'].sum()
s2 = df1.rename(columns={'Pref_Week_Of_Month_A':'week'}).merge(df).groupby('ID')['Amount'].sum()
df2 = pd.concat([s1, s2], axis=1, keys=('Amount_On_Pref_Day','Amount_Pref_Week'))
print (df2)
Amount_On_Pref_Day Amount_Pref_Week
ID
1 1200 600
I have a Pandas dataframe, which looks like below
I want to create a new column, which tells the exact date from the information from all the above columns. The code should look something like this:
df['Date'] = pd.to_datetime(df['Month']+df['WeekOfMonth']+df['DayOfWeek']+df['Year'])
I was able to find a workaround for your case. You will need to define the dictionaries for the months and the days of the week.
month = {"Jan":"01", "Feb":"02", "March":"03", "Apr": "04", "May":"05", "Jun":"06", "Jul":"07", "Aug":"08", "Sep":"09", "Oct":"10", "Nov":"11", "Dec":"12"}
week = {"Monday":1,"Tuesday":2,"Wednesday":3,"Thursday":4,"Friday":5,"Saturday":6,"Sunday":7}
With this dictionaries the transformation that I used with a custom dataframe was:
rows = [["Dec",5,"Wednesday", "1995"],
["Jan",3,"Wednesday","2013"]]
df = pd.DataFrame(rows, columns=["Month","Week","Weekday","Year"])
df['Date'] = (df["Year"] + "-" + df["Month"].map(month) + "-" + (df["Week"].apply(lambda x: (x - 1)*7) + df["Weekday"].map(week).apply(int) ).apply(str)).astype('datetime64[ns]')
However you have to be careful. With some data that you posted as example there were some dates that exceeds the date range. For example, for
row = ["Oct",5,"Friday","2018"]
The date displayed is 2018-10-33. I recommend using some logic to filter your data in order to avoid this kind of problems.
Let's approach it in 3 steps as follows:
Get the date of month start Month_Start from Year and Month
Calculate the date offsets DateOffset relative to Month_Start from WeekOfMonth and DayOfWeek
Get the actual date Date from Month_Start and DateOffset
Here's the codes:
df['Month_Start'] = pd.to_datetime(df['Year'].astype(str) + df['Month'] + '01', format="%Y%b%d")
import time
df['DateOffset'] = (df['WeekOfMonth'] - 1) * 7 + df['DayOfWeek'].map(lambda x: time.strptime(x, '%A').tm_wday) - df['Month_Start'].dt.dayofweek
df['Date'] = df['Month_Start'] + pd.to_timedelta(df['DateOffset'], unit='D')
Output:
Month WeekOfMonth DayOfWeek Year Month_Start DateOffset Date
0 Dec 5 Wednesday 1995 1995-12-01 26 1995-12-27
1 Jan 3 Wednesday 2013 2013-01-01 15 2013-01-16
2 Oct 5 Friday 2018 2018-10-01 32 2018-11-02
3 Jun 2 Saturday 1980 1980-06-01 6 1980-06-07
4 Jan 5 Monday 1976 1976-01-01 25 1976-01-26
The Date column now contains the dates derived from the information from other columns.
You can remove the working interim columns, if you like, as follows:
df = df.drop(['Month_Start', 'DateOffset'], axis=1)
I have a column with many dates: sample of the said list below
Dates
1 2019-02-01
2 2018-03-10
3 2019-08-01
4 2020-02-07
I would like to have it so that if input a date, of any year I can get the week number.
However, the fiscal year starts on Aug 1 of any given year.
I tried just shifting the date to Jan 1 but it's different for every year due to leap years.
data['Dates'] = pd.to_datetime(data['Dates'])
data['Week'] = (data['Dates'] - timedelta(days=215)).week
print(data)
how can I get a result similar to this one below
Dates Week
1 2019-02-01 27
2 2018-03-10 32
3 2019-08-01 1
4 2020-02-07 28
-Note: the weeks are probably incorrect.
The other answer ignores the fiscal year part of the OP. I am leaving the fiscal year start date calc to the reader but this will calculate the week number (where Monday is the start of the week) from an arbitrary start date.
from dateutil import relativedelta
from datetime import date, datetime, timedelta
NEXT_MONDAY = relativedelta.relativedelta(weekday=relativedelta.MO)
LAST_MONDAY = relativedelta.relativedelta(weekday=relativedelta.MO(-1))
ONE_WEEK = timedelta(weeks=1)
def week_in_fiscal_year(d: date, fiscal_year_start: date) -> int:
fy_week_2_monday = fiscal_year_start + NEXT_MONDAY
if d < fy_week_2_monday:
return 1
else:
cur_week_monday = d + LAST_MONDAY
return int((cur_week_monday - fy_week_2_monday) / ONE_WEEK) + 2
adapted from this post
Convert it to a datetime, then call datetime.date(2010, 6, 16).strftime("%V")4
You can also use isocalendar which will return a tuple, as opposed to a string above datetime.date(2010, 6, 16).isocalendar()[1]
How to get week number in Python?