Calculate the number of definite months in a period - python

The start year, start month, end year, and end month are the inputs (like May'2022 to June'2024). If I need to calculate how many definite months are included in this period (like how many January, March, or December are in this period), how can I achieve this using Python?

Use date_range with DatetimeIndex.month_name and Index.value_counts:
s = pd.date_range('2022-05-01','2024-06-01', freq='MS').month_name().value_counts()
print (s)
June 3
May 3
April 2
March 2
July 2
December 2
November 2
October 2
February 2
January 2
September 2
August 2
dtype: int64
Last select by index in Series called s:
print (s['January'])
2
print (s['March'])
2

pandas date_range is a good helper here
import pandas as pd
ym = pd.date_range('2022-05-01','2024-06-01', freq='MS').strftime("%Y-%b").to_list()
print(ym)
def count_month(ym_list, month):
return(sum(month in s for s in ym_list))
print(count_month(ym, "May"))
print(count_month(ym, "Jan"))
and the output is
['2022-May', '2022-Jun', '2022-Jul', '2022-Aug', '2022-Sep', '2022-Oct', '2022-Nov', '2022-Dec', '2023-Jan', '2023-Feb', '2023-Mar', '2023-Apr', '2023-May', '2023-Jun', '2023-Jul', '2023-Aug', '2023-Sep', '2023-Oct', '2023-Nov', '2023-Dec', '2024-Jan', '2024-Feb', '2024-Mar', '2024-Apr', '2024-May', '2024-Jun']
3
2

Related

Calculate the Number of Users at the Start of the Month

I have a table which looks like this:
ID
Start Date
End Date
1
01/01/2022
29/01/2022
2
03/01/2022
3
15/01/2022
4
01/02/2022
01/03/2022
5
01/03/2022
01/05/2022
6
01/04/2022
So, for every row i have the start date of the contract with the user and the end date. If the contract is still present, there will be no end date.
I'm trying to get a table that looks like this:
Feb
Mar
Apr
Jun
3
3
4
3
Which counts the number of active users on the first day of the month.
What is the most efficient way to calculate this?
At the moment the only idea that came to my mind was to use a scaffold table containing the dates i'm intereseted in (the first day of every month) and from that easily create the new table I need.
But my question is, is there a better way to solve this? I would love to find a more efficient way to calculate this since i would need to repeat the exact same calculations for the number of users at the start of the week.
This might help:
# initializing dataframe
df = pd.DataFrame({'start':['01/01/2022','03/01/2022','15/01/2022','01/02/2022','01/03/2022','01/04/2022'],
'end':['29/01/2022','','','01/03/2022','01/05/2022','']})
# cleaning datetime (the empty ones are replaced with the max exit)
df['start'] = pd.to_datetime(df['start'],format='%d/%m/%Y')
df['end'] = pd.to_datetime(df['end'],format='%d/%m/%Y', errors='coerce')
df['end'].fillna(df.end.max(), inplace=True)
dt_range = pd.date_range(start=df.start.min(),end=df.end.max(),freq='MS')
df2 = pd.DataFrame(columns=['month','number'])
for dat in dt_range:
row = {'month':dat.strftime('%B - %Y'),'number':len(df[(df.start <= dat)&(df.end >= dat)])}
df2 = df2.append(row, ignore_index=True)
Output:
month number
0 January - 2022 1
1 February - 2022 3
2 March - 2022 4
3 April - 2022 4
4 May - 2022 4
Or, if you want the format as in your question:
df2.T
month January - 2022 February - 2022 March - 2022 April - 2022 May - 2022
number 1 3 4 4 4

Updating Year in One Column Based on Month of Another Column that accounts for New Year

I have created this DataFrame:
agency coupon vintage Cbal Month CPR year month Month_Predicted_DT
0 FHLG 1.5 2021 70.090310 November 5.418937 2022 11 2022-11-01
1 FHLG 1.5 2021 70.090310 December 5.549916 2022 12 2022-12-01
2 FHLG 1.5 2021 70.090310 January 5.238943 2022 1 2022-01-01
3 FHLG 1.5 2020 52.414637 November 5.514456 2022 11 2022-11-01
4 FHLG 1.5 2020 52.414637 December 5.550490 2022 12 2022-12-01
5 FHLG 1.5 2020 52.414637 January 5.182304 2022 1 2022-01-01
Created from this original df:
agency coupon year Cbal November December January
0 FHLG 1.5 2021 70.090310 5.418937 5.549916 5.238943
1 FHLG 1.5 2020 52.414637 5.514456 5.550490 5.182304
2 FHLG 2.0 2022 44.598755 3.346706 3.715995 3.902644
3 FHLG 2.0 2021 472.209165 5.802857 5.899596 5.627774
4 FHLG 2.0 2020 269.761452 7.090993 7.091404 6.567561
Using this code:
citi = pd.read_excel("Downloads/CITI_2022_05_22(5_22).xlsx")
#Extracting just the relevant months (M, M+1, M+2)
M = citi.columns[-6]
M_1 = citi.columns[-4]
M_2 = citi.columns[-2]
#Extracting just the relevant columns
cols = ['agency-term','coupon','year','Cbal',M,M_1,M_2]
citi = citi[cols]
todays_date = date.today()
current_year = todays_date.year
citi_new['year'] = current_year
citi_new['month'] = pd.to_datetime(citi_new.Month, format="%B").dt.month
citi_new['Month_Predicted_DT'] = pd.to_datetime(citi_new[['year', 'month']].assign(DAY=1))
citi_new = citi.set_index(cols[0:4]).stack().reset_index()
citi_new.rename(columns={"level_4": "Month", 0 : "CPR", "year" : "vintage"}, inplace = True)
For reference M is the current month, and M_1 and M_2 are month+1 and month+2.
My main question is that my solution for creating the 'Month_Predicted_DT column only works if the months in question do not overlap with the new year, so if M == November or M == December, then the year in Month_Predicted_DT is not correct for January and/or February. For example, Month_Predicted_DT for January rows should be 2023-01-01 not 2022. The same would be true if M was December, then I would want rows for Jan. and Feb. to be 2023-01-01 and 2023-02-01, respectively.
I have tried to come up with a workaround using df.iterrows or np.where but just can't really get a working solution.
You could try adding 12 months to dates that are over two months out:
#get first day of the current month
start = pd.Timestamp.today().normalize().replace(day=1)
#convert month column to timestamps
dates = pd.to_datetime(df["Month"]+f"{start.year}", format="%B%Y")
#offset the year if the date is not in the next 3 months
df["Month_Predicted_DT"] = dates.where(dates>=start,dates+pd.DateOffset(months=12))

Calculating calendar weeks from fiscal weeks

So I am really new to this and struggling with something, which I feel should be quite simple.
I have a Pandas Dataframe containing two columns: Fiscal Week (str) and Amount sold (int).
Fiscal Week
Amount sold
0
2019031
24
1
2019041
47
2
2019221
34
3
2019231
46
4
2019241
35
My problem is the fiscal week column. It contains strings which describe the fiscal year and week . The fiscal year for this purpose starts on October 1st and ends on September 30th. So basically, 2019031 is the Monday (the 1 at the end) of the third week of October 2019. And 2019221 would be the 2nd week of March 2020.
The issue is that I want to turn this data into timeseries later. But I can't do that with the data in string format - I need it to be in date time format.
I actually added the 1s at the end of all these strings using
df['Fiscal Week']= df['Fiscal Week'].map('{}1'.format)
so that I can then turn it into a proper date:
df['Fiscal Week'] = pd.to_datetime(df['Fiscal Week'], format="%Y%W%w")
as I couldn't figure out how to do it with just the weeks and no day defined.
This, of course, returns the following:
Fiscal Week
Amount sold
0
2019-01-21
24
1
2019-01-28
47
2
2019-06-03
34
3
2019-06-10
46
4
2019-06-17
35
As expected, this is clearly not what I need, as according to the definition of the fiscal year week 1 is not January at all but rather October.
Is there some simple solution to get the dates to what they are actually supposed to be?
Ideally I would like the final format to be e.g. 2019-03 for the first entry. So basically exactly like the string but in some kind of date format, that I can then work with later on. Alternatively, calendar weeks would also be fine.
Assuming you have a data frame with fiscal dates of the form 'YYYYWW' where YYY = the calendar year of the start of the fiscal year and ww = the number of weeks into the year, you can convert to calendar dates as follows:
def getCalendarDate(fy_date: str):
f_year = fy_date[0:4]
f_week = fy_date[4:]
fys = pd.to_datetime(f'{f_year}/10/01', format= '%Y/%m/%d')
return fys + pd.to_timedelta(int(f_week), "W")
You can then use this function to create the column of calendar dates as follows:
df['Calendar Date]'] = list(getCalendarDate(x) for x in df['Fiscal Week'].to_list())

Sort pandas dataframe rows according to a string value column

I have the following dataframe :
month price
0 April 102.478015
1 August 94.868053
2 December 97.278205
3 February 100.114510
4 January 99.419109
5 July 93.402928
6 June 96.114224
7 March 101.297762
8 May 102.905340
9 November 97.952169
10 October 95.606478
11 September 94.226803
I would like to have the months in a coherent order (January in the first row until December in the 12th row). How please could I do ?
If necessary, you can copy this dataframe and then execute
pd.read_clipboard(sep='\s\s+')
to have the dataframe on your jupyter notebook
Convert values to ordered categoricals, so possible use DataFrame.sort_values:
cats = ['January','February','March','April','May','June',
'July','August','September','October','November','December']
df['month'] = pd.CategoricalIndex(df['month'], ordered=True, categories=cats)
#alternative
#df['month'] = pd.Categorical(df['month'], ordered=True, categories=cats)
df = df.sort_values('month')
print (df)
month price
4 January 99.419109
3 February 100.114510
7 March 101.297762
0 April 102.478015
8 May 102.905340
6 June 96.114224
5 July 93.402928
1 August 94.868053
11 September 94.226803
10 October 95.606478
9 November 97.952169
2 December 97.278205

Look Up in Python DataFrames?

I have a dataframe df1:
Month
1
3
March
April
2
4
5
I have another dataframe df2:
Month Name
1 January
2 February
3 March
4 April
5 May
If I want to replace the integer values of df1 with the corresponding name from df2, what kind of lookup function can I use?
I want to end up with this as my df1:
Month
January
March
March
April
February
May
replace it
df1.replace(dict(zip(df2.Month.astype(str),df2.Name)))
Out[76]:
Month
0 January
1 March
2 March
3 April
4 February
5 April
6 May
You can use pd.Series.map and then fillna. Just be careful to map either strings to strings or, as here, numeric to numeric:
month_name = df2.set_index('Month')['Name']
df1['Month'] = pd.to_numeric(df1['Month'], errors='coerce').map(month_name)\
.fillna(df1['Month'])
print(df1)
Month
0 January
1 March
2 March
3 April
4 February
5 April
6 May
You can also use pd.Series.replace, but this is often inefficient.
One alternative is to use map with a function:
def repl(x, lookup=dict(zip(df2.Month.astype(str), df2.Name))):
return lookup.get(x, x)
df['Month'] = df['Month'].map(repl)
print(df)
Output
Month
0 January
1 February
2 March
3 April
4 May
Use map with a series, just need to make sure your dtypes match:
mapper = df2.set_index(df2['Month'].astype(str))['Name']
df1['Month'].map(mapper).fillna(df1['Month'])
Output:
0 January
1 March
2 March
3 April
4 February
5 April
6 May
Name: Month, dtype: object

Categories