Trying to convert calendar year to financial. I have a dataframe as below. Each ID will have multiple records. And the records might have missing months like 3rd row 3 month is missing
df:
ID Callender Date
1 01-01-2022
1 01-02-2022
1 01-04-2022
1 01-05-2022
1 01-05-2022
2 01-01-2022
2 01-07-2023
Expected output:
As the financial year starts form July to June
eg: FY 2022 means:
i.e.
July -2021 - This is 1st month in the financial year,
August- 2021 - This is 2nd month in the financial year
Sep -2021 - This is 3rd month in the financial year
Oct -2021 - This is 4th month in the financial year
Nov 2021 - - This is 5th month in the financial year
Dec 2021- - This is 6th month in the financial year
jan 2022- This is 7th month in the financial year
feb 2022- This is 8th month in the financial year
March 2022- This is 9th month in the financial year
April 2022- This is 10th month in the financial year
May 2022- This is 11th month in the financial year
June 2022- This is 12th month in the financial year`
Expected output: Convert Callender year to financial year:
ID Callender_Date Financial_Year Fiscal_Month
1 01-01-2022 2022 7
1 01-02-2022 2022 8
1 01-04-2022 2022 10
1 01-05-2022 2022 11
1 01-06-2022 2022 12
2 01-01-2021 2021 7
2 01-07-2021 2022 1`
Tried with below code- found in some other question
df['Callender_Date '] = df['Callender_Date '].asfreq('J-July') - 1
Try:
# convert the column to datetime (if not already):
df['Callender_Date'] = pd.to_datetime(df['Callender_Date'], dayfirst=True)
df['Financial_Year'] =df['Callender_Date'].dt.to_period('Q-JUL').dt.qyear
df['Fiscal_Month'] = (df['Callender_Date'] + pd.DateOffset(months=6)).dt.month
print(df)
Prints:
ID Callender_Date Financial_Year Fiscal_Month
0 1 2022-01-01 2022 7
1 1 2022-02-01 2022 8
2 1 2022-04-01 2022 10
3 1 2022-05-01 2022 11
4 1 2022-06-01 2022 12
5 2 2021-01-01 2021 7
6 2 2021-07-01 2021 1
Related
I have a table
Year
Week
Sales
2021
47
56
2021
48
5
2021
49
4
2021
50
6
2021
51
7
2021
52
10
2022
1
2
2021
2
3
I want to get all data from 2021 year 49 week. However if I make the following slice:
table[(table.Year >= 2021) & (table.Week >= 49)]
I get the data for every week that >= 49 for every year starting 2021. How to conclude into the slice weeks 1-48 year 2022 without creating new column? I mean how to get all data from the table starting from year 2021 week 49 (2021: week 49-52, 2022: week 1-52, 2023: week 1-52 etc.)
You're missing an OR. IIUC, you want data beyond Week 49 of 2021. In logical expression that can be written as the year is greater than 2021 OR the year is 2021, but the week is greater than 49:
out = table[((table.Year == 2021) & (table.Week >= 49)) | (table.Year > 2021)]
Output:
Year Week Sales
2 2021 49 4
3 2021 50 6
4 2021 51 7
5 2021 52 10
6 2022 1 2
My dataframe contains zipcodes, months and the number of purchases up until that month.
However, some months are missing for some zipcodes. As you can see in the example below, the months March and April are not recorded for zipcode '2400'.
Zipcode Date Cumulative purchases
0 9999 December 2018 2
1 9999 January 2019 2
2 9999 February 2019 2
3 9999 March 2019 3
4 9999 April 2019 4
5 2400 December 2018 2
6 2400 January 2019 3
7 2400 February 2019 4
etc
I would like to add these month records, by repeating the cumulative purchases
Ideally it would look like this:
Zipcode Date Cumulative purchases
0 9999 December 2018 2
1 9999 January 2019 2
2 9999 February 2019 2
3 9999 March 2019 3
4 9999 April 2019 4
5 2400 December 2018 2
6 2400 January 2019 3
7 2400 February 2019 4
8 2400 March 2019 4
9 2400 April 2019 4
etc
You could use the complete function from pyjanitor to expose the missing values :
# pip install pyjanitor
import pandas as pd
import janitor as jn
df.complete('Zipcode', ('Date', 'Cumulative')).ffill()
Zipcode Date Cumulative purchases
0 9999 December 2018 2.0
1 9999 January 2019 2.0
2 9999 February 2019 2.0
3 9999 March 2019 3.0
4 9999 April 2019 4.0
5 2400 December 2018 2.0
6 2400 January 2019 3.0
7 2400 February 2019 4.0
8 2400 March 2019 4.0
9 2400 April 2019 4.0
Here is a bit changed previous answer with removed reset_index, reshape by Series.unstack and added missing datetimes up to until in DataFrame.reindex, forward filling missing values and reshape by DataFrame.stack :
df['Date'] = pd.to_datetime(df['Date'])
df = (df.set_index('Date')
.groupby('Zipcode', sort=False)
.resample('MS')['Purchase'].sum()
.groupby(level=0)
.cumsum()
.unstack()
)
until = pd.to_datetime('2019-04')
df = (df.reindex(pd.date_range(df.columns.min(), until, freq='MS', name='Date'), axis=1)
.ffill(axis=1)
.stack()
.astype(int)
.reset_index(name='Cumulative purchases'))
df['Date'] = df['Date'].dt.strftime('%B %Y')
print (df)
Zipcode Date Cumulative purchases
0 9999 December 2018 2
1 9999 January 2019 2
2 9999 February 2019 2
3 9999 March 2019 3
4 9999 April 2019 4
5 2400 December 2018 2
6 2400 January 2019 3
7 2400 February 2019 4
8 2400 March 2019 4
9 2400 April 2019 4
I have a dataset similar to the following:
SNAPSHOT_DATE DEPLOYMENT_TYPE FORECAST_YEAR TOTAL_WIDGETS
1/1/20 1 2020 206457
1/1/20 1 2021 70571
1/1/20 1 2022 46918
1/1/20 1 2023 36492
1/1/20 1 2024 0
1/1/20 1 2025 0
2/1/20 1 2020 207177
2/1/20 1 2021 71947
2/1/20 1 2022 46918
2/1/20 1 2023 36492
2/1/20 1 2024 0
2/1/20 1 2025 0
3/1/20 1 2020 242758
3/1/20 1 2021 102739
3/1/20 1 2022 43174
3/1/20 1 2023 32956
3/1/20 1 2024 0
3/1/20 1 2025 0
1/1/20 2 2020 286616
1/1/20 2 2021 134276
1/1/20 2 2022 87674
1/1/20 2 2023 240
1/1/20 2 2024 0
1/1/20 2 2025 0
2/1/20 2 2020 308145
2/1/20 2 2021 132996
2/1/20 2 2022 87674
2/1/20 2 2023 240
2/1/20 2 2024 0
2/1/20 2 2025 0
3/1/20 2 2020 218761
3/1/20 2 2021 178594
3/1/20 2 2022 87674
3/1/20 2 2023 240
3/1/20 2 2024 0
3/1/20 2 2025 0
I want to be able to plot for each deployment type, Total Widgets on the y axis and the months (Jan 1 '20 - Dec 1 '20) on the x axis then include a separate line in the plot for each forecasted year 2020-2025. How can I best accomplish this? my first thought was to filter each deployment type based on Date range and forecasted year like this:
forecastchanges_widgets2020 = data.loc[((data['DEPLOYMENT_TYPE'] =='1') & (data['Date'] >= '2020-01-01') & (data['Date'] <= '2020-12-01')) & (data['FORECAST_YEAR'] =='2020')]
and plot each line, but that would mean I would need to repeat that for each year contained within each deployment type. There must be a better way to achieve the desired plot?
This question / answers does not match my requirements, because I need to separate out each deployment type into its own plot and then plot the 'total_widgets' value for each year across the month dates on the x axis
For this case, sns.relplot will work
seaborn is a high-level API for matplotlib.
Given your dataframe data
data only contains information where the 'SNAPSHOT' year is 2020, however, for the full dataset, there will be a row of plots for each year in 'Snapshot_Year'.
Since the x-axis will be different for each row of plots, facet_kws={'sharex': False}) is used, so xlim can scale based on the date range for the year.
import pandas as pd
import seaborn as sns
# convert SNAPSHOT_DATE to a datetime dtype
data.SNAPSHOT_DATE = pd.to_datetime(data.SNAPSHOT_DATE)
# add the snapshot year as a new column
data.insert(1, 'Snapshot_Year', data.SNAPSHOT_DATE.dt.year)
# plot the data
g = sns.relplot(data=data, col='DEPLOYMENT_TYPE', row='Snapshot_Year', x='SNAPSHOT_DATE', y='TOTAL_WIDGETS',
hue='FORECAST_YEAR', kind='line', facet_kws={'sharex': False})
g.set_xticklabels(rotation=90)
plt.tight_layout()
I want to remove a certain keywords or string in a column from pandas dataframe.
The dataframe df looks like this:
YEAR WEEK
2019 WK-01
2019 WK-02
2019 WK-03
2019 WK-14
2019 WK-25
2020 WK-06
2020 WK-07
I would like to remove WK-and 0 from the WEEK column so that my output will looks like this:
YEAR WEEK
2019 1
2019 2
2019 3
2019 14
2019 25
2020 6
2020 7
You can try:
df['WEEK'] = df['WEEK'].str.extract('(\d*)$').astype(int)
Output:
YEAR WEEK
0 2019 1
1 2019 2
2 2019 3
3 2019 14
4 2019 25
5 2020 6
6 2020 7
Shave off the first three characters and convert to int.
df['WEEK'] = df['WEEK'].str[3:].astype(int)
I have a dataset that looks like below:
month year value
1 2019 20
2 2019 13
3 2019 10
4 2019 20
5 2019 13
6 2019 10
7 2019 20
8 2019 13
9 2019 10
10 2019 20
11 2019 13
12 2019 10
1 2020 20
2 2020 13
3 2020 10
4 2020 40
Please assume that each month and year occurs multiple times and also there are much more columns. What I wanted to create is multiple dataframes in a 6 months window. I dont want to have aggregations.
The partitioned dataset should include data in the below criteria. Please help me with pandas. I know the naive way is to manually use conditions to select the dataframe. But I guess there will be more effective way in doing this operations at one go.
month 1-6 year 2019
month 2-7 year 2019
month 3-8 year 2019
month 4-9 year 2019
month 5-10 year 2019
month 6-11 year 2019
month 7-12 year 2019
month 8-1 year 2019,2020
month 9-2 year 2019,2020
month 10-3 year 2019,2020
month 11-3 year 2019,2020
What I have tried so far:
for i, j in zip(range(1,12), range(6,13)):
print(i,j) # this is for 2019
I can take this i and j and plug it in months and repeat the same for 2020 as well. But there will be a better way where it would be easy to create a list of dataframes.
With a datetime index and pd.Grouper, you can proceed as follows
df = pd.DataFrame(np.random.randn(12,3),
index = pd.date_range(pd.Timestamp.now(), periods = 12),
)
df_grouped = df.groupby(pd.Grouper(freq = "6M"))
[df_grouped.get_group(x) for x in df_grouped.groups]