I have a dataset that looks like below:
month year value
1 2019 20
2 2019 13
3 2019 10
4 2019 20
5 2019 13
6 2019 10
7 2019 20
8 2019 13
9 2019 10
10 2019 20
11 2019 13
12 2019 10
1 2020 20
2 2020 13
3 2020 10
4 2020 40
Please assume that each month and year occurs multiple times and also there are much more columns. What I wanted to create is multiple dataframes in a 6 months window. I dont want to have aggregations.
The partitioned dataset should include data in the below criteria. Please help me with pandas. I know the naive way is to manually use conditions to select the dataframe. But I guess there will be more effective way in doing this operations at one go.
month 1-6 year 2019
month 2-7 year 2019
month 3-8 year 2019
month 4-9 year 2019
month 5-10 year 2019
month 6-11 year 2019
month 7-12 year 2019
month 8-1 year 2019,2020
month 9-2 year 2019,2020
month 10-3 year 2019,2020
month 11-3 year 2019,2020
What I have tried so far:
for i, j in zip(range(1,12), range(6,13)):
print(i,j) # this is for 2019
I can take this i and j and plug it in months and repeat the same for 2020 as well. But there will be a better way where it would be easy to create a list of dataframes.
With a datetime index and pd.Grouper, you can proceed as follows
df = pd.DataFrame(np.random.randn(12,3),
index = pd.date_range(pd.Timestamp.now(), periods = 12),
)
df_grouped = df.groupby(pd.Grouper(freq = "6M"))
[df_grouped.get_group(x) for x in df_grouped.groups]
Related
Trying to convert calendar year to financial. I have a dataframe as below. Each ID will have multiple records. And the records might have missing months like 3rd row 3 month is missing
df:
ID Callender Date
1 01-01-2022
1 01-02-2022
1 01-04-2022
1 01-05-2022
1 01-05-2022
2 01-01-2022
2 01-07-2023
Expected output:
As the financial year starts form July to June
eg: FY 2022 means:
i.e.
July -2021 - This is 1st month in the financial year,
August- 2021 - This is 2nd month in the financial year
Sep -2021 - This is 3rd month in the financial year
Oct -2021 - This is 4th month in the financial year
Nov 2021 - - This is 5th month in the financial year
Dec 2021- - This is 6th month in the financial year
jan 2022- This is 7th month in the financial year
feb 2022- This is 8th month in the financial year
March 2022- This is 9th month in the financial year
April 2022- This is 10th month in the financial year
May 2022- This is 11th month in the financial year
June 2022- This is 12th month in the financial year`
Expected output: Convert Callender year to financial year:
ID Callender_Date Financial_Year Fiscal_Month
1 01-01-2022 2022 7
1 01-02-2022 2022 8
1 01-04-2022 2022 10
1 01-05-2022 2022 11
1 01-06-2022 2022 12
2 01-01-2021 2021 7
2 01-07-2021 2022 1`
Tried with below code- found in some other question
df['Callender_Date '] = df['Callender_Date '].asfreq('J-July') - 1
Try:
# convert the column to datetime (if not already):
df['Callender_Date'] = pd.to_datetime(df['Callender_Date'], dayfirst=True)
df['Financial_Year'] =df['Callender_Date'].dt.to_period('Q-JUL').dt.qyear
df['Fiscal_Month'] = (df['Callender_Date'] + pd.DateOffset(months=6)).dt.month
print(df)
Prints:
ID Callender_Date Financial_Year Fiscal_Month
0 1 2022-01-01 2022 7
1 1 2022-02-01 2022 8
2 1 2022-04-01 2022 10
3 1 2022-05-01 2022 11
4 1 2022-06-01 2022 12
5 2 2021-01-01 2021 7
6 2 2021-07-01 2021 1
I have a table
Year
Week
Sales
2021
47
56
2021
48
5
2021
49
4
2021
50
6
2021
51
7
2021
52
10
2022
1
2
2021
2
3
I want to get all data from 2021 year 49 week. However if I make the following slice:
table[(table.Year >= 2021) & (table.Week >= 49)]
I get the data for every week that >= 49 for every year starting 2021. How to conclude into the slice weeks 1-48 year 2022 without creating new column? I mean how to get all data from the table starting from year 2021 week 49 (2021: week 49-52, 2022: week 1-52, 2023: week 1-52 etc.)
You're missing an OR. IIUC, you want data beyond Week 49 of 2021. In logical expression that can be written as the year is greater than 2021 OR the year is 2021, but the week is greater than 49:
out = table[((table.Year == 2021) & (table.Week >= 49)) | (table.Year > 2021)]
Output:
Year Week Sales
2 2021 49 4
3 2021 50 6
4 2021 51 7
5 2021 52 10
6 2022 1 2
I have a table from different companies' sales.
company_name sales year
A 200 2019
A 100 2018
A 30 2017
B 15 2019
B 30 2018
B 45 2017
Now, I want to add a previous year's sales in the same row just like
company_name sales year previous_sales
A 200 2019 100
A 100 2018 30
A 30 2017 Nan
B 15 2019 30
B 30 2018 45
B 45 2017 Nan
I tried to use the code like this, but I failed to get the right result
df["previous_sales"] = df.groupby(['company_name', 'year'])['sales'].shift()
I want to remove a certain keywords or string in a column from pandas dataframe.
The dataframe df looks like this:
YEAR WEEK
2019 WK-01
2019 WK-02
2019 WK-03
2019 WK-14
2019 WK-25
2020 WK-06
2020 WK-07
I would like to remove WK-and 0 from the WEEK column so that my output will looks like this:
YEAR WEEK
2019 1
2019 2
2019 3
2019 14
2019 25
2020 6
2020 7
You can try:
df['WEEK'] = df['WEEK'].str.extract('(\d*)$').astype(int)
Output:
YEAR WEEK
0 2019 1
1 2019 2
2 2019 3
3 2019 14
4 2019 25
5 2020 6
6 2020 7
Shave off the first three characters and convert to int.
df['WEEK'] = df['WEEK'].str[3:].astype(int)
I have a pandas dataframe like the following:
Customer Id year
0 1510220024 2017
1 1510270013 2017
2 1511160047 2017
3 1512100014 2017
4 1603180006 2017
5 1605030030 2017
6 1605160013 2017
7 1606060008 2017
8 1510220024 2018
9 1606270014 2017
10 1608080011 2017
11 1608090002 2017
12 1511160047 2018
13 1606270014 2018
And I want to build the following matrix from the above dataframe:
2017 2018
2017 11 3
2018 3 3
This matrix tells that there were total 11 customers in year 2017 and three of them also appeared in 2018 and so on. In actual, I have 7 years of data so it would be 7x7 matrix. I am struggling for a while now but can't get this right.
merge + crosstab:
m = df.merge(df, left_on='Customer Id', right_on='Customer Id')
pd.crosstab(m.year_x, m.year_y)
year_y 2017 2018
year_x
2017 11 3
2018 3 3