I have a table
Year
Week
Sales
2021
47
56
2021
48
5
2021
49
4
2021
50
6
2021
51
7
2021
52
10
2022
1
2
2021
2
3
I want to get all data from 2021 year 49 week. However if I make the following slice:
table[(table.Year >= 2021) & (table.Week >= 49)]
I get the data for every week that >= 49 for every year starting 2021. How to conclude into the slice weeks 1-48 year 2022 without creating new column? I mean how to get all data from the table starting from year 2021 week 49 (2021: week 49-52, 2022: week 1-52, 2023: week 1-52 etc.)
You're missing an OR. IIUC, you want data beyond Week 49 of 2021. In logical expression that can be written as the year is greater than 2021 OR the year is 2021, but the week is greater than 49:
out = table[((table.Year == 2021) & (table.Week >= 49)) | (table.Year > 2021)]
Output:
Year Week Sales
2 2021 49 4
3 2021 50 6
4 2021 51 7
5 2021 52 10
6 2022 1 2
Related
Trying to convert calendar year to financial. I have a dataframe as below. Each ID will have multiple records. And the records might have missing months like 3rd row 3 month is missing
df:
ID Callender Date
1 01-01-2022
1 01-02-2022
1 01-04-2022
1 01-05-2022
1 01-05-2022
2 01-01-2022
2 01-07-2023
Expected output:
As the financial year starts form July to June
eg: FY 2022 means:
i.e.
July -2021 - This is 1st month in the financial year,
August- 2021 - This is 2nd month in the financial year
Sep -2021 - This is 3rd month in the financial year
Oct -2021 - This is 4th month in the financial year
Nov 2021 - - This is 5th month in the financial year
Dec 2021- - This is 6th month in the financial year
jan 2022- This is 7th month in the financial year
feb 2022- This is 8th month in the financial year
March 2022- This is 9th month in the financial year
April 2022- This is 10th month in the financial year
May 2022- This is 11th month in the financial year
June 2022- This is 12th month in the financial year`
Expected output: Convert Callender year to financial year:
ID Callender_Date Financial_Year Fiscal_Month
1 01-01-2022 2022 7
1 01-02-2022 2022 8
1 01-04-2022 2022 10
1 01-05-2022 2022 11
1 01-06-2022 2022 12
2 01-01-2021 2021 7
2 01-07-2021 2022 1`
Tried with below code- found in some other question
df['Callender_Date '] = df['Callender_Date '].asfreq('J-July') - 1
Try:
# convert the column to datetime (if not already):
df['Callender_Date'] = pd.to_datetime(df['Callender_Date'], dayfirst=True)
df['Financial_Year'] =df['Callender_Date'].dt.to_period('Q-JUL').dt.qyear
df['Fiscal_Month'] = (df['Callender_Date'] + pd.DateOffset(months=6)).dt.month
print(df)
Prints:
ID Callender_Date Financial_Year Fiscal_Month
0 1 2022-01-01 2022 7
1 1 2022-02-01 2022 8
2 1 2022-04-01 2022 10
3 1 2022-05-01 2022 11
4 1 2022-06-01 2022 12
5 2 2021-01-01 2021 7
6 2 2021-07-01 2021 1
I have a table from different companies' sales.
company_name sales year
A 200 2019
A 100 2018
A 30 2017
B 15 2019
B 30 2018
B 45 2017
Now, I want to add a previous year's sales in the same row just like
company_name sales year previous_sales
A 200 2019 100
A 100 2018 30
A 30 2017 Nan
B 15 2019 30
B 30 2018 45
B 45 2017 Nan
I tried to use the code like this, but I failed to get the right result
df["previous_sales"] = df.groupby(['company_name', 'year'])['sales'].shift()
I have a dataframe which looks like this:
0 1 2
0 April 0.002745 ADANIPORTS.NS
1 July 0.005239 ASIANPAINT.NS
2 April 0.003347 AXISBANK.NS
3 April 0.004469 BAJAJ-AUTO.NS
4 June 0.006045 BAJFINANCE.NS
5 June 0.005176 BAJAJFINSV.NS
6 April 0.003321 BHARTIARTL.NS
7 November 0.003469 INFRATEL.NS
8 April 0.002667 BPCL.NS
9 April 0.003864 BRITANNIA.NS
10 April 0.005570 CIPLA.NS
11 October 0.000925 COALINDIA.NS
12 April 0.003666 DRREDDY.NS
13 April 0.002836 EICHERMOT.NS
14 April 0.003793 GAIL.NS
15 April 0.003850 GRASIM.NS
16 April 0.002858 HCLTECH.NS
17 December 0.005666 HDFC.NS
18 April 0.003484 HDFCBANK.NS
19 April 0.004173 HEROMOTOCO.NS
20 April 0.006395 HINDALCO.NS
21 June 0.001844 HINDUNILVR.NS
22 October 0.004620 ICICIBANK.NS
23 April 0.004020 INDUSINDBK.NS
24 January 0.002496 INFY.NS
25 September 0.001835 IOC.NS
26 May 0.002290 ITC.NS
27 April 0.005910 JSWSTEEL.NS
28 April 0.003570 KOTAKBANK.NS
29 May 0.003346 LT.NS
30 April 0.006131 M&M.NS
31 April 0.003912 MARUTI.NS
32 March 0.003596 NESTLEIND.NS
33 April 0.002180 NTPC.NS
34 April 0.003209 ONGC.NS
35 June 0.001796 POWERGRID.NS
36 April 0.004182 RELIANCE.NS
37 April 0.004246 SHREECEM.NS
38 October 0.004836 SBIN.NS
39 April 0.002596 SUNPHARMA.NS
40 April 0.004235 TCS.NS
41 April 0.006729 TATAMOTORS.NS
42 October 0.003395 TATASTEEL.NS
43 August 0.002440 TECHM.NS
44 June 0.003481 TITAN.NS
45 April 0.003749 ULTRACEMCO.NS
46 April 0.005854 UPL.NS
47 April 0.004991 VEDL.NS
48 July 0.001627 WIPRO.NS
49 April 0.003728 ZEEL.NS
how can i create a multiindex dataframe which would groupby in column 0. When i do:
new.groupby([0])
Out[315]: <pandas.core.groupby.generic.DataFrameGroupBy object at 0x0A938BB0>
I am not able to group all the months together.
How to groupby and create a multiindex dataframe
Based on your info, I'd suggest the following:
#rename columns to make useful
new = new.rename(columns={0:'Month',1:'Price', 2:'Ticker'})
new.groupby(['Month','Ticker'])['Price'].sum()
To note - you should change change the 'Month' to a datetime or else the order will be illogical.
Also, the documentation is quite strong for pandas.
I want to remove a certain keywords or string in a column from pandas dataframe.
The dataframe df looks like this:
YEAR WEEK
2019 WK-01
2019 WK-02
2019 WK-03
2019 WK-14
2019 WK-25
2020 WK-06
2020 WK-07
I would like to remove WK-and 0 from the WEEK column so that my output will looks like this:
YEAR WEEK
2019 1
2019 2
2019 3
2019 14
2019 25
2020 6
2020 7
You can try:
df['WEEK'] = df['WEEK'].str.extract('(\d*)$').astype(int)
Output:
YEAR WEEK
0 2019 1
1 2019 2
2 2019 3
3 2019 14
4 2019 25
5 2020 6
6 2020 7
Shave off the first three characters and convert to int.
df['WEEK'] = df['WEEK'].str[3:].astype(int)
I have a dataset that looks like below:
month year value
1 2019 20
2 2019 13
3 2019 10
4 2019 20
5 2019 13
6 2019 10
7 2019 20
8 2019 13
9 2019 10
10 2019 20
11 2019 13
12 2019 10
1 2020 20
2 2020 13
3 2020 10
4 2020 40
Please assume that each month and year occurs multiple times and also there are much more columns. What I wanted to create is multiple dataframes in a 6 months window. I dont want to have aggregations.
The partitioned dataset should include data in the below criteria. Please help me with pandas. I know the naive way is to manually use conditions to select the dataframe. But I guess there will be more effective way in doing this operations at one go.
month 1-6 year 2019
month 2-7 year 2019
month 3-8 year 2019
month 4-9 year 2019
month 5-10 year 2019
month 6-11 year 2019
month 7-12 year 2019
month 8-1 year 2019,2020
month 9-2 year 2019,2020
month 10-3 year 2019,2020
month 11-3 year 2019,2020
What I have tried so far:
for i, j in zip(range(1,12), range(6,13)):
print(i,j) # this is for 2019
I can take this i and j and plug it in months and repeat the same for 2020 as well. But there will be a better way where it would be easy to create a list of dataframes.
With a datetime index and pd.Grouper, you can proceed as follows
df = pd.DataFrame(np.random.randn(12,3),
index = pd.date_range(pd.Timestamp.now(), periods = 12),
)
df_grouped = df.groupby(pd.Grouper(freq = "6M"))
[df_grouped.get_group(x) for x in df_grouped.groups]