I have a table that is updated manually every week using excel. I would like to automate this process using python/pandas. I want to update report week number(This number indicates how many times we have reported on that month so far for a given quarter) based on financial week and month. Obviously we are now in September but I will show you the first week to give you an idea of how its updated. The first week for 2021 would start on 01/04/2021 (First Monday of the year) and end on 12/27/2021 (Last Monday of the year).
This script is to be run weekly so next time it is run 01/04/201 --> 01/11/2021, 1 week is added & the "Report Week" should update by 1 too, unless report week is greater than 13. If "Report Week" is greater than 13 than we stop updating that month and add the next month. So in this case we drop December and start reporting March and its report Week becomes 1, as this is the first month we are reporting on it.
Month
Finance Week
Report Week
December
01/04/2021
13
January
01/04/2021
8
February
01/04/2021
4
January
01/11/2021
9
February
01/11/2021
5
March
01/11/2021
1
When January hits Report week 13 we will stop updating that month and move onto April and give it a value of 1 for tis Report Week and so on for every month.
I am not sure what is the best way to go about this. I read here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html that one should not update when iterating a df so I'm not sure what to do.
Related
In Impala - I am trying to get the all week start dates and week end dates between 08/01/2021 (Aug 1st 2021)- 12/31/2022 (December 31st 2021)
can anyone help with this?
So I am really new to this and struggling with something, which I feel should be quite simple.
I have a Pandas Dataframe containing two columns: Fiscal Week (str) and Amount sold (int).
Fiscal Week
Amount sold
0
2019031
24
1
2019041
47
2
2019221
34
3
2019231
46
4
2019241
35
My problem is the fiscal week column. It contains strings which describe the fiscal year and week . The fiscal year for this purpose starts on October 1st and ends on September 30th. So basically, 2019031 is the Monday (the 1 at the end) of the third week of October 2019. And 2019221 would be the 2nd week of March 2020.
The issue is that I want to turn this data into timeseries later. But I can't do that with the data in string format - I need it to be in date time format.
I actually added the 1s at the end of all these strings using
df['Fiscal Week']= df['Fiscal Week'].map('{}1'.format)
so that I can then turn it into a proper date:
df['Fiscal Week'] = pd.to_datetime(df['Fiscal Week'], format="%Y%W%w")
as I couldn't figure out how to do it with just the weeks and no day defined.
This, of course, returns the following:
Fiscal Week
Amount sold
0
2019-01-21
24
1
2019-01-28
47
2
2019-06-03
34
3
2019-06-10
46
4
2019-06-17
35
As expected, this is clearly not what I need, as according to the definition of the fiscal year week 1 is not January at all but rather October.
Is there some simple solution to get the dates to what they are actually supposed to be?
Ideally I would like the final format to be e.g. 2019-03 for the first entry. So basically exactly like the string but in some kind of date format, that I can then work with later on. Alternatively, calendar weeks would also be fine.
Assuming you have a data frame with fiscal dates of the form 'YYYYWW' where YYY = the calendar year of the start of the fiscal year and ww = the number of weeks into the year, you can convert to calendar dates as follows:
def getCalendarDate(fy_date: str):
f_year = fy_date[0:4]
f_week = fy_date[4:]
fys = pd.to_datetime(f'{f_year}/10/01', format= '%Y/%m/%d')
return fys + pd.to_timedelta(int(f_week), "W")
You can then use this function to create the column of calendar dates as follows:
df['Calendar Date]'] = list(getCalendarDate(x) for x in df['Fiscal Week'].to_list())
Lets say we have a date as 2019-11-19 which is Tuesday. Now I want to get the 3 business days back from Tuesday i.e. I want to get 2019-11-15 as 16th and 17th are Saturday and Sunday respectively. To achieve this I have the following code:
dt_tue = datetime.strptime('2019-11-19','%Y-%m-%d')
bd-3 = dt_tue - timedelta(days=3) #<--- 3 business days prior
for i in range(bd_3.day,dt_tue.day+1):
dt_in = datetime(dt_tue.year,dt_tue.month,i)
if dt_in.weekday() > 5:
bd_3 = dt_tue - timedelta(4)
The above code generates bd_3 as 15th Nov 2019 which is Friday, and this is correct.
I want to handle a holiday (as provided in dataframe) in the above code. So for example, if dt_in falls on any holiday (including bd_3 and the dt_tue), then the bd_3 should be 14th Nov. Except that Holiday falls on Saturday or Sunday then bd_3 should be 15th Nov only.
Can any body please throw some light? Assume holiday Dataframe looks like below:
Date Holiday_name Day
January 1, 2019 New Year's Day Tuesday
January 21, 2019 Martin Luther King Day Monday
February 18, 2019 Presidents' Day* Monday
May 27, 2019 Memorial Day Monday
Since you're busy looping over all the days anyway, I suggest just doing a simple back-up and check each day as you go, something like:
dt_tue = datetime.strptime('2019-11-19','%Y-%m-%d')
current_day = dt_tue
days_before = 0
while days_before < 2:
# Skip weekends and holiday (without counting as a business day)
while current_day.weekday() >= 5 or current_day in df['Date']:
current_day -= timedelta(days=1)
# Step back a business day
current_day -= timedelta(days=1)
days_before += 1
bd_3 = current_day
You may need to tweak that a bit as I'm not 100% sure how your holidays dataframe is formatted.
I have a dataframe that contains 11 years' worth of max and min temperature data (2005 to 2015). I am trying to find the highest and lowest temperature for each day of the year over the 10 year period .
I removed the 2 leap days from the data, i.e. 2008-02-29 and 2012-02-29, but when I apply dayofyear to the data, it returns 366 rows and I can't work out why.
I've broken down the steps and tested each part. The leap day dates are definitely not in the dataframe when I apply dayofyear
After I've removed the leap days and checked using this:
dfmax['2008-02-26':'2008-03-02']
29th Feb is not there.
The next step is to aggregate the date by day of year to get highest temp:
maxtemp = dfmax.groupby(dfmax.index.dayofyear).aggregate(max)
and from
maxtemp.info()
I get this :
Int64Index: 366 entries, 1 to 366
I expected 365 entries. What am I doing wrong?
The dayofyear attribute on Pandas maps has nothing to to with what dates actually present in your index. It is an integer assigned according to the position of that day in the calendar.
In other words, December 31 of 2008 is ALWAYS 366 regardless of the rest of the index. Therefore, if you are looking at 2008 (leap year) and you remove the last day of Feb, you're only deleting number 60 from the set, not resetting the count.
As per the documentation:
This attribute returns the day of the year on which the particular
date occurs. The return value ranges between 1 to 365 for regular
years and 1 to 366 for leap years.
I have some data and a date column. By running the command below, it goes through the DF and counts all the events happened during that week.
df['date'].groupby(df.date.dt.to_period("W")).agg('count')
The result is something like:
2018-04-16/2018-04-22 40
2018-04-23/2018-04-29 18
The weeks starts on Monday and end Sunday.
I want the week to start on Sunday and end on Saturday. So, the data should be
2018-04-15/2018-04-21 40
2018-04-22/2018-04-28 18
Use:
df = pd.DataFrame({'Date':np.random.choice(pd.date_range('2018-04-10',periods=365, freq='D'),1000)})
df.groupby(df['Date'].dt.to_period('W-SAT')).agg('count')
Output:
Date
Date
2018-04-08/2018-04-14 12
2018-04-15/2018-04-21 19
2018-04-22/2018-04-28 21
2018-04-29/2018-05-05 16
2018-05-06/2018-05-12 21
Use an anchored offset. Excerpt from the linked table:
W-SUN weekly frequency (Sundays). Same as ‘W’
W-MON weekly frequency (Mondays)
W-TUE weekly frequency (Tuesdays)
W-WED weekly frequency (Wednesdays)
W-THU weekly frequency (Thursdays)
W-FRI weekly frequency (Fridays)
W-SAT weekly frequency (Saturdays)
Since you want the week to end on Saturday, W-SAT should
suffice.