I have two time series, df1
day cnt
2020-03-01 135006282
2020-03-02 145184482
2020-03-03 146361872
2020-03-04 147702306
2020-03-05 148242336
and df2:
day cnt
2017-03-01 149104078
2017-03-02 149781629
2017-03-03 151963252
2017-03-04 147384922
2017-03-05 143466746
The problem is that the sensors I'm measuring are sensitive to the day of the week, so on Sunday, for instance, they will produce less cnt. Now I need to compare the time series over 2 different years, 2017 and 2020, but to do that I have to align (March, in this case) to the matching day of the week, and plot them accordingly. How do I "shift" the data to make the series comparable?
The ISO calendar is a representation of date in a tuple (year, weeknumber, weekday). In pandas they are the dt members year, weekofyear and weekday. So assuming that the day column actually contains Timestamps (convert if first with to_datetime if it does not), you could do:
df1['Y'] = df1.day.dt.year
df1['W'] = df1.day.dt.weekofyear
df1['D'] = df1.day.dt.weekday
Then you could align the dataframes on the W and D columns
March 2017 started on wednesday
March 2020 started on Sunday
So, delete the last 3 days of march 2017
So, delete the first sunday, monday and tuesday from 2020
this way you have comparable days
df1['ctn2020'] = df1['cnt']
df2['cnt2017'] = df2['cnt']
df1 = df1.iloc[2:, 2]
df2 = df2.iloc[:-3, 2]
Since you don't want to plot the date, but want the months to align, make a new dataframe with both columns and a index column. This way you will have 3 columns: index(0-27), 2017 and 2020. The index will represent.
new_df = pd.concat([df1,df2], axis=1)
If you also want to plot the days of the week on the x axis, check out this link, to know how to get the day of the week from a date, and them change the x ticks label.
Sorry for the "written step-to-stop", if it all sounds confusing, i can type the whole code later for you.
Related
So I am really new to this and struggling with something, which I feel should be quite simple.
I have a Pandas Dataframe containing two columns: Fiscal Week (str) and Amount sold (int).
Fiscal Week
Amount sold
0
2019031
24
1
2019041
47
2
2019221
34
3
2019231
46
4
2019241
35
My problem is the fiscal week column. It contains strings which describe the fiscal year and week . The fiscal year for this purpose starts on October 1st and ends on September 30th. So basically, 2019031 is the Monday (the 1 at the end) of the third week of October 2019. And 2019221 would be the 2nd week of March 2020.
The issue is that I want to turn this data into timeseries later. But I can't do that with the data in string format - I need it to be in date time format.
I actually added the 1s at the end of all these strings using
df['Fiscal Week']= df['Fiscal Week'].map('{}1'.format)
so that I can then turn it into a proper date:
df['Fiscal Week'] = pd.to_datetime(df['Fiscal Week'], format="%Y%W%w")
as I couldn't figure out how to do it with just the weeks and no day defined.
This, of course, returns the following:
Fiscal Week
Amount sold
0
2019-01-21
24
1
2019-01-28
47
2
2019-06-03
34
3
2019-06-10
46
4
2019-06-17
35
As expected, this is clearly not what I need, as according to the definition of the fiscal year week 1 is not January at all but rather October.
Is there some simple solution to get the dates to what they are actually supposed to be?
Ideally I would like the final format to be e.g. 2019-03 for the first entry. So basically exactly like the string but in some kind of date format, that I can then work with later on. Alternatively, calendar weeks would also be fine.
Assuming you have a data frame with fiscal dates of the form 'YYYYWW' where YYY = the calendar year of the start of the fiscal year and ww = the number of weeks into the year, you can convert to calendar dates as follows:
def getCalendarDate(fy_date: str):
f_year = fy_date[0:4]
f_week = fy_date[4:]
fys = pd.to_datetime(f'{f_year}/10/01', format= '%Y/%m/%d')
return fys + pd.to_timedelta(int(f_week), "W")
You can then use this function to create the column of calendar dates as follows:
df['Calendar Date]'] = list(getCalendarDate(x) for x in df['Fiscal Week'].to_list())
I am trying to create a new dataframe from an existing one by conditioning holiday datetime. train dataframe is existing and I want to create train_holiday from it by taking day and month values of holiday dataframe, my purpose is similar below:
date values
2015-02-01 10
2015-02-02 20
2015-02-03 30
2015-02-04 40
2015-02-05 50
2015-02-06 60
date
2012-02-02
2012-02-05
now first one is existing, and second dataframe shows holidays. I want to create a new dataframe from first one that only contains 2015 holidays similar below:
date values
2015-02-02 20
2015-02-05 50
I tried
train_holiday = train.loc[train["date"].dt.day== holidays["date"].dt.day]
but it gives error. could you please help me about this?
In your problem you care only the month and the day components, and one way to extract that is by dt.strftime() (ref). Applying that extraction on both date columns and use .isin() to keep month-day in df1 that matches that in df2.
df1[
df1['date'].dt.strftime('%m%d').isin(
df2['date'].dt.strftime('%m%d')
)
]
Make sure both date columns are in date-time format so that .dt can work. For example,
df1['date'] = pd.to_datetime(df1['date'])
I have a dataferam with a column that contains the date for the first monday of evry week between an arbitrary start date and now. I wish to generate a new column that has 2 week jumps but is the same length as the original column and would contain repeated values. For example this would be the result for the month of October where the column weekly exists and bi-weekly is the target:
data = {'weekly':['2018-10-08','2018-10-15','2018-10-22','2018-10-29']
,'bi-weekly':['2018-10-08','2018-10-08',
'2018-10- 22','2018-10-22']}
df = pd.DataFrame(data)
At the moment I am stuck with pd.date_range(start,end,freq='14D') but this does not contain any repeated values which I need to be able to groupby
IIUC
df.groupby(np.arange(len(df))//2).weekly.transform('first')
Out[487]:
0 2018-10-08
1 2018-10-08
2 2018-10-22
3 2018-10-22
Name: weekly, dtype: datetime64[ns]
How can I make a Plot counting the elements that are in a specific Time Group. I have a dataframe like this
Incident_Number Submit_Date Description
001 05/04/2017 12:00:45 Problem1
002 05/05/2017 13:00:00 Problem2
003 05/05/2017 14:00:00 Problem3
004 07/05/2017 19:00:00 Problem4
005 07/06/2017 08:00:00 Problem5
and how could be possible to make a line plot that show me the total Incidents by month, date, weekday, or year. I tried grouping by, but this take many lines, first extracting the month, year, and date and then transforming again in datetime to visualize. Any ideas?
Thanks for your help
Start with converting Submit_Date into a timedate (if it is not a timedate yet) and making it the index:
df['Submit_Date'] = pd.to_datetime(df['Submit_Date'])
df.set_index('Submit_Date', inplace = True)
Now you can resample your data at any frequency and plot it. For example, resample by 1 month (get monthly counts):
df.resample('1M').count()['Description'].plot()
I have a column named Date Opened, its a date field. Its dtype is datetime64[ns]. What I am trying to do is run through all the dates in my dataframe in the Date Opened column and then somehow create a new column with specific dates. My dates format looks like this:
'2012-05-16'
I was wondering if there is any way to run through the dates and only bring me back dates that are in the months of jan, feb, mar, and then apr, may, jun and then jul, aug, sep, and finally oct, nov, dec, put them into a separate column that I can filter on by quarter so for jan, feb, mar, that would be Q1 and then the next set of three would be Q2 and so on, and the years aren't all the same so that's why I want to group and filter by quarter.
'2012-01-03', '2013-02-03', '2012-03-12'
'2012-01-10', '2013-02-07', '2012-03-13'
'2012-01-13', '2013-02-15', '2012-03-18'
'2012-01-16', '2013-02-19', '2012-03-20'
'2012-01-22', '2013-02-20', '2012-03-21'
'2012-01-23', '2013-02-21', '2012-03-25'
'2012-01-28', '2013-02-28', '2012-03-27'
I have tried using datetime and group them but I can't seem to get them in their own column and I don't want the dates to be reliant on year, I want to just pull the dates in by month (quarter) so, no matter what year it is, they still just bring them in according to the quarter that they fall under.
You can create an additional column with the numeric quarter of each date with the quarter attribute and then filter based on that.
In [17]: df = pd.DataFrame({'Date Opened': s, 'foo': ['test', 'bar']})
In [18]: df
Out[18]:
Date Opened foo
0 2016-12-14 test
1 2014-03-12 bar
In [19]: df['quarter'] = df['Date Opened'].dt.quarter
In [20]: df
Out[20]:
Date Opened foo quarter
0 2016-12-14 test 4
1 2014-03-12 bar 1