I want to produce a dataframe that splits by day (which is the day date of the month) but then orders them by the date. At the moment the code below splits them into dates e.g. 1 - 11, 2 - 11 but the 30 -10 and 31-10 come after all my November dates.
ResultSet2 = ResultProxy2.fetchall()
df2 = pd.DataFrame(ResultSet2)
resultsrecovery = [group[1] for group in df2.groupby(["day"])]
The current code output :
I basically want the grouped dataframe for the 30-10 and 31st of October to come before all the ones in November
So I am really new to this and struggling with something, which I feel should be quite simple.
I have a Pandas Dataframe containing two columns: Fiscal Week (str) and Amount sold (int).
Fiscal Week
Amount sold
0
2019031
24
1
2019041
47
2
2019221
34
3
2019231
46
4
2019241
35
My problem is the fiscal week column. It contains strings which describe the fiscal year and week . The fiscal year for this purpose starts on October 1st and ends on September 30th. So basically, 2019031 is the Monday (the 1 at the end) of the third week of October 2019. And 2019221 would be the 2nd week of March 2020.
The issue is that I want to turn this data into timeseries later. But I can't do that with the data in string format - I need it to be in date time format.
I actually added the 1s at the end of all these strings using
df['Fiscal Week']= df['Fiscal Week'].map('{}1'.format)
so that I can then turn it into a proper date:
df['Fiscal Week'] = pd.to_datetime(df['Fiscal Week'], format="%Y%W%w")
as I couldn't figure out how to do it with just the weeks and no day defined.
This, of course, returns the following:
Fiscal Week
Amount sold
0
2019-01-21
24
1
2019-01-28
47
2
2019-06-03
34
3
2019-06-10
46
4
2019-06-17
35
As expected, this is clearly not what I need, as according to the definition of the fiscal year week 1 is not January at all but rather October.
Is there some simple solution to get the dates to what they are actually supposed to be?
Ideally I would like the final format to be e.g. 2019-03 for the first entry. So basically exactly like the string but in some kind of date format, that I can then work with later on. Alternatively, calendar weeks would also be fine.
Assuming you have a data frame with fiscal dates of the form 'YYYYWW' where YYY = the calendar year of the start of the fiscal year and ww = the number of weeks into the year, you can convert to calendar dates as follows:
def getCalendarDate(fy_date: str):
f_year = fy_date[0:4]
f_week = fy_date[4:]
fys = pd.to_datetime(f'{f_year}/10/01', format= '%Y/%m/%d')
return fys + pd.to_timedelta(int(f_week), "W")
You can then use this function to create the column of calendar dates as follows:
df['Calendar Date]'] = list(getCalendarDate(x) for x in df['Fiscal Week'].to_list())
Hi all,
The image is the data frame I am working on to learn python.
From this dataframe, I am trying to find the row records which are the last day of Dec for each year. My objective is to keep the record highlighted in yellow in the data frame and remove the white rows.
For example, for the year 2010, I just want to keep the 3rd record and remove rows 1 to 2.
As for the year 2011, I want to remove rows 4 to 7 and keep row 8.
Below is the code I have written. I intend to use loop to find the records I want to keep and remove the rest.
To retain records using month values, I managed to meet my objective by keeping Dec and remove Jan to Nov records
However, for days (last 3 lines of code), I realized that the last day does not always end with 31 in the data frame and I cannot use my initial logic to remove.
May I seek help if is there a better solution to find the last day of the month in data frame and remove the rest?
Thanks
amalgamate=pd.read_excel("amalgamate.xlsx")
##Create last 3 columns to segregate Year, Month and Day.
amalgamate["Date"] = pd.to_datetime(amalgamate["Date"], errors = "raise", format = "%Y-%m-%d")
amalgamate["Year"]=amalgamate["Date"].dt.year
amalgamate["Month"]=amalgamate["Date"].dt.month
amalgamate["Day"]=amalgamate["Date"].dt.day
listofMonth=amalgamate.Month.unique()
listofDay=amalgamate.Day.unique()
#Loop through the records and remove records that are not Dec for each year
for eachmonth in listofMonth:
if eachmonth !=12:
amalgamate=amalgamate[amalgamate.Month != eachmonth]
#Loop through the records and remove records that are not 31 for each month
for eachday in listofDay:
if eachday !=31:
amalgamate=amalgamate[amalgamate.Day != eachday]
Here is a oneliner that will filter the last days of the months by grouping by Date with pd.Grouper set to one month, then getting the last row from each group:
df.loc[df.groupby(pd.Grouper(key='Date', freq='1M')).Date.idxmax()]
As you mentioned at the beginning of the question that you want to find the last day of Dec for each year, you can group the dates by year and get the last entry within a group by GroupBy.last(), as follows:
df.groupby(df['Date'].dt.year, as_index=False).last()
If you further want to find the last day of a month (as you mentioned at the end of the question), you can group the dates by year and month and get the last entry within a group by GroupBy.last(), as follows:
df.groupby([df['Date'].dt.year, df['Date'].dt.month], as_index=False).last()
You can use pandas groupby to find the last (i.e., max) month and last day per year, then merge dataframes to filter only the rows that have the last month and day. Just as you don't need to assume that the last day of Dec in your data is 31, you don't have to assume that the last month in the year in your data is Dec. There are multiple ways to do it, and you could do the steps below in a different order. Here's one that I think may be easiest to follow:
row1list = [2010, 12, 28]
row2list = [2010, 12, 20]
row3list = [2011, 11, 20]
row4list = [2011, 11, 15]
row5list = [2011, 10, 30]
df = pd.DataFrame([row1list, row2list, row3list, row4list, row5list], columns=['year', 'month', 'day'])
# find last day for every combo of year, month
df_last_day_per_year_month = df.groupby(['year', 'month'], as_index=False).agg({
'day': max})
# find last month for every year, using only the rows with max day per year, month
df_last_month_per_year = df_last_day_per_year_month.groupby('year', as_index=False).agg({
'month': max})
# keep only the last month by comparing month values to last month per year
df_last_month_per_year = df_last_month_per_year.rename(columns={'month':'last_month'})
df_last_day_per_year_month = df_last_day_per_year_month.merge(df_last_month_per_year, on='year', how='left')
df_last_day_per_year_month = df_last_day_per_year_month[df_last_day_per_year_month['month'] == df_last_day_per_year_month['last_month']]
# don't need 'last_month' column anymore so delete it
del df_last_day_per_year_month['last_month']
# inner merge to filter original df to keep only the dates that are max month, day per year
df = df.merge(df_last_day_per_year_month, on=['year', 'month', 'day'], how='inner')
print(df)
# year month day
# 0 2010 12 28
# 1 2011 11 20
This question already has answers here:
How do I get the day of week given a date?
(30 answers)
Closed 4 years ago.
Given a 4 digit year, return the number of the day on which January 1st falls. 0 is Sunday, …, 6 is Saturday
You can use the datetime library
import datetime
d = datetime.date(year, 1, 1)
weekday = d.isoweekday() % 7
The datetime library starts all weeks on Monday, so you need to do a bit of a conversion to get Sunday to be 0
I am new to Pandas timeseries and dataframes and struggle getting this simple task done.
I have a dataset "data" (1-dimensional float32-Numpy array) for each day from 1/1/2004 - 12/31/2008. The dates are stored as a list of datetime objects "dates".
Basically, I would like to calculate a complete "standard year" - the average value of each day of all years (1-365).
I started from this similar (?) question (Getting the average of a certain hour on weekdays over several years in a pandas dataframe), but could not get to the desired result - a time series of 365 "average" days, e.g. the average of all four 1st of January's, 2nd of January's ...
A small example script:
import numpy as np
import pandas as pd
import datetime
startdate = datetime.datetime(2004, 1, 1)
enddate = datetime.datetime(2008, 1, 1)
days = (enddate + datetime.timedelta(days=1) - startdate).days
data = np.random.random(days)
dates = [startdate + datetime.timedelta(days=x) for x in range(0, days)]
ts = pd.Series(data, dates)
test = ts.groupby(lambda x: (x.year, x.day)).mean()
Group by the month and day, rather than the year and day:
test = ts.groupby([ts.index.month, ts.index.day]).mean()
yields
1 1 0.499264
2 0.449357
3 0.498883
...
12 17 0.408180
18 0.317682
19 0.467238
...
29 0.413721
30 0.399180
31 0.828423
Length: 366, dtype: float64