I need to groupby based on the year & month of the Date column so I have a clearer graph presentation and less sample-size.
df.groupby(df2['Date'])['¢/kWh'].mean()
This has helped me to have less sample size and a better presentation. However I need to further groupby based on the month and year.
Try :
df.groupby(df2['Date'].astype(str).str[0:7])['¢/kWh'].mean()
Related
Is it possible to use .resample() to take the last observation in a month of a weekly time series to create a monthly time series from the weekly time series? I don't want to sum or average anything, just take the last observation of each month
Thank you.
Based on what you want and what the documentation describes, you could try the following :
data[COLUMN].resample('M', convention='end')
Try it out and update us!
References
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html
Is the 'week' field as week of year, a date or other?
If it's a datetime, and you have datetime library imported , use .dt.to_period('M') on your current date column to create a new 'month' column, then get the max date for each month to get the date to sample ( if you only want the LAST date in each month ? )
Like max(df['MyDateField'])
Someone else is posting as I type this, so may have a better answer :)
could anyone can teach me how to calculate the time duration for each person? I'm not good at describing the question because my English sucks but as the picture shows below, I want to select the latest date for the person and then subtract the oldest date to get the time duration. Thanks
I wanna get the duration time.
If need subtract maximal dates per customers use GroupBy.transform with Series.sub, last if necessary convert timedeltas to days by Series.dt.days:
df['date'] = pd.to_datetime(df['date'])
df['dur'] = df.groupby('customer')['date'].transform('max').sub(df['date']).dt.days
If there was a variable in an xarray dataset with a time dimension with daily values over some multiyear time span
2017-01-01 ... 2018-12-31, then it is possible to group the data by month, or by the day of the year, using
.groupby("time.month") or .groupby("time.dayofyear")
Is there a way to efficiently group the data by the day of the month, for example if I wanted to calculate the mean value on the 21st of each month?
See the xarray docs on the DateTimeAccessor helper object. For more info, you can also check out the xarray docs on Working with Time Series Data: Datetime Components, which in turn refers to the pandas docs on date/time components.
You're looking for day. Unfortunately, both pandas and xarray simply describe .dt.day as referring to "the days of the datetime" which isn't particularly helpful. But if you take a look at python's native datetime.Date.day definition, you'll see the more specific:
date.day
Between 1 and the number of days in the given month of the given year.
So, simply
da.groupby("time.day")
Should do the trick!
I not sure, but maybe you can do like this:
import datetime
x = datetime.datetime.now()
day = x.strftime("%d")
month = x.strftime("%m")
year = x.strftime("%Y")
.groupby(month) or .groupby(year)
Assuming that I have a series made of daily values:
dates = pd.date_range('1/1/2004', periods=365, freq="D")
ts = pd.Series(np.random.randint(0,101, 365), index=dates)
I need to use .groupby or .reduce with a fixed schema of dates.
Use of the ts.resample('8d') isn't an option as dates need to not fluctuate within the month and the last chunk of the month needs to be flexible to address the different lengths of the months and moreover in case of a leap year.
A list of dates can be obtained through:
g = dates[dates.day.isin([1,8,16,24])]
How I can group or reduce my data to the specific schema so I can compute the sum, max, min in a more elegant and efficient way than:
for i in range(0,len(g)-1):
ts.loc[(dec[i] < ts.index) & (ts.index < dec[i+1])]
Well from calendar point of view, you can group them to calendar weeks, day of week, months and so on.
If that is something that you would be intrested in, you could do that easily with datetime and pandas for example:
import datetime
df['week'] = df['date'].dt.week #create week column
df.groupby(['week'])['values'].sum() #sum values by weeks
I have a dataframe that has one column which is a datetime series object. It has some data associated with every date in another column. The year ranges from 2005-2014. I want to group similar dates in each year together, i.e, all the 1st January falling in 2005-15 must be grouped together irrespective of the year.Similarly for all the 365 days in a year. So I should have 365 days as the output. How can I do that?
Assuming your DataFrame has a column Date, you can make it the index of the DataFrame and then use strftime, to convert to a format with only day and month (like "%m-%d"), and groupby plus the appropriate function (I just used mean):
df = df.set_index('Date')
df.index = df.index.strftime("%m-%d")
dfAggregated = df.groupby(level=0).mean()
Please note that the output will have 366 days, due to leap years. You might want to filter out the data associated to Feb 29th or merge it into Feb 28th/March 1st (depending on the specific use case of your application)