Is it possible to use .resample() to take the last observation in a month of a weekly time series to create a monthly time series from the weekly time series? I don't want to sum or average anything, just take the last observation of each month
Thank you.
Based on what you want and what the documentation describes, you could try the following :
data[COLUMN].resample('M', convention='end')
Try it out and update us!
References
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html
Is the 'week' field as week of year, a date or other?
If it's a datetime, and you have datetime library imported , use .dt.to_period('M') on your current date column to create a new 'month' column, then get the max date for each month to get the date to sample ( if you only want the LAST date in each month ? )
Like max(df['MyDateField'])
Someone else is posting as I type this, so may have a better answer :)
Related
could anyone can teach me how to calculate the time duration for each person? I'm not good at describing the question because my English sucks but as the picture shows below, I want to select the latest date for the person and then subtract the oldest date to get the time duration. Thanks
I wanna get the duration time.
If need subtract maximal dates per customers use GroupBy.transform with Series.sub, last if necessary convert timedeltas to days by Series.dt.days:
df['date'] = pd.to_datetime(df['date'])
df['dur'] = df.groupby('customer')['date'].transform('max').sub(df['date']).dt.days
I have a table which contains information on the number of changes done on a particular day. I want to add a text field to it in the format YYYY-WW (e. g. 2022-01) which indicates the week number of the day. I need this information to determine in what week the total number of changes was the highest.
How can I determine the week number in Python?
Below is the code based on this answer:
week_nr = day.isocalendar().week
year = day.isocalendar().year
week_nr_txt = "{:4d}-{:02d}".format(year, week_nr)
At a first glance it seems to work, but I am not sure that week_nr_txt will contain year-week tuple according to the ISO 8601 standard.
Will it?
If not how do I need to change my code in order to avoid any week-related errors (example see below)?
Example of a week-related error: In year y1 there are 53 weeks and the last week spills over into the year y1+1.
The correct year-week tuple is y1-53. But I am afraid that my code above will result in y2-53 (y2=y1+1) which is wrong.
Thanks. I try to give my answer. You can easily use datetime python module like this:
from datetime import datetime
date = datetime(year, month, day)
# And formating the date time object like :
date.strftime('%Y-%U')
Then you will have the year and wich week the total information changes
If there was a variable in an xarray dataset with a time dimension with daily values over some multiyear time span
2017-01-01 ... 2018-12-31, then it is possible to group the data by month, or by the day of the year, using
.groupby("time.month") or .groupby("time.dayofyear")
Is there a way to efficiently group the data by the day of the month, for example if I wanted to calculate the mean value on the 21st of each month?
See the xarray docs on the DateTimeAccessor helper object. For more info, you can also check out the xarray docs on Working with Time Series Data: Datetime Components, which in turn refers to the pandas docs on date/time components.
You're looking for day. Unfortunately, both pandas and xarray simply describe .dt.day as referring to "the days of the datetime" which isn't particularly helpful. But if you take a look at python's native datetime.Date.day definition, you'll see the more specific:
date.day
Between 1 and the number of days in the given month of the given year.
So, simply
da.groupby("time.day")
Should do the trick!
I not sure, but maybe you can do like this:
import datetime
x = datetime.datetime.now()
day = x.strftime("%d")
month = x.strftime("%m")
year = x.strftime("%Y")
.groupby(month) or .groupby(year)
I need to groupby based on the year & month of the Date column so I have a clearer graph presentation and less sample-size.
df.groupby(df2['Date'])['¢/kWh'].mean()
This has helped me to have less sample size and a better presentation. However I need to further groupby based on the month and year.
Try :
df.groupby(df2['Date'].astype(str).str[0:7])['¢/kWh'].mean()
I want to divide the date range starting from 1st July to 1st August, in weekly basis. But I want it to start from 1st day of the month.
I am using pd.date_range('2015-07-01', '2015-08-01', freq='W' )
But I am getting
DatetimeIndex(['2015-07-05', '2015-07-12', '2015-07-19', '2015-07-26'], dtype='datetime64[ns]', freq='W-SUN')
I want this to be done from 2015-07-01. I know I can use timedelta or find the start day of the month and use W-WED. But is there any other shortcut to do the same using date_range of pandas?
I have checked http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases, but could not come up with anything useful.
Any help is appreciated. Thanks in advance.
I would suggest using the frequency of 7 days instead of a week, so that you will start at the first day of the month rather than the first day of the week
pd.date_range('2015-07-01', '2015-08-01', freq='7d')
EDIT
To clarify, it is not strictly the first day of the month, but the first day you provide. But in your example those two are the same