I have a table which contains information on the number of changes done on a particular day. I want to add a text field to it in the format YYYY-WW (e. g. 2022-01) which indicates the week number of the day. I need this information to determine in what week the total number of changes was the highest.
How can I determine the week number in Python?
Below is the code based on this answer:
week_nr = day.isocalendar().week
year = day.isocalendar().year
week_nr_txt = "{:4d}-{:02d}".format(year, week_nr)
At a first glance it seems to work, but I am not sure that week_nr_txt will contain year-week tuple according to the ISO 8601 standard.
Will it?
If not how do I need to change my code in order to avoid any week-related errors (example see below)?
Example of a week-related error: In year y1 there are 53 weeks and the last week spills over into the year y1+1.
The correct year-week tuple is y1-53. But I am afraid that my code above will result in y2-53 (y2=y1+1) which is wrong.
Thanks. I try to give my answer. You can easily use datetime python module like this:
from datetime import datetime
date = datetime(year, month, day)
# And formating the date time object like :
date.strftime('%Y-%U')
Then you will have the year and wich week the total information changes
Related
Is it possible to use .resample() to take the last observation in a month of a weekly time series to create a monthly time series from the weekly time series? I don't want to sum or average anything, just take the last observation of each month
Thank you.
Based on what you want and what the documentation describes, you could try the following :
data[COLUMN].resample('M', convention='end')
Try it out and update us!
References
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html
Is the 'week' field as week of year, a date or other?
If it's a datetime, and you have datetime library imported , use .dt.to_period('M') on your current date column to create a new 'month' column, then get the max date for each month to get the date to sample ( if you only want the LAST date in each month ? )
Like max(df['MyDateField'])
Someone else is posting as I type this, so may have a better answer :)
Background: Sometimes we need to take a date which is a month after than the original timestamp, since not all days are trading days, some adjustments must be made.
I extracted the index of stock close price, getting a time series with lots of timestamps of trading days.
trading_day_glossory = stock_close_full.index
Now, given a datetime-format variable date, with the following function, the program should return me the day variable indicating a trading day. But indeed it did not. The if condition is never evoked, eventually it added up to 9999:99:99 and reported error.
def return_trading_day(day,trading_day_glossory):
while True:
day = day + relativedelta(days=1)
if day in trading_day_glossory:
break
I reckon that comparing a timestamp with a datetime is problematic, so I rewrote the first part of my function in this way:
trading_day_glossory = stock_close_full.index
trading_day_glossory = trading_day_glossory.to_pydatetime()
# Error message: OverflowError: date value out of range
However this change makes no difference. I further tested some characteristics of the variables involved:
testing1 = trading_day_glossory[20] # returns a datetime variable say 2000-05-08 00:00:00
testing2 = day # returns a datetime variable say 2000-05-07 00:00:00
What may be the problem and what should I do?
Thanks.
Not quite sure what is going on because the errors cannot be reproduced from your codes and variables.
However, you can try searchsorted to find the first timestamp not earlier than a given date in a sorted time series by binary search:
trading_day_glossory.searchsorted(day)
It's way better than comparing values in a while loop.
If there was a variable in an xarray dataset with a time dimension with daily values over some multiyear time span
2017-01-01 ... 2018-12-31, then it is possible to group the data by month, or by the day of the year, using
.groupby("time.month") or .groupby("time.dayofyear")
Is there a way to efficiently group the data by the day of the month, for example if I wanted to calculate the mean value on the 21st of each month?
See the xarray docs on the DateTimeAccessor helper object. For more info, you can also check out the xarray docs on Working with Time Series Data: Datetime Components, which in turn refers to the pandas docs on date/time components.
You're looking for day. Unfortunately, both pandas and xarray simply describe .dt.day as referring to "the days of the datetime" which isn't particularly helpful. But if you take a look at python's native datetime.Date.day definition, you'll see the more specific:
date.day
Between 1 and the number of days in the given month of the given year.
So, simply
da.groupby("time.day")
Should do the trick!
I not sure, but maybe you can do like this:
import datetime
x = datetime.datetime.now()
day = x.strftime("%d")
month = x.strftime("%m")
year = x.strftime("%Y")
.groupby(month) or .groupby(year)
I'm trying to write a bit of code to check if a document has been updated this week, and if not to read in the data and update it. I need to be able to check if the last modified date/time of the document occurred in this week or not (Monday-Sunday).
I know this code gives me the last modified time of the file as a float of secconds since the epoch:
os.path.getmtime('path')
And I know I can use time.ctime to get that as a string date:
time.ctime(os.path.getmtime('path'))
But I'm not sure how to check if that date was in the current week. I also don't know if its easier to convert to a datetime object rather than ctime for this?
you can use datetime.isocalendar and compare the week attribute, basicallly
import os
from datetime import datetime
t_file = datetime.fromtimestamp(os.path.getmtime(filepath))
t_now = datetime.now()
print(t_file.isocalendar().week == t_now.isocalendar().week)
# or print(t_file.isocalendar()[1]== t_now.isocalendar()[1])
# to compare the year as well, use e.g.
print(t_file.isocalendar()[:2] == t_now.isocalendar()[:2])
The ISO year consists of 52 or 53 full weeks, and where a week starts on a Monday and ends on a Sunday. The first week of an ISO year is the first (Gregorian) calendar week of a year containing a Thursday. This is called week number 1, and the ISO year of that Thursday is the same as its Gregorian year.
I had a list of dates that turned into week number and years using;
dfweek['weeknum'] = df['Date'].dt.strftime('%U_%Y')
This would output: 34_2019
34 being the 34th week of 2019
How would I go about sorting data by this string in chronological order since the order comes out:
00_2018
00_2019
01_2018
01_2019
I tried converting back to datetime by:
dfweek['weeknum1'] = pd.to_datetime(dfweek['weeknum'], format = '%W_%Y')
This kept returning the error: ValueError: Cannot use '%W' or '%U' without day and year
Tried adding a day in the form of %w just to see what happens
dfweek['weeknum'] = df['Date'].dt.strftime('%U_%Y_%w')
dfweek['weeknum1'] = pd.to_datetime(dfweek['weeknum'], format = '%W_%Y_%w')
but it just spits back the original date without the week number
My desired output would be
00_2018
01_2018
02_2018
...
51_2019
52_2019
You can use the following for the sorting:
dfweek = dfweek.assign(weeknum1= df['Date'].dt.strftime('%Y_%U')).sort_values('weeknum1')
Here, we made a temporary column weeknum1 using format e.g. '2018_00' and then sort using this format. As a result, it is sorting in year + week number as required.