Pandas: Sorting by week number and year string - python

I had a list of dates that turned into week number and years using;
dfweek['weeknum'] = df['Date'].dt.strftime('%U_%Y')
This would output: 34_2019
34 being the 34th week of 2019
How would I go about sorting data by this string in chronological order since the order comes out:
00_2018
00_2019
01_2018
01_2019
I tried converting back to datetime by:
dfweek['weeknum1'] = pd.to_datetime(dfweek['weeknum'], format = '%W_%Y')
This kept returning the error: ValueError: Cannot use '%W' or '%U' without day and year
Tried adding a day in the form of %w just to see what happens
dfweek['weeknum'] = df['Date'].dt.strftime('%U_%Y_%w')
dfweek['weeknum1'] = pd.to_datetime(dfweek['weeknum'], format = '%W_%Y_%w')
but it just spits back the original date without the week number
My desired output would be
00_2018
01_2018
02_2018
...
51_2019
52_2019

You can use the following for the sorting:
dfweek = dfweek.assign(weeknum1= df['Date'].dt.strftime('%Y_%U')).sort_values('weeknum1')
Here, we made a temporary column weeknum1 using format e.g. '2018_00' and then sort using this format. As a result, it is sorting in year + week number as required.

Related

Standardizing date format in Python

I have a dataset and I realized the values in the date column are in two different formats.
I tried resolving this issue using date parser but it confuses the day of the month with the month.
For example :
'27/02/21 13:40' is converted correctly to 2021-02-27 13:40:00' BUT '01/03/21 15:09' is converted to '2021-01-03 15:09:00' (Instead of March 1st it's transformed into January 3rd).
I really don't understand why in the first case the conversion is correct, while in the second it's not. Both dates are in the same column and have the same format.
This is a preview of the dataset with the two different dates:
This is the date that was converted correctly
This is the date that was not converted correctly
These are the steps I followed:
I converted the date columns to a list of strings
I created this function:
date parser function
I previewed my converted list and noticed that not all dates had been converted in the same way:
This date was converted correctly
Here, month and day of the month are switched

Get month-day pair without a year from pandas date time

I am trying to use this, but eventually, I get the same year-month-day format where my year changed to default "1900". I want to get only month-day pairs if it is possible.
df['date'] = pd.to_datetime(df['date'], format="%m-%d")
If you transform anything to date time, you'll always have a year in it, i.e. to_datetime will always yield a date time with a year.
Without a year, you will need to store it as a string, e.g. by running the inverse of your example:
df['date'] = df['date'].dt.strftime(format="%m-%d")

How to get a year-week text in Python?

I have a table which contains information on the number of changes done on a particular day. I want to add a text field to it in the format YYYY-WW (e. g. 2022-01) which indicates the week number of the day. I need this information to determine in what week the total number of changes was the highest.
How can I determine the week number in Python?
Below is the code based on this answer:
week_nr = day.isocalendar().week
year = day.isocalendar().year
week_nr_txt = "{:4d}-{:02d}".format(year, week_nr)
At a first glance it seems to work, but I am not sure that week_nr_txt will contain year-week tuple according to the ISO 8601 standard.
Will it?
If not how do I need to change my code in order to avoid any week-related errors (example see below)?
Example of a week-related error: In year y1 there are 53 weeks and the last week spills over into the year y1+1.
The correct year-week tuple is y1-53. But I am afraid that my code above will result in y2-53 (y2=y1+1) which is wrong.
Thanks. I try to give my answer. You can easily use datetime python module like this:
from datetime import datetime
date = datetime(year, month, day)
# And formating the date time object like :
date.strftime('%Y-%U')
Then you will have the year and wich week the total information changes

Convert pandas iso week number to regular week number

I have a dataframe of downsampled Open/High/Low/Last/Change/Volume values for a security over ten years.
I'm trying to get the weekly count of samples i.e. how many samples did my downsampling method, in this case a Volume bar, sample per week over the entire dataset so that I can plot it and compare to other downsampling methods.
So far I've tried creating a series in the df called 'Year-Week' following the answers prescribed here and here.
The problem with these answers is that my EOY dates such as '1997-12-30' get transformed to '1997-01' because of the ISO calendar system used as described in this answer, which breaks my results when I apply the value_counts method.
My code is the following:
volumeBar['Year/Week'] = (pd.Series(volumeBar.index).dt.year.astype(str) + "/" + pd.Series(volumeBar.index).dt.week.astype(str)).values
So my question is: As it stand the following sample DateTimeIndex
Date
1997-12-22
1997-12-29
1997-12-30
becomes
Year/Week
1997/52
1997/1
1997/1
How could I get the following expected result?
Year/Week
1997/52
1997/52
1997/52
Please keep in mind that I cannot manually correct this behavior because of the size of the dataset and the erradict nature of these appearing results due to the way the ISO calendar works.
Many thanks in advance!
You can use the below function get_years_week to get years and weeks without ISO formating.
import pandas as pd
import datetime
a = {'Date': ['1997-11-29', '1997-12-22',
'1997-12-29',
'1997-12-30']}
data = pd.DataFrame(a)
data['Date'] = pd.to_datetime(data['Date'])
# Function for getting weeks and years
def get_years_week(data):
# Get year from date
data['year'] = data['Date'].dt.year
# loop over each row of date column and get week number
for i in range(len(data)):
data['week'] = (((data['Date'][i] - datetime.datetime\
(data['Date'][i].year,1,1)).days // 7) + 1)
# create column for week and year
data['year/week'] = pd.Series(data_2['year'].astype('str'))\
+ '/' + pd.Series(data_2['week'].astype('str'))
return data

Date to text and vice versa in excel [duplicate]

This question already has an answer here:
convert numerical representation of date (excel format) to python date and time, then split them into two seperate dataframe columns in pandas
(1 answer)
Closed 4 years ago.
I have seen that excel identifies dates with specific serial numbers. For example :
09/07/2018 = 43290
10/07/2018 = 43291
I know that we use the DATEVALUE , VALUE and the TEXT functions to convert between these types.
But what is the logic behind this conversion? why 43290 for 09/07/2018 ?
Also , if I have a list of these dates in the number format in a dataframe (Python), how can I convert this number to the date format?
Similarly with time, I see decimal values in place of a regular time format. What is the logic behind these time conversions?
The following question that has been given in the comments is informative, but does not answer my question of the logic behind the conversion between Date and Text format :
convert numerical representation of date (excel format) to python date and time, then split them into two seperate dataframe columns in pandas
It is simply the number of days (or fraction of days, if talking about date and time) since January 1st 1900:
The DATEVALUE function converts a date that is stored as text to a
serial number that Excel recognizes as a date. For example, the
formula =DATEVALUE("1/1/2008") returns 39448, the serial number of the
date 1/1/2008. Remember, though, that your computer's system date
setting may cause the results of a DATEVALUE function to vary from
this example
...
Excel stores dates as sequential serial numbers so that they can be used in calculations. By default, January 1, 1900 is serial number 1, and January 1, 2008 is serial number 39448 because it is 39,447 days after January 1, 1900.
from DATEVALUE docs
if I have a list of these dates in the number format in a dataframe
(Python), how can I convert this number to the date format?
Since we know this number represents the number of days since 1/1/1900 it can be easily converted to a date:
from datetime import datetime, timedelta
day_number = 43290
print(datetime(1900, 1, 1) + timedelta(days=day_number - 2))
# 2018-07-09 00:00:00 ^ subtracting 2 because 1/1/1900 is
# "day 1", not "day 0"
However pd.read_excel should be able to handle this automatically.

Categories