Python Q. How to parse an object index in a data frame into its date, time, and time zone?
The format is "YYY-MM-DD HH:MM:SS-HH:MM"
where the right "HH:MM" is the timezone.
Example:
Midnight Jan 1st, 2020 in Mountain Time:
2020-01-01 00:00:00-07:00
I'm trying to convert this into seven columns in the data frame:
YYYY, MM, DD, HH, MM, SS, TZ
Use pd.to_datetime to parse a string column into a datetime array
datetimes = pd.to_datetime(column)
once you have this, you can access elements of the datetime object with the .dt datetime accessor:
final = pd.DataFrame({
"year": datetimes.dt.year,
"month": datetimes.dt.month,
"day": datetimes.dt.day,
"hour": datetimes.dt.hour,
"minute": datetimes.dt.minute,
"second": datetimes.dt.second,
"timezone": datetimes.dt.tz,
})
See the pandas user guide section on date/time functionality for more info
df
Date
0 2022-05-01 01:10:04+07:00
1 2022-05-02 05:09:10+07:00
2 2022-05-02 11:22:05+07:00
3 2022-05-02 10:00:30+07:00
df['Date'] = pd.to_datetime(df['Date'])
df['tz']= df['Date'].dt.tz
df['year']= df['Date'].dt.year
df['month']= df['Date'].dt.month
df['month_n']= df['Date'].dt.month_name()
df['day']= df['Date'].dt.day
df['day_n']= df['Date'].dt.day_name()
df['h']= df['Date'].dt.hour
df['mn']= df['Date'].dt.minute
df['s']= df['Date'].dt.second
df['T']= df['Date'].dt.time
df['D']= df['Date'].dt.date
Date tz year month month_n day day_n h mn s T D
0 2022-05-01 01:10:04+07:00 pytz.FixedOffset(420) 2022 5 May 1 Sunday 1 10 4 01:10:04 2022-05-01
1 2022-05-02 05:09:10+07:00 pytz.FixedOffset(420) 2022 5 May 2 Monday 5 9 10 05:09:10 2022-05-02
2 2022-05-02 11:22:05+07:00 pytz.FixedOffset(420) 2022 5 May 2 Monday 11 22 5 11:22:05 2022-05-02
3 2022-05-02 10:00:30+07:00 pytz.FixedOffset(420) 2022 5 May 2 Monday 10 0 30 10:00:30 2022-05-02
Related
I am working on a DataFrame looking at baseball games Date and their Attendance so I can create a Calendar Heatmap.
Date Attendance
1 Apr 7 44723.0
2 Apr 8 42719.0
3 Apr 9 36139.0
4 Apr 10 41253.0
5 Apr 11 20480.0
I've tried different solutions that I've come across...
- df['Date'] = df['Date'].astype('datetime64[ns]')
- df['Date'] = pd.to_datetime(df['Date'])
but I'll get the error of
'Out of bounds nanosecond timestamp: 1-04-07 00:00:00'.
From looking at my data, I don't even have a date that goes with that timestamp. I also looked at other posts on this site, and 1 potential problem is that my Dates are NOT zero padded? Could that be the cause?
you can convert to datetime if you supply a format; Ex:
df
Out[33]:
Date Attendance
1 Apr 7 44723.0
2 Apr 8 42719.0
3 Apr 9 36139.0
4 Apr 10 41253.0
5 Apr 11 20480.0
pd.to_datetime(df["Date"], format="%b %d")
Out[35]:
1 1900-04-07
2 1900-04-08
3 1900-04-09
4 1900-04-10
5 1900-04-11
Name: Date, dtype: datetime64[ns]
If you're unhappy with the base year 1900, you can add a date offset, for example
df["datetime"] = pd.to_datetime(df["Date"], format="%b %d")
df["datetime"] += pd.tseries.offsets.DateOffset(years=100)
df["datetime"]
1 2000-04-07
2 2000-04-08
3 2000-04-09
4 2000-04-10
5 2000-04-11
Name: datetime, dtype: datetime64[ns]
if I have 2 different set of dates:
01/05/2022 - 31/12/2022
01/01/2023 - 31/12/2023
01/05/2022 - 30/09/2022
01/10/2022 - 31/12/2022
01/01/2023 - 31/12/2023
I want to check if both set of dates above are contiguous between below range of dates
Date 1 = 01/05/2022
Date 2 = 31/12/2023
Please suggest a solution.
It seems to me easier to use pandas to check if dates fall into the date range.
You have the data day, month, year. In my practice, I usually see the sequences year, month, day.
I changed the variables 'Date_1', 'Date_2' to the desired format and the arrays themselves with dates, which I divided into two parts from and to. Then I filled the dataframe with these arrays and checked the date range. I specifically added one line with data for clarity: 2023-01-01 2025-12-31, it is just filtered, since it does not fall under the condition.
import pandas as pd
from datetime import datetime
Date_1 = '01/05/2022'
Date_2 = '31/12/2023'
Date_1 = datetime.strptime(Date_1, "%d/%m/%Y")
Date_2 = datetime.strptime(Date_2, "%d/%m/%Y")
start = [datetime.strptime(i, "%d/%m/%Y")for i in ['01/05/2022', '01/01/2023', '01/05/2022', '01/10/2022', '01/01/2023', '01/01/2023']]
finish = [datetime.strptime(i, "%d/%m/%Y")for i in ['31/12/2022', '31/12/2023', '30/09/2022', '31/12/2022', '31/12/2023', '31/12/2025']]
df = pd.DataFrame({'start': start, 'finish': finish})
print(df)
print(df[(df['start'] >= Date_1) & (df['finish'] <= Date_2)])
Output print(df)
start finish
0 2022-05-01 2022-12-31
1 2023-01-01 2023-12-31
2 2022-05-01 2022-09-30
3 2022-10-01 2022-12-31
4 2023-01-01 2023-12-31
5 2023-01-01 2025-12-31
Output print(df[(df['start'] >= Date_1) & (df['finish'] <= Date_2)])
start finish
0 2022-05-01 2022-12-31
1 2023-01-01 2023-12-31
2 2022-05-01 2022-09-30
3 2022-10-01 2022-12-31
4 2023-01-01 2023-12-31
I try to convert multiple dates format into YYYY-MM-DD, then merge them into 1 column ignore the NULL, but I end up with TypeError: cannot add DatetimeArray and DatetimeArray
import pandas as pd
data = [[ 'Apr 2021'], ['Jan 1'], ['Fri'], [ 'Jan 18']]
df = pd.DataFrame(data, columns = ['date', ])
#convert Month date Jan 1
df['date1']=(pd.to_datetime('2021 '+ df['date'],errors='coerce',format='%Y %b %d'))
# convert Month Year Apr 2021
df['date2']=pd.to_datetime(df['date'], errors='coerce')
#convert fri to this friday
today = datetime.date.today()
friday = today + datetime.timedelta( (4-today.weekday()) % 7 )
this_firday = friday.strftime('%Y-%m-%d')
df['date3']=df['date'].map({'Fri':this_firday})
df['date3'] = pd.to_datetime(df['date3'])
df['dateFinal'] = df['date1'] + df['date2'] + df['date3']
I check the dtypes, they're all datetime, I don't know why. my approach is not efficient, feel free to let me know a better way.
IIUC:
try via bfill() on axis=1:
df['dateFinal'] = df[['date1','date2','date3']].bfill(axis=1).iloc[:,0]
OR
via ffill() on axis=1:
df['dateFinal'] = df[['date1','date2','date3']].ffill(axis=1).iloc[:,-1]
OR
via stack()+to_numpy()
df['dateFinal'] = df[['date1','date2','date3']].stack().to_numpy()
output of df:
date date1 date2 date3 dateFinal
0 Apr 2021 NaT 2021-04-01 NaT 2021-04-01
1 Jan 1 2021-01-01 NaT NaT 2021-01-01
2 Fri NaT NaT 2021-08-13 2021-08-13
3 Jan 18 2021-01-18 NaT NaT 2021-01-18
I have the following df:
time_series date sales
store_0090_item_85261507 1/2020 1,0
store_0090_item_85261501 2/2020 0,0
store_0090_item_85261500 3/2020 6,0
Being 'date' = Week/Year.
So, I tried use the following code:
df['date'] = df['date'].apply(lambda x: datetime.strptime(x + '/0', "%U/%Y/%w"))
But, return this df:
time_series date sales
store_0090_item_85261507 2020-01-05 1,0
store_0090_item_85261501 2020-01-12 0,0
store_0090_item_85261500 2020-01-19 6,0
But, the first day of the first week of 2020 is 2019-12-29, considering sunday as first day. How can I have the first day 2020-12-29 of the first week of 2020 and not 2020-01-05?
From the datetime module's documentation:
%U: Week number of the year (Sunday as the first day of the week) as a zero padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0.
Edit: My originals answer doesn't work for input 1/2023 and using ISO 8601 date values doesn't work for 1/2021, so I've edited this answer by adding a custom function
Here is a way with a custom function
import pandas as pd
from datetime import datetime, timedelta
##############################################
# to demonstrate issues with certain dates
print(datetime.strptime('0/2020/0', "%U/%Y/%w")) # 2019-12-29 00:00:00
print(datetime.strptime('1/2020/0', "%U/%Y/%w")) # 2020-01-05 00:00:00
print(datetime.strptime('0/2021/0', "%U/%Y/%w")) # 2020-12-27 00:00:00
print(datetime.strptime('1/2021/0', "%U/%Y/%w")) # 2021-01-03 00:00:00
print(datetime.strptime('0/2023/0', "%U/%Y/%w")) # 2023-01-01 00:00:00
print(datetime.strptime('1/2023/0', "%U/%Y/%w")) # 2023-01-01 00:00:00
#################################################
df = pd.DataFrame({'date':["1/2020", "2/2020", "3/2020", "1/2021", "2/2021", "1/2023", "2/2023"]})
print(df)
def get_first_day(date):
date0 = datetime.strptime('0/' + date.split('/')[1] + '/0', "%U/%Y/%w")
date1 = datetime.strptime('1/' + date.split('/')[1] + '/0', "%U/%Y/%w")
date = datetime.strptime(date + '/0', "%U/%Y/%w")
return date if date0 == date1 else date - timedelta(weeks=1)
df['new_date'] = df['date'].apply(lambda x:get_first_day(x))
print(df)
Input
date
0 1/2020
1 2/2020
2 3/2020
3 1/2021
4 2/2021
5 1/2023
6 2/2023
Output
date new_date
0 1/2020 2019-12-29
1 2/2020 2020-01-05
2 3/2020 2020-01-12
3 1/2021 2020-12-27
4 2/2021 2021-01-03
5 1/2023 2023-01-01
6 2/2023 2023-01-08
You'll want to use ISO week parsing directives, Ex:
import pandas as pd
date = pd.Series(["1/2020", "2/2020", "3/2020"])
pd.to_datetime(date+"/1", format="%V/%G/%u")
0 2019-12-30
1 2020-01-06
2 2020-01-13
dtype: datetime64[ns]
you can also shift by one day if the week should start on Sunday:
pd.to_datetime(date+"/1", format="%V/%G/%u") - pd.Timedelta('1d')
0 2019-12-29
1 2020-01-05
2 2020-01-12
dtype: datetime64[ns]
I have a pandas dataframe and datetime is used as an index in the following format: datetime.date(2018, 12, 31).
Each datetime represents the fiscal year end, i.e. 31/12/2018, 31/12/2017, 31/12/2016 etc.
However, for some companies the fiscal year end may be 30/11/2018 or 31/10/2018 and etc. instead of the last date of each year.
Is there any quick way in changing the non-standardized datetime to the last date of each year?
i.e. from 30/11/2018 to 30/12/2018 and 31/10/2018 to 31/12/2018 an so on.....
df = pd.DataFrame({'datetime': ['2019-01-02','2019-02-01', '2019-04-01', '2019-06-01', '2019-11-30','2019-12-30'],
'data': [1,2,3,4,5,6]})
df['datetime'] = pd.to_datetime(df['datetime'])
df['quarter'] = df['datetime'] + pd.tseries.offsets.QuarterEnd(n=0)
df
datetime data quarter
0 2019-01-02 1 2019-03-31
1 2019-02-01 2 2019-03-31
2 2019-04-01 3 2019-06-30
3 2019-06-01 4 2019-06-30
4 2019-11-30 5 2019-12-31
5 2019-12-30 6 2019-12-31
We have a datetime column with random dates I picked. Then we add a timeseries offset to the end of each date to make it quarter end and standardize the times.