I have a pandas dataframe and datetime is used as an index in the following format: datetime.date(2018, 12, 31).
Each datetime represents the fiscal year end, i.e. 31/12/2018, 31/12/2017, 31/12/2016 etc.
However, for some companies the fiscal year end may be 30/11/2018 or 31/10/2018 and etc. instead of the last date of each year.
Is there any quick way in changing the non-standardized datetime to the last date of each year?
i.e. from 30/11/2018 to 30/12/2018 and 31/10/2018 to 31/12/2018 an so on.....
df = pd.DataFrame({'datetime': ['2019-01-02','2019-02-01', '2019-04-01', '2019-06-01', '2019-11-30','2019-12-30'],
'data': [1,2,3,4,5,6]})
df['datetime'] = pd.to_datetime(df['datetime'])
df['quarter'] = df['datetime'] + pd.tseries.offsets.QuarterEnd(n=0)
df
datetime data quarter
0 2019-01-02 1 2019-03-31
1 2019-02-01 2 2019-03-31
2 2019-04-01 3 2019-06-30
3 2019-06-01 4 2019-06-30
4 2019-11-30 5 2019-12-31
5 2019-12-30 6 2019-12-31
We have a datetime column with random dates I picked. Then we add a timeseries offset to the end of each date to make it quarter end and standardize the times.
Related
Sheet 1 has a column 'Date' with 10 years worth of dates. These dates are trading days for the Australian stockmarket. I'm looking to remove all dates that are not the 15th trading day of each month (not necessarily the 15th day of the month). This code works for the first 12 months of the first year but it stops after that.
Code:
df = pd.read_csv(r'C:\Users\\Desktop\Sheet1.csv')
df['Date'] = pd.to_datetime(df['Date'])
df['month'] = df['Date'].dt.month
df['trading_day'] = df.groupby(['month']).cumcount() + 1
df = df[df['trading_day'] == 15]
df.drop(['month', 'trading_day'], axis=1, inplace=True)
df.to_excel("Sheet2.xlsx", index=False)
Current output:
Date NAV
2009-06-22 00:00:00 $50.7731
2009-07-21 00:00:00 $52.2194
2009-08-21 00:00:00 $55.5233
2009-09-21 00:00:00 $61.1116
2009-10-21 00:00:00 $62.6512
2009-11-20 00:00:00 $60.9736
2009-12-21 00:00:00 $60.2841
2010-01-22 00:00:00 $61.2418
2010-02-19 00:00:00 $59.8768
2010-03-19 00:00:00 $63.4521
2010-04-23 00:00:00 $63.1672
2010-05-21 00:00:00 $55.8651
You also need to group by year to compute the cumcount:
df['trading_day'] = df.groupby([df['Date'].dt.year, 'month']).cumcount() + 1
I have the following date column that I would like to transform to a pandas datetime object. Is it possible to do this with weekly data? For example, 1-2018 stands for week 1 in 2018 and so on. I tried the following conversion but I get an error message: Cannot use '%W' or '%U' without day and year
import pandas as pd
df1 = pd.DataFrame(columns=["date"])
df1['date'] = ["1-2018", "1-2018", "2-2018", "2-2018", "3-2018", "4-2018", "4-2018", "4-2018"]
df1["date"] = pd.to_datetime(df1["date"], format = "%W-%Y")
You need to add a day to the datetime format
df1["date"] = pd.to_datetime('0' + df1["date"], format='%w%W-%Y')
print(df1)
Output
date
0 2018-01-07
1 2018-01-07
2 2018-01-14
3 2018-01-14
4 2018-01-21
5 2018-01-28
6 2018-01-28
7 2018-01-28
As the error message says, you need to specify the day of the week by adding %w :
df1["date"] = pd.to_datetime( '0'+df1.date, format='%w%W-%Y')
I have the following df:
time_series date sales
store_0090_item_85261507 1/2020 1,0
store_0090_item_85261501 2/2020 0,0
store_0090_item_85261500 3/2020 6,0
Being 'date' = Week/Year.
So, I tried use the following code:
df['date'] = df['date'].apply(lambda x: datetime.strptime(x + '/0', "%U/%Y/%w"))
But, return this df:
time_series date sales
store_0090_item_85261507 2020-01-05 1,0
store_0090_item_85261501 2020-01-12 0,0
store_0090_item_85261500 2020-01-19 6,0
But, the first day of the first week of 2020 is 2019-12-29, considering sunday as first day. How can I have the first day 2020-12-29 of the first week of 2020 and not 2020-01-05?
From the datetime module's documentation:
%U: Week number of the year (Sunday as the first day of the week) as a zero padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0.
Edit: My originals answer doesn't work for input 1/2023 and using ISO 8601 date values doesn't work for 1/2021, so I've edited this answer by adding a custom function
Here is a way with a custom function
import pandas as pd
from datetime import datetime, timedelta
##############################################
# to demonstrate issues with certain dates
print(datetime.strptime('0/2020/0', "%U/%Y/%w")) # 2019-12-29 00:00:00
print(datetime.strptime('1/2020/0', "%U/%Y/%w")) # 2020-01-05 00:00:00
print(datetime.strptime('0/2021/0', "%U/%Y/%w")) # 2020-12-27 00:00:00
print(datetime.strptime('1/2021/0', "%U/%Y/%w")) # 2021-01-03 00:00:00
print(datetime.strptime('0/2023/0', "%U/%Y/%w")) # 2023-01-01 00:00:00
print(datetime.strptime('1/2023/0', "%U/%Y/%w")) # 2023-01-01 00:00:00
#################################################
df = pd.DataFrame({'date':["1/2020", "2/2020", "3/2020", "1/2021", "2/2021", "1/2023", "2/2023"]})
print(df)
def get_first_day(date):
date0 = datetime.strptime('0/' + date.split('/')[1] + '/0', "%U/%Y/%w")
date1 = datetime.strptime('1/' + date.split('/')[1] + '/0', "%U/%Y/%w")
date = datetime.strptime(date + '/0', "%U/%Y/%w")
return date if date0 == date1 else date - timedelta(weeks=1)
df['new_date'] = df['date'].apply(lambda x:get_first_day(x))
print(df)
Input
date
0 1/2020
1 2/2020
2 3/2020
3 1/2021
4 2/2021
5 1/2023
6 2/2023
Output
date new_date
0 1/2020 2019-12-29
1 2/2020 2020-01-05
2 3/2020 2020-01-12
3 1/2021 2020-12-27
4 2/2021 2021-01-03
5 1/2023 2023-01-01
6 2/2023 2023-01-08
You'll want to use ISO week parsing directives, Ex:
import pandas as pd
date = pd.Series(["1/2020", "2/2020", "3/2020"])
pd.to_datetime(date+"/1", format="%V/%G/%u")
0 2019-12-30
1 2020-01-06
2 2020-01-13
dtype: datetime64[ns]
you can also shift by one day if the week should start on Sunday:
pd.to_datetime(date+"/1", format="%V/%G/%u") - pd.Timedelta('1d')
0 2019-12-29
1 2020-01-05
2 2020-01-12
dtype: datetime64[ns]
i got dataframe with column like this:
Date
3 mins
2 hours
9-Feb
13-Feb
the type of the dates is string for every row. What is the easiest way to get that dates into integer unixtime ?
One idea is convert columns to datetimes and to timedeltas:
df['dates'] = pd.to_datetime(df['Date']+'-2020', format='%d-%b-%Y', errors='coerce')
times = df['Date'].replace({'(\d+)\s+mins': '00:\\1:00',
'\s+hours': ':00:00'}, regex=True)
df['times'] = pd.to_timedelta(times, errors='coerce')
#remove rows if missing values in dates and times
df = df[df['Date'].notna() | df['times'].notna()]
df['all'] = df['dates'].dropna().astype(np.int64).append(df['times'].dropna().astype(np.int64))
print (df)
Date dates times all
0 3 mins NaT 00:03:00 180000000000
1 2 hours NaT 02:00:00 7200000000000
2 9-Feb 2020-02-09 NaT 1581206400000000000
3 13-Feb 2020-02-13 NaT 1581552000000000000
I'm working on a pandas dataframe, one of my column is a date (YYYYMMDD), another one is an hour (HH:MM), I would like to concatenate the two column as one timestamp or datetime64 column, to later use that column as an index (for a time series). Here is the situation :
Do you have any ideas? The classic pandas.to_datetime() seems to work only if the columns contain hours only, day only and year only, ... etc...
Setup
df
Out[1735]:
id date hour other
0 1820 20140423 19:00:00 8
1 4814 20140424 08:20:00 22
Solution
import datetime as dt
#convert date and hour to str, concatenate them and then convert them to datetime format.
df['new_date'] = df[['date','hour']].astype(str).apply(lambda x: dt.datetime.strptime(x.date + x.hour, '%Y%m%d%H:%M:%S'), axis=1)
df
Out[1756]:
id date hour other new_date
0 1820 20140423 19:00:00 8 2014-04-23 19:00:00
1 4814 20140424 08:20:00 22 2014-04-24 08:20:00