How to add a year to a column of dates in pandas - python

I am attempting to add a year to a column of dates in a pandas dataframe, but when I use pd.to_timedelta I get additional hours & minutes. I know I could take the updated time and truncate the hours, but I feel like there must be a way to add a year precisely. My attempt as follows:
import pandas as pd
dates = pd.DataFrame({'date':['20170101','20170102','20170103']})
dates['date'] = pd.to_datetime(dates['date'], format='%Y%m%d')
dates['date2'] = dates['date'] + pd.to_timedelta(1, unit='y')
dates
yields:
Out[1]:
date date2
0 2017-01-01 2018-01-01 05:49:12
1 2017-01-02 2018-01-02 05:49:12
2 2017-01-03 2018-01-03 05:49:12
How can I add a year without adding 05:49:12 HH:mm:ss?

In [99]: dates['date'] + pd.offsets.DateOffset(years=1)
Out[99]:
0 2018-01-01
1 2018-01-02
2 2018-01-03
Name: date, dtype: datetime64[ns]
leap year check:
In [100]: pd.to_datetime(['2011-02-28', '2012-02-29']) + pd.offsets.DateOffset(years=1)
Out[100]: DatetimeIndex(['2012-02-28', '2013-02-28'], dtype='datetime64[ns]', freq=None)

You can normalize via pd.Series.dt.normalize:
dates['date2'] = (dates['date'] + pd.to_timedelta(1, unit='y')).dt.normalize()

Or convert datetime to date
dates['date'] = dates['date'].apply(lambda a: a.date())

Edit: This works if you don't care about leap years, etc. Otherwise see jp_data_analysis's answer.
You can use 365 and unit='d':
pd.to_timedelta(365, unit='d')

You can access the components of a date (year, month and day) using code of the form dataframe["column"].dt.component.
For example, the month component is dataframe["column"].dt.month, and the year component is dataframe["column"].dt.year.

Related

How do I adjust the dates of a column in pandas according to a threshhold?

I have a data frame with a datetime column like so:
dates
0 2017-09-19
1 2017-08-28
2 2017-07-13
I want to know if there is a way to adjust the dates with this condition:
If the day of the date is before 15, then change the date to the end of last month.
If the day of the date is 15 or after, then change the date to the end of the current month.
My desired output would look something like this:
dates
0 2017-09-30
1 2017-08-31
2 2017-06-30
Using np.where and Josh's suggestion of MonthEnd, this can be simplified a bit.
Given:
dates
0 2017-09-19
1 2017-08-28
2 2017-07-13
Doing:
from pandas.tseries.offsets import MonthEnd
# Where the day is less than 15,
# Give the DateEnd of the previous month.
# Otherwise,
# Give the DateEnd of the current month.
df.dates = np.where(df.dates.dt.day.lt(15),
df.dates.add(MonthEnd(-1)),
df.dates.add(MonthEnd(0)))
print(df)
# Output:
dates
0 2017-09-30
1 2017-08-31
2 2017-06-30
Easy with MonthEnd
Let's set up the data:
dates = pd.Series({0: '2017-09-19', 1: '2017-08-28', 2: '2017-07-13'})
dates = pd.to_datetime(dates)
Then:
from pandas.tseries.offsets import MonthEnd
pre, post = dates.dt.day < 15, dates.dt.day >= 15
dates.loc[pre] = dates.loc[pre] + MonthEnd(-1)
dates.loc[post] = dates.loc[post] + MonthEnd(1)
Explanation: create masks (pre and post) first. Then use the masks to either get month end for current or previous month, as appropriate.

Python - Converting a column with weekly data to a datetime object

I have the following date column that I would like to transform to a pandas datetime object. Is it possible to do this with weekly data? For example, 1-2018 stands for week 1 in 2018 and so on. I tried the following conversion but I get an error message: Cannot use '%W' or '%U' without day and year
import pandas as pd
df1 = pd.DataFrame(columns=["date"])
df1['date'] = ["1-2018", "1-2018", "2-2018", "2-2018", "3-2018", "4-2018", "4-2018", "4-2018"]
df1["date"] = pd.to_datetime(df1["date"], format = "%W-%Y")
You need to add a day to the datetime format
df1["date"] = pd.to_datetime('0' + df1["date"], format='%w%W-%Y')
print(df1)
Output
date
0 2018-01-07
1 2018-01-07
2 2018-01-14
3 2018-01-14
4 2018-01-21
5 2018-01-28
6 2018-01-28
7 2018-01-28
As the error message says, you need to specify the day of the week by adding %w :
df1["date"] = pd.to_datetime( '0'+df1.date, format='%w%W-%Y')

Pandas DateTime for Month

I have month column with values formatted as: 2019M01
To find the seasonality I need this formatted into Pandas DateTime format.
How to format 2019M01 into datetime so that I can use it for my seasonality plotting?
Thanks.
Use to_datetime with format parameter:
print (df)
date
0 2019M01
1 2019M03
2 2019M04
df['date'] = pd.to_datetime(df['date'], format='%YM%m')
print (df)
date
0 2019-01-01
1 2019-03-01
2 2019-04-01

Adding 1 day to selective time period of date column

I have a table where it has a column 'Date', 'Time', 'Costs'.
I want to select rows where the time is greater than 12:00:00, then add 1 day to 'Date' column of the selected rows.
How should I go about in doing it?
So far I have:
df[df['Time']>'12:00:00']['Date'] = df[df['Time']>'12:00:00']['Date'].astype('datetime64[ns]') + timedelta(days=1)
I am a beginner in learning coding and any suggestions would be really helpful! Thanks.
Use to_datetime first for column Date if not datetimes, then convert column Time to string if possible python times, convert to datetimes and get hours by Series.dt.hour, compare and add 1 day by condition:
df = pd.DataFrame({'Date':['2015-01-02','2016-05-08'],
'Time':['10:00:00','15:00:00']})
print (df)
Date Time
0 2015-01-02 10:00:00
1 2016-05-08 15:00:00
df['Date'] = pd.to_datetime(df['Date'])
mask = pd.to_datetime(df['Time'].astype(str)).dt.hour > 12
df.loc[mask, 'Date'] += pd.Timedelta(days=1)
print (df)
Date Time
0 2015-01-02 10:00:00
1 2016-05-09 15:00:00

Pandas DatetimeIndex and to_datetime discrepancies when calculate (format) the same date

I've got a simple task of creating consectuive days and do some calculations on it.
I did it using:
date = pd.DatetimeIndex(start='2019-01-01', end='2019-01-10',freq='D')
df = pd.DataFrame([date, date.week, date.dayofweek], index=['Date','Week', 'DOW']).T
df
and now I want to calculate back the date from week and day of week using:
df['Date2'] = pd.to_datetime('2019' + df['Week'].map(str) + df['DOW'].map(str), format='%Y%W%w')
The result I get is:
As I understand it DatetimeIndex has a different method of calculating Week Number as 1stJan2019 should be Week=0 and dow=2 and it is when I try run code: pd.to_datetime('201902', format='%Y%W%w') : Timestamp('2019-01-01 00:00:00')
Simmilar questions where asked here and here but both for both of them the discrepency came from different time zones and here I don't use them.
Thanks for help!
According to the documentation https://github.com/d3/d3-time-format#api-reference,
it appears %W is Monday-based week whereas %w is Sunday-based weekday.
I ran the code bellow to get back the expected result :
date = pd.DatetimeIndex(start='2019-01-01', end='2019-01-10',freq='D')
df = pd.DataFrame([date, date.week, date.weekday_name, date.dayofweek], index=['Date','Week', 'Weekday', 'DOW']).T
df['Week'] = df['Week'] - 1
df['Date2'] = pd.to_datetime('2019' + df['Week'].map(str) + df['Weekday'].map(str), format='%Y%W%A', box=True)
Notice that 2018-12-31 is in the first week of year 2019
Date Week Weekday DOW Date2
0 2018-12-31 00:00:00 0 Monday 0 2018-12-31

Categories