pandas - add 1 month to a pd.Timestamp [duplicate] - python

This question already has answers here:
Add months to a date in Pandas
(4 answers)
How can I get pandas Timestamp offset by certain amount of months?
(1 answer)
Closed 4 years ago.
I have multiple df, and they are indexed with timestamps for consecutive months. For example:
1996-01-01 01:00:00
1996-02-01 01:00:00
1996-03-01 01:00:00
1996-04-01 01:00:00
1996-05-01 01:00:00
1996-06-01 01:00:00
I'm trying to create a function where I can add an arbitrary number of rows onto the df, continuing on from whatever the last month happens to be. I tried to solve this by using:
df.iloc[-1].name + pd.Timedelta(1, unit='M')
in a for loop, but this only seems to add 30 days, instead of changing the month value +1. Is there a more reliable way to fetch a pd.Timestamp and add 1 month?
Thank you

Related

how to add data in index column in panda dataframe [duplicate]

This question already has answers here:
Offset date for a Pandas DataFrame date index
(3 answers)
Closed 1 year ago.
I have below dataframe where the first column is without any header. and I need to add 14 days to each value in that column. How can I do it?
L122.Y 5121.Y 110.Y
2021-08-30 14:00:00 0.0 0.0 35.778441
2021-08-30 15:00:00 0.0 0.0 35.741066
2021-08-30 16:00:00 0.0 0.0 35.737846
I think first column is called index, test it:
print (df.index)
If need convert it to DatetimeIndex and add days use:
df.index = pd.to_datetime(df.index) + pd.Timedelta('14 days')
If already DatetimeIndex:
df.index += pd.Timedelta('14 days')

Python Timedelta[M] adds incomplete days

I have a table that has a column Months_since_Start_fin_year and a Date column. I need to add the number of months in the first column to the date in the second column.
DateTable['Date']=DateTable['First_month']+DateTable['Months_since_Start_fin_year'].astype("timedelta64[M]")
This works OK for month 0, but month 1 already has a different time and for month 2 onwards has the wrong date.
Image of output table where early months have the correct date but month 2 where I would expect June 1st actually shows May 31st
It must be adding incomplete months, but I'm not sure how to fix it?
I have also tried
DateTable['Date']=DateTable['First_month']+relativedelta(months=DateTable['Months_since_Start_fin_year'])
but I get a type error that says
TypeError: cannot convert the series to <class 'int'>
My Months_since_Start_fin_year is type int32 and my First_month variable is datetime64[ns]
The problem with adding months as an offset to a date is that not all months are equally long (28-31 days). So you need pd.DateOffset which handles that ambiguity for you. .astype("timedelta64[M]") on the other hand only gives you the average days per month within a year (30 days 10:29:06).
Ex:
import pandas as pd
# a synthetic example since you didn't provide a mre
df = pd.DataFrame({'start_date': 7*['2017-04-01'],
'month_offset': range(7)})
# make sure we have datetime dtype
df['start_date'] = pd.to_datetime(df['start_date'])
# add month offset
df['new_date'] = df.apply(lambda row: row['start_date'] +
pd.DateOffset(months=row['month_offset']),
axis=1)
which would give you e.g.
df
start_date month_offset new_date
0 2017-04-01 0 2017-04-01
1 2017-04-01 1 2017-05-01
2 2017-04-01 2 2017-06-01
3 2017-04-01 3 2017-07-01
4 2017-04-01 4 2017-08-01
5 2017-04-01 5 2017-09-01
6 2017-04-01 6 2017-10-01
You can find similar examples here on SO, e.g. Add months to a date in Pandas. I only modified the answer there by using an apply to be able to take the months offset from one of the DataFrame's columns.

Datetime difference between two columns without counting the full 24 hours [duplicate]

This question already has answers here:
Differance between two days excluding weekends in hours
(3 answers)
Closed 2 years ago.
here are two really useful questions for datetime comparison in python:
Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes
Determine the difference between two DateTimes, only counting opening hours
I have a dataframe in python with two columns:
A B
10:00:00 01.01.2019 12:00:00 02.01.2019
And I have opening hours (which are relevant hours), which only should count for the calculation, so not the full 24 hours and maybe not every day. So my business would be open from: 10:00:00 - 18:00:00 every day, how can i adjust:
df_time['td'] = df_time['B']-df_time['A']
That the outcome would be 10 hours in this case? And it is open from monday to friday.
Not a full answer, but I would do:
Count the time in the first day 18:00:00 - df['A'].dt.time
Count the time in the last day df['B'] - 10:00:00
Count the day in between and multiply by 8.

Computing time between two dates and returning number of days [duplicate]

This question already has answers here:
Pandas Timedelta in Days
(5 answers)
Closed 3 years ago.
Given two columns in a dataframe that are date time objects:
Checkin Checkout
2018-09-13 19:55:00 2018-09-16 13:08:00
I'd like to compute the time difference in days and have it output as an integer to a new column. So far, I've done this but the output also includes seconds.
delta = df['Checkin'] - df['Checkout']
print(delta)
The output however ends up being:
2 days 17:13:00
and is output as a DT object. I'd like it to just output as 2 and as an integer in a new column.
How would I go about doing that?
You need dt.days
(df['checkin'] - df['checkout']).dt.days
Output:
0 -3
dtype: int64

How to work around Out of bounds nanosecond [duplicate]

This question already has an answer here:
Filling missing date values with the least possible date in Pandas dataframe
(1 answer)
Closed 3 years ago.
LastLogin LastPurchased
2018-08-21 00:28:04.081677 0001-01-01 00:00:00
2018-08-21 00:28:58.209522 2018-08-20 00:28:58.209522
I need difference in days (df[LastLogin] - df['LastPurchased']).dt.days but there are some '0001-01-01 00:00:00' in LastPurchased. Anything I try to do to change 1-01-01 to a date within the Panda bounds results in Out of bounds nanosecond timestamp: 1-01-01 00:00:00. Is there any other ways?
LastLogin LastPurchased Days
2018-08-21 00:28:04.081677 1999-01-01 00:00:00 6935
2018-08-21 00:28:58.209522 2018-08-20 00:28:58.209522 1
Pandas requires that the year in your datetime be greater than 1677 and less than 2622 (approximately - see pandas/_libs/tslibs/src/datetime/np_datetime.c for the exact bounds). Otherwise, the given date is outside the range that can be represented by nanosecond-resolution 64-bit integers:
>>> pd.Timestamp.max
Timestamp('2262-04-11 23:47:16.854775807')
>>> pd.Timestamp.min
Timestamp('1677-09-21 00:12:43.145225')
>>> pd.Timestamp.max - pd.Timestamp.min
datetime.timedelta(213503, 84873, 709550)
It's up to you how you want to handle this. Consider what you are ultimately trying to indicate by subtracting the date 0001-01-01. I'll assume that means a user has logged in but never purchased.
To coerce LastPurchased to either a valid Pandas Timestamp or pd.NaT ("not a time"), you can use
df['LastPurchased'] = pd.to_datetime(df['LastPurchased'], errors='coerce')
This will give NaT as the difference in those spots:
>>> pd.Timestamp(2018, 1, 1) - pd.NaT
NaT
Which you can use as a "sentinel" and check for with pd.isnat().

Categories