In Pandas, I would like to add a number of days to a date/datetime (two columns).
Example:
dates = pd.Series(pd.date_range("20180101 00:00", "20180104 00:00"))
0 2018-01-01
1 2018-01-02
2 2018-01-03
3 2018-01-04
dtype: datetime64[ns]
days = pd.Series(np.arange(4)).astype('float')
0 0.0
1 1.0
2 2.0
3 3.0
dtype: float64
What I have tried (and the associated error I get):
dates + days
TypeError: cannot operate on a series without a rhs of a series/ndarray of type datetime64[ns] or a timedelta
dates + days.astype('int')
TypeError: incompatible type for a datetime/timedelta operation [__add__]
dates + pd.DateOffset(days=days)
TypeError: DatetimeIndex cannot perform the operation +
dates + np.timedelta64(days.values)
ValueError: Could not convert object to NumPy timedelta
dates + pd.offsets.Day(days)
TypeError: cannot convert the series to
dates + pd.datetools.timedelta(days=days)
TypeError: unsupported type for timedelta days component: Series
At last I have found two methods!:
dates + pd.to_timedelta(days, unit='D')
or
dates + pd.TimedeltaIndex(days, unit='D')
Which both produce:
0 2018-01-01
1 2018-01-03
2 2018-01-05
3 2018-01-07
dtype: datetime64[ns]
(It took ages to find the relevant documentation: https://pandas.pydata.org/pandas-docs/stable/timedeltas.html#to-timedelta)
Related
i have dates in dataframe from 2015-01-01 00:00:00 to 2022-09-05 23:59:00
I want to extract all the data inbetween 04-01 00:00:00 and 10-31 23:59:59
i tried
`
filtered_df = df.loc[(df['date'] >= '04-01 00:00:00')
& (df['date'] < '10-31 23:59:59')]
filtered_df.head()
`it says
TypeError: Invalid comparison between dtype=datetime64[ns] and str
althrough when i type df.dtypes date is
datetime64[ns]
it should show dates like after 2015-10-31 23:59:59 there is 2016-04-01 00:00:00
not any other date until it is within these 6 months
Is it even possible ?
df[(df.date.dt.month>=4) & (df.date.dt.month<=10)]
Key point is .dt that allows datetime specific accessor (a little bit like .str allows string specific methods.
I have a pandas Series object with dates as index and values as a share price of a company. I would like to slice the data, so that I have let´s say a date 10.01.2022, and I want a slice from 3 previous dates and 5 next days from this date. Is that easily done? Or do I have to convert it, add/subtract those numbers from that date, and convert back? I´m a bit lost in all that datetime, strptime, to_datetime,...
Something like this:
date = "10.01.2022"
share_price = [date - 3 : date + 5]
Thank you
You can use .loc[]. Both ends will be inclusive.
Example:
s = pd.Series([1,2,3,4,5,6],
index = pd.to_datetime([
'07.01.2022', '09.01.2022', '10.01.2022',
'12.01.2022', '15.01.2022', '16.01.2022'
], dayfirst=True))
date = pd.to_datetime("10.01.2022", dayfirst=True)
s:
2022-01-07 1
2022-01-09 2
2022-01-10 3
2022-01-12 4
2022-01-15 5
2022-01-16 6
dtype: int64
date:
Timestamp('2022-01-10 00:00:00')
s.loc[date - pd.Timedelta('3d') : date + pd.Timedelta('5d')]
2022-01-07 1
2022-01-09 2
2022-01-10 3
2022-01-12 4
2022-01-15 5
dtype: int64
Edit:
To add business days:
from pandas.tseries.offsets import BDay
s.loc[date - BDay(3) : date + BDay(5)]
I have a dataframe df and its first column is timedelta64
df.info():
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 686 entries, 0 to 685
Data columns (total 6 columns):
0 686 non-null timedelta64[ns]
1 686 non-null object
2 686 non-null object
3 686 non-null object
4 686 non-null object
5 686 non-null object
If I print(df[0][2]), for example, it will give me 0 days 05:01:11. However, I don't want the 0 days filed. I only want 05:01:11 to be printed. Could someone teaches me how to do this? Thanks so much!
It is possible by:
df['duration1'] = df['duration'].astype(str).str[-18:-10]
But solution is not general, if input is 3 days 05:01:11 it remove 3 days too.
So solution working only for timedeltas less as one day correctly.
More general solution is create custom format:
N = 10
np.random.seed(11230)
rng = pd.date_range('2017-04-03 15:30:00', periods=N, freq='13.5H')
df = pd.DataFrame({'duration': np.abs(np.random.choice(rng, size=N) -
np.random.choice(rng, size=N)) })
df['duration1'] = df['duration'].astype(str).str[-18:-10]
def f(x):
ts = x.total_seconds()
hours, remainder = divmod(ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
df['duration2'] = df['duration'].apply(f)
print (df)
duration duration1 duration2
0 2 days 06:00:00 06:00:00 54:00:00
1 2 days 19:30:00 19:30:00 67:30:00
2 1 days 03:00:00 03:00:00 27:00:00
3 0 days 00:00:00 00:00:00 0:00:00
4 4 days 12:00:00 12:00:00 108:00:00
5 1 days 03:00:00 03:00:00 27:00:00
6 0 days 13:30:00 13:30:00 13:30:00
7 1 days 16:30:00 16:30:00 40:30:00
8 0 days 00:00:00 00:00:00 0:00:00
9 1 days 16:30:00 16:30:00 40:30:00
Here's a short and robust version using apply():
df['timediff_string'] = df['timediff'].apply(
lambda x: f'{x.components.hours:02d}:{x.components.minutes:02d}:{x.components.seconds:02d}'
if not pd.isnull(x) else ''
)
This leverages the components attribute of pandas Timedelta objects and also handles empty values (NaT).
If the timediff column does not contain pandas Timedelta objects, you can convert it:
df['timediff'] = pd.to_timedelta(df['timediff'])
datetime.timedelta already formats the way you'd like. The crux of this issue is that Pandas internally converts to numpy.timedelta.
import pandas as pd
from datetime import timedelta
time_1 = timedelta(days=3, seconds=3400)
time_2 = timedelta(days=0, seconds=3400)
print(time_1)
print(time_2)
times = pd.Series([time_1, time_2])
# Times are converted to Numpy timedeltas.
print(times)
# Convert to string after converting to datetime.timedelta.
times = times.apply(
lambda numpy_td: str(timedelta(seconds=numpy_td.total_seconds())))
print(times)
So, convert to a datetime.timedelta and then str (to prevent conversion back to numpy.timedelta) before printing.
3 days, 0:56:40
0:56:400
0 3 days 00:56:40
1 0 days 00:56:40
dtype: timedelta64[ns]
0 3 days, 0:56:40
1 0:56:40
dtype: object
I came here looking for answers to the same question, so I felt I should add further clarification. : )
You can convert it into a Python timedelta, then to str and finally back to a Series:
pd.Series(df["duration"].dt.to_pytimedelta().astype(str), name="start_time")
Given OP is ok with an object column (a little verbose):
def splitter(td):
td = str(td).split(' ')[-1:][0]
return td
df['split'] = df['timediff'].apply(splitter)
Basically we're taking the timedelta column, transforming the contents to a string, then splitting the string (creates a list) and taking the last item of that list, which would be the hh:mm:ss component.
Note that specifying ' ' for what to split by is redundant here.
Alternative one liner:
df['split2'] = df['timediff'].astype('str').str.split().str[-1]
which is very similar, but not very pretty IMHO. Also, the output includes milliseconds, which is not the case in the first solution. I'm not sure what the reason for that is (please comment if you do). If your data is big it might be worthwhile to time these different approaches.
If wou want to remove all nonzero components (not only days), you can do it like this:
def pd_td_fmt(td):
import pandas as pd
abbr = {'days': 'd', 'hours': 'h', 'minutes': 'min', 'seconds': 's', 'milliseconds': 'ms', 'microseconds': 'us',
'nanoseconds': 'ns'}
fmt = lambda td:"".join(f"{v}{abbr[k]}" for k, v in td.components._asdict().items() if v != 0)
if isinstance(td, pd.Timedelta):
return fmt(td)
elif isinstance(td,pd.TimedeltaIndex):
return td.map(fmt)
else:
raise ValueError
If you can be sure that your timedelta is less than a day, this might work. To do this in as few lines as possible, I convert the timedelta to a datetime by adding the unix epoch 0 and then using the now-datetime dt function to format the date format.
df['duration1'] = (df['duration'] + pd.to_datetime(0)).dt.strftime('%M:%S')
I am trying to get a new date after adding some days in the actual date. But I am getting the following error:
TypeError: unsupported type for timedelta days component: Series
My sample data and code is provided below:
data={'Date':['8/24/2020','8/26/2020','9/20/2020','10/26/2020','5/26/2020','4/26/2020'],
'Days':[23,34,56,78,65,54]}
df=pd.DataFrame(data,columns=['Date','Days'])
df.Date=pd.to_datetime(df.Date)
df.newdate=df.Date+timedelta(df.Days)
Use pd.to_timedelta
import pandas as pd
df['new_date'] = df.Date + pd.to_timedelta(df.Days, unit='d')
0 2020-09-16
1 2020-09-29
2 2020-11-15
3 2021-01-12
4 2020-07-30
5 2020-06-19
Name: new_date, dtype: datetime64[ns]
Try the following code:
def get_newdate(row):
return row['Date'] + pd.Timedelta(days = row['Days'])
call the above function in the following manner
df.Date = pd.to_datetime(df.Date)
df['newDate'] = df.apply(get_newdate, axis = 1)
I have a pandas df that contains a Column of timestamps. Some of the timestamps are after midnight. These are in 24hr time. I'm trying to add 12hrs to these times so it's consistent.
import pandas as pd
import datetime as dt
import numpy as np
d = ({
'time' : ['9:00:00','10:00:00','11:00:00','12:00:00','01:00:00','02:00:00'],
})
df = pd.DataFrame(data=d)
I have used the following code from another question. But I can't get it to include all the values. The dates are also not necessary.
Convert incomplete 12h datetime-like strings into appropriate datetime type
ts = pd.to_datetime(df.time, format = '%H:%M:%S')
ts[ts.dt.hour == 12] -= pd.Timedelta(12, 'h')
twelve = ts.dt.time == dt.time(0,0,0)
newdate = ts.dt.date.diff() > pd.Timedelta(0)
midnight = twelve & newdate
noon = twelve & ~newdate
offset = pd.Series(np.nan, ts.index, dtype='timedelta64[ns]')
offset[midnight] = pd.Timedelta(0)
offset[noon] = pd.Timedelta(12, 'h')
offset.fillna(method='ffill', inplace=True)
ts = ts.add(offset, fill_value=0).dt.strftime('%H:%M:%S')
print(ts)
Output:
TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('O')
My intended Output is
time
0 9:00:00
1 10:00:00
2 11:00:00
3 12:00:00
4 13:00:00
5 14:00:00
I think need change last row of code to add with fill_value=0 for repalce missing values to ts and then time for python times or strftime for strings:
ts = ts.add(offset, fill_value=0).dt.time
print (ts)
0 09:00:00
1 10:00:00
2 11:00:00
3 12:00:00
4 13:00:00
5 14:00:00
dtype: object
print (ts.apply(type))
0 <class 'datetime.time'>
1 <class 'datetime.time'>
2 <class 'datetime.time'>
3 <class 'datetime.time'>
4 <class 'datetime.time'>
5 <class 'datetime.time'>
dtype: object
ts = ts.add(offset, fill_value=0).dt.strftime('%H:%M:%S')
print (ts)
0 09:00:00
1 10:00:00
2 11:00:00
3 12:00:00
4 13:00:00
5 14:00:00
dtype: object
print (ts.apply(type))
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
3 <class 'str'>
4 <class 'str'>
5 <class 'str'>
dtype: object