Subtract/add days to Pandas Timestamp - python

How do I subtract/add days (integer) to a Pandas Timestamp object?
For example, my atomics and datatypes are (lifted from Pycharm):
startDate = {Timestamp} 2008-09-20 00:00:00
dayDistance = {int} 124
The code as pulled from the Internet returns None:
from datetime import timedelta
newDate = startDate - timedelta(days=dayDistance)
I am expecting an object of type Timestamp so it is compatible with the rest of the code downstream from here.

pandas has its own Timedelta data type:
start_date = pd.Timestamp("2008-09-20 00:00:00")
dayDistance = 124
new_date = start_date - pd.Timedelta(dayDistance, unit="d")
But Python's built-in timedelta works too:
from datetime import timedelta
new_date = start_date - timedelta(days=dayDistance)

Actually it works fine on my test. But you may use pd.Timedelta instead
import pandas as pd
from datetime import timedelta
ts = pd.Timestamp.now()
print(f"{type(ts)} {ts}")
dt_td_ts = ts - timedelta(days=10)
pd_td_ts = ts - pd.Timedelta(10, unit='d')
print(f"With datetime timedelta: ({type(dt_td_ts)}) {dt_td_ts}")
print(f"With pandas timedelta: ({type(pd_td_ts)}) {pd_td_ts}")
returns
<class 'pandas._libs.tslibs.timestamps.Timestamp'> 2022-05-15 22:20:04.596195
With datetime timedelta: (<class 'pandas._libs.tslibs.timestamps.Timestamp'>) 2022-05-05 22:20:04.596195
With pandas timedelta: (<class 'pandas._libs.tslibs.timestamps.Timestamp'>) 2022-05-05 22:20:04.596195
as you can see the type is Timestamp as expected even when using datetime.timedelta

Related

How to convert datetime into GMT +7 in Python?

I have a dataframe that looks like that:
conversation__created_at
0 2020-10-15T03:39:42.766773+00:00
1 2020-10-14T11:24:33.831177+00:00
2 2020-10-14T08:29:44.192258+00:00
3 2020-10-14T01:42:06.674313+00:00
4 2020-10-13T12:57:04.218184+00:00
How to convert it into GMT +7?
I assume you have a pandas series because the data you posted looks like one.
Then you can use tz_convert, i.e.
import pandas as pd
pd.to_datetime('2020-10-15T03:39:42.766773+00:00').tz_convert('Etc/GMT+7')
As pointed out in the comments, since the datetime carries a T in it, it is of string format, thus we need to convert to datetime first and then convert to the correct timezone.
pd.to_datetime(series).dt.tz_convert('Etc/GMT+7')
You can use datetime library only.
from datetime import datetime, timedelta, timezone
d = datetime.fromisoformat("2020-10-15T03:39:42.766773+00:00")
tz = timezone(timedelta(hours=7))
new_time = d.astimezone(tz)
you can use pytz to set timezone for your datetime instance
for example:
from pytz import timezone
from datetime import datetime
date = datetime.now()
print(date)
tz = timezone("Etc/GMT+7")
date = date.replace(tzinfo=tz)
print(date)
out put:
2020-10-26 10:33:25.934699
2020-10-26 10:33:25.934699-07:00
You can apply pytz.timezone on the df
from pytz import timezone
from datetime import datetime
def myDate(x):
tz = timezone("Etc/GMT+7")
dt = x.replace(tzinfo=tz)
return dt
df['conversation__created_at'] = df.apply(lambda row: myDate(row['conversation__created_at'].to_pydatetime()))

Python: get time difference between 2 time columns of dataframe and filter them

My time.csv is like:
end_date start_date
2017-01-01 17:00:00 2017-01-01 16:30:00
2017-01-03 17:05:00 2016-01-03 21:05:00
I want to add another column duration that contains the difference in hours. Here is what I have so far:
import pandas as pd
from datetime import datetime, timedelta
df_time = pd.read_csv('time.csv')
df_time["duration"] = (datetime.strptime(df_time["end_date"], '%Y-%m-%d %H:%M:%S') - \
datetime.strptime(df_time["start_date"], '%Y-%m-%d %H:%M:%S'))/ \
timedelta(hours = 1)
print(df_time["duration"].head())
But I got the following error
TypeError: strptime() argument 1 must be str, not Series
How do I convert Series to str so that the parse function works?
Secondly, how do I truncate the top 1% longest of duration?
As #Quang Hoang said, you can convert the time series column from string into Timestamp format and then it will be easy to find the duration between them.
import pandas as pd
time_data = pd.read_csv("time.csv")
time_data.loc[: , 'end_date'] = pd.to_datetime(time_data.loc[: , 'end_date'])
time_data.loc[: , 'start_date'] = pd.to_datetime(time_data.loc[: , 'start_date'])
time_data['duration'] = time_data['end_date'] - time_data['start_date']
Here is the screenshot of the output:
Hope it helps :)

pandas get integer seconds when doing a time difference instead of a x days mm:ss:hh format

I have two date-hour column A and B of type shown below. Both colum are in a Dataframe (pandas).
yyyy-mm-dd hh:mm:ss
I create
df['difference'] = df['A'] - df['B']
I get a format like
0 days 00:01:13
I would prefer to have a column which contains the seconds in integer. For instance, I need to get 73 in my above example instead of 1min13.
How to do that?
We can use total_seconds
(df['A'] - df['B']).dt.total_seconds()
from datetime import datetime, time
#Specified date
date1 = datetime.strptime('2015-01-01 01:00:00', '%Y-%m-%d %H:%M:%S')
import datetime
old_time = date1
print(old_time)
new_time = old_time - datetime.timedelta(hours=2, minutes=10)
print(new_time)

Iterate over and perform function on python datetime objects

being inexperienced with Python I am unable to properly deal with the datetime objects, when wanting to iterate over them. I imported timestamps from a csv file and parsed them into datetime objects. Now I am unable to perform functions on them, because I get one error after the other. Please see my code, what causes the "TypeError: 'datetime.datetime' object is not iterable".
If there is no simple solution to my problem, can someone tell me how to save the datetime objects into a list?
The function of my code is inspired by this post, which works on date time objects from a list: Getting the closest date to a given date
Thanks in advance.
from datetime import datetime, timedelta
import pandas as pd
from dateutil.parser import parse
csvFile = pd.read_csv('myFile.csv')
column = csvFile['timestamp']
column = column.str.slice(0, 19, 1)
dt1 = datetime.strptime(column[1], '%Y-%m-%d %H:%M:%S')
print("dt1", dt1) #output: dt1 2010-12-30 15:06:00
dt2 = datetime.strptime(column[2], '%Y-%m-%d %H:%M:%S')
print("dt2", dt2) #output: dt2 2010-12-30 16:34:00
dt3 = dt1 - dt2
print("dt3", dt3) #output: dt3 -1 day, 22:32:00
#parsing the timestamps as datetime objects works:
for row in range(len(column)):
timestamp = datetime.strptime(column[row], '%Y-%m-%d %H:%M:%S')
print("timestamp", timestamp) #output (excerpt): timestamp 2010-12-30 14:32:00 timestamp 2010-12-30 15:06:00
here error occurs:
base_date = dt1
def func(x):
d = x[0]
delta = d - base_date if d > base_date else timedelta.max
return delta
min(timestamp, key = func)
timestamp is an instance of datetime.datetime,you should put it into a list or a tuple.
after correct,it should be like this min([timestamp], key = func)

Pandas corrupting datetime object

I am trying to create timezone aware date column in a pandas DataFrame. When I run the code below, the resulting pandas column does not have the same datetime as the one I inputted. What am I doing wrong here?
I am using python 3.6.2 and pandas 0.20.3
from datetime import datetime
import pandas as pd
import pytz
date_string = "12/14/2016 12:00"
timezone = pytz.timezone("US/Pacific")
input_datetime = datetime.strptime(date_string, "%m/%d/%Y %H:%M").replace(tzinfo=timezone)
df = pd.DataFrame({"datetime":[input_datetime]})
If I run that code, df['datetime'][0].minute returns 53 while input_datetime.minute returns 0.
When I don't replace the tzinfo I do not have a problem.
If you first convert your input_datetime you can call the minutes (or years etc) of your dataframe with .dt.minute
input_datetime = pd.to_datetime(datetime.strptime(date_string,
"%m/%d/%Y %H:%M")).replace(tzinfo=timezone)
df = pd.DataFrame({"datetime":[input_datetime]})
df['datetime'].dt.minute
You can use pandas .dt and tz_localize:
from datetime import datetime
import pandas as pd
date_string = "12/14/2016 12:00"
df = pd.DataFrame({'datetime':[datetime.strptime(date_string, "%m/%d/%Y %H:%M")]})
df['datetime'].dt.tz_localize('US/Pacific')
Output:
0 2016-12-14 12:00:00-08:00
Name: datetime, dtype: datetime64[ns, US/Pacific]

Categories