I have a Pandas dataframe with a "Datetime" column that contains a, well, a datetime :)
ColA Datetime ColB
---- ------------------- ----
1 2021-01-01 05:02:22 SomeVal
2 2021-01-01 01:01:22 SomeOtherVal
I want to create a new Date column that has only two rules:
1. If the "time" element of datetime is between 00:00:00 and 02:00:00 then make Date the "date" element of Datetime - 1 (the previous day)
2. Otherwise make Date the "date" element of Datetime as is
To achieve this, I'm going to have to run a check on the Datetime column. How would that look? Also, bonus points if I don't need to iterate the dataframe in order to achieve this.
Convert values to datetimes and if time is less like 02:00:00 subtract one day in Series.mask:
from datetime import time
df['Datetime'] = pd.to_datetime(df['Datetime'])
df['Datetime'] = df['Datetime'].mask(df['Datetime'].dt.time <= time(2, 0, 0),
df['Datetime'] - pd.Timedelta('1 day'))
print (df)
ColA Datetime ColB
0 1 2021-01-01 05:02:22 SomeVal
1 2 2020-12-31 01:01:22 SomeOtherVal
Related
I have a column with timestamps (strings) which look like the following:
2017-10-25T09:57:00.319Z
2017-10-25T09:59:00.319Z
2017-10-27T11:03:00.319Z
Tbh I do not know the meaning of Z but I guess it is not that important.
How to convert the above strings into correct timestamp to calculate the difference/delta (e.g. in seconds or minutes)?
I want to have a column where the deltas between one to anoter timestamp are listed.
You can use pd.to_datetime() to convert the string to datetime format. Then get the time difference/delta by .diff(). Finally, convert the timedelta to seconds by .dt.total_seconds(), as follows:
(Assuming your column of string is named Date):
df['Date'] = pd.to_datetime(df['Date'])
df['TimeDelta'] = df['Date'].diff().dt.total_seconds()
Result:
Time delta in seconds:
print(df)
Date TimeDelta
0 2017-10-25 09:57:00.319000+00:00 NaN
1 2017-10-25 09:59:00.319000+00:00 120.0
2 2017-10-27 11:03:00.319000+00:00 176640.0
id
date
0
2021-18-01
1
2021-17-01
How can I keep rows if the date column has a value not equal to today's date (17th Jan)?
df[df.date != datetime.datetime.today().date()]
Expected Output
id
date
0
2021-18-01
try:
df.date = pd.to_datetime(df.date, format="%Y-%d-%m")
df[df.date!=str(datetime.datetime.today().date())]
date
1 2021-01-17
Setup:
Here we populate a dataframe and convert the data to correct datatypes
df = pd.DataFrame([dict(id=0, date='2021-01-17'),
dict(id=1, date='2021-01-18'),
dict(id=2, date='2021-01-19')])
df = df.set_index('id')
df.date = pd.to_datetime(df.date)
The result would look like this:
id
date
0
2021-01-17
1
2021-01-18
2
2021-01-19
Filtering:
df.loc[df.date.dt.date != pd.Timestamp.now().date()]
The result would look like this (In my timezone it is January, 18th already):
id
date
0
2021-01-17
2
2021-01-19
Explanation
We use LocIndexer .loc accessor to filter the dataframe via an array of booleans.
In order to make the comparison correct we take the date() part from the current timestamp on the right hand side and use the DateTimeProperties .dt accessor to use date property of the underlying Datetime object for the left hand side.
I have a table where it has a column 'Date', 'Time', 'Costs'.
I want to select rows where the time is greater than 12:00:00, then add 1 day to 'Date' column of the selected rows.
How should I go about in doing it?
So far I have:
df[df['Time']>'12:00:00']['Date'] = df[df['Time']>'12:00:00']['Date'].astype('datetime64[ns]') + timedelta(days=1)
I am a beginner in learning coding and any suggestions would be really helpful! Thanks.
Use to_datetime first for column Date if not datetimes, then convert column Time to string if possible python times, convert to datetimes and get hours by Series.dt.hour, compare and add 1 day by condition:
df = pd.DataFrame({'Date':['2015-01-02','2016-05-08'],
'Time':['10:00:00','15:00:00']})
print (df)
Date Time
0 2015-01-02 10:00:00
1 2016-05-08 15:00:00
df['Date'] = pd.to_datetime(df['Date'])
mask = pd.to_datetime(df['Time'].astype(str)).dt.hour > 12
df.loc[mask, 'Date'] += pd.Timedelta(days=1)
print (df)
Date Time
0 2015-01-02 10:00:00
1 2016-05-09 15:00:00
I have a dataFrame with two columns, ["StartDate" ,"duration"]
the elements in the StartDate column are datetime type, and the duration are ints.
Something like:
StartDate Duration
08:16:05 20
07:16:01 20
I expect to get:
EndDate
08:16:25
07:16:21
Simply add the seconds to the hour.
I'd being checking some ideas about it like the delta time types and that all those datetimes have the possibilities to add delta times, but so far I can find how to do it with the DataFrames (in a vector fashion, cause It might be possible to iterate over all the rows performing the operation ).
consider this df
StartDate duration
0 01/01/2017 135
1 01/02/2017 235
You can get the datetime column like this
df['EndDate'] = pd.to_datetime(df['StartDate']) + pd.to_timedelta(df['duration'], unit='s')
df.drop('StartDate,'duration', axis = 1, inplace = True)
You get
EndDate
0 2017-01-01 00:02:15
1 2017-01-02 00:03:55
EDIT: with the sample dataframe that you posted
df['EndDate'] = pd.to_timedelta(df['StartDate']) + pd.to_timedelta(df['Duration'], unit='s')
df.StartDate = df.apply(lambda x: pd.to_datetime(x.StartDate)+pd.Timedelta(Second(df.duration)) ,axis = 1)
I have a Pandas Dataframe df:
a date
1 2014-06-29 00:00:00
df.types return:
a object
date object
I want convert column data to data without time but:
df['date']=df['date'].astype('datetime64[s]')
return:
a date
1 2014-06-28 22:00:00
df.types return:
a object
date datetime64[ns]
But value is wrong.
I'd have:
a date
1 2014-06-29
or:
a date
1 2014-06-29 00:00:00
I would start by putting your dates in pd.datetime:
df['date'] = pd.to_datetime(df.date)
Now, you can see that the time component is still there:
df.date.values
array(['2014-06-28T19:00:00.000000000-0500'], dtype='datetime64[ns]')
If you are ok having a date object again, you want:
df['date'] = [x.strftime("%y-%m-%d") for x in df.date]
Here would be ending with a datetime:
df['date'] = [x.date() for x in df.date]
df.date
datetime.date(2014, 6, 29)
Here you go. Just use this pattern:
df.to_datetime().date()