I have a timestamp column in a dataframe as below, and I want to create another column called day of week from that. How can do it?
Input:
Pickup date/time
07/05/2018 09:28:00
14/05/2018 17:00:00
15/05/2018 17:00:00
15/05/2018 17:00:00
23/06/2018 17:00:00
29/06/2018 17:00:00
Expected Output:
Pickup date/time Day of Week
07/05/2018 09:28:00 Monday
14/05/2018 17:00:00 Monday
15/05/2018 17:00:00 Tuesday
15/05/2018 17:00:00 Tuesday
23/06/2018 17:00:00 Saturday
29/06/2018 17:00:00 Friday
You can use weekday_name
df['date/time'] = pd.to_datetime(df['date/time'], format = '%d/%m/%Y %H:%M:%S')
df['Day of Week'] = df['date/time'].dt.weekday_name
You get
date/time Day of Week
0 2018-05-07 09:28:00 Monday
1 2018-05-14 17:00:00 Monday
2 2018-05-15 17:00:00 Tuesday
3 2018-05-15 17:00:00 Tuesday
4 2018-06-23 17:00:00 Saturday
5 2018-06-29 17:00:00 Friday
Edit:
For the newer versions of Pandas, use day_name(),
df['Day of Week'] = df['date/time'].dt.day_name()
pandas>=0.23.0: pandas.Timestamp.day_name()
df['Day of Week'] = df['date/time'].day_name()
https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.day_name.html
pandas>=0.18.1,<0.23.0: pandas.Timestamp.weekday_name()
Deprecated since version 0.23.0
https://pandas-docs.github.io/pandas-docs-travis/reference/api/pandas.Timestamp.weekday_name.html
Related
I have a dataframe as follows:
period
1651622400000.00000
1651536000000.00000
1651449600000.00000
1651363200000.00000
1651276800000.00000
1651190400000.00000
1651104000000.00000
1651017600000.00000
I have converted it into human readable datetime as:
df['period'] = pd.to_datetime(df['period'], unit='ms')
and this outputs:
2022-04-04 00:00:00
2022-04-05 00:00:00
2022-04-06 00:00:00
2022-04-07 00:00:00
2022-04-08 00:00:00
2022-04-09 00:00:00
2022-04-10 00:00:00
2022-04-11 00:00:00
2022-04-12 00:00:00
hours minutes and seconds are turned to 0.
I checked this into https://www.epochconverter.com/ and this gives
GMT: Monday, April 4, 2022 12:00:00 AM
Your time zone: Monday, April 4, 2022 5:45:00 AM GMT+05:45
How do I get h, m, and s as well?
If use https://www.epochconverter.com/ is added timezone.
If need add timezones to column use Series.dt.tz_localize and then Series.dt.tz_convert:
df['period'] = (pd.to_datetime(df['period'], unit='ms')
.dt.tz_localize('GMT')
.dt.tz_convert('Asia/Kathmandu'))
print (df)
period
0 2022-05-04 05:45:00+05:45
1 2022-05-03 05:45:00+05:45
2 2022-05-02 05:45:00+05:45
3 2022-05-01 05:45:00+05:45
4 2022-04-30 05:45:00+05:45
5 2022-04-29 05:45:00+05:45
6 2022-04-28 05:45:00+05:45
7 2022-04-27 05:45:00+05:45
There is no problem with your code or with pandas. And I don't think the timezone is an issue here either (as the other answer says). April 4, 2022 12:00:00 AM is the exact same time and date as 2022-04-04 00:00:00, just in one case you use AM... You could specify timezones as jezrael writes or with utc=True (check the docs) but I guess that's not your problem.
I have a csv file with datetime column. I use pandas.read_csv(file, index_col="Date", parse_date=True) to read the csv. The datetime columns has 30min freq/res, so the first time of a given date is 00:30:00, but the last time is not what I want:
As you can see, time 00:00:00 of a given date (here 2015-12-01) is interpreted as next day.
I couldn't find a way to resolve this. In this example, I want:
2015-12-02 00:00:00 be interpreted as 205-12-01 24:00:00 or something that refers to the correct date.
Does anyone know how to do in in pandas?
Edit:
So what I want is when I get the date for this time 00:00:00, it give me date of yesterday (so it sees the time as 23:59:59):
I want this:
2015-12-01 23:00:00 Tuesday 2015-12-01
2015-12-01 23:30:00 Tuesday 2015-12-01
2015-12-02 00:00:00 Wednesday 2015-12-02
2015-12-02 00:30:00 Wednesday 2015-12-02
be this:
2015-12-01 23:00:00 Tuesday 2015-12-01
2015-12-01 23:30:00 Tuesday 2015-12-01
2015-12-01 23:59:59 Tuesday 2015-12-01
2015-12-02 00:30:00 Wednesday 2015-12-02
be this
It is actually quite simple if you can use strings. If the time is '00:00:00', subtract one day, convert to string, replace '00:00:00' with '24:00:00'.
import datetime
s = pd.Series(['2015-12-01 23:00:00', '2015-12-01 00:00:00'])
s = pd.to_datetime(s)
s.where(s.dt.time != datetime.time(0),
((s-pd.to_timedelta('1day'))
.dt.strftime('%Y-%m-%d %H:%M:%S')
.str.replace('00:00:00', '24:00:00')
)
)
Output:
0 2015-12-01 23:00:00
1 2015-11-30 24:00:00
Or, for your edit:
df['col1'] = pd.to_datetime(df['col1'])
df['col1'] = df['col1'].where(df['col1'].dt.time != datetime.time(0),
(df['col1']-pd.to_timedelta('1s'))
)
df['col2'] = df['col1'].dt.day_name()
df['col3'] = df['col1'].dt.date
output:
col1 col2 col3
0 2015-12-01 23:00:00 Tuesday 2015-12-01
1 2015-12-01 23:30:00 Tuesday 2015-12-01
2 2015-12-01 23:59:59 Tuesday 2015-12-01
3 2015-12-02 00:30:00 Wednesday 2015-12-02
(A bit late but) You can use dt.normalize to find values to modify and subtract one second then change other columns according to DateTime column.
Input data:
>>> df
DateTime DayOfWeek Date
0 2015-12-01 23:00:00 Tuesday 2015-12-01
1 2015-12-01 23:30:00 Tuesday 2015-12-01
2 2015-12-02 00:00:00 Wednesday 2015-12-02
3 2015-12-02 00:30:00 Wednesday 2015-12-02
>>> df.dtypes
DateTime datetime64[ns]
DayOfWeek object
Date datetime64[ns]
new = df.loc[df['DateTime'].eq(df['DateTime'].dt.normalize()), ['DateTime']] \
.sub(pd.DateOffset(seconds=1))
new = new.assign(DayOfWeek=new['DateTime'].dt.day_name(),
Date=new['DateTime'].dt.normalize())
df.update(new)
Output result:
>>> df
DateTime DayOfWeek Date
0 2015-12-01 23:00:00 Tuesday 2015-12-01
1 2015-12-01 23:30:00 Tuesday 2015-12-01
2 2015-12-01 23:59:59 Tuesday 2015-12-01
3 2015-12-02 00:30:00 Wednesday 2015-12-02
4 2021-08-30 23:59:59 Monday 2021-08-30
My DataFrame:
start_trade week_day
0 2021-01-16 09:30:00 Saturday
1 2021-01-19 14:30:00 Tuesday
2 2021-01-25 22:00:00 Monday
3 2021-01-29 12:15:00 Friday
4 2021-01-31 12:35:00 Sunday
There are no trades on the exchange on Saturday and Sunday. Therefore, if my trading signal falls on the weekend, I want to open a trade on Friday 23:50.
Expexted output:
start_trade week_day
0 2021-01-15 23:50:00 Friday
1 2021-01-19 14:30:00 Tuesday
2 2021-01-25 22:00:00 Monday
3 2021-01-29 12:15:00 Friday
4 2021-01-29 23:50:00 Friday
How to do it?
You can do it playing with to_timedelta to change the date to the Friday of the week and then set the time with Timedelta. Do this only on the rows wanted with the mask
#for week ends dates
mask = df['start_trade'].dt.weekday.isin([5,6])
df.loc[mask, 'start_trade'] = (df['start_trade'].dt.normalize() # to get midnight
- pd.to_timedelta(df['start_trade'].dt.weekday-4, unit='D') # to get the friday date
+ pd.Timedelta(hours=23, minutes=50)) # set 23:50 for time
df.loc[mask, 'week_day'] = 'Friday'
print(df)
start_trade week_day
0 2021-01-15 23:50:00 Friday
1 2021-01-19 14:30:00 Tuesday
2 2021-01-25 22:00:00 Monday
3 2021-01-29 12:15:00 Friday
4 2021-01-29 23:50:00 Friday
Try:
weekend = df['week_day'].isin(['Saturday', 'Sunday'])
df.loc[weekend, 'week_day'] = 'Friday'
Or np.where along with str.contains, and | operator:
df['week_day'] = np.where(df['week_day'].str.contains(r'Saturday|Sunday'),'Friday',df['week_day'])
I'm currently building a program to track the Login in / out, the data is exported as a string like this "5:00AM", I use the following code to convert the data from string to datetime64[ns]
df = pd.DataFrame({ 'LoginTime' : ["10:00PM", "5:00AM", "11:00PM","7:00AM"],
'Logout Time' : ["6:00AM","2:00PM", "5:00AM", "5:00PM"]})
for c in df.columns:
if c == 'LoginTime':
df['LoginTime'] = pd.to_datetime(df['LoginTime'], format='%I:%M%p')
elif c == 'Logout Time':
df['Logout Time'] = pd.to_datetime(df['Logout Time'], format='%I:%M%p')
The output result is the following:
LoginTime Logout Time
0 1900-01-01 22:00:00 1900-01-01 06:00:00
1 1900-01-01 05:00:00 1900-01-01 14:00:00
2 1900-01-01 23:00:00 1900-01-01 05:00:00
3 1900-01-01 07:00:00 1900-01-01 17:00:00
LoginTime datetime64[ns]
Logout Time datetime64[ns]
The code works as expected and changed the string to time format, however, I noticed the format is 1/1/1900 10:00:00 PM, I would like to know if there's a way to get only the time like this 10:00:00 PM without affecte the data type as datetime64[ns] since I have to create validation for the Login in / out
Thanks in advance
Try using dt.strftime after converting it to datetime format:
df = pd.DataFrame({'LoginTime':['1/1/1900 10:00:00 PM', '1/2/2018 05:00:00 AM']})
LoginTime
0 1/1/1900 10:00:00 PM
1 1/2/2018 05:00:00 AM
df['LoginTime'] = pd.to_datetime(df['LoginTime']).dt.strftime('%I:%M %p')
LoginTime
0 10:00 PM
1 05:00 AM
I have raw data like this want to find the difference between this two time in mint .....problem is data which is in data frame...
source:
start time end time
0 08:30:00 17:30:00
1 11:00:00 17:30:00
2 08:00:00 21:30:00
3 19:30:00 22:00:00
4 19:00:00 00:00:00
5 08:30:00 15:30:00
Need a output like this:
duration
540mint
798mint
162mint
1140mint
420mint
Your expected output seems to be incorrect. That aside, we can use base R's difftime:
transform(
df,
duration = difftime(
strptime(end.time, format = "%H:%M:%S"),
strptime(start.time, format = "%H:%M:%S"),
units = "mins"))
# start.time end.time duration
#0 08:30:00 17:30:00 540 mins
#1 11:00:00 17:30:00 390 mins
#2 08:00:00 21:30:00 810 mins
#3 19:30:00 22:00:00 150 mins
#4 19:00:00 00:00:00 -1140 mins
#5 08:30:00 15:30:00 420 mins
or as a difftime vector
with(df, difftime(
strptime(end.time, format = "%H:%M:%S"),
strptime(start.time, format = "%H:%M:%S"),
units = "mins"))
#Time differences in mins
#[1] 540 390 810 150 -1140 420
Sample data
df <- read.table(text =
" 'start time' 'end time'
0 08:30:00 17:30:00
1 11:00:00 17:30:00
2 08:00:00 21:30:00
3 19:30:00 22:00:00
4 19:00:00 00:00:00
5 08:30:00 15:30:00", header = T, row.names = 1)
import pandas as pd
df = pd.DataFrame({'start time':['08:30:00','11:00:00','08:00:00','19:30:00','19:00:00','08:30:00'],'end time':['17:30:00','17:30:00','21:30:00','22:00:00','00:00:00','15:30:00']},columns=['start time','end time'])
df
Out[355]:
start time end time
0 08:30:00 17:30:00
1 11:00:00 17:30:00
2 08:00:00 21:30:00
3 19:30:00 22:00:00
4 19:00:00 00:00:00
5 08:30:00 15:30:00
(pd.to_datetime(df['end time']) - pd.to_datetime(df['start time'])).dt.seconds/60
Out[356]:
0 540.0
1 390.0
2 810.0
3 150.0
4 300.0
5 420.0
dtype: float64
Yes, definitely datetime is what you need here. Specifically, the strptime function, which parses a string into a time object.
from datetime import datetime
s1 = '10:33:26'
s2 = '11:15:49' # for example
FMT = '%H:%M:%S'
tdelta = datetime.strptime(s2, FMT) - datetime.strptime(s1, FMT)
That gets you a timedelta object that contains the difference between the two times. You can do whatever you want with that, e.g. converting it to seconds or adding it to another datetime.
This will return a negative result if the end time is earlier than the start time, for example s1 = 12:00:00 and s2 = 05:00:00. If you want the code to assume the interval crosses midnight in this case (i.e. it should assume the end time is never earlier than the start time), you can add the following lines to the above code:
if tdelta.days < 0:
tdelta = timedelta(days=0,
seconds=tdelta.seconds, microseconds=tdelta.microseconds)
(of course you need to include from datetime import timedelta somewhere). Thanks to J.F. Sebastian for pointing out this use case.