Parse date from two columns pandas - python

I have a set of data that looks like this (3 columns). The date and time are in 1 column and the timezone is in another column.
location,time,zone
EASTERN HILLSBOROUGH,1/27/2015 12:00,EST-5
EASTERN HILLSBOROUGH,1/24/2015 7:00,EST-5
EASTERN HILLSBOROUGH,1/27/2015 6:00,EST-5
EASTERN HILLSBOROUGH,2/14/2015 8:00,EST-5
EASTERN HILLSBOROUGH,2/7/2015 22:00,EST-5
EASTERN HILLSBOROUGH,2/2/2015 2:00,EST-5
I'm using pandas in order to parse the date and time with its respective timezone. In read_csv I can do parse_dates = [[1,2]] which, according to the docs, combines the columns into 1 and parses them.
So now the new data looks like this (2 columns)
location,time_zone
EASTERN HILLSBOROUGH,1/27/2015 12:00 EST-5
EASTERN HILLSBOROUGH,1/24/2015 7:00 EST-5
EASTERN HILLSBOROUGH,1/27/2015 6:00 EST-5
EASTERN HILLSBOROUGH,2/14/2015 8:00 EST-5
EASTERN HILLSBOROUGH,2/7/2015 22:00 EST-5
EASTERN HILLSBOROUGH,2/2/2015 2:00 EST-5
However, if I type df['time_zone'].dtype I get dtype('O') which isn't a datetimelike because I can't use the dt accessor with it.
How else can I parse those two columns properly?

Not sure if this is what you want, but you could just read in (without any datetime parsing) and then use to_datetime (note that new variable time_zone is 5 hours later than time).
df['time_zone'] = pd.to_datetime( df.time + df.zone )
location time zone time_zone
0 EASTERN HILLSBOROUGH 1/27/2015 12:00 EST-5 2015-01-27 17:00:00
1 EASTERN HILLSBOROUGH 1/24/2015 7:00 EST-5 2015-01-24 12:00:00
2 EASTERN HILLSBOROUGH 1/27/2015 6:00 EST-5 2015-01-27 11:00:00
3 EASTERN HILLSBOROUGH 2/14/2015 8:00 EST-5 2015-02-14 13:00:00
4 EASTERN HILLSBOROUGH 2/7/2015 22:00 EST-5 2015-02-08 03:00:00
5 EASTERN HILLSBOROUGH 2/2/2015 2:00 EST-5 2015-02-02 07:00:00
df.info()
location 6 non-null object
time 6 non-null object
zone 6 non-null object
time_zone 6 non-null datetime64[ns]

Per the pytz module:
The preferred way of dealing with times is to always work in UTC,
converting to localtime only when generating output to be read by
humans.
I don't believe your timezones are standard, which makes the conversion a little more tricky. We should, however, be able to strip the timezone offset and add it to the UTC time using datetime.timedelta. This is a hack, and I wish I knew a better way.
I assume all times are recorded in their local timezones, so 1/27/2015 12:00 EST-5 would be 1/27/2015 17:00 UTC.
from pytz import utc
import datetime as dt
df = pd.read_csv('times.csv')
df['UTC_time'] = [utc.localize(t) - dt.timedelta(hours=int(h))
for t, h in zip(pd.to_datetime(df.time),
df.zone.str.extract(r'(-?\d+)'))]
>>> df
location time zone UTC_time
0 EASTERN HILLSBOROUGH 1/27/2015 12:00 EST-5 2015-01-27 17:00:00+00:00
1 EASTERN HILLSBOROUGH 1/24/2015 7:00 EST-5 2015-01-24 12:00:00+00:00
2 EASTERN HILLSBOROUGH 1/27/2015 6:00 EST-5 2015-01-27 11:00:00+00:00
3 EASTERN HILLSBOROUGH 2/14/2015 8:00 EST-5 2015-02-14 13:00:00+00:00
4 EASTERN HILLSBOROUGH 2/7/2015 22:00 EST-5 2015-02-08 03:00:00+00:00
5 EASTERN HILLSBOROUGH 2/2/2015 2:00 EST-5 2015-02-02 07:00:00+00:00
Examining a single timestamp, you'll notice the timezone is set to UTC:
>>> df.UTC_time.iat[0]
Timestamp('2015-01-27 17:00:00+0000', tz='UTC')
>>> df.UTC_time.iat[0].tzname()
'UTC'
To display them in a different time zone:
fmt = '%Y-%m-%d %H:%M:%S %Z%z'
>>> [t.astimezone('EST').strftime(fmt) for t in df.UTC_time]
['2015-01-27 12:00:00 EST-0500',
'2015-01-24 07:00:00 EST-0500',
'2015-01-27 06:00:00 EST-0500',
'2015-02-14 08:00:00 EST-0500',
'2015-02-07 22:00:00 EST-0500',
'2015-02-02 02:00:00 EST-0500']
Here is a test. Let's change the timezones in df and see if alternative solutions still work:
df['zone'] = ['EST-5', 'CST-6', 'MST-7', 'GST10', 'PST-8', 'AKST-9']
df['UTC_time'] = [utc.localize(t) - dt.timedelta(hours=int(h))
for t, h in zip(pd.to_datetime(df.time),
df.zone.str.extract(r'(-?\d+)'))]
>>> df
location time zone UTC_time
0 EASTERN HILLSBOROUGH 1/27/2015 12:00 EST-5 2015-01-27 17:00:00+00:00
1 EASTERN HILLSBOROUGH 1/24/2015 7:00 CST-6 2015-01-24 13:00:00+00:00
2 EASTERN HILLSBOROUGH 1/27/2015 6:00 MST-7 2015-01-27 13:00:00+00:00
3 EASTERN HILLSBOROUGH 2/14/2015 8:00 GST10 2015-02-13 22:00:00+00:00
4 EASTERN HILLSBOROUGH 2/7/2015 22:00 PST-8 2015-02-08 06:00:00+00:00
5 EASTERN HILLSBOROUGH 2/2/2015 2:00 AKST-9 2015-02-02 11:00:00+00:00
Check the python docs for more details about working with time.
Here is a good SO article on the subject.
How to make an unaware datetime timezone aware in python
And here is a link to the tz database timezones.

Related

python dataframe convert epoch to readable datetime hour minutes seconds as zero

I have a dataframe as follows:
period
1651622400000.00000
1651536000000.00000
1651449600000.00000
1651363200000.00000
1651276800000.00000
1651190400000.00000
1651104000000.00000
1651017600000.00000
I have converted it into human readable datetime as:
df['period'] = pd.to_datetime(df['period'], unit='ms')
and this outputs:
2022-04-04 00:00:00
2022-04-05 00:00:00
2022-04-06 00:00:00
2022-04-07 00:00:00
2022-04-08 00:00:00
2022-04-09 00:00:00
2022-04-10 00:00:00
2022-04-11 00:00:00
2022-04-12 00:00:00
hours minutes and seconds are turned to 0.
I checked this into https://www.epochconverter.com/ and this gives
GMT: Monday, April 4, 2022 12:00:00 AM
Your time zone: Monday, April 4, 2022 5:45:00 AM GMT+05:45
How do I get h, m, and s as well?
If use https://www.epochconverter.com/ is added timezone.
If need add timezones to column use Series.dt.tz_localize and then Series.dt.tz_convert:
df['period'] = (pd.to_datetime(df['period'], unit='ms')
.dt.tz_localize('GMT')
.dt.tz_convert('Asia/Kathmandu'))
print (df)
period
0 2022-05-04 05:45:00+05:45
1 2022-05-03 05:45:00+05:45
2 2022-05-02 05:45:00+05:45
3 2022-05-01 05:45:00+05:45
4 2022-04-30 05:45:00+05:45
5 2022-04-29 05:45:00+05:45
6 2022-04-28 05:45:00+05:45
7 2022-04-27 05:45:00+05:45
There is no problem with your code or with pandas. And I don't think the timezone is an issue here either (as the other answer says). April 4, 2022 12:00:00 AM is the exact same time and date as 2022-04-04 00:00:00, just in one case you use AM... You could specify timezones as jezrael writes or with utc=True (check the docs) but I guess that's not your problem.

Extract hour and minutes from timestamp but keep it in datetime format

I have a dataframe looking like this
open Start show Einde show
5 NaN 11:30 NaN
6 16:00 18:00 19:45
7 14:30 16:30 18:15
8 NaN NaN NaN
9 18:45 20:45 22:30
These hours are in string format and I would like to transform them to datetime format.
Whenever I try to use pd.to_datetime(evs['open'], errors='coerce') (to change one of the columns) It changes the hours to a full datetime format like this: 2020-04-03 16:00:00 with todays date. I would like to have just the hour, but still in datetime format so I can add minutes etc.
Now when I use dt.hour to access the hour, it return a string and not in HH:MM format.
Can someone help me out please? I'm reading in a CSV through Pandas read_csv but when I use the date parser I get the same problem. Ideally this would get fixed in the read_csv section instead of separately but at this point I'll take anything.
Thanks!
As Chris commented, it is not possible to convert just the hours and minutes into datetime format. But you can use timedeltas to solve your problem.
import datetime
import pandas as pd
def to_timedelta(date):
date = pd.to_datetime(date)
try:
date_start = datetime.datetime(date.year, date.month, date.day, 0, 0)
except TypeError:
return pd.NaT # to keep dtype of series; Alternative: pd.Timedelta(0)
return date - date_start
df['open'].apply(to_timedelta)
Output:
5 NaT
6 16:00:00
7 14:30:00
8 NaT
9 18:45:00
Name: open, dtype: timedelta64[ns]
Now you can use datetime.timedelta to add/subtract minutes, hours or whatever:
df['open'] + datetime.timedelta(minutes=15)
Output:
5 NaT
6 16:15:00
7 14:45:00
8 NaT
9 19:00:00
Name: open, dtype: timedelta64[ns]
Also, it is pretty easy to get back to full datetimes:
df['open'] + datetime.datetime(2020, 4, 4)
Output:
5 NaT
6 2020-04-04 16:00:00
7 2020-04-04 14:30:00
8 NaT
9 2020-04-04 18:45:00
Name: open, dtype: datetime64[ns]

is there a way to combine (concat) one column's different values?

I am creating a dictionary for 7 days. From 22th January to 29th. But there is two different data in one column in a day. Column name is Last Update. That values are I want to combine is '1/25/2020 10:00 PM', '1/25/2020 12:00 PM'. This values in the same column. So 25. January is Saturday. I want to combine them together as Saturday.
For understanding the column:
Last Update
0 1/22/2020 12:00
1 1/22/2020 12:00
2 1/22/2020 12:00
3 1/22/2020 12:00
4 1/22/2020 12:00
...
363 1/29/2020 21:00
364 1/29/2020 21:00
365 1/29/2020 21:00
366 1/29/2020 21:00
367 1/29/2020 21:00
i came so far:
day_map = {'1/22/2020 12:00': 'Wednesday', '1/23/20 12:00 PM': 'Thursday',
'1/24/2020 12:00 PM': 'Friday', .?.?.
You just need to convert date to datetime and use pandas.dt functions. In this case
df["Last Update"] = df["Last Update"].astype("M8")
df["Last Update"].dt.weekday_name
# returns
0 Wednesday
1 Wednesday
2 Wednesday
3 Wednesday
4 Wednesday
Name: Last Update, dtype: object

How to convert pandas dataframe date column in format of 'dd/mm/yyyy %H:%M' to 'yyyy/mm/dd %H:%M'

I have dataframe in the format of 'dd/mm/yyyy %H:%M'.
Date Price
29/10/2018 19:30 163.09
29/10/2018 20:00 211.95
29/10/2018 20:30 205.86
29/10/2018 21:00 201.39
29/10/2018 21:30 126.68
29/10/2018 22:00 112.36
29/10/2018 22:30 120.94
I want this dataframe in the format of 'yyyy/mm/dd %H:%M' as following.
Date Price
2018/29/10 19:30 163.09
2018/29/10 20:00 211.95
2018/29/10 20:30 205.86
2018/29/10 21:00 201.39
2018/29/10 21:30 126.68
2018/29/10 22:00 112.36
2018/29/10 22:30 120.94
I tried
df['Date'] = pd.to_datetime(df['Date]) but it gives result as following which is not something I am looking for
Date Price
2018-29-10 19:30:00 163.09
2018-29-10 20:00:00 211.95
2018-29-10 20:30:00 205.86
2018-29-10 21:00:00 201.39
Use strftime for convert datetimes to string format:
df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%Y/%m/%d %H:%M')
print (df)
Date Price
0 2018/10/29 19:30 163.09
1 2018/10/29 20:00 211.95
2 2018/10/29 20:30 205.86
3 2018/10/29 21:00 201.39
4 2018/10/29 21:30 126.68
5 2018/10/29 22:00 112.36
6 2018/10/29 22:30 120.94
print (type(df.loc[0, 'Date']))
<class 'str'>
print (df['Date'].dtype)
object
So if want working with datetimeslike function, use only to_datetime, format is YYYY-MM-DD HH:MM:SS:
df['Date'] = pd.to_datetime(df['Date'])
print (df)
Date Price
0 2018-10-29 19:30:00 163.09
1 2018-10-29 20:00:00 211.95
2 2018-10-29 20:30:00 205.86
3 2018-10-29 21:00:00 201.39
4 2018-10-29 21:30:00 126.68
5 2018-10-29 22:00:00 112.36
6 2018-10-29 22:30:00 120.94
print (type(df.loc[0, 'Date']))
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
print (df['Date'].dtype)
datetime64[ns]
Pandas stores datetime as integers
When you say it gives result as following, you are only seeing a string representation of these underlying integers. You should not misconstrue this as how Pandas stores your data or, indeed, how the data will be represented when you export it to another format.
Convert to object dtype
You can use pd.Series.dt.strftime to convert your series to a series of strings. This will have object dtype, which represents a sequence of pointers:
df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%Y/%m/%d %H:%M')
You will lose all vectorisation benefits, so you should aim to perform this operation only if necessary and as late as possible.

How can i extract day of week from timestamp in pandas

I have a timestamp column in a dataframe as below, and I want to create another column called day of week from that. How can do it?
Input:
Pickup date/time
07/05/2018 09:28:00
14/05/2018 17:00:00
15/05/2018 17:00:00
15/05/2018 17:00:00
23/06/2018 17:00:00
29/06/2018 17:00:00
Expected Output:
Pickup date/time Day of Week
07/05/2018 09:28:00 Monday
14/05/2018 17:00:00 Monday
15/05/2018 17:00:00 Tuesday
15/05/2018 17:00:00 Tuesday
23/06/2018 17:00:00 Saturday
29/06/2018 17:00:00 Friday
You can use weekday_name
df['date/time'] = pd.to_datetime(df['date/time'], format = '%d/%m/%Y %H:%M:%S')
df['Day of Week'] = df['date/time'].dt.weekday_name
You get
date/time Day of Week
0 2018-05-07 09:28:00 Monday
1 2018-05-14 17:00:00 Monday
2 2018-05-15 17:00:00 Tuesday
3 2018-05-15 17:00:00 Tuesday
4 2018-06-23 17:00:00 Saturday
5 2018-06-29 17:00:00 Friday
Edit:
For the newer versions of Pandas, use day_name(),
df['Day of Week'] = df['date/time'].dt.day_name()
pandas>=0.23.0: pandas.Timestamp.day_name()
df['Day of Week'] = df['date/time'].day_name()
https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.day_name.html
pandas>=0.18.1,<0.23.0: pandas.Timestamp.weekday_name()
Deprecated since version 0.23.0
https://pandas-docs.github.io/pandas-docs-travis/reference/api/pandas.Timestamp.weekday_name.html

Categories