I get daily reports which include a timestamp column and a UTC Offset column. Using pandas, I can convert the int Timestamp into a datetime64 type. I unfortunately can't figure out how to use the offset.
Since the 'UTC Offset' column comes in as a string I have tried converting it to an int to help, but can't figure out how to use it. I tried using pd.offsets.Hour, but that can't use the column of offsets.
df = pd.read_csv(filename, encoding='utf-8', delimiter=r'\t',engine='python')
df['Timestamp'] = pd.to_datetime(df[r'Stream Timestamp'],utc=True, unit='s')
print(df[:3][r'Stream Timestamp'])
0 2019-05-01 14:21:37+00:00
1 2019-05-01 15:50:12+00:00
Name: Stream Timestamp, dtype: datetime64[ns, UTC]
0 -06:00
1 +01:00
2 -04:00
Name: UTC Offset, dtype: object
df[r"UTC Offset"] = df[r"UTC Offset"].astype(int)
Optimally, I want to do something like this
df[r'Adjusted'] = df[r'Timestamp'] + pd.offsets.Hour(df[r'UTC Offset'])
However I can't seem to figure out how best to reference the column of offsets. I'm a little new to datetime in general, but any help would be appreciated!
Maybe not the prettiest, but since when you read in the csv it is an object you can shave off the old offset and combine it with the offset column as a string. For this to work all of the Timestamps when read in must have the offset. If they don't consider maybe checking if there is a + or - in the string and then going from there.
Then convert to datetime. I included the format parameter in the pd.to_datetime just so it would be faster, however you do not need this if your dataset is small. I am actually surprised at how hard it is to get information for pandas timezones, but maybe check out tzinfo?
I included the intermediate steps in different columns for ease of understanding, but you of course need not do this.
df = pd.DataFrame({'timestamp_str': ['2019-05-01 14:21:37+00:00',
'2019-05-01 15:50:12+00:00',
'2019-05-01 15:50:12+00:00'],
'utc_offset': ['-06:00','+01:00','-04:00']})
df['timestamp_str_combine'] = df['timestamp_str'].str[:-6] + df['utc_offset']
df['timestamp'] = pd.to_datetime(df['timestamp_str_combine'],
format="%Y-%m-%d %H:%M:%S", utc=True)
df.info()
Output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
timestamp_str 3 non-null object
utc_offset 3 non-null object
timestamp_str_combine 3 non-null object
timestamp 3 non-null datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1), object(3)
memory usage: 176.0+ bytes
I prefer to convert to datetime as it's easier to apply offsets than with native pandas time format.
To get the timezone offset:
from tzlocal import get_localzone # pip install tzlocal
millis = 1288483950000
ts = millis * 1e-3
local_dt = datetime.fromtimestamp(ts, get_localzone())
utc_offset = local_dt.utcoffset()
hours_offset = utc_offset / timedelta(hours=1)
Then apply the offset:
df['dt'] = pd.to_datetime(df['timestamp'],infer_datetime_format=True,errors='ignore')
df['dtos'] = df['dt'] + timedelta(hours=hours_offset)
Related
I want to change Datetime (2014-12-23 00:00:00) into unixtime. I tried it with the Datetime function but it didn´t work. I got the Datetime stamps in an array.
Zeit =np.array(Jahresgang1.ix[ :,'Zeitstempel'])
t = pd.to_datetime(Zeit, unit='s')
unixtime = pd.DataFrame(t)
print unixtime
Thanks a lot
I think you can subtract the date 1970-1-1 to create a timedelta and then access the attribute total_seconds:
In [130]:
s = pd.Series(pd.datetime(2012,1,1))
s
Out[130]:
0 2012-01-01
dtype: datetime64[ns]
In [158]:
(s - dt.datetime(1970,1,1)).dt.total_seconds()
Out[158]:
0 1325376000
dtype: float64
to emphasize EdChum's first comment, you can directly get Unix time like
import pandas as pd
s = pd.to_datetime(["2014-12-23 00:00:00"])
unix = s.astype("int64")
print(unix)
# Int64Index([1419292800000000000], dtype='int64')
or for a pd.Timestamp:
print(pd.to_datetime("2014-12-23 00:00:00").value)
# 1419292800000000000
Notes
the output precision is nanoseconds - if you want another, divide appropriately, e.g. by 10⁹ to get seconds, 10⁶ to get milliseconds etc.
this assumes the input date/time to be UTC, unless a time zone / UTC offset is specified
I have a dataframe with 3 columns. The dataframe is created from Postgres table.
How can I do a conversion from timestamptz to timestamp please?
I did
df['StartTime'] = df["StartTime"].apply(lambda x: x.tz_localize(None))
example of data in the StartTime :
2013-09-27 14:19:46.825000+02:00
2014-02-07 10:52:25.392000+01:00
Thank you,
To give a more thorough answer, the point here is that in your example, you have timestamps with mixed UTC offsets. Without settings any keywords, pandas will convert the strings to datetime but leave the Series' type as native Python datetime, not pandas (numpy) datetime64. That makes it kind of hard to use built-in methods like tz_localize. But you can work your way around. Ex:
import pandas as pd
# exemplary Series
StartTime = pd.Series(["2013-09-27 14:19:46.825000+02:00", "2014-02-07 10:52:25.392000+01:00"])
# make sure we have datetime Series
StartTime = pd.to_datetime(StartTime)
# notice the dtype:
print(type(StartTime.iloc[0]))
# <class 'datetime.datetime'>
# we also cannot use dt accessor:
# print(StartTime.dt.date)
# >>> AttributeError: Can only use .dt accessor with datetimelike values
# ...but we can use replace method of datetime object and remove tz info:
StartTime = StartTime.apply(lambda t: t.replace(tzinfo=None))
# now we have
StartTime
0 2013-09-27 14:19:46.825
1 2014-02-07 10:52:25.392
dtype: datetime64[ns]
# and can use e.g.
StartTime.dt.date
# 0 2013-09-27
# 1 2014-02-07
# dtype: object
From an online API I gather a series of data points, each with a value and an ISO timestamp. Unfortunately I need to loop over them, so I store them in a temporary dict and then create a pandas dataframe from that and set the index to the timestamp column (simplified example):
from datetime import datetime
import pandas
input_data = [
'2019-09-16T06:44:01+02:00',
'2019-11-11T09:13:01+01:00',
]
data = []
for timestamp in input_data:
_date = datetime.fromisoformat(timestamp)
data.append({'time': _date})
pd_data = pandas.DataFrame(data).set_index('time')
As long as all timestamps are in the same timezone and DST/non-DST everything works fine, and, I get a Dataframe with a DatetimeIndex which I can work on later.
However, once two different time-offsets appear in one dataset (above example), I only get an Index, in my dataframe, which does not support any time-based methods.
Is there any way to make pandas accept timezone-aware, differing date as index?
A minor correction of the question's wording, which I think is important. What you have are UTC offsets - DST/no-DST would require more information than that, i.e. a time zone. Here, this matters since you can parse timestamps with UTC offsets (even different ones) to UTC easily:
import pandas as pd
input_data = [
'2019-09-16T06:44:01+02:00',
'2019-11-11T09:13:01+01:00',
]
dti = pd.to_datetime(input_data, utc=True)
# dti
# DatetimeIndex(['2019-09-16 04:44:01+00:00', '2019-11-11 08:13:01+00:00'], dtype='datetime64[ns, UTC]', freq=None)
I prefer to work with UTC so I'd be fine with that. If however you need date/time in a certain time zone, you can convert e.g. like
dti = dti.tz_convert('Europe/Berlin')
# dti
# DatetimeIndex(['2019-09-16 06:44:01+02:00', '2019-11-11 09:13:01+01:00'], dtype='datetime64[ns, Europe/Berlin]', freq=None)
A pandas datetime column also requires the offset to be the same. A column with different offsets, will not be converted to a datetime dtype.
I suggest, do not convert the data to a datetime until it's in pandas.
Separate the time offset, and treat it as a timedelta
to_timedelta requires a format of 'hh:mm:ss' so add ':00' to the end of the offset
See Pandas: Time deltas for all the available timedelta operations
pandas.Series.dt.tz_convert
pandas.Series.tz_localize
Convert to a specific TZ with:
If a datetime is not datetime64[ns, UTC] dtype, then first use .dt.tz_localize('UTC') before .dt.tz_convert('US/Pacific')
Otherwise df.datetime_utc.dt.tz_convert('US/Pacific')
import pandas as pd
# sample data
input_data = ['2019-09-16T06:44:01+02:00', '2019-11-11T09:13:01+01:00']
# dataframe
df = pd.DataFrame(input_data, columns=['datetime'])
# separate the offset from the datetime and convert it to a timedelta
df['offset'] = pd.to_timedelta(df.datetime.str[-6:] + ':00')
# if desired, create a str with the separated datetime
# converting this to a datetime will lead to AmbiguousTimeError because of overlapping datetimes at 2AM, per the OP
df['datetime_str'] = df.datetime.str[:-6]
# convert the datetime column to a datetime format without the offset
df['datetime_utc'] = pd.to_datetime(df.datetime, utc=True)
# display(df)
datetime offset datetime_str datetime_utc
0 2019-09-16T06:44:01+02:00 0 days 02:00:00 2019-09-16 06:44:01 2019-09-16 04:44:01+00:00
1 2019-11-11T09:13:01+01:00 0 days 01:00:00 2019-11-11 09:13:01 2019-11-11 08:13:01+00:00
print(df.info())
[out]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 datetime 2 non-null object
1 offset 2 non-null timedelta64[ns]
2 datetime_str 2 non-null object
3 datetime_utc 2 non-null datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1), object(2), timedelta64[ns](1)
memory usage: 192.0+ bytes
# convert to local timezone
df.datetime_utc.dt.tz_convert('US/Pacific')
[out]:
0 2019-09-15 21:44:01-07:00
1 2019-11-11 00:13:01-08:00
Name: datetime_utc, dtype: datetime64[ns, US/Pacific]
Other Resources
Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes.
Talk Python to Me: Episode #271: Unlock the mysteries of time, Python's datetime that is!
Real Python: Using Python datetime to Work With Dates and Times
The dateutil module provides powerful extensions to the standard datetime module.
I have two dataframes (see here), which contain dates and times.
The details for the first data frame are:
Date object
Time object
Channel1 float64
Channel2 float64
Channel3 float64
Channel4 float64
Channel5 float64
dtype: object
The details for the second data frame are:
Date object
Time object
Mean float64
STD float64
Min float64
Max float64
dtype: object
I am trying to convert the times to a DateTime object so that I can then do a calculation to make the time relative to the first time instance (i.e. the earliest time would become 0, and then all others would be seconds after the start).
When I try (from here):
df['Time'] = df['Time'].apply(pd.Timestamp)
I get this error:
TypeError: Cannot convert input [15:35:45] of type <class 'datetime.time'> to Timestamp
When I try (from here):
df['Time'] = pd.to_datetime(df['Time'])
but it gives me this error:
TypeError: <class 'datetime.time'> is not convertible to datetime
Any suggestions would be appreciated.
the reason why you are getting the error
TypeError: <class 'datetime.time'> is not convertible to datetime
is literally what it says, your df['Time'] contains datetime.time object and so, cannot be converted to a datetime.datetime or Timestamp object, both of which require the date component to be passed as well.
The solution is to combine df['Date'] and df['Time'] and then, pass it to pd.to_datetime. See below code sample:
df = pd.DataFrame({'Date': ['3/11/2000', '3/12/2000', '3/13/2000'],
'Time': ['15:35:45', '18:35:45', '05:35:45']})
df['datetime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
Output
Date Time datetime
0 3/11/2000 15:35:45 2000-03-11 15:35:45
1 3/12/2000 18:35:45 2000-03-12 18:35:45
2 3/13/2000 05:35:45 2000-03-13 05:35:45
In the end my solution was different for the two dataframes which I had.
For the first dataframe, the solution which combines the Date column with the Time column worked well:
df['Date Time'] = df['Date'] + ' ' + df['Time']
After the two columns are combined, the following code is used to turn it into a datetime object (note the format='%d/%m/%Y %H:%M:%S' part is required because otherwise it confuses the month/date and uses the US formatting, i.e. it thinks 11/12/2018 is 12th of November, and not 11th of December):
df['Date Time'] = pd.to_datetime(df['Date Time'], format='%d/%m/%Y %H:%M:%S')
For my second dataframe, I went up earlier in my data processing journey and found an option which saves the date and month to a single column directly. After which the following code converted it to a datetime object:
df['Date Time'] = df['Date Time'].apply(pd.Timestamp)
I have a SQL table that contains data of the mySQL time type as follows:
time_of_day
-----------
12:34:56
I then use pandas to read the table in:
df = pd.read_sql('select * from time_of_day', engine)
Looking at df.dtypes yields:
time_of_day timedelta64[ns]
My main issue is that, when writing my df to a csv file, the data comes out all messed up, instead of essentially looking like my SQL table:
time_of_day
0 days 12:34:56.000000000
I'd like to instead (obviously) store this record as a time, but I can't find anything in the pandas docs that talk about a time dtype.
Does pandas lack this functionality intentionally? Is there a way to solve my problem without requiring janky data casting?
Seems like this should be elementary, but I'm confounded.
Pandas does not support a time dtype series
Pandas (and NumPy) do not have a time dtype. Since you wish to avoid Pandas timedelta, you have 3 options: Pandas datetime, Python datetime.time, or Python str. Below they are presented in order of preference. Let's assume you start with the following dataframe:
df = pd.DataFrame({'time': pd.to_timedelta(['12:34:56', '05:12:45', '15:15:06'])})
print(df['time'].dtype) # timedelta64[ns]
Pandas datetime series
You can use Pandas datetime series and include an arbitrary date component, e.g. today's date. Underlying such a series are integers, which makes this solution the most efficient and adaptable.
The default date, if unspecified, is 1-Jan-1970:
df['time'] = pd.to_datetime(df['time'])
print(df)
# time
# 0 1970-01-01 12:34:56
# 1 1970-01-01 05:12:45
# 2 1970-01-01 15:15:06
You can also specify a date, such as today:
df['time'] = pd.Timestamp('today').normalize() + df['time']
print(df)
# time
# 0 2019-01-02 12:34:56
# 1 2019-01-02 05:12:45
# 2 2019-01-02 15:15:06
Pandas object series of Python datetime.time values
The Python datetime module from the standard library supports datetime.time objects. You can convert your series to an object dtype series containing pointers to a sequence of datetime.time objects. Operations will no longer be vectorised, but each underlying value will be represented internally by a number.
df['time'] = pd.to_datetime(df['time']).dt.time
print(df)
# time
# 0 12:34:56
# 1 05:12:45
# 2 15:15:06
print(df['time'].dtype)
# object
print(type(df['time'].at[0]))
# <class 'datetime.time'>
Pandas object series of Python str values
Converting to strings is only recommended for presentation purposes that are not supported by other types, e.g. Pandas datetime or Python datetime.time. For example:
df['time'] = pd.to_datetime(df['time']).dt.strftime('%H:%M:%S')
print(df)
# time
# 0 12:34:56
# 1 05:12:45
# 2 15:15:06
print(df['time'].dtype)
# object
print(type(df['time'].at[0]))
# <class 'str'>
it's a hack, but you can pull out the components to create a string and convert that string to a datetime.time(h,m,s) object
def convert(td):
time = [str(td.components.hours), str(td.components.minutes),
str(td.components.seconds)]
return datetime.strptime(':'.join(time), '%H:%M:%S').time()
df['time'] = df['time'].apply(lambda x: convert(x))
found a solution, but i feel like it's gotta be more elegant than this:
def convert(x):
return pd.to_datetime(x).strftime('%H:%M:%S')
df['time_of_day'] = df['time_of_day'].apply(convert)
df['time_of_day'] = pd.to_datetime(df['time_of_day']).apply(lambda x: x.time())
Adapted this code