From the official documentation of pandas.to_datetime we can say,
unit : string, default ‘ns’
unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or
float number. This will be based off the origin. Example, with
unit=’ms’ and origin=’unix’ (the default), this would calculate the
number of milliseconds to the unix epoch start.
So when I try like this way,
import pandas as pd
df = pd.DataFrame({'time': [pd.to_datetime('2019-01-15 13:25:43')]})
df_unix_sec = pd.to_datetime(df['time'], unit='ms', origin='unix')
print(df)
print(df_unix_sec)
time
0 2019-01-15 13:25:43
0 2019-01-15 13:25:43
Name: time, dtype: datetime64[ns]
Output is not changing for the latter one. Every time it is showing the datetime value not number of milliseconds to the unix epoch start for the 2nd one. Why is that? Am I missing something?
I think you misunderstood what the argument is for. The purpose of origin='unix' is to convert an integer timestamp to datetime, not the other way.
pd.to_datetime(1.547559e+09, unit='s', origin='unix')
# Timestamp('2019-01-15 13:30:00')
Here are some options:
Option 1: integer division
Conversely, you can get the timestamp by converting to integer (to get nanoseconds) and divide by 109.
pd.to_datetime(['2019-01-15 13:30:00']).astype(int) / 10**9
# Float64Index([1547559000.0], dtype='float64')
Pros:
super fast
Cons:
makes assumptions about how pandas internally stores dates
Option 2: recommended by pandas
Pandas docs recommend using the following method:
# create test data
dates = pd.to_datetime(['2019-01-15 13:30:00'])
# calculate unix datetime
(dates - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')
[out]:
Int64Index([1547559000], dtype='int64')
Pros:
"idiomatic", recommended by the library
Cons:
unweildy
not as performant as integer division
Option 3: pd.Timestamp
If you have a single date string, you can use pd.Timestamp as shown in the other answer:
pd.Timestamp('2019-01-15 13:30:00').timestamp()
# 1547559000.0
If you have to cooerce multiple datetimes (where pd.to_datetime is your only option), you can initialize and map:
pd.to_datetime(['2019-01-15 13:30:00']).map(pd.Timestamp.timestamp)
# Float64Index([1547559000.0], dtype='float64')
Pros:
best method for a single datetime string
easy to remember
Cons:
not as performant as integer division
You can use timestamp() method which returns POSIX timestamp as float:
pd.Timestamp('2021-04-01').timestamp()
[Out]:
1617235200.0
pd.Timestamp('2021-04-01 00:02:35.234').timestamp()
[Out]:
1617235355.234
value attribute of the pandas Timestamp holds the unix epoch. This value is in nanoseconds. So you can convert to ms or us by diving by 1e3 or 1e6. Check the code below.
import pandas as pd
date_1 = pd.to_datetime('2020-07-18 18:50:00')
print(date_1.value)
When you calculate the difference between two datetimes, the dtype of the difference is timedelta64[ns] by default (ns in brackets). By changing [ns] into [ms], [s], [m] etc as you cast the output to a new timedelta64 object, you can convert the difference into milliseconds, seconds, minutes etc.
For example, to find the number of seconds passed since Unix epoch, subtract datetimes and change dtype.
df_unix_sec = (df['time'] - pd.Timestamp('1970-01-01')).astype('timedelta64[s]')
N.B. Oftentimes, the differences are very large numbers, so if you want them as integers, use astype('int64') (NOT astype(int)).
df_unix_sec = (df['time'] - pd.Timestamp('1970-01-01')).astype('timedelta64[s]').astype('int64')
For OP's example, this would yield,
0 1547472343
Name: time, dtype: int64
In case you are accessing a particular datetime64 object from the dataframe, chances are that pandas will return a Timestamp object which is essentially how pandas stores datetime64 objects.
You can use pd.Timestamp.to_datetime64() method of the pd.Timestamp object to convert it to numpy.datetime64 object with ns precision.
Related
From the official documentation of pandas.to_datetime we can say,
unit : string, default ‘ns’
unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or
float number. This will be based off the origin. Example, with
unit=’ms’ and origin=’unix’ (the default), this would calculate the
number of milliseconds to the unix epoch start.
So when I try like this way,
import pandas as pd
df = pd.DataFrame({'time': [pd.to_datetime('2019-01-15 13:25:43')]})
df_unix_sec = pd.to_datetime(df['time'], unit='ms', origin='unix')
print(df)
print(df_unix_sec)
time
0 2019-01-15 13:25:43
0 2019-01-15 13:25:43
Name: time, dtype: datetime64[ns]
Output is not changing for the latter one. Every time it is showing the datetime value not number of milliseconds to the unix epoch start for the 2nd one. Why is that? Am I missing something?
I think you misunderstood what the argument is for. The purpose of origin='unix' is to convert an integer timestamp to datetime, not the other way.
pd.to_datetime(1.547559e+09, unit='s', origin='unix')
# Timestamp('2019-01-15 13:30:00')
Here are some options:
Option 1: integer division
Conversely, you can get the timestamp by converting to integer (to get nanoseconds) and divide by 109.
pd.to_datetime(['2019-01-15 13:30:00']).astype(int) / 10**9
# Float64Index([1547559000.0], dtype='float64')
Pros:
super fast
Cons:
makes assumptions about how pandas internally stores dates
Option 2: recommended by pandas
Pandas docs recommend using the following method:
# create test data
dates = pd.to_datetime(['2019-01-15 13:30:00'])
# calculate unix datetime
(dates - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')
[out]:
Int64Index([1547559000], dtype='int64')
Pros:
"idiomatic", recommended by the library
Cons:
unweildy
not as performant as integer division
Option 3: pd.Timestamp
If you have a single date string, you can use pd.Timestamp as shown in the other answer:
pd.Timestamp('2019-01-15 13:30:00').timestamp()
# 1547559000.0
If you have to cooerce multiple datetimes (where pd.to_datetime is your only option), you can initialize and map:
pd.to_datetime(['2019-01-15 13:30:00']).map(pd.Timestamp.timestamp)
# Float64Index([1547559000.0], dtype='float64')
Pros:
best method for a single datetime string
easy to remember
Cons:
not as performant as integer division
You can use timestamp() method which returns POSIX timestamp as float:
pd.Timestamp('2021-04-01').timestamp()
[Out]:
1617235200.0
pd.Timestamp('2021-04-01 00:02:35.234').timestamp()
[Out]:
1617235355.234
value attribute of the pandas Timestamp holds the unix epoch. This value is in nanoseconds. So you can convert to ms or us by diving by 1e3 or 1e6. Check the code below.
import pandas as pd
date_1 = pd.to_datetime('2020-07-18 18:50:00')
print(date_1.value)
When you calculate the difference between two datetimes, the dtype of the difference is timedelta64[ns] by default (ns in brackets). By changing [ns] into [ms], [s], [m] etc as you cast the output to a new timedelta64 object, you can convert the difference into milliseconds, seconds, minutes etc.
For example, to find the number of seconds passed since Unix epoch, subtract datetimes and change dtype.
df_unix_sec = (df['time'] - pd.Timestamp('1970-01-01')).astype('timedelta64[s]')
N.B. Oftentimes, the differences are very large numbers, so if you want them as integers, use astype('int64') (NOT astype(int)).
df_unix_sec = (df['time'] - pd.Timestamp('1970-01-01')).astype('timedelta64[s]').astype('int64')
For OP's example, this would yield,
0 1547472343
Name: time, dtype: int64
In case you are accessing a particular datetime64 object from the dataframe, chances are that pandas will return a Timestamp object which is essentially how pandas stores datetime64 objects.
You can use pd.Timestamp.to_datetime64() method of the pd.Timestamp object to convert it to numpy.datetime64 object with ns precision.
From the official documentation of pandas.to_datetime we can say,
unit : string, default ‘ns’
unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or
float number. This will be based off the origin. Example, with
unit=’ms’ and origin=’unix’ (the default), this would calculate the
number of milliseconds to the unix epoch start.
So when I try like this way,
import pandas as pd
df = pd.DataFrame({'time': [pd.to_datetime('2019-01-15 13:25:43')]})
df_unix_sec = pd.to_datetime(df['time'], unit='ms', origin='unix')
print(df)
print(df_unix_sec)
time
0 2019-01-15 13:25:43
0 2019-01-15 13:25:43
Name: time, dtype: datetime64[ns]
Output is not changing for the latter one. Every time it is showing the datetime value not number of milliseconds to the unix epoch start for the 2nd one. Why is that? Am I missing something?
I think you misunderstood what the argument is for. The purpose of origin='unix' is to convert an integer timestamp to datetime, not the other way.
pd.to_datetime(1.547559e+09, unit='s', origin='unix')
# Timestamp('2019-01-15 13:30:00')
Here are some options:
Option 1: integer division
Conversely, you can get the timestamp by converting to integer (to get nanoseconds) and divide by 109.
pd.to_datetime(['2019-01-15 13:30:00']).astype(int) / 10**9
# Float64Index([1547559000.0], dtype='float64')
Pros:
super fast
Cons:
makes assumptions about how pandas internally stores dates
Option 2: recommended by pandas
Pandas docs recommend using the following method:
# create test data
dates = pd.to_datetime(['2019-01-15 13:30:00'])
# calculate unix datetime
(dates - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')
[out]:
Int64Index([1547559000], dtype='int64')
Pros:
"idiomatic", recommended by the library
Cons:
unweildy
not as performant as integer division
Option 3: pd.Timestamp
If you have a single date string, you can use pd.Timestamp as shown in the other answer:
pd.Timestamp('2019-01-15 13:30:00').timestamp()
# 1547559000.0
If you have to cooerce multiple datetimes (where pd.to_datetime is your only option), you can initialize and map:
pd.to_datetime(['2019-01-15 13:30:00']).map(pd.Timestamp.timestamp)
# Float64Index([1547559000.0], dtype='float64')
Pros:
best method for a single datetime string
easy to remember
Cons:
not as performant as integer division
You can use timestamp() method which returns POSIX timestamp as float:
pd.Timestamp('2021-04-01').timestamp()
[Out]:
1617235200.0
pd.Timestamp('2021-04-01 00:02:35.234').timestamp()
[Out]:
1617235355.234
value attribute of the pandas Timestamp holds the unix epoch. This value is in nanoseconds. So you can convert to ms or us by diving by 1e3 or 1e6. Check the code below.
import pandas as pd
date_1 = pd.to_datetime('2020-07-18 18:50:00')
print(date_1.value)
When you calculate the difference between two datetimes, the dtype of the difference is timedelta64[ns] by default (ns in brackets). By changing [ns] into [ms], [s], [m] etc as you cast the output to a new timedelta64 object, you can convert the difference into milliseconds, seconds, minutes etc.
For example, to find the number of seconds passed since Unix epoch, subtract datetimes and change dtype.
df_unix_sec = (df['time'] - pd.Timestamp('1970-01-01')).astype('timedelta64[s]')
N.B. Oftentimes, the differences are very large numbers, so if you want them as integers, use astype('int64') (NOT astype(int)).
df_unix_sec = (df['time'] - pd.Timestamp('1970-01-01')).astype('timedelta64[s]').astype('int64')
For OP's example, this would yield,
0 1547472343
Name: time, dtype: int64
In case you are accessing a particular datetime64 object from the dataframe, chances are that pandas will return a Timestamp object which is essentially how pandas stores datetime64 objects.
You can use pd.Timestamp.to_datetime64() method of the pd.Timestamp object to convert it to numpy.datetime64 object with ns precision.
I would like to create a column in a pandas data frame that is an integer representation of the number of days in a timedelta column. Is it possible to use 'datetime.days' or do I need to do something more manual?
timedelta column
7 days, 23:29:00
day integer column
7
The Series class has a pandas.Series.dt accessor object with several
useful datetime attributes, including dt.days. Access this attribute via:
timedelta_series.dt.days
You can also get the seconds and microseconds attributes in the same way.
You could do this, where td is your series of timedeltas. The division converts the nanosecond deltas into day deltas, and the conversion to int drops to whole days.
import numpy as np
(td / np.timedelta64(1, 'D')).astype(int)
Timedelta objects have read-only instance attributes .days, .seconds, and .microseconds.
If the question isn't just "how to access an integer form of the timedelta?" but "how to convert the timedelta column in the dataframe to an int?" the answer might be a little different. In addition to the .dt.days accessor you need either df.astype or pd.to_numeric
Either of these options should help:
df['tdColumn'] = pd.to_numeric(df['tdColumn'].dt.days, downcast='integer')
or
df['tdColumn'] = df['tdColumn'].dt.days.astype('int16')
The simplest way to do this is by
df["DateColumn"] = (df["DateColumn"]).dt.days
A great way to do this is
dif_in_days = dif.days
(where dif is the difference between dates)
From the official documentation of pandas.to_datetime we can say,
unit : string, default ‘ns’
unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or
float number. This will be based off the origin. Example, with
unit=’ms’ and origin=’unix’ (the default), this would calculate the
number of milliseconds to the unix epoch start.
So when I try like this way,
import pandas as pd
df = pd.DataFrame({'time': [pd.to_datetime('2019-01-15 13:25:43')]})
df_unix_sec = pd.to_datetime(df['time'], unit='ms', origin='unix')
print(df)
print(df_unix_sec)
time
0 2019-01-15 13:25:43
0 2019-01-15 13:25:43
Name: time, dtype: datetime64[ns]
Output is not changing for the latter one. Every time it is showing the datetime value not number of milliseconds to the unix epoch start for the 2nd one. Why is that? Am I missing something?
I think you misunderstood what the argument is for. The purpose of origin='unix' is to convert an integer timestamp to datetime, not the other way.
pd.to_datetime(1.547559e+09, unit='s', origin='unix')
# Timestamp('2019-01-15 13:30:00')
Here are some options:
Option 1: integer division
Conversely, you can get the timestamp by converting to integer (to get nanoseconds) and divide by 109.
pd.to_datetime(['2019-01-15 13:30:00']).astype(int) / 10**9
# Float64Index([1547559000.0], dtype='float64')
Pros:
super fast
Cons:
makes assumptions about how pandas internally stores dates
Option 2: recommended by pandas
Pandas docs recommend using the following method:
# create test data
dates = pd.to_datetime(['2019-01-15 13:30:00'])
# calculate unix datetime
(dates - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')
[out]:
Int64Index([1547559000], dtype='int64')
Pros:
"idiomatic", recommended by the library
Cons:
unweildy
not as performant as integer division
Option 3: pd.Timestamp
If you have a single date string, you can use pd.Timestamp as shown in the other answer:
pd.Timestamp('2019-01-15 13:30:00').timestamp()
# 1547559000.0
If you have to cooerce multiple datetimes (where pd.to_datetime is your only option), you can initialize and map:
pd.to_datetime(['2019-01-15 13:30:00']).map(pd.Timestamp.timestamp)
# Float64Index([1547559000.0], dtype='float64')
Pros:
best method for a single datetime string
easy to remember
Cons:
not as performant as integer division
You can use timestamp() method which returns POSIX timestamp as float:
pd.Timestamp('2021-04-01').timestamp()
[Out]:
1617235200.0
pd.Timestamp('2021-04-01 00:02:35.234').timestamp()
[Out]:
1617235355.234
value attribute of the pandas Timestamp holds the unix epoch. This value is in nanoseconds. So you can convert to ms or us by diving by 1e3 or 1e6. Check the code below.
import pandas as pd
date_1 = pd.to_datetime('2020-07-18 18:50:00')
print(date_1.value)
When you calculate the difference between two datetimes, the dtype of the difference is timedelta64[ns] by default (ns in brackets). By changing [ns] into [ms], [s], [m] etc as you cast the output to a new timedelta64 object, you can convert the difference into milliseconds, seconds, minutes etc.
For example, to find the number of seconds passed since Unix epoch, subtract datetimes and change dtype.
df_unix_sec = (df['time'] - pd.Timestamp('1970-01-01')).astype('timedelta64[s]')
N.B. Oftentimes, the differences are very large numbers, so if you want them as integers, use astype('int64') (NOT astype(int)).
df_unix_sec = (df['time'] - pd.Timestamp('1970-01-01')).astype('timedelta64[s]').astype('int64')
For OP's example, this would yield,
0 1547472343
Name: time, dtype: int64
In case you are accessing a particular datetime64 object from the dataframe, chances are that pandas will return a Timestamp object which is essentially how pandas stores datetime64 objects.
You can use pd.Timestamp.to_datetime64() method of the pd.Timestamp object to convert it to numpy.datetime64 object with ns precision.
I was having trouble manipulating a time-series data provided to me for a project. The data contains the number of flight bookings made on a website per second in a duration of 30 minutes. Here is a part of the column containing the timestamp
>>> df['Date_time']
0 7/14/2017 2:14:14 PM
1 7/14/2017 2:14:37 PM
2 7/14/2017 2:14:38 PM
I wanted to do
>>> pd.set_index('Date_time')
and use the datetime and timedelta methods provided by pandas to generate the timestamp to be used as index to access and modify any value in any cell.
Something like
>>> td=datetime(year=2017,month=7,day=14,hour=2,minute=14,second=36)
>>> td1=dt.timedelta(minutes=1,seconds=58)
>>> ti1=td1+td
>>> df.at[ti1,'column_name']=65000
But the timestamp generated is of the form
>>> print(ti1)
2017-07-14 02:16:34
Which cannot be directly used as an index in my case as can be clearly seen. Is there a workaround for the above case without writing additional methods myself?
I want to do the above as it provides me greater level of control over the data than looking for the default numerical index for each row I want to update and hence will prove more efficient accordig to me
Can you check the dtype of the 'Date_time' column and confirm for me that it is string (object) ?
df.dtypes
If so, you should be able to cast the values to pd.Timestamp by using the following.
df['timestamp'] = df['Date_time'].apply(pd.Timestamp)
When we call .dtypes now, we should have a 'timestamp' field of type datetime64[ns], which allows us to use builtin pandas methods more easily.
I would suggest it is prudent to index the dataframe by the timestamp too, achieved by setting the index equal to that column.
df.set_index('timestamp', inplace=True)
We should now be able to use some more useful methods such as
df.loc[timestamp_to_check, :]
df.loc[start_time_stamp : end_timestamp, : ]
df.asof(timestamp_to_check)
to lookup values from the DataFrame based upon passing a datetime.datetime / pd.Timestamp / np.datetime64 into the above. Note that you will need to cast any string (object) 'lookups' to one of the above types in order to make use of the above correctly.
I prefer to use pd.Timestamp() - https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Timestamp.html to handle datetime conversion from strings unless I am explicitly certain of what format the datetime string is always going to be in.