numpy datetime64 from unix utc seconds - python

Note: I think datetime64 is doing the right thing. So I'll just leave the post up in case it's useful.
As of numpy 1.7.0, seconds passed in to a np.datetime64 are interpreted as being in a local timezone. Is there a clean and fast way to import a unix utc seconds to np.datetime64? I've got arrays with 50M of these and it seems that there should be a way to tell np.datetime64 that my seconds value is UTC, no?
datetime.datetime.utcfromtimestamp(1338624706)
datetime.datetime(2012, 6, 2, 8, 11, 46) # this is the time I'm looking for
np.datetime64(1338624706, 's')
numpy.datetime64('2012-06-02T01:11:46-0700') # Darn you ISO! Off by 7 hours
dt64 = np.datetime64(1338624706, 's')
dt64.astype(datetime.datetime)
datetime.datetime(2012, 6, 2, 8, 11, 46) # Wait, did it do the right thing?
# This seems like the best option at the moment,
# but requires building datetime.datetime objects:
dt64 = np.datetime64(datetime.datetime.utcfromtimestamp(1338624706))
numpy.datetime64('2012-06-02T01:11:46.000000-0700') # Show this
dt64.astype(datetime.datetime)
datetime.datetime(2012, 6, 2, 8, 11, 46) # Looks like it worked
I really do not want to resort to string operations. I would be nice to be able to convert an array of unix utc ints or floats straight to the correct dt64.
https://stackoverflow.com/a/13704307/417578 implies that numpy 1.8.0 might do what I want, but is there something that will work in 1.7.0?

Heres another way in pandas
(which handles the quirks in different versions of numpy datetime64 correctly,
so this works in numpy 1.6.2) - I think u might need current master for this (0.11-dev)
# obviously replace this by your utc seconds
# need to convert to the default in pandas of datetime64[ns]
z = pd.Series([(1338624706 + i)*1e9 for i in range(50)],dtype='datetime64[ns]')
In [35]: z.head()
Out[35]:
0 2012-06-02 08:11:46
1 2012-06-02 08:11:47
2 2012-06-02 08:11:48
3 2012-06-02 08:11:49
4 2012-06-02 08:11:50
Dtype: datetime64[ns]
# turn it into a DatetimeIndex and localize
lidx = pd.DatetimeIndex(z).tz_localize('UTC')
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-06-02 08:11:46, ..., 2012-06-02 08:12:35]
Length: 50, Freq: None, Timezone: UTC
# now you have a nice object to say convert timezones
In [44]: lidx.tz_convert('US/Eastern')
Out[44]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-06-02 04:11:46, ..., 2012-06-02 04:12:35]
Length: 50, Freq: None, Timezone: US/Eastern

Perhaps I misunderstand the question, but isn't the timezone just a display issue?
utc_time = datetime.datetime.utcnow()
print utc_time
dt64 = np.datetime64(utc_time)
print dt64
print dt64.astype(datetime.datetime)
2013-02-24 17:30:53.586297
2013-02-24T11:30:53.586297-0600
2013-02-24 17:30:53.586297
The time hasn't been 'changed' in any way:
some_time = datetime.datetime.utcfromtimestamp(1338624706)
dt64 = np.datetime64(1338624706,'s')
print dt64.astype(int64)
1338624706
This is as of numpy 1.7.

Related

Python: How to Drop Hours from a Datetime? [duplicate]

I have a date string and want to convert it to the date type:
I have tried to use datetime.datetime.strptime with the format that I want but it is returning the time with the conversion.
when = alldates[int(daypos[0])]
print when, type(when)
then = datetime.datetime.strptime(when, '%Y-%m-%d')
print then, type(then)
This is what the output returns:
2013-05-07 <type 'str'>
2013-05-07 00:00:00 <type 'datetime.datetime'>
I need to remove the time: 00:00:00.
print then.date()
What you want is a datetime.date object. What you have is a datetime.datetime object. You can either change the object when you print as per above, or do the following when creating the object:
then = datetime.datetime.strptime(when, '%Y-%m-%d').date()
If you need the result to be timezone-aware, you can use the replace() method of datetime objects. This preserves timezone, so you can do
>>> from django.utils import timezone
>>> now = timezone.now()
>>> now
datetime.datetime(2018, 8, 30, 14, 15, 43, 726252, tzinfo=<UTC>)
>>> now.replace(hour=0, minute=0, second=0, microsecond=0)
datetime.datetime(2018, 8, 30, 0, 0, tzinfo=<UTC>)
Note that this returns a new datetime object -- now remains unchanged.
>>> print then.date(), type(then.date())
2013-05-07 <type 'datetime.date'>
To convert a string into a date, the easiest way AFAIK is the dateutil module:
import dateutil.parser
datetime_object = dateutil.parser.parse("2013-05-07")
It can also handle time zones:
print(dateutil.parser.parse("2013-05-07"))
>>> datetime.datetime(2013, 5, 7, 1, 12, 12, tzinfo=tzutc())
If you have a datetime object, say:
import pytz
import datetime
now = datetime.datetime.now(pytz.UTC)
and you want chop off the time part, then I think it is easier to construct a new object instead of "substracting the time part". It is shorter and more bullet proof:
date_part datetime.datetime(now.year, now.month, now.day, tzinfo=now.tzinfo)
It also keeps the time zone information, it is easier to read and understand than a timedelta substraction, and you also have the option to give a different time zone in the same step (which makes sense, since you will have zero time part anyway).
For me, I needed to KEEP a timetime object because I was using UTC and it's a bit of a pain. So, this is what I ended up doing:
date = datetime.datetime.utcnow()
start_of_day = date - datetime.timedelta(
hours=date.hour,
minutes=date.minute,
seconds=date.second,
microseconds=date.microsecond
)
end_of_day = start_of_day + datetime.timedelta(
hours=23,
minutes=59,
seconds=59
)
Example output:
>>> date
datetime.datetime(2016, 10, 14, 17, 21, 5, 511600)
>>> start_of_day
datetime.datetime(2016, 10, 14, 0, 0)
>>> end_of_day
datetime.datetime(2016, 10, 14, 23, 59, 59)
If you specifically want a datetime and not a date but want the time zero'd out you could combine date with datetime.min.time()
Example:
datetime.datetime.combine(datetime.datetime.today().date(),
datetime.datetime.min.time())
You can use simply pd.to_datetime(then) and pandas will convert the date elements into ISO date format- [YYYY-MM-DD].
You can pass this as map/apply to use it in a dataframe/series too.
You can usee the following code:
week_start = str(datetime.today() - timedelta(days=datetime.today().weekday() % 7)).split(' ')[0]

Convert the 'datetime.date' to a datetime with 'pd.Timestamp' [duplicate]

How do I convert a numpy.datetime64 object to a datetime.datetime (or Timestamp)?
In the following code, I create a datetime, timestamp and datetime64 objects.
import datetime
import numpy as np
import pandas as pd
dt = datetime.datetime(2012, 5, 1)
# A strange way to extract a Timestamp object, there's surely a better way?
ts = pd.DatetimeIndex([dt])[0]
dt64 = np.datetime64(dt)
In [7]: dt
Out[7]: datetime.datetime(2012, 5, 1, 0, 0)
In [8]: ts
Out[8]: <Timestamp: 2012-05-01 00:00:00>
In [9]: dt64
Out[9]: numpy.datetime64('2012-05-01T01:00:00.000000+0100')
Note: it's easy to get the datetime from the Timestamp:
In [10]: ts.to_datetime()
Out[10]: datetime.datetime(2012, 5, 1, 0, 0)
But how do we extract the datetime or Timestamp from a numpy.datetime64 (dt64)?
.
Update: a somewhat nasty example in my dataset (perhaps the motivating example) seems to be:
dt64 = numpy.datetime64('2002-06-28T01:00:00.000000000+0100')
which should be datetime.datetime(2002, 6, 28, 1, 0), and not a long (!) (1025222400000000000L)...
You can just use the pd.Timestamp constructor. The following diagram may be useful for this and related questions.
Welcome to hell.
You can just pass a datetime64 object to pandas.Timestamp:
In [16]: Timestamp(numpy.datetime64('2012-05-01T01:00:00.000000'))
Out[16]: <Timestamp: 2012-05-01 01:00:00>
I noticed that this doesn't work right though in NumPy 1.6.1:
numpy.datetime64('2012-05-01T01:00:00.000000+0100')
Also, pandas.to_datetime can be used (this is off of the dev version, haven't checked v0.9.1):
In [24]: pandas.to_datetime('2012-05-01T01:00:00.000000+0100')
Out[24]: datetime.datetime(2012, 5, 1, 1, 0, tzinfo=tzoffset(None, 3600))
To convert numpy.datetime64 to datetime object that represents time in UTC on numpy-1.8:
>>> from datetime import datetime
>>> import numpy as np
>>> dt = datetime.utcnow()
>>> dt
datetime.datetime(2012, 12, 4, 19, 51, 25, 362455)
>>> dt64 = np.datetime64(dt)
>>> ts = (dt64 - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')
>>> ts
1354650685.3624549
>>> datetime.utcfromtimestamp(ts)
datetime.datetime(2012, 12, 4, 19, 51, 25, 362455)
>>> np.__version__
'1.8.0.dev-7b75899'
The above example assumes that a naive datetime object is interpreted by np.datetime64 as time in UTC.
To convert datetime to np.datetime64 and back (numpy-1.6):
>>> np.datetime64(datetime.utcnow()).astype(datetime)
datetime.datetime(2012, 12, 4, 13, 34, 52, 827542)
It works both on a single np.datetime64 object and a numpy array of np.datetime64.
Think of np.datetime64 the same way you would about np.int8, np.int16, etc and apply the same methods to convert between Python objects such as int, datetime and corresponding numpy objects.
Your "nasty example" works correctly:
>>> from datetime import datetime
>>> import numpy
>>> numpy.datetime64('2002-06-28T01:00:00.000000000+0100').astype(datetime)
datetime.datetime(2002, 6, 28, 0, 0)
>>> numpy.__version__
'1.6.2' # current version available via pip install numpy
I can reproduce the long value on numpy-1.8.0 installed as:
pip install git+https://github.com/numpy/numpy.git#egg=numpy-dev
The same example:
>>> from datetime import datetime
>>> import numpy
>>> numpy.datetime64('2002-06-28T01:00:00.000000000+0100').astype(datetime)
1025222400000000000L
>>> numpy.__version__
'1.8.0.dev-7b75899'
It returns long because for numpy.datetime64 type .astype(datetime) is equivalent to .astype(object) that returns Python integer (long) on numpy-1.8.
To get datetime object you could:
>>> dt64.dtype
dtype('<M8[ns]')
>>> ns = 1e-9 # number of seconds in a nanosecond
>>> datetime.utcfromtimestamp(dt64.astype(int) * ns)
datetime.datetime(2002, 6, 28, 0, 0)
To get datetime64 that uses seconds directly:
>>> dt64 = numpy.datetime64('2002-06-28T01:00:00.000000000+0100', 's')
>>> dt64.dtype
dtype('<M8[s]')
>>> datetime.utcfromtimestamp(dt64.astype(int))
datetime.datetime(2002, 6, 28, 0, 0)
The numpy docs say that the datetime API is experimental and may change in future numpy versions.
I think there could be a more consolidated effort in an answer to better explain the relationship between Python's datetime module, numpy's datetime64/timedelta64 and pandas' Timestamp/Timedelta objects.
The datetime standard library of Python
The datetime standard library has four main objects
time - only time, measured in hours, minutes, seconds and microseconds
date - only year, month and day
datetime - All components of time and date
timedelta - An amount of time with maximum unit of days
Create these four objects
>>> import datetime
>>> datetime.time(hour=4, minute=3, second=10, microsecond=7199)
datetime.time(4, 3, 10, 7199)
>>> datetime.date(year=2017, month=10, day=24)
datetime.date(2017, 10, 24)
>>> datetime.datetime(year=2017, month=10, day=24, hour=4, minute=3, second=10, microsecond=7199)
datetime.datetime(2017, 10, 24, 4, 3, 10, 7199)
>>> datetime.timedelta(days=3, minutes = 55)
datetime.timedelta(3, 3300)
>>> # add timedelta to datetime
>>> datetime.timedelta(days=3, minutes = 55) + \
datetime.datetime(year=2017, month=10, day=24, hour=4, minute=3, second=10, microsecond=7199)
datetime.datetime(2017, 10, 27, 4, 58, 10, 7199)
NumPy's datetime64 and timedelta64 objects
NumPy has no separate date and time objects, just a single datetime64 object to represent a single moment in time. The datetime module's datetime object has microsecond precision (one-millionth of a second). NumPy's datetime64 object allows you to set its precision from hours all the way to attoseconds (10 ^ -18). It's constructor is more flexible and can take a variety of inputs.
Construct NumPy's datetime64 and timedelta64 objects
Pass an integer with a string for the units. See all units here. It gets converted to that many units after the UNIX epoch: Jan 1, 1970
>>> np.datetime64(5, 'ns')
numpy.datetime64('1970-01-01T00:00:00.000000005')
>>> np.datetime64(1508887504, 's')
numpy.datetime64('2017-10-24T23:25:04')
You can also use strings as long as they are in ISO 8601 format.
>>> np.datetime64('2017-10-24')
numpy.datetime64('2017-10-24')
Timedeltas have a single unit
>>> np.timedelta64(5, 'D') # 5 days
>>> np.timedelta64(10, 'h') 10 hours
Can also create them by subtracting two datetime64 objects
>>> np.datetime64('2017-10-24T05:30:45.67') - np.datetime64('2017-10-22T12:35:40.123')
numpy.timedelta64(147305547,'ms')
Pandas Timestamp and Timedelta build much more functionality on top of NumPy
A pandas Timestamp is a moment in time very similar to a datetime but with much more functionality. You can construct them with either pd.Timestamp or pd.to_datetime.
>>> pd.Timestamp(1239.1238934) #defaults to nanoseconds
Timestamp('1970-01-01 00:00:00.000001239')
>>> pd.Timestamp(1239.1238934, unit='D') # change units
Timestamp('1973-05-24 02:58:24.355200')
>>> pd.Timestamp('2017-10-24 05') # partial strings work
Timestamp('2017-10-24 05:00:00')
pd.to_datetime works very similarly (with a few more options) and can convert a list of strings into Timestamps.
>>> pd.to_datetime('2017-10-24 05')
Timestamp('2017-10-24 05:00:00')
>>> pd.to_datetime(['2017-1-1', '2017-1-2'])
DatetimeIndex(['2017-01-01', '2017-01-02'], dtype='datetime64[ns]', freq=None)
Converting Python datetime to datetime64 and Timestamp
>>> dt = datetime.datetime(year=2017, month=10, day=24, hour=4,
minute=3, second=10, microsecond=7199)
>>> np.datetime64(dt)
numpy.datetime64('2017-10-24T04:03:10.007199')
>>> pd.Timestamp(dt) # or pd.to_datetime(dt)
Timestamp('2017-10-24 04:03:10.007199')
Converting numpy datetime64 to datetime and Timestamp
>>> dt64 = np.datetime64('2017-10-24 05:34:20.123456')
>>> unix_epoch = np.datetime64(0, 's')
>>> one_second = np.timedelta64(1, 's')
>>> seconds_since_epoch = (dt64 - unix_epoch) / one_second
>>> seconds_since_epoch
1508823260.123456
>>> datetime.datetime.utcfromtimestamp(seconds_since_epoch)
>>> datetime.datetime(2017, 10, 24, 5, 34, 20, 123456)
Convert to Timestamp
>>> pd.Timestamp(dt64)
Timestamp('2017-10-24 05:34:20.123456')
Convert from Timestamp to datetime and datetime64
This is quite easy as pandas timestamps are very powerful
>>> ts = pd.Timestamp('2017-10-24 04:24:33.654321')
>>> ts.to_pydatetime() # Python's datetime
datetime.datetime(2017, 10, 24, 4, 24, 33, 654321)
>>> ts.to_datetime64()
numpy.datetime64('2017-10-24T04:24:33.654321000')
>>> dt64.tolist()
datetime.datetime(2012, 5, 1, 0, 0)
For DatetimeIndex, the tolist returns a list of datetime objects. For a single datetime64 object it returns a single datetime object.
One option is to use str, and then to_datetime (or similar):
In [11]: str(dt64)
Out[11]: '2012-05-01T01:00:00.000000+0100'
In [12]: pd.to_datetime(str(dt64))
Out[12]: datetime.datetime(2012, 5, 1, 1, 0, tzinfo=tzoffset(None, 3600))
Note: it is not equal to dt because it's become "offset-aware":
In [13]: pd.to_datetime(str(dt64)).replace(tzinfo=None)
Out[13]: datetime.datetime(2012, 5, 1, 1, 0)
This seems inelegant.
.
Update: this can deal with the "nasty example":
In [21]: dt64 = numpy.datetime64('2002-06-28T01:00:00.000000000+0100')
In [22]: pd.to_datetime(str(dt64)).replace(tzinfo=None)
Out[22]: datetime.datetime(2002, 6, 28, 1, 0)
If you want to convert an entire pandas series of datetimes to regular python datetimes, you can also use .to_pydatetime().
pd.date_range('20110101','20110102',freq='H').to_pydatetime()
> [datetime.datetime(2011, 1, 1, 0, 0) datetime.datetime(2011, 1, 1, 1, 0)
datetime.datetime(2011, 1, 1, 2, 0) datetime.datetime(2011, 1, 1, 3, 0)
....
It also supports timezones:
pd.date_range('20110101','20110102',freq='H').tz_localize('UTC').tz_convert('Australia/Sydney').to_pydatetime()
[ datetime.datetime(2011, 1, 1, 11, 0, tzinfo=<DstTzInfo 'Australia/Sydney' EST+11:00:00 DST>)
datetime.datetime(2011, 1, 1, 12, 0, tzinfo=<DstTzInfo 'Australia/Sydney' EST+11:00:00 DST>)
....
NOTE: If you are operating on a Pandas Series you cannot call to_pydatetime() on the entire series. You will need to call .to_pydatetime() on each individual datetime64 using a list comprehension or something similar:
datetimes = [val.to_pydatetime() for val in df.problem_datetime_column]
This post has been up for 4 years and I still struggled with this conversion problem - so the issue is still active in 2017 in some sense. I was somewhat shocked that the numpy documentation does not readily offer a simple conversion algorithm but that's another story.
I have come across another way to do the conversion that only involves modules numpy and datetime, it does not require pandas to be imported which seems to me to be a lot of code to import for such a simple conversion. I noticed that datetime64.astype(datetime.datetime) will return a datetime.datetime object if the original datetime64 is in micro-second units while other units return an integer timestamp. I use module xarray for data I/O from Netcdf files which uses the datetime64 in nanosecond units making the conversion fail unless you first convert to micro-second units. Here is the example conversion code,
import numpy as np
import datetime
def convert_datetime64_to_datetime( usert: np.datetime64 )->datetime.datetime:
t = np.datetime64( usert, 'us').astype(datetime.datetime)
return t
Its only tested on my machine, which is Python 3.6 with a recent 2017 Anaconda distribution. I have only looked at scalar conversion and have not checked array based conversions although I'm guessing it will be good. Nor have I looked at the numpy datetime64 source code to see if the operation makes sense or not.
import numpy as np
import pandas as pd
def np64toDate(np64):
return pd.to_datetime(str(np64)).replace(tzinfo=None).to_datetime()
use this function to get pythons native datetime object
I've come back to this answer more times than I can count, so I decided to throw together a quick little class, which converts a Numpy datetime64 value to Python datetime value. I hope it helps others out there.
from datetime import datetime
import pandas as pd
class NumpyConverter(object):
#classmethod
def to_datetime(cls, dt64, tzinfo=None):
"""
Converts a Numpy datetime64 to a Python datetime.
:param dt64: A Numpy datetime64 variable
:type dt64: numpy.datetime64
:param tzinfo: The timezone the date / time value is in
:type tzinfo: pytz.timezone
:return: A Python datetime variable
:rtype: datetime
"""
ts = pd.to_datetime(dt64)
if tzinfo is not None:
return datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second, tzinfo=tzinfo)
return datetime(ts.year, ts.month, ts.day, ts.hour, ts.minute, ts.second)
I'm gonna keep this in my tool bag, something tells me I'll need it again.
I did like this
import pandas as pd
# Custom function to convert Pandas Datetime to Timestamp
def toTimestamp(data):
return data.timestamp()
# Read a csv file
df = pd.read_csv("friends.csv")
# Replace the "birthdate" column by:
# 1. Transform to datetime
# 2. Apply the custom function to the column just converted
df["birthdate"] = pd.to_datetime(df["birthdate"]).apply(toTimestamp)
Some solutions work well for me but numpy will deprecate some parameters.
The solution that work better for me is to read the date as a pandas datetime and excract explicitly the year, month and day of a pandas object.
The following code works for the most common situation.
def format_dates(dates):
dt = pd.to_datetime(dates)
try: return [datetime.date(x.year, x.month, x.day) for x in dt]
except TypeError: return datetime.date(dt.year, dt.month, dt.day)
Only way I managed to convert a column 'date' in pandas dataframe containing time info to numpy array was as following: (dataframe is read from csv file "csvIn.csv")
import pandas as pd
import numpy as np
df = pd.read_csv("csvIn.csv")
df["date"] = pd.to_datetime(df["date"])
timestamps = np.array([np.datetime64(value) for dummy, value in df["date"].items()])
indeed, all of these datetime types can be difficult, and potentially problematic (must keep careful track of timezone information). here's what i have done, though i admit that i am concerned that at least part of it is "not by design". also, this can be made a bit more compact as needed.
starting with a numpy.datetime64 dt_a:
dt_a
numpy.datetime64('2015-04-24T23:11:26.270000-0700')
dt_a1 = dt_a.tolist() # yields a datetime object in UTC, but without tzinfo
dt_a1
datetime.datetime(2015, 4, 25, 6, 11, 26, 270000)
# now, make your "aware" datetime:
dt_a2=datetime.datetime(*list(dt_a1.timetuple()[:6]) + [dt_a1.microsecond], tzinfo=pytz.timezone('UTC'))
... and of course, that can be compressed into one line as needed.

Display Python datetime without time

I have a date string and want to convert it to the date type:
I have tried to use datetime.datetime.strptime with the format that I want but it is returning the time with the conversion.
when = alldates[int(daypos[0])]
print when, type(when)
then = datetime.datetime.strptime(when, '%Y-%m-%d')
print then, type(then)
This is what the output returns:
2013-05-07 <type 'str'>
2013-05-07 00:00:00 <type 'datetime.datetime'>
I need to remove the time: 00:00:00.
print then.date()
What you want is a datetime.date object. What you have is a datetime.datetime object. You can either change the object when you print as per above, or do the following when creating the object:
then = datetime.datetime.strptime(when, '%Y-%m-%d').date()
If you need the result to be timezone-aware, you can use the replace() method of datetime objects. This preserves timezone, so you can do
>>> from django.utils import timezone
>>> now = timezone.now()
>>> now
datetime.datetime(2018, 8, 30, 14, 15, 43, 726252, tzinfo=<UTC>)
>>> now.replace(hour=0, minute=0, second=0, microsecond=0)
datetime.datetime(2018, 8, 30, 0, 0, tzinfo=<UTC>)
Note that this returns a new datetime object -- now remains unchanged.
>>> print then.date(), type(then.date())
2013-05-07 <type 'datetime.date'>
To convert a string into a date, the easiest way AFAIK is the dateutil module:
import dateutil.parser
datetime_object = dateutil.parser.parse("2013-05-07")
It can also handle time zones:
print(dateutil.parser.parse("2013-05-07"))
>>> datetime.datetime(2013, 5, 7, 1, 12, 12, tzinfo=tzutc())
If you have a datetime object, say:
import pytz
import datetime
now = datetime.datetime.now(pytz.UTC)
and you want chop off the time part, then I think it is easier to construct a new object instead of "substracting the time part". It is shorter and more bullet proof:
date_part datetime.datetime(now.year, now.month, now.day, tzinfo=now.tzinfo)
It also keeps the time zone information, it is easier to read and understand than a timedelta substraction, and you also have the option to give a different time zone in the same step (which makes sense, since you will have zero time part anyway).
For me, I needed to KEEP a timetime object because I was using UTC and it's a bit of a pain. So, this is what I ended up doing:
date = datetime.datetime.utcnow()
start_of_day = date - datetime.timedelta(
hours=date.hour,
minutes=date.minute,
seconds=date.second,
microseconds=date.microsecond
)
end_of_day = start_of_day + datetime.timedelta(
hours=23,
minutes=59,
seconds=59
)
Example output:
>>> date
datetime.datetime(2016, 10, 14, 17, 21, 5, 511600)
>>> start_of_day
datetime.datetime(2016, 10, 14, 0, 0)
>>> end_of_day
datetime.datetime(2016, 10, 14, 23, 59, 59)
If you specifically want a datetime and not a date but want the time zero'd out you could combine date with datetime.min.time()
Example:
datetime.datetime.combine(datetime.datetime.today().date(),
datetime.datetime.min.time())
You can use simply pd.to_datetime(then) and pandas will convert the date elements into ISO date format- [YYYY-MM-DD].
You can pass this as map/apply to use it in a dataframe/series too.
You can usee the following code:
week_start = str(datetime.today() - timedelta(days=datetime.today().weekday() % 7)).split(' ')[0]

Using pandas to convert string timestamps

I've been attempting to use pandas.to_datetime to convert between timestamp formats in my code base, however when provided with a string input sometimes pandas does not seem to extract the UTC offset correctly:
Here is a correct conversion, the UTC offset is correctly captured as reflected in the Timestamp object:
In[76]: pd.to_datetime('2014-04-09T15:29:59.999993-0500', utc=True)
Out[76]: Timestamp('2014-04-09 20:29:59.999993+0000', tz='UTC')
Here is an alternate string representation which is still a valid ISO 8601 datetime strings but the UTC offset of -0500 seems to be ignored:
In[77]: pd.to_datetime('2014-04-09T152959.999993-0500', utc=True)
Out[77]: Timestamp('2014-04-09 15:29:59.999993+0000', tz='UTC')
On the other hand the dateutil package handles things fine:
In[78]: dateutil.parser.parse('2014-04-09T152959.999993-0500')
Out[78]: datetime.datetime(2014, 4, 9, 15, 29, 59, 999993, tzinfo=tzoffset(None, -18000))
I could certainly use dateutil but is there some reason that pandas.to_datetime does not handle different ISO date time strings correctly. Am I doing something wrong here?
Using Python 2.7.6 and pandas 0.13.1
Using pandas 0.14.0: both calls to pd.to_datetime return the correct, timezone-aware Timestamp:
In [72]: pd.__version__
Out[72]: '0.14.0'
In [69]: pd.to_datetime('2014-04-09T152959.999993-0500', utc=True)
Out[69]: Timestamp('2014-04-09 20:29:59.999993+0000', tz='UTC')
In [70]: pd.to_datetime('2014-04-09T15:29:59.999993-0500', utc=True)
Out[70]: Timestamp('2014-04-09 20:29:59.999993+0000', tz='UTC')
In [71]: dateutil.parser.parse('2014-04-09T152959.999993-0500').astimezone(pytz.utc)
Out[71]: datetime.datetime(2014, 4, 9, 20, 29, 59, 999993, tzinfo=<UTC>)

Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone

You can use the function tz_localize to make a Timestamp or DateTimeIndex timezone aware, but how can you do the opposite: how can you convert a timezone aware Timestamp to a naive one, while preserving its timezone?
An example:
In [82]: t = pd.date_range(start="2013-05-18 12:00:00", periods=10, freq='s', tz="Europe/Brussels")
In [83]: t
Out[83]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 12:00:00, ..., 2013-05-18 12:00:09]
Length: 10, Freq: S, Timezone: Europe/Brussels
I could remove the timezone by setting it to None, but then the result is converted to UTC (12 o'clock became 10):
In [86]: t.tz = None
In [87]: t
Out[87]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 10:00:00, ..., 2013-05-18 10:00:09]
Length: 10, Freq: S, Timezone: None
Is there another way I can convert a DateTimeIndex to timezone naive, but while preserving the timezone it was set in?
Some context on the reason I am asking this: I want to work with timezone naive timeseries (to avoid the extra hassle with timezones, and I do not need them for the case I am working on).
But for some reason, I have to deal with a timezone-aware timeseries in my local timezone (Europe/Brussels). As all my other data are timezone naive (but represented in my local timezone), I want to convert this timeseries to naive to further work with it, but it also has to be represented in my local timezone (so just remove the timezone info, without converting the user-visible time to UTC).
I know the time is actually internal stored as UTC and only converted to another timezone when you represent it, so there has to be some kind of conversion when I want to "delocalize" it. For example, with the python datetime module you can "remove" the timezone like this:
In [119]: d = pd.Timestamp("2013-05-18 12:00:00", tz="Europe/Brussels")
In [120]: d
Out[120]: <Timestamp: 2013-05-18 12:00:00+0200 CEST, tz=Europe/Brussels>
In [121]: d.replace(tzinfo=None)
Out[121]: <Timestamp: 2013-05-18 12:00:00>
So, based on this, I could do the following, but I suppose this will not be very efficient when working with a larger timeseries:
In [124]: t
Out[124]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 12:00:00, ..., 2013-05-18 12:00:09]
Length: 10, Freq: S, Timezone: Europe/Brussels
In [125]: pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])
Out[125]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 12:00:00, ..., 2013-05-18 12:00:09]
Length: 10, Freq: None, Timezone: None
To answer my own question, this functionality has been added to pandas in the meantime. Starting from pandas 0.15.0, you can use tz_localize(None) to remove the timezone resulting in local time.
See the whatsnew entry: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#timezone-handling-improvements
So with my example from above:
In [4]: t = pd.date_range(start="2013-05-18 12:00:00", periods=2, freq='H',
tz= "Europe/Brussels")
In [5]: t
Out[5]: DatetimeIndex(['2013-05-18 12:00:00+02:00', '2013-05-18 13:00:00+02:00'],
dtype='datetime64[ns, Europe/Brussels]', freq='H')
using tz_localize(None) removes the timezone information resulting in naive local time:
In [6]: t.tz_localize(None)
Out[6]: DatetimeIndex(['2013-05-18 12:00:00', '2013-05-18 13:00:00'],
dtype='datetime64[ns]', freq='H')
Further, you can also use tz_convert(None) to remove the timezone information but converting to UTC, so yielding naive UTC time:
In [7]: t.tz_convert(None)
Out[7]: DatetimeIndex(['2013-05-18 10:00:00', '2013-05-18 11:00:00'],
dtype='datetime64[ns]', freq='H')
This is much more performant than the datetime.replace solution:
In [31]: t = pd.date_range(start="2013-05-18 12:00:00", periods=10000, freq='H',
tz="Europe/Brussels")
In [32]: %timeit t.tz_localize(None)
1000 loops, best of 3: 233 µs per loop
In [33]: %timeit pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])
10 loops, best of 3: 99.7 ms per loop
Because I always struggle to remember, a quick summary of what each of these do:
>>> pd.Timestamp.now() # naive local time
Timestamp('2019-10-07 10:30:19.428748')
>>> pd.Timestamp.utcnow() # tz aware UTC
Timestamp('2019-10-07 08:30:19.428748+0000', tz='UTC')
>>> pd.Timestamp.now(tz='Europe/Brussels') # tz aware local time
Timestamp('2019-10-07 10:30:19.428748+0200', tz='Europe/Brussels')
>>> pd.Timestamp.now(tz='Europe/Brussels').tz_localize(None) # naive local time
Timestamp('2019-10-07 10:30:19.428748')
>>> pd.Timestamp.now(tz='Europe/Brussels').tz_convert(None) # naive UTC
Timestamp('2019-10-07 08:30:19.428748')
>>> pd.Timestamp.utcnow().tz_localize(None) # naive UTC
Timestamp('2019-10-07 08:30:19.428748')
>>> pd.Timestamp.utcnow().tz_convert(None) # naive UTC
Timestamp('2019-10-07 08:30:19.428748')
I think you can't achieve what you want in a more efficient manner than you proposed.
The underlying problem is that the timestamps (as you seem aware) are made up of two parts. The data that represents the UTC time, and the timezone, tz_info. The timezone information is used only for display purposes when printing the timezone to the screen. At display time, the data is offset appropriately and +01:00 (or similar) is added to the string. Stripping off the tz_info value (using tz_convert(tz=None)) doesn't doesn't actually change the data that represents the naive part of the timestamp.
So, the only way to do what you want is to modify the underlying data (pandas doesn't allow this... DatetimeIndex are immutable -- see the help on DatetimeIndex), or to create a new set of timestamp objects and wrap them in a new DatetimeIndex. Your solution does the latter:
pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])
For reference, here is the replace method of Timestamp (see tslib.pyx):
def replace(self, **kwds):
return Timestamp(datetime.replace(self, **kwds),
offset=self.offset)
You can refer to the docs on datetime.datetime to see that datetime.datetime.replace also creates a new object.
If you can, your best bet for efficiency is to modify the source of the data so that it (incorrectly) reports the timestamps without their timezone. You mentioned:
I want to work with timezone naive timeseries (to avoid the extra hassle with timezones, and I do not need them for the case I am working on)
I'd be curious what extra hassle you are referring to. I recommend as a general rule for all software development, keep your timestamp 'naive values' in UTC. There is little worse than looking at two different int64 values wondering which timezone they belong to. If you always, always, always use UTC for the internal storage, then you will avoid countless headaches. My mantra is Timezones are for human I/O only.
The accepted solution does not work when there are multiple different timezones in a Series. It throws ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
The solution is to use the apply method.
Please see the examples below:
# Let's have a series `a` with different multiple timezones.
> a
0 2019-10-04 16:30:00+02:00
1 2019-10-07 16:00:00-04:00
2 2019-09-24 08:30:00-07:00
Name: localized, dtype: object
> a.iloc[0]
Timestamp('2019-10-04 16:30:00+0200', tz='Europe/Amsterdam')
# trying the accepted solution
> a.dt.tz_localize(None)
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
# Make it tz-naive. This is the solution:
> a.apply(lambda x:x.tz_localize(None))
0 2019-10-04 16:30:00
1 2019-10-07 16:00:00
2 2019-09-24 08:30:00
Name: localized, dtype: datetime64[ns]
# a.tz_convert() also does not work with multiple timezones, but this works:
> a.apply(lambda x:x.tz_convert('America/Los_Angeles'))
0 2019-10-04 07:30:00-07:00
1 2019-10-07 13:00:00-07:00
2 2019-09-24 08:30:00-07:00
Name: localized, dtype: datetime64[ns, America/Los_Angeles]
Setting the tz attribute of the index explicitly seems to work:
ts_utc = ts.tz_convert("UTC")
ts_utc.index.tz = None
Late contribution but just came across something similar in Python datetime and pandas give different timestamps for the same date.
If you have timezone-aware datetime in pandas, technically, tz_localize(None) changes the POSIX timestamp (that is used internally) as if the local time from the timestamp was UTC. Local in this context means local in the specified timezone. Ex:
import pandas as pd
t = pd.date_range(start="2013-05-18 12:00:00", periods=2, freq='H', tz="US/Central")
# DatetimeIndex(['2013-05-18 12:00:00-05:00', '2013-05-18 13:00:00-05:00'], dtype='datetime64[ns, US/Central]', freq='H')
t_loc = t.tz_localize(None)
# DatetimeIndex(['2013-05-18 12:00:00', '2013-05-18 13:00:00'], dtype='datetime64[ns]', freq='H')
# offset in seconds according to timezone:
(t_loc.values-t.values)//1e9
# array([-18000, -18000], dtype='timedelta64[ns]')
Note that this will leave you with strange things during DST transitions, e.g.
t = pd.date_range(start="2020-03-08 01:00:00", periods=2, freq='H', tz="US/Central")
(t.values[1]-t.values[0])//1e9
# numpy.timedelta64(3600,'ns')
t_loc = t.tz_localize(None)
(t_loc.values[1]-t_loc.values[0])//1e9
# numpy.timedelta64(7200,'ns')
In contrast, tz_convert(None) does not modify the internal timestamp, it just removes the tzinfo.
t_utc = t.tz_convert(None)
(t_utc.values-t.values)//1e9
# array([0, 0], dtype='timedelta64[ns]')
My bottom line would be: stick with timezone-aware datetime if you can or only use t.tz_convert(None) which doesn't modify the underlying POSIX timestamp. Just keep in mind that you're practically working with UTC then.
(Python 3.8.2 x64 on Windows 10, pandas v1.0.5.)
Building on D.A.'s suggestion that "the only way to do what you want is to modify the underlying data" and using numpy to modify the underlying data...
This works for me, and is pretty fast:
def tz_to_naive(datetime_index):
"""Converts a tz-aware DatetimeIndex into a tz-naive DatetimeIndex,
effectively baking the timezone into the internal representation.
Parameters
----------
datetime_index : pandas.DatetimeIndex, tz-aware
Returns
-------
pandas.DatetimeIndex, tz-naive
"""
# Calculate timezone offset relative to UTC
timestamp = datetime_index[0]
tz_offset = (timestamp.replace(tzinfo=None) -
timestamp.tz_convert('UTC').replace(tzinfo=None))
tz_offset_td64 = np.timedelta64(tz_offset)
# Now convert to naive DatetimeIndex
return pd.DatetimeIndex(datetime_index.values + tz_offset_td64)
The most important thing is add tzinfo when you define a datetime object.
from datetime import datetime, timezone
from tzinfo_examples import HOUR, Eastern
u0 = datetime(2016, 3, 13, 5, tzinfo=timezone.utc)
for i in range(4):
u = u0 + i*HOUR
t = u.astimezone(Eastern)
print(u.time(), 'UTC =', t.time(), t.tzname())
How I handled this problem with a 15-min frequency datetimeindex in europe.
If you are in the situation where you have a timezone aware (Europe/Amsterdam in my case) index and want to convert it into a timezone naive index by transforming everything into local time, you will have dst problems, namely
there will be 1 hour missing on the last sunday of march (when europe switches to summer time)
there will be 1 hour duplicate on the last sunday of october (when europe switches to summer time)
Here is how you can handle it:
# make index tz naive
df.index = df.index.tz_localize(None)
# handle dst
if df.index[0].month == 3:
# last sunday of march, one hour is lost
df = df.resample("15min").pad()
if df.index[0].month == 10:
# in october, one hour is added
df = df[~df.index.duplicated(keep='last')]
Note: in my case, I run the above code on a df that contains only a single month, hence I do df.index[0].month to find out the month. If yours contains more months, you should probably index it differently to know when to do DST.
It consists of resampling from the last valid value in march, to avoid losing the 1 hour (in my case, all my data is in 15 min intervals, hence i resample like that. Resample for whatever your interval is). And for october, I drop duplicates.

Categories