Related
what is the best method in Python to convert a string to a given format? My problem is that I have scraped dates that have the following format: Dec 13, 2019 6:01 am
Ideally I want to analyse the scraped data in excel, but unfortunately Excel can not read this date format.
Do you think it is best to do that in Python or in Excel?
Thanks
You can definetely do this with Python using either standard library, or dateparser package.
>>> import dateparser
>>> dateparser.parse('Dec 13, 2019 6:01 am')
datetime.datetime(2019, 12, 13, 6, 1)
Or directly to ISO format:
>>> dateparser.parse('Dec 13, 2019 6:01 am').isoformat()
'2019-12-13T06:01:00'
Another thing to look out for when working with time programmatically is time zone - it's where bugs are very likely to appear. There's a very sweet package for working with datetime data in python called pendulum, I cannot stress enough how convenient it is. And it's API is completely compatible with python's standard library datetime. So you can just do import pendulum as dt instead of import datetime as dt and it will work.
It also has a great parser tool with support for time zones:
>>> import pendulum
>>> dt = pendulum.parse('1975-05-21T22:00:00')
>>> print(dt)
'1975-05-21T22:00:00+00:00
# You can pass a tz keyword to specify the timezone
>>> dt = pendulum.parse('1975-05-21T22:00:00', tz='Europe/Paris')
>>> print(dt)
'1975-05-21T22:00:00+01:00'
# Not ISO 8601 compliant but common
>>> dt = pendulum.parse('1975-05-21 22:00:00')
By passing the tz keyword argument you can parse and specify time zone at the same time.
You can use strptime()
to convert string to a datetime format.
>>> utc_time = datetime.strptime("Dec 13, 2019 6:01 am", "%b %d, %Y %I:%M %p")
>>> utc_time.strftime("%d-%m-%Y %R")
'13-12-2019 06:01'
you can use pythons inbuilt datetime library.
check this: https://docs.python.org/3.6/library/datetime.html
I have to convert a timezone-aware string like "2012-11-01T04:16:13-04:00" to a Python datetime object.
I saw the dateutil module which has a parse function, but I don't really want to use it as it adds a dependency.
So how can I do it? I have tried something like the following, but with no luck.
datetime.datetime.strptime("2012-11-01T04:16:13-04:00", "%Y-%m-%dT%H:%M:%S%Z")
As of Python 3.7, datetime.datetime.fromisoformat() can handle your format:
>>> import datetime
>>> datetime.datetime.fromisoformat('2012-11-01T04:16:13-04:00')
datetime.datetime(2012, 11, 1, 4, 16, 13, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000)))
In older Python versions you can't, not without a whole lot of painstaking manual timezone defining.
Python does not include a timezone database, because it would be outdated too quickly. Instead, Python relies on external libraries, which can have a far faster release cycle, to provide properly configured timezones for you.
As a side-effect, this means that timezone parsing also needs to be an external library. If dateutil is too heavy-weight for you, use iso8601 instead, it'll parse your specific format just fine:
>>> import iso8601
>>> iso8601.parse_date('2012-11-01T04:16:13-04:00')
datetime.datetime(2012, 11, 1, 4, 16, 13, tzinfo=<FixedOffset '-04:00'>)
iso8601 is a whopping 4KB small. Compare that tot python-dateutil's 148KB.
As of Python 3.2 Python can handle simple offset-based timezones, and %z will parse -hhmm and +hhmm timezone offsets in a timestamp. That means that for a ISO 8601 timestamp you'd have to remove the : in the timezone:
>>> from datetime import datetime
>>> iso_ts = '2012-11-01T04:16:13-04:00'
>>> datetime.strptime(''.join(iso_ts.rsplit(':', 1)), '%Y-%m-%dT%H:%M:%S%z')
datetime.datetime(2012, 11, 1, 4, 16, 13, tzinfo=datetime.timezone(datetime.timedelta(-1, 72000)))
The lack of proper ISO 8601 parsing is being tracked in Python issue 15873.
Here is the Python Doc for datetime object using dateutil package..
from dateutil.parser import parse
get_date_obj = parse("2012-11-01T04:16:13-04:00")
print get_date_obj
There are two issues with the code in the original question: there should not be a : in the timezone and the format string for "timezone as an offset" is lower case %z not upper %Z.
This works for me in Python v3.6
>>> from datetime import datetime
>>> t = datetime.strptime("2012-11-01T04:16:13-0400", "%Y-%m-%dT%H:%M:%S%z")
>>> print(t)
2012-11-01 04:16:13-04:00
You can convert like this.
date = datetime.datetime.strptime('2019-3-16T5-49-52-595Z','%Y-%m-%dT%H-%M-%S-%f%z')
date_time = date.strftime('%Y-%m-%dT%H:%M:%S.%fZ')
You can create a timezone unaware object and replace the tzinfo and make it a timezone aware DateTime object later.
from datetime import datetime
import pytz
unware_time = datetime.strptime("2012-11-01 04:16:13", "%Y-%m-%d %H:%M:%S")
aware_time = unaware_time.replace(tzinfo=pytz.UTC)
I'm new to Python, but found a way to convert
2017-05-27T07:20:18.000-04:00
to
2017-05-27T07:20:18 without downloading new utilities.
from datetime import datetime, timedelta
time_zone1 = int("2017-05-27T07:20:18.000-04:00"[-6:][:3])
>>returns -04
item_date = datetime.strptime("2017-05-27T07:20:18.000-04:00".replace(".000", "")[:-6], "%Y-%m-%dT%H:%M:%S") + timedelta(hours=-time_zone1)
I'm sure there are better ways to do this without slicing up the string so much, but this got the job done.
This suggestion for using dateutil by Mohideen bin Mohammed definitely is the best solution even if it does a require a small library. having used the other approaches there prone to various forms of failure. Here's a nice function for this.
from dateutil.parser import parse
def parse_date_convert(date, fmt=None):
if fmt is None:
fmt = '%Y-%m-%d %H:%M:%S' # Defaults to : 2022-08-31 07:47:30
get_date_obj = parse(str(date))
return str(get_date_obj.strftime(fmt))
dates = ['2022-08-31T07:47:30Z','2022-08-31T07:47:29.098Z','2017-05-27T07:20:18.000-04:00','2012-11-01T04:16:13-04:00']
for date in dates:
print(f'Before: {date} After: {parse_date_convert(date)}')
Results:
Before: 2022-08-31T07:47:30Z After: 2022-08-31 07:47:30
Before: 2022-08-31T07:47:29.098Z After: 2022-08-31 07:47:29
Before: 2017-05-27T07:20:18.000-04:00 After: 2017-05-27 07:20:18
Before: 2012-11-01T04:16:13-04:00 After: 2012-11-01 04:16:13
Having tried various forms such as slicing split replacing the T Z like this:
dates = ['2022-08-31T07:47:30Z','2022-08-31T07:47:29.098Z','2017-05-27T07:20:18.000-04:00','2012-11-01T04:16:13-04:00']
for date in dates:
print(f'Before: {date} After: {date.replace("T", " ").replace("Z", "")}')
You still are left with subpar results. like the below
Before: 2022-08-31T07:47:30Z After: 2022-08-31 07:47:30
Before: 2022-08-31T07:47:29.098Z After: 2022-08-31 07:47:29.098
Before: 2017-05-27T07:20:18.000-04:00 After: 2017-05-27 07:20:18.000-04:00
Before: 2012-11-01T04:16:13-04:00 After: 2012-11-01 04:16:13-04:00
I have to convert a timezone-aware string like "2012-11-01T04:16:13-04:00" to a Python datetime object.
I saw the dateutil module which has a parse function, but I don't really want to use it as it adds a dependency.
So how can I do it? I have tried something like the following, but with no luck.
datetime.datetime.strptime("2012-11-01T04:16:13-04:00", "%Y-%m-%dT%H:%M:%S%Z")
As of Python 3.7, datetime.datetime.fromisoformat() can handle your format:
>>> import datetime
>>> datetime.datetime.fromisoformat('2012-11-01T04:16:13-04:00')
datetime.datetime(2012, 11, 1, 4, 16, 13, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000)))
In older Python versions you can't, not without a whole lot of painstaking manual timezone defining.
Python does not include a timezone database, because it would be outdated too quickly. Instead, Python relies on external libraries, which can have a far faster release cycle, to provide properly configured timezones for you.
As a side-effect, this means that timezone parsing also needs to be an external library. If dateutil is too heavy-weight for you, use iso8601 instead, it'll parse your specific format just fine:
>>> import iso8601
>>> iso8601.parse_date('2012-11-01T04:16:13-04:00')
datetime.datetime(2012, 11, 1, 4, 16, 13, tzinfo=<FixedOffset '-04:00'>)
iso8601 is a whopping 4KB small. Compare that tot python-dateutil's 148KB.
As of Python 3.2 Python can handle simple offset-based timezones, and %z will parse -hhmm and +hhmm timezone offsets in a timestamp. That means that for a ISO 8601 timestamp you'd have to remove the : in the timezone:
>>> from datetime import datetime
>>> iso_ts = '2012-11-01T04:16:13-04:00'
>>> datetime.strptime(''.join(iso_ts.rsplit(':', 1)), '%Y-%m-%dT%H:%M:%S%z')
datetime.datetime(2012, 11, 1, 4, 16, 13, tzinfo=datetime.timezone(datetime.timedelta(-1, 72000)))
The lack of proper ISO 8601 parsing is being tracked in Python issue 15873.
Here is the Python Doc for datetime object using dateutil package..
from dateutil.parser import parse
get_date_obj = parse("2012-11-01T04:16:13-04:00")
print get_date_obj
There are two issues with the code in the original question: there should not be a : in the timezone and the format string for "timezone as an offset" is lower case %z not upper %Z.
This works for me in Python v3.6
>>> from datetime import datetime
>>> t = datetime.strptime("2012-11-01T04:16:13-0400", "%Y-%m-%dT%H:%M:%S%z")
>>> print(t)
2012-11-01 04:16:13-04:00
You can convert like this.
date = datetime.datetime.strptime('2019-3-16T5-49-52-595Z','%Y-%m-%dT%H-%M-%S-%f%z')
date_time = date.strftime('%Y-%m-%dT%H:%M:%S.%fZ')
You can create a timezone unaware object and replace the tzinfo and make it a timezone aware DateTime object later.
from datetime import datetime
import pytz
unware_time = datetime.strptime("2012-11-01 04:16:13", "%Y-%m-%d %H:%M:%S")
aware_time = unaware_time.replace(tzinfo=pytz.UTC)
I'm new to Python, but found a way to convert
2017-05-27T07:20:18.000-04:00
to
2017-05-27T07:20:18 without downloading new utilities.
from datetime import datetime, timedelta
time_zone1 = int("2017-05-27T07:20:18.000-04:00"[-6:][:3])
>>returns -04
item_date = datetime.strptime("2017-05-27T07:20:18.000-04:00".replace(".000", "")[:-6], "%Y-%m-%dT%H:%M:%S") + timedelta(hours=-time_zone1)
I'm sure there are better ways to do this without slicing up the string so much, but this got the job done.
This suggestion for using dateutil by Mohideen bin Mohammed definitely is the best solution even if it does a require a small library. having used the other approaches there prone to various forms of failure. Here's a nice function for this.
from dateutil.parser import parse
def parse_date_convert(date, fmt=None):
if fmt is None:
fmt = '%Y-%m-%d %H:%M:%S' # Defaults to : 2022-08-31 07:47:30
get_date_obj = parse(str(date))
return str(get_date_obj.strftime(fmt))
dates = ['2022-08-31T07:47:30Z','2022-08-31T07:47:29.098Z','2017-05-27T07:20:18.000-04:00','2012-11-01T04:16:13-04:00']
for date in dates:
print(f'Before: {date} After: {parse_date_convert(date)}')
Results:
Before: 2022-08-31T07:47:30Z After: 2022-08-31 07:47:30
Before: 2022-08-31T07:47:29.098Z After: 2022-08-31 07:47:29
Before: 2017-05-27T07:20:18.000-04:00 After: 2017-05-27 07:20:18
Before: 2012-11-01T04:16:13-04:00 After: 2012-11-01 04:16:13
Having tried various forms such as slicing split replacing the T Z like this:
dates = ['2022-08-31T07:47:30Z','2022-08-31T07:47:29.098Z','2017-05-27T07:20:18.000-04:00','2012-11-01T04:16:13-04:00']
for date in dates:
print(f'Before: {date} After: {date.replace("T", " ").replace("Z", "")}')
You still are left with subpar results. like the below
Before: 2022-08-31T07:47:30Z After: 2022-08-31 07:47:30
Before: 2022-08-31T07:47:29.098Z After: 2022-08-31 07:47:29.098
Before: 2017-05-27T07:20:18.000-04:00 After: 2017-05-27 07:20:18.000-04:00
Before: 2012-11-01T04:16:13-04:00 After: 2012-11-01 04:16:13-04:00
Why python 2.7 doesn't include Z character (Zulu or zero offset) at the end of UTC datetime object's isoformat string unlike JavaScript?
>>> datetime.datetime.utcnow().isoformat()
'2013-10-29T09:14:03.895210'
Whereas in javascript
>>> console.log(new Date().toISOString());
2013-10-29T09:38:41.341Z
Option: isoformat()
Python's datetime does not support the military timezone suffixes like 'Z' suffix for UTC. The following simple string replacement does the trick:
In [1]: import datetime
In [2]: d = datetime.datetime(2014, 12, 10, 12, 0, 0)
In [3]: str(d).replace('+00:00', 'Z')
Out[3]: '2014-12-10 12:00:00Z'
str(d) is essentially the same as d.isoformat(sep=' ')
See: Datetime, Python Standard Library
Option: strftime()
Or you could use strftime to achieve the same effect:
In [4]: d.strftime('%Y-%m-%dT%H:%M:%SZ')
Out[4]: '2014-12-10T12:00:00Z'
Note: This option works only when you know the date specified is in UTC.
See: datetime.strftime()
Additional: Human Readable Timezone
Going further, you may be interested in displaying human readable timezone information, pytz with strftime %Z timezone flag:
In [5]: import pytz
In [6]: d = datetime.datetime(2014, 12, 10, 12, 0, 0, tzinfo=pytz.utc)
In [7]: d
Out[7]: datetime.datetime(2014, 12, 10, 12, 0, tzinfo=<UTC>)
In [8]: d.strftime('%Y-%m-%d %H:%M:%S %Z')
Out[8]: '2014-12-10 12:00:00 UTC'
Python datetime objects don't have time zone info by default, and without it, Python actually violates the ISO 8601 specification (if no time zone info is given, assumed to be local time). You can use the pytz package to get some default time zones, or directly subclass tzinfo yourself:
from datetime import datetime, tzinfo, timedelta
class simple_utc(tzinfo):
def tzname(self,**kwargs):
return "UTC"
def utcoffset(self, dt):
return timedelta(0)
Then you can manually add the time zone info to utcnow():
>>> datetime.utcnow().replace(tzinfo=simple_utc()).isoformat()
'2014-05-16T22:51:53.015001+00:00'
Note that this DOES conform to the ISO 8601 format, which allows for either Z or +00:00 as the suffix for UTC. Note that the latter actually conforms to the standard better, with how time zones are represented in general (UTC is a special case.)
Short answer
datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")
Long answer
The reason that the "Z" is not included is because datetime.now() and even datetime.utcnow() return timezone naive datetimes, that is to say datetimes with no timezone information associated. To get a timezone aware datetime, you need to pass a timezone as an argument to datetime.now. For example:
from datetime import datetime, timezone
datetime.utcnow()
#> datetime.datetime(2020, 9, 3, 20, 58, 49, 22253)
# This is timezone naive
datetime.now(timezone.utc)
#> datetime.datetime(2020, 9, 3, 20, 58, 49, 22253, tzinfo=datetime.timezone.utc)
# This is timezone aware
Once you have a timezone aware timestamp, isoformat will include a timezone designation. Thus, you can then get an ISO 8601 timestamp via:
datetime.now(timezone.utc).isoformat()
#> '2020-09-03T20:53:07.337670+00:00'
"+00:00" is a valid ISO 8601 timezone designation for UTC. If you want to have "Z" instead of "+00:00", you have to do the replacement yourself:
datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")
#> '2020-09-03T20:53:07.337670Z'
The following javascript and python scripts give identical outputs. I think it's what you are looking for.
JavaScript
new Date().toISOString()
Python
from datetime import datetime
datetime.utcnow().isoformat()[:-3]+'Z'
The output they give is the UTC (zulu) time formatted as an ISO string with a 3 millisecond significant digit and appended with a Z.
2019-01-19T23:20:25.459Z
Your goal shouldn't be to add a Z character, it should be to generate a UTC "aware" datetime string in ISO 8601 format. The solution is to pass a UTC timezone object to datetime.now() instead of using datetime.utcnow():
from datetime import datetime, timezone
datetime.now(timezone.utc)
>>> datetime.datetime(2020, 1, 8, 6, 6, 24, 260810, tzinfo=datetime.timezone.utc)
datetime.now(timezone.utc).isoformat()
>>> '2020-01-08T06:07:04.492045+00:00'
That looks good, so let's see what Django and dateutil think:
from django.utils.timezone import is_aware
is_aware(datetime.now(timezone.utc))
>>> True
from dateutil.parser import isoparse
is_aware(isoparse(datetime.now(timezone.utc).isoformat()))
>>> True
Note that you need to use isoparse() from dateutil.parser because the Python documentation for datetime.fromisoformat() says it "does not support parsing arbitrary ISO 8601 strings".
Okay, the Python datetime object and the ISO 8601 string are both UTC "aware". Now let's look at what JavaScript thinks of the datetime string. Borrowing from this answer we get:
let date = '2020-01-08T06:07:04.492045+00:00';
const dateParsed = new Date(Date.parse(date))
document.write(dateParsed);
document.write("\n");
// Tue Jan 07 2020 22:07:04 GMT-0800 (Pacific Standard Time)
document.write(dateParsed.toISOString());
document.write("\n");
// 2020-01-08T06:07:04.492Z
document.write(dateParsed.toUTCString());
document.write("\n");
// Wed, 08 Jan 2020 06:07:04 GMT
Notes:
I approached this problem with a few goals:
generate a UTC "aware" datetime string in ISO 8601 format
use only Python Standard Library functions for datetime object and string creation
validate the datetime object and string with the Django timezone utility function, the dateutil parser and JavaScript functions
Note that this approach does not include a Z suffix and does not use utcnow(). But it's based on the recommendation in the Python documentation and it passes muster with both Django and JavaScript.
See also:
Stop using utcnow and utcfromtimestamp
What is the “right” JSON date format?
In Python >= 3.2 you can simply use this:
>>> from datetime import datetime, timezone
>>> datetime.now(timezone.utc).isoformat()
'2019-03-14T07:55:36.979511+00:00'
Python datetimes are a little clunky. Use arrow.
> str(arrow.utcnow())
'2014-05-17T01:18:47.944126+00:00'
Arrow has essentially the same api as datetime, but with timezones and some extra niceties that should be in the main library.
A format compatible with Javascript can be achieved by:
arrow.utcnow().isoformat().replace("+00:00", "Z")
'2018-11-30T02:46:40.714281Z'
Javascript Date.parse will quietly drop microseconds from the timestamp.
I use pendulum:
import pendulum
d = pendulum.now("UTC").to_iso8601_string()
print(d)
>>> 2019-10-30T00:11:21.818265Z
There are a lot of good answers on the post, but I wanted the format to come out exactly as it does with JavaScript. This is what I'm using and it works well.
In [1]: import datetime
In [1]: now = datetime.datetime.utcnow()
In [1]: now.strftime('%Y-%m-%dT%H:%M:%S') + now.strftime('.%f')[:4] + 'Z'
Out[3]: '2018-10-16T13:18:34.856Z'
Using only standard libraries, making no assumption that the timezone is already UTC, and returning the exact format requested in the question:
dt.astimezone(timezone.utc).replace(tzinfo=None).isoformat(timespec='milliseconds') + 'Z'
This does require Python 3.6 or later though.
>>> import arrow
>>> now = arrow.utcnow().format('YYYY-MM-DDTHH:mm:ss.SSS')
>>> now
'2018-11-28T21:34:59.235'
>>> zulu = "{}Z".format(now)
>>> zulu
'2018-11-28T21:34:59.235Z'
Or, to get it in one fell swoop:
>>> zulu = "{}Z".format(arrow.utcnow().format('YYYY-MM-DDTHH:mm:ss.SSS'))
>>> zulu
'2018-11-28T21:54:49.639Z'
By combining all answers above I came with following function :
from datetime import datetime, tzinfo, timedelta
class simple_utc(tzinfo):
def tzname(self,**kwargs):
return "UTC"
def utcoffset(self, dt):
return timedelta(0)
def getdata(yy, mm, dd, h, m, s) :
d = datetime(yy, mm, dd, h, m, s)
d = d.replace(tzinfo=simple_utc()).isoformat()
d = str(d).replace('+00:00', 'Z')
return d
print getdata(2018, 02, 03, 15, 0, 14)
pip install python-dateutil
>>> a = "2019-06-27T02:14:49.443814497Z"
>>> dateutil.parser.parse(a)
datetime.datetime(2019, 6, 27, 2, 14, 49, 443814, tzinfo=tzutc())
I have a string field like this..
2011-09-04 23:44:30.801000
and now I need to convert it to a datetime object in python so that I can calculate the difference between two datetime objects.
You should use datetime.datetime.strptime(), which converts a string and date format into a datetime.datetime object.
The format fields (e.g., %Y denotes four-digit year) are specified in the Python documentation.
>>> import datetime
>>> s = '2011-09-04 23:44:30.801000'
>>> format = '%Y-%m-%d %H:%M:%S.%f'
>>> date=datetime.datetime.strptime(s, format)
>>> date
datetime.datetime(2011, 9, 4, 23, 44, 30, 801000)
An alternative to datetime.datetime.strptime would be the python-dateutil libray. dateutil will allow you to do the same thing without the explicit formatting step:
>>> from dateutil import parser
>>> date_obj = parser.parse('2011-09-04 23:44:30.801000')
>>> date
datetime.datetime(2011, 9, 4, 23, 44, 30, 801000)
It's not a standard library module, but it is very handy for parsing date and time strings, especially if you don't have control over the format they come in.
One caveat if you install this library: version 1.5 is for Python 2 and version 2.0 is for Python 3. easy_install and pip default to installing the 2.0 version, so you have to explicitly indicate python-dateutil==1.5 if you are using Python 2.
Use datetime.datetime.strptime.
# date string to datetime object
date_str = "2008-11-10 17:53:59"
dt_obj = datetime.strptime(date_str, "%Y-%m-%d %H:%M:%S")
print repr(dt_obj)