Parse Date string to datetime with timezone - python

I have a string:
r = 'Thu Dec 17 08:56:41 CST 2020'
Here CST represent China central time('Asia/Shanghai'). I wanted to parse it to datetime...I am doing something like
from dateparser import parse
r1 = parse(r)
Which is giving me r1 as:
2020-12-17 08:56:41-06:00
And I am also doing this
r2 = r1.replace(tzinfo=pytz.timezone("Asia/Shanghai"))
And this is giving me r2 as:
2020-12-17 08:50:41+08:00
There is 6 min lag in r2 can someone tell me why is that? And how to correctly transfer my raw string r1 to desired r2 which is:
2020-12-17 08:56:41 in Asia/Shanghai timezone
Thanks

Using dateutil.parser you can directly parse your date correctly.
Note that CST is an ambiguous timezone, so you need to specify which one you mean. You can either do this directly in the tzinfos parameter of the parse() call or you can define a dictionary that has mappings for timezones and pass this. In this dict, you can either specify the offset, e.g.
timezone_info = {
"CDT": -5 * 3600,
"CEST": 2 * 3600,
"CST": 8 * 3600
}
parser.parse(r, tzinfos=timezone_info)
or (using gettz) directly specify a timezone:
timezone_info = {
"CDT": gettz("America/Chicago"),
"CEST": gettz("Europe/Berlin"),
"CST": gettz("Asia/Shanghai")
}
parser.parse(r, tzinfos=timezone_info)
See also the dateutil.parser documentation and the answers to this SO question.
Be aware that the latter approach is tricky if you have a location with daylight saving time! Depending on the date you apply it to, gettz("America/Chicago") will have UTC-5 or UTC-6 as a result (as Chicago switches between Central Standard Time and Central Daylight Time). So depending on your input data, the second example may actually not really be correct and yield the wrong outcome! Currently, China observes China Standard Time (CST) all year, so for your use case it makes no difference (may depend on your date range though).
Overall:
from dateutil import parser
from dateutil.tz import gettz
timezone_info = {"CST": gettz("Asia/Shanghai")}
r = 'Thu Dec 17 08:56:41 CST 2020'
d = parser.parse(r, tzinfos=timezone_info)
print(d)
print(d.strftime('%Y-%m-%d %H:%M:%S %Z%z'))
gets you
2020-12-17 08:56:41+08:00
2020-12-17 08:56:41 CST+0800
EDIT: Printing the human readable timezone name instead of the abbreviated one name is just a little more complicated with this approach, as dateutil.tz.gettz() gets you a tzfile that has no attribute which has just the name. However, you can obtain it via the protected _filename using split():
print(d.strftime('%Y-%m-%d %H:%M:%S') + " in " + "/".join(d.tzinfo._filename.split('/')[-2:]))
yields
2020-12-17 08:56:41+08:00 in Asia/Shanghai
This of course only works if you used gettz() to set the timezone in the first place.
EDIT 2: If you know that all your dates are in CST anyway, you can also ignore the timezone when parsing. This gets you naive (or unanware) datetimes which you can then later add a human readable timezone to. You can do this using replace() and specify the timezone either as shown above using gettz() or using timezone(() from the pytz module:
from dateutil import parser
from dateutil.tz import gettz
import pytz
r = 'Thu Dec 17 08:56:41 CST 2020'
d = parser.parse(r, ignoretz=True)
d_dateutil = d.replace(tzinfo=gettz('Asia/Shanghai'))
d_pytz = d.replace(tzinfo=pytz.timezone('Asia/Shanghai'))
Note that depending on which module you use to add the timezone information, the class of tzinfo differs. For the pytz object, there is a more direct way of accessing the timezone in human readable form:
print(type(d_dateutil.tzinfo))
print("/".join(d_dateutil.tzinfo._filename.split('/')[-2:]))
print(type(d_pytz.tzinfo))
print(d_pytz.tzinfo.zone)
produces
<class 'dateutil.tz.tz.tzfile'>
Asia/Shanghai
<class 'pytz.tzfile.Asia/Shanghai'>
Asia/Shanghai

from datetime import datetime
import pytz
# The datetime string you have
r = "Thu Dec 17 08:56:41 CST 2020"
# The time-zone string you want to use
offset_string = 'Asia/Shanghai'
# convert the time zone string into offset from UTC
# a. datetime.now(pytz.timezone(offset_string)).utcoffset().total_seconds() --- returns seconds offset from UTC
# b. convert seconds into hours (decimal) --- divide by 60 twice
# c. remove the decimal point, we want the structure as: +0800
offset_num_repr = '+{:05.2f}'.format(datetime.now(pytz.timezone(offset_string)).utcoffset().total_seconds()/60/60).replace('.', '')
print('Numeric representation of the offset: ', offset_num_repr)
# replace the CST 2020 with numeric timezone offset
# a. replace it with the offset computed above
updated_datetime = str(r).replace('CST', offset_num_repr)
print('\t Modified datetime string: ', updated_datetime)
# Now parse your string into datetime object
r = datetime.strptime(updated_datetime, "%a %b %d %H:%M:%S %z %Y")
print('\tFinal parsed datetime object: ', r)
Should produce:
Numeric representation of the offset: +0800
Modified datetime string: Thu Dec 17 08:56:41 +0800 2020
Final parsed datetime object: 2020-12-17 08:56:41+08:00

Related

Parsing OFX datetime in Python

I'm trying to parse the datetime specified in the OFX 2.3 spec in Python. I believe it's a custom format, but feel free to let me know if it has a name. The spec states the following:
There is one format for representing dates, times, and time zones. The complete form is:
YYYYMMDDHHMMSS.XXX [gmt offset[:tz name]]
For example, “19961005132200.124[-5:EST]” represents October 5, 1996, at 1:22 and 124 milliseconds p.m., in Eastern Standard Time. This is the same as 6:22 p.m. Greenwich Mean Time (GMT).
Here is my current attempt:
from datetime import datetime
date_str = "19961005132200.124[EST]"
date = datetime.strptime(date_str, "%Y%m%d%H%M%S.%f[%Z]")
This partial example works so far, but is lacking the GMT offset portion (the -5 in [-5:EST]). I'm not sure how to specify a time zone offset of at most two digits.
Some things to note here, first (as commented):
Python built-in strptime will have a hard time here - %z won't parse a single digit offset hour, and %Z won't parse some (potentially) ambiguous time zone abbreviation.
Then, the OFX Banking Version 2.3 docs (sect. 3.2.8.2 Date and Datetime) leave some questions open to me:
Is the UTC offset optional ?
Why is EST called a time zone while it's just an abbreviation ?
Why in the example the UTC offset is -5 hours while on 1996-10-05, US/Eastern was at UTC-4 ?
What about offsets that have minutes specified, e.g. +5:30 for Asia/Calcutta ?
(opinionated) Why re-invent the wheel in the first place instead of using a commonly used standard like ISO 8601 ?
Anyway, here's an attempt at a custom parser:
from datetime import datetime, timedelta, timezone
from zoneinfo import ZoneInfo
def parseOFXdatetime(s, tzinfos=None, _tz=None):
"""
parse OFX datetime string to an aware Python datetime object.
"""
# first, treat formats that have no UTC offset specified.
if not '[' in s:
# just make sure default format is satisfied by filling with zeros if needed
s = s.ljust(14, '0') + '.000' if not '.' in s else s
return datetime.strptime(s, "%Y%m%d%H%M%S.%f").replace(tzinfo=timezone.utc)
# offset and tz are specified, so first get the date/time, offset and tzname components
s, off = s.strip(']').split('[')
off, name = off.split(':')
s = s.ljust(14, '0') + '.000' if not '.' in s else s
# if tzinfos are specified, map the tz name:
if tzinfos:
_tz = tzinfos.get(name) # this might still leave _tz as None...
if not _tz: # ...so we derive a tz from a timedelta
_tz = timezone(timedelta(hours=int(off)), name=name)
return datetime.strptime(s, "%Y%m%d%H%M%S.%f").replace(tzinfo=_tz)
# some test strings
t = ["19961005132200.124[-5:EST]", "19961005132200.124", "199610051322", "19961005",
"199610051322[-5:EST]", "19961005[-5:EST]"]
for s in t:
print(# normal parsing
f'{s}\n {repr(parseOFXdatetime(s))}\n'
# parsing with tzinfo mapping supplied; abbreviation -> timezone object
f' {repr(parseOFXdatetime(s, tzinfos={"EST": ZoneInfo("US/Eastern")}))}\n\n')

Extract each year, month, day, year from getctime , getmtime in Python

I want to extract the year month day hours min eachly from below value.
import os, time, os.path, datetime
date_of_created = time.ctime(os.path.getctime(folderName))
date_of_modi = time.ctime(os.path.getmtime(folderName))
Now I only can get like below
'Thu Dec 26 19:21:37 2019'
but I want to get the the value separtly
2019 // Dec(Could i get this as int??) // 26
each
I want to extract each year month day each time min value from date_of_created and date_of_modi
Could i get it? in python?
You can convert the string to a datetime object:
from datetime import datetime
date_of_created = datetime.strptime(time.ctime(os.path.getctime(folderName)), "%a %b %d %H:%M:%S %Y") # Convert string to date format
print("Date created year: {} , month: {} , day: {}".format(str(date_of_created.year),str(date_of_created.month),str(date_of_created.day)))
The time.ctime function returns the local time in string form. You might want to use the time.localtime function, which returns a struct_time object which contains the information you are looking for. As example,
import os, time
date_created_string = time.ctime(os.path.getctime('/home/b-fg/Downloads'))
date_created_obj = time.localtime(os.path.getctime('/home/b-fg/Downloads'))
print(date_created_string) # Mon Feb 10 09:41:03 2020
print('Year: {:4d}'.format(date_created_obj.tm_year)) # Year: 2020
print('Month: {:2d}'.format(date_created_obj.tm_mon)) # Month: 2
print('Day: {:2d}'.format(date_created_obj.tm_mday)) # Day: 10
Note that these are integer values, as requested.
time.ctime([secs])
Convert a time expressed in seconds since the epoch to a string of a form: 'Sun Jun 20 23:21:05 1993' representing local time.
If that's not what you want... use something else? time.getmtime will return a struct_time which should have the relevant fields, or for a more modern interface use datetime.datetime.fromtimestamp which... returns a datetime object from a UNIX timestamp.
Furthermore, using stat would probably more efficient as it ctime and mtime will probably perform a stat call each internally.
You can use the datetime module, more specifically the fromtimestamp() function from the datetime module to get what you expect.
import os, time, os.path, datetime
date_of_created = datetime.datetime.fromtimestamp(os.path.getctime(my_repo))
date_of_modi = datetime.datetime.fromtimestamp(os.path.getmtime(my_repo))
print(date_of_created.strftime("%Y"))
Output will be 2020 for a repo created in 2020.
All formats are available at this link

Python Regular expression to replace date with timezone

I have CSV file which contains various timezone dates, but before feeding those data to tests, I want to replace all the dates with unify value.
date column contains values like below,
2019-01-01 00:00:00+05:30
2018-12-31 18:30:00+00
2018-02-02 00:00:00-04:00
I want replace them like
2019-01-01 00:00:00+00
2018-12-31 00:00:00+00
2018-02-02 00:00:00+00
How do I write Regex to cover all possible timezones?
I wrote:
([0-9]){4}(-:?)([0-9]){2}(-:?)([0-9]){2} ([0-9]){2}:([0-9]){2}:([0-9]){2}(+-?)([0-9]){2}:([0-9]){2}
but it fails when it encounter 2018-12-31 18:30:00+00, How can I handle this case?
Tim Biegeleisen is very right, you should not be using regex for this, you should use a datetime API provided by Python. I have sourced my answer from an excellent post on this by jfs here
The below is for Python 3.3+ (since you have tagged your question with Python 3.0
time_string = "2019-01-01 00:00:00+05:30"
# Parses a datetime instance from a string
dt = datetime.datetime.strptime(time_string,'%Y-%m-%d %H:%M:%S%z')
# Changes the timezone to UTC by setting tzinfo
timestamp = dt.replace(tzinfo=datetime.timezone.utc).timestamp()
# Converts back to a datetime object
dt = datetime.datetime.fromtimestamp(timestamp)
# Formats and prints it out.
print(dt.strftime('%Y-%m-%d %H:%M:%S %Z'))
For Python versions < 3.3, for an aware datetime
time_string = "2019-01-01 00:00:00+05:30"
# Parses a datetime instance from a string
dt = datetime.datetime.strptime(time_string,'%Y-%m-%d %H:%M:%S%z')
# Changes the timezone to UTC by setting tzinfo
timestamp = (dt - datetime(1970,1,1, tzinfo=timezone.utc)) / timedelta(seconds=1)
# Converts back to a datetime object
dt = datetime.datetime.fromtimestamp(timestamp)
# Formats and prints it out.
print(dt.strftime('%Y-%m-%d %H:%M:%S %Z'))
Terminology
An aware object is used to represent a specific moment in time that is
not open to interpretation
For our case, timezone information is known.
The best way to solve this problem is using **python datetime **(strp and strf)
If you want to solve it using regex then as per python doc https://docs.python.org/2/library/re.html
you can do something like this
def dashrepl(matchobj):
return "{0} 00:00:00+00".format(matchobj.group(1))
import re
k="(\d{4}(-\d{2}){2})\s(\d{2}:?){3}.[\d:]+"
ab = re.sub(k, dashrepl, "2019-01-01 00:00:00+05:30")
You don't need to use regex for this as it seems to be straight forward. You can use the below snippet
ts = ["2019-01-01 00:00:00+05:30", "2018-12-31 18:30:00+00", "2018-02-02 00:00:00-04:00"]
l = [x.split()[0] + " 00:00:00+00" for x in ts]
OR
l = [x[:11] + "00:00:00+00" for x in ts]

Convert weekday name string into datetime

I have the following date (as an object format) : Tue 31 Jan in a pandas Series.
and I try to change it into : 31/01/2019
Please, how can I achieve this ? I understand more or less that pandas.Datetime can convert easily when a string date is clearer (like 6/1/1930 22:00) but not in my case, when their is a weekday name.
Thank you for your help.
Concat the year and callpd.to_datetime with a custom format:
s = pd.Series(['Tue 31 Jan', 'Mon 20 Feb',])
pd.to_datetime(s + ' 2019', format='%a %d %b %Y')
0 2019-01-31
1 2019-02-20
dtype: datetime64[ns]
This is fine as long as all your dates follow this format. If that is not the case, this cannot be solved reliably.
More information on datetime formats at strftime.org.
Another option is using the 3rd party dateutil library:
import dateutil
s.apply(dateutil.parser.parse)
0 2018-01-31
1 2018-02-20
dtype: datetime64[ns]
This can be installed with PyPi.
Another, slower option (but more flexible) is using the 3rd party datefinder library to sniff dates from string containing random text (if this is what you need):
import datefinder
s.apply(lambda x: next(datefinder.find_dates(x)))
0 2018-01-31
1 2018-02-20
dtype: datetime64[ns]
You can install it with PyPi.
Convert to a datetime object
If you wanted to use the datetime module, you could get the year by doing the following:
import datetime as dt
d = dt.datetime.strptime('Tue 31 Jan', '%a %d %b').replace(year=dt.datetime.now().year)
This is taking the date in your format, but replacing the default year 1900 with the current year in a reliable way.
This is similar to the other answers, but uses the builtin replace method as opposed to concatenating a string.
Output
To get the desired output from your new datetime object, you could perform the following:
>>> d.strftime('%d/%m/%Y')
'31/01/2018'
Here is two alternate ways to achieve the same result.
Method 1: Using datetime module
from datetime import datetime
datetime_object = datetime.strptime('Tue 31 Jan', '%a %d %b')
print(datetime_object) # outputs 1900-01-31 00:00:00
If you had given an Year parameter like Tue 31 Jan 2018, then this code would work.
from datetime import datetime
datetime_object = datetime.strptime('Tue 31 Jan 2018', '%a %d %b %Y')
print(datetime_object) # outputs 2018-01-31 00:00:00
To print the resultant date in a format like this 31/01/2019. You can use
print(datetime_object.strftime("%d/%m/%Y")) # outputs 31/01/2018
Here are all the possible formatting options available with datetime object.
Method 2: Using dateutil.parser
This method automatically fills in the Year parameter with current year.
from dateutil import parser
string = "Tue 31 Jan"
date = parser.parse(string)
print(date) # outputs 2018-01-31 00:00:00

Restricting RSS elements by date with feedparser. [Python]

I iterate a RSS feed like so where _file is the feed
d = feedparser.parse(_file)
for element in d.entries:
print repr(element.date)
The date output comes out like so
u'Thu, 16 Jul 2009 15:18:22 EDT'
I cant seem to understand how to actually quantify the above date output so I can use it to limit feed elements. I So what I am asking is how can I get a actual time out of this, so I can say if greater then 7 days old, skip this element.
feedparser is supposed to give you a struct_time object from Python's time module. I'm guessing it doesn't recognize that date format and so is giving you the raw string.
See here on how to add support for parsing malformed timestamps:
http://pythonhosted.org/feedparser/date-parsing.html
If you manage to get it to give you the struct_time, you can read more about that here:
http://docs.python.org/library/time.html#time.struct_time
struct_time objects have everything you need. They have these members:
time.struct_time(tm_year=2010, tm_mon=2, tm_mday=4, tm_hour=23, tm_min=44, tm_sec=19, tm_wday=3, tm_yday=35, tm_isdst=0)
I generally convert the structs to seconds, like this:
import time
import calendar
struct = time.localtime()
seconds = calendar.timegm(struct)
Then you can just do regular math to see how many seconds have elapsed, or use the datetime module to do timedeltas.
one way
>>> import time
>>> t=time.strptime("Thu, 16 Jul 2009 15:18:22 EDT","%a, %d %b %Y %H:%M:%S %Z")
>>> sevendays=86400*7
>>> current=time.strftime ("%s",time.localtime())
>>> if int(current) - time.mktime(t) > sevendays:
print "more than 7 days"
you can also see the datetime module and make use of timedelta() for date calculations.
If you install the dateutil module:
import dateutil.parser as dp
import dateutil.tz as dtz
import datetime
date_string=u'Thu, 16 Jul 2009 15:18:22 EDT'
adatetime=dp.parse(date_string)
print(adatetime)
# 2009-07-16 15:18:22-04:00
now=datetime.datetime.now(dtz.tzlocal())
print(now)
# 2010-02-04 23:35:52.428766-05:00
aweekago=now-datetime.timedelta(days=7)
print(aweekago)
# 2010-01-28 23:35:52.428766-05:00
if adatetime<aweekago:
print('old news')
If you are using Ubuntu, dateutil is provided by the python-dateutil package.

Categories