Python Regular expression to replace date with timezone - python

I have CSV file which contains various timezone dates, but before feeding those data to tests, I want to replace all the dates with unify value.
date column contains values like below,
2019-01-01 00:00:00+05:30
2018-12-31 18:30:00+00
2018-02-02 00:00:00-04:00
I want replace them like
2019-01-01 00:00:00+00
2018-12-31 00:00:00+00
2018-02-02 00:00:00+00
How do I write Regex to cover all possible timezones?
I wrote:
([0-9]){4}(-:?)([0-9]){2}(-:?)([0-9]){2} ([0-9]){2}:([0-9]){2}:([0-9]){2}(+-?)([0-9]){2}:([0-9]){2}
but it fails when it encounter 2018-12-31 18:30:00+00, How can I handle this case?

Tim Biegeleisen is very right, you should not be using regex for this, you should use a datetime API provided by Python. I have sourced my answer from an excellent post on this by jfs here
The below is for Python 3.3+ (since you have tagged your question with Python 3.0
time_string = "2019-01-01 00:00:00+05:30"
# Parses a datetime instance from a string
dt = datetime.datetime.strptime(time_string,'%Y-%m-%d %H:%M:%S%z')
# Changes the timezone to UTC by setting tzinfo
timestamp = dt.replace(tzinfo=datetime.timezone.utc).timestamp()
# Converts back to a datetime object
dt = datetime.datetime.fromtimestamp(timestamp)
# Formats and prints it out.
print(dt.strftime('%Y-%m-%d %H:%M:%S %Z'))
For Python versions < 3.3, for an aware datetime
time_string = "2019-01-01 00:00:00+05:30"
# Parses a datetime instance from a string
dt = datetime.datetime.strptime(time_string,'%Y-%m-%d %H:%M:%S%z')
# Changes the timezone to UTC by setting tzinfo
timestamp = (dt - datetime(1970,1,1, tzinfo=timezone.utc)) / timedelta(seconds=1)
# Converts back to a datetime object
dt = datetime.datetime.fromtimestamp(timestamp)
# Formats and prints it out.
print(dt.strftime('%Y-%m-%d %H:%M:%S %Z'))
Terminology
An aware object is used to represent a specific moment in time that is
not open to interpretation
For our case, timezone information is known.

The best way to solve this problem is using **python datetime **(strp and strf)
If you want to solve it using regex then as per python doc https://docs.python.org/2/library/re.html
you can do something like this
def dashrepl(matchobj):
return "{0} 00:00:00+00".format(matchobj.group(1))
import re
k="(\d{4}(-\d{2}){2})\s(\d{2}:?){3}.[\d:]+"
ab = re.sub(k, dashrepl, "2019-01-01 00:00:00+05:30")

You don't need to use regex for this as it seems to be straight forward. You can use the below snippet
ts = ["2019-01-01 00:00:00+05:30", "2018-12-31 18:30:00+00", "2018-02-02 00:00:00-04:00"]
l = [x.split()[0] + " 00:00:00+00" for x in ts]
OR
l = [x[:11] + "00:00:00+00" for x in ts]

Related

Converting a datetime to local time based on timezone in Python3

I have a question related to dates and time in Python.
Problem:
date = datetime.datetime.strptime(str(row[1]), "%Y-%m-%d %H:%M:%S")
localtime = date.astimezone(pytz.timezone("Europe/Brussels"))
formattedDate = localtime.strftime("%Y-%m%-%d")
In the code above, str(row[1]) gives back a UTC datetime coming from a mysql database: 2022-02-28 23:00:00
I parse this as a datetime and change the timezone to Europe/Brussels.
I then format it back to a string.
Expected result:
I'd like to return the date in local time. Europe/Brussels adds one hour so I would expect that strftime returns 2022-03-01, but it keeps returning 2022-02-28.
Can somebody help?
date is a naïve date, without timezone, because no timezone information was in the string you parsed. Using astimezone on that simply attaches timezone information to it, turning a naïve date into an aware one. It obviously can't convert any times, because it doesn't know what to convert from.
This also already contains the answer: make the date aware that it's in UTC first before trying to convert it to a different timezone:
date = datetime.datetime.strptime(...).astimezone(datetime.timezone.utc)
Ah, I see!
I ended up doing this, basically the same as you mentioned:
from_zone = tz.gettz('UTC')
to_zone = tz.gettz('Europe/Brussels')
utcdate = datetime.datetime.strptime(str(row[1]), "%Y-%m-%d %H:%M:%S")
utcdate = utcdate.replace(tzinfo=from_zone)
localdate = utcdate.astimezone(to_zone)
formattedLocalDate = localdate.strftime("%Y%m%d");
The naïve date gets UTC aware by the utcdate.replace(tzinfo=from_zone).
Thanks for helping!

Parsing OFX datetime in Python

I'm trying to parse the datetime specified in the OFX 2.3 spec in Python. I believe it's a custom format, but feel free to let me know if it has a name. The spec states the following:
There is one format for representing dates, times, and time zones. The complete form is:
YYYYMMDDHHMMSS.XXX [gmt offset[:tz name]]
For example, “19961005132200.124[-5:EST]” represents October 5, 1996, at 1:22 and 124 milliseconds p.m., in Eastern Standard Time. This is the same as 6:22 p.m. Greenwich Mean Time (GMT).
Here is my current attempt:
from datetime import datetime
date_str = "19961005132200.124[EST]"
date = datetime.strptime(date_str, "%Y%m%d%H%M%S.%f[%Z]")
This partial example works so far, but is lacking the GMT offset portion (the -5 in [-5:EST]). I'm not sure how to specify a time zone offset of at most two digits.
Some things to note here, first (as commented):
Python built-in strptime will have a hard time here - %z won't parse a single digit offset hour, and %Z won't parse some (potentially) ambiguous time zone abbreviation.
Then, the OFX Banking Version 2.3 docs (sect. 3.2.8.2 Date and Datetime) leave some questions open to me:
Is the UTC offset optional ?
Why is EST called a time zone while it's just an abbreviation ?
Why in the example the UTC offset is -5 hours while on 1996-10-05, US/Eastern was at UTC-4 ?
What about offsets that have minutes specified, e.g. +5:30 for Asia/Calcutta ?
(opinionated) Why re-invent the wheel in the first place instead of using a commonly used standard like ISO 8601 ?
Anyway, here's an attempt at a custom parser:
from datetime import datetime, timedelta, timezone
from zoneinfo import ZoneInfo
def parseOFXdatetime(s, tzinfos=None, _tz=None):
"""
parse OFX datetime string to an aware Python datetime object.
"""
# first, treat formats that have no UTC offset specified.
if not '[' in s:
# just make sure default format is satisfied by filling with zeros if needed
s = s.ljust(14, '0') + '.000' if not '.' in s else s
return datetime.strptime(s, "%Y%m%d%H%M%S.%f").replace(tzinfo=timezone.utc)
# offset and tz are specified, so first get the date/time, offset and tzname components
s, off = s.strip(']').split('[')
off, name = off.split(':')
s = s.ljust(14, '0') + '.000' if not '.' in s else s
# if tzinfos are specified, map the tz name:
if tzinfos:
_tz = tzinfos.get(name) # this might still leave _tz as None...
if not _tz: # ...so we derive a tz from a timedelta
_tz = timezone(timedelta(hours=int(off)), name=name)
return datetime.strptime(s, "%Y%m%d%H%M%S.%f").replace(tzinfo=_tz)
# some test strings
t = ["19961005132200.124[-5:EST]", "19961005132200.124", "199610051322", "19961005",
"199610051322[-5:EST]", "19961005[-5:EST]"]
for s in t:
print(# normal parsing
f'{s}\n {repr(parseOFXdatetime(s))}\n'
# parsing with tzinfo mapping supplied; abbreviation -> timezone object
f' {repr(parseOFXdatetime(s, tzinfos={"EST": ZoneInfo("US/Eastern")}))}\n\n')

Datetime still showing time component

I have this line of code-
future_end_date = datetime.strptime('2020/02/29','%Y/%m/%d')
and when I print this-
2020-02-29 00:00:00
it still shows the time component even though I did strptime
This is because strptime returns datetime rather than date. Try converting it to date:
datetime.strptime('2020/02/29','%Y/%m/%d').date()
datetime.strptime(date_string, format) function returns a datetime
object corresponding to date_string, parsed according to format.
When you print datetime object, it is formatted as a string in ISO
8601 format, YYYY-MM-DDTHH:MM:SS
So you need to convert the datetime into date if you only want Year, month and Day -
datetime.strptime('2020/02/29','%Y/%m/%d').date()
Another possible way is using strftime() method which returns a string representing date and time using date, time or datetime object.
datetime.strptime('2020/02/29','%Y/%m/%d').strftime('%Y/%m/%d')
Output of both code snippets -
2020/02/29

How can I convert text to DateTime?

I scraped a website and got the following Output:
2018-06-07T12:22:00+0200
2018-06-07T12:53:00+0200
2018-06-07T13:22:00+0200
Is there a way I can take the first one and convert it into a DateTime value?
Just parse the string into year, month, day, hour and minute integers and then create a new date time object with those variables.
Check out the datetime docs
You can convert string format of datetime to datetime object like this using strptime, here %z is the time zone :
import datetime
dt = "2018-06-07T12:22:00+0200"
ndt = datetime.datetime.strptime(dt, "%Y-%m-%dT%H:%M:%S%z")
# output
2018-06-07 12:22:00+02:00
The following function (not mine) should help you with what you want:
df['date_column'] = pd.to_datetime(df['date_column'], format = '%d/%m/%Y %H:%M').dt.strftime('%Y%V')
You can mess around with the keys next to the % symbols to achieve what you want. You may, however, need to do some light cleaning of your values before you can use them with this function, i.e. replacing 2018-06-07T12:22:00+0200 with 2018-06-07 12:22.
You can use datetime lib.
from datetime import datetime
datetime_object = datetime.strptime('Jun 1 2005 1:33PM', '%b %d %Y %I:%M%p')
datetime.strptime documentation
Solution here

Python: convert 'days since 1990' to datetime object

I have a time series that I have pulled from a netCDF file and I'm trying to convert them to a datetime format. The format of the time series is in 'days since 1990-01-01 00:00:00 +10' (+10 being GMT: +10)
time = nc_data.variables['time'][:]
time_idx = 0 # first timestamp
print time[time_idx]
9465.0
My desired output is a datetime object like so (also GMT +10):
"2015-12-01 00:00:00"
I have tried converting this using the time module without much success although I believe I may be using wrong (I'm still a novice in python and programming).
import time
time_datetime = time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(time[time_idx]*24*60*60))
Any advice appreciated,
Cheers!
The datetime module's timedelta is probably what you're looking for.
For example:
from datetime import date, timedelta
days = 9465 # This may work for floats in general, but using integers
# is more precise (e.g. days = int(9465.0))
start = date(1990,1,1) # This is the "days since" part
delta = timedelta(days) # Create a time delta object from the number of days
offset = start + delta # Add the specified number of days to 1990
print(offset) # >>> 2015-12-01
print(type(offset)) # >>> <class 'datetime.date'>
You can then use and/or manipulate the offset object, or convert it to a string representation however you see fit.
You can use the same format as for this date object as you do for your time_datetime:
print(offset.strftime('%Y-%m-%d %H:%M:%S'))
Output:
2015-12-01 00:00:00
Instead of using a date object, you could use a datetime object instead if, for example, you were later going to add hours/minutes/seconds/timezone offsets to it.
The code would stay the same as above with the exception of two lines:
# Here, you're importing datetime instead of date
from datetime import datetime, timedelta
# Here, you're creating a datetime object instead of a date object
start = datetime(1990,1,1) # This is the "days since" part
Note: Although you don't state it, but the other answer suggests you might be looking for timezone aware datetimes. If that's the case, dateutil is the way to go in Python 2 as the other answer suggests. In Python 3, you'd want to use the datetime module's tzinfo.
netCDF num2date is the correct function to use here:
import netCDF4
ncfile = netCDF4.Dataset('./foo.nc', 'r')
time = ncfile.variables['time'] # do not cast to numpy array yet
time_convert = netCDF4.num2date(time[:], time.units, time.calendar)
This will convert number of days since 1900-01-01 (i.e. the units of time) to python datetime objects. If time does not have a calendar attribute, you'll need to specify the calendar, or use the default of standard.
We can do this in a couple steps. First, we are going to use the dateutil library to handle our work. It will make some of this easier.
The first step is to get a datetime object from your string (1990-01-01 00:00:00 +10). We'll do that with the following code:
from datetime import datetime
from dateutil.relativedelta import relativedelta
import dateutil.parser
days_since = '1990-01-01 00:00:00 +10'
days_since_dt = dateutil.parser.parse(days_since)
Now, our days_since_dt will look like this:
datetime.datetime(1990, 1, 1, 0, 0, tzinfo=tzoffset(None, 36000))
We'll use that in our next step, of determining the new date. We'll use relativedelta in dateutils to handle this math.
new_date = days_since_dt + relativedelta(days=9465.0)
This will result in your value in new_date having a value of:
datetime.datetime(2015, 12, 1, 0, 0, tzinfo=tzoffset(None, 36000))
This method ensures that the answer you receive continues to be in GMT+10.

Categories