Issue with code when using datetime and timezone - python

I have a list of strings called 'entries'. Each entry includes a date and time in a format like this: 'Mon Jun 15 17:52:03 2015'
I'm parsing the dates/times from each entry with regex and then I need to put them into python's datetime format and change the timezone to UTC (which is local time +4 hrs). Here's my code:
from datetime import datetime
import pytz
local = pytz.timezone("Etc/GMT+4")
localdate = [None]*len(entries)
local_dt = [None]*len(entries)
utc_dt = [None]*len(entries)
utdate = [None]*len(entries)
for i in range(len(entries)):
localdate[i] = datetime.strptime(re.search(r'\w{3}\s*?\w{3}\s*?\d{1,2}\s*?
\d{1,2}:\d{2}:\d{2}\s*?\d{4}', entries[i]).group(0), "%c")
local_dt[i] = local.localize(localdate[i], is_dst=None)
utc_dt[i] = local_dt[i].astimezone(pytz.utc)
utdate[i] = utc_dt[i].strftime("%c")
utdate = map(str, utdate)
print utdate
It seems to work well line-by-line if I go through and print each step, but once it gets to the last step it reverts back to the original format of the dates/times rather than the python datetime format of 'yyyy-mm-dd hh:mm:ss'. Anyone know what's wrong?

tl;dr
You're formatting the datetime object into a string with utdate[i] = utc_dt[i].strftime("%c"). The %c code formats the date according to the system's localization settings, not the format you're expecting.
The standard string representation of a datetime object will generate the format you're looking for – you can get a string from str(some_datetime), or print(some_datetime) to print it to the console.
Timezones
This is notoriously hard to keep track of, but you may want to double check which timezone you're using. As is, your code will take an input time and give an output time that's 4 hours earlier. If I'm understanding correctly, you expect it the other way around. You should know that the "Etc" timezones are labelled oppositely for weird reasons, and you may want to change the timezone used. It's a different question, but using a location-based timezone instead of a UTC offset may be a good idea for things like DST support.
Improvements
You can simplify and clarify what you're trying to do here with a few changes. It makes it a bit more "Pythonic" as well.
input_format = '%a %b %d %H:%M:%S %Y' # Change 1
converted_entries = [] # Change 2
for entry in entries: # Change 3
local_date = datetime.strptime(entry, input_format) # Change 1 (continued)
# Change 4
localized_date = local.localize(local_date)
utc_date = localized_date.astimezone(pytz.utc)
converted_entries.append(utc_date)
utdate = map(str, converted_entries)
print utdate
Changes
Use a strftime/strptime formatter. strftime and strptime are designed to parse strings, ordinarily regular expressions shouldn't be needed to process them first. The same goes for output formats – if specific format is needed that's not provided with a built-in method like datetime.isoformat, use a formatter.
In Python there's no need to initialize a list a length ahead of time (or with None). list_var = [] or list_var = list() will give you an empty list that will expand on demand.
Typically it's best and simplest to just iterate over a list, rather than jump through hoops to get a loop counter. It's more readable, and ultimately less to remember.
If you do need a counter, use enumerate, e.g.: for i, entry in enumerate(entries):
Use scoped variables. Temporary values like localdate and localdt can just be kept inside the for loop. Technically it's wasting memory, but more importantly it keeps the code simpler and more encapsulated.
If the values are needed for later, then do what I've done with the converted_entries list. Initialize it outside the loop, then just append the value to the list each time through.
No need for counter variables:
localized_dates = []
for # omitted ...
localized_date = local.localize(local_date)
localized_dates.append(localized_date)
I hope that's helpful for you. The beauty of Python is that it can be pretty simple, so just embrace it 😀

Related

How to retrieve strptime model with string and datetime object available?

Suppose I have a large set of strings I want to parse to a set of datetime objects. I could use the dateutils.parser and iterate through the set but it is more computer intensive and takes a longer time than parsing one, retrieving the strptime format applied and just do datetime.strptime(string, model).
I wanted to create a function, a bit like the following:
def retrieve_format(datetime_object, string):
#do some things
return model
with the model being a string.
I have found nothing that explains the inner workings of the dateutils parser, and I believe the developers have the ability to add such a feature.
Any idea on how to do it ? It would save time and computing power.
Example
Suppose I have a set of string that are formatted the same way as this one:
myStr = '27/03/2020 - 16:20'
I could do
myDate = dateutils.parser.parse(myStr)
and get 'myDate' as being
datetime.datetime(2020, 3, 27, 16, 20)
but now I could use my function as such
>>> model = retrieve_format(myDate, myStr)
>>> print(model)
%d/%m/%Y - %H:%M
I could then do
datetime_set = {}
for formatted_string in set:
raw = datetime.datetime.strptime(formatted_string, model)
datetime_set.add(raw)
to treat all the other elements very efficiently.
Okay so thanks to snakecharmerb's comment on my question, I found this comment which uses the dateinfer library. Here, just the string is needed. Installation with pip is possible
pip install pydateinfer
A working example would be the following
import dateinfer
dateinfer.infer(['27/03/2020 - 16:20', '28/03/2020 - 14:56' ])
and the output is
'%d/%m/%Y - %H:%M'
The input is always a list, even if it contains only one element.
Depending on the ambiguity of the string, the list should have more or less elements. That is because for example in '04/04/2020', we have no means of distinguishing the day or the month.

Python library to return date format

I need to return the date format from a string. Currently I am using parser to parse a string as a date, then replacing the year with a yyyy or yy. Similarly for other dates items. Is there some function I could use that would return mm-dd-yyyy when I send 12-05-2018?
Technically, it is an impossible question. If you send in 12-05-2018, there is no way for me to know whether you are sending in a mm-dd-yyyy (Dec 5, 2018) or dd-mm-yyyy (May 12, 2018).
One approach might be to do a regex replacement of anything which matches your expected date pattern, e.g.
date = "Here is a date: 12-05-2018 and here is another one: 10-31-2010"
date_masked = re.sub(r'\b\d{2}-\d{2}-\d{4}\b', 'mm-dd-yyyy', date)
print(date)
print(date_masked)
Here is a date: 12-05-2018 and here is another one: 10-31-2010
Here is a date: mm-dd-yyyy and here is another one: mm-dd-yyyy
Of course, the above script makes no effort to check whether the dates are actually valid. If you require that, you may use one of the date libraries available in Python.
I don't really understand what you plan to do with the format. There are two reasons I can think of why you might want it. (1) You want at some future point to convert a normalized datetime back into the original string. If that is what you want you would be better off just storing the normalized datetime and the original string. Or (2) you want to draw (dodgy) conclusions about person sending the data, because different nationalities will tend to use different formats. But, whatever you want it for, you can do it this way:
from dateutil import parser
def get_date_format(date_input):
date = parser.parse(date_input)
for date_format in ("%m-%d-%Y", "%d-%m-%Y", "%Y-%m-%d"):
# You can extend the list above to include formats with %y in addition to %Y, etc, etc
if date.strftime(date_format) == date_input:
return date_format
>>> date_input = "12-05-2018"
>>> get_date_format(date_input)
'%m-%d-%Y'
You mention in a comment you are prepared to make assumptions about ambiguous dates like 12-05-2018 (could be May or December) and 05-12-18 (could be 2018 or 2005). You can pass those assumptions to dateutil.parser.parse. It accepts boolean keyword parameters dayfirst and yearfirst which it will use in ambiguous cases.
Take a look at the datetime library. There you will find the function strptime(), which is exactly what you are looking for.
Here is the documentation: https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior

Is it possible to extract a format string (e.g. "YY-mm-DD HH:MM:SS.sss") from a python datetime object? [duplicate]

Here's an array of datetime values:
array = np.array(['2016-05-01T00:00:59.3+10:00', '2016-05-01T00:02:59.4+10:00',
'2016-05-01T00:03:59.4+10:00', '2016-05-01T00:13:00.1+10:00',
'2016-05-01T00:22:00.5+10:00', '2016-05-01T00:31:01.1+10:00'],
dtype=object)
pd.to_datetime is very good at inferring datetime formats.
array = pd.to_datetime(array)
print(array)
DatetimeIndex(['2016-04-30 14:00:59.300000', '2016-04-30 14:02:59.400000',
'2016-04-30 14:03:59.400000', '2016-04-30 14:13:00.100000',
'2016-04-30 14:22:00.500000', '2016-04-30 14:31:01.100000'],
dtype='datetime64[ns]', freq=None)
How can I dynamically figure out what datetime format pd.to_datetime inferred? Something like: %Y-%m-%dT... (sorry, my datetime foo is really bad).
I don't think it's possible to do this in full generality in pandas.
As mentioned in other comments and answers, the internal function _guess_datetime_format is close to being what you ask for, but it has strict criteria for what constitutes a guessable format and so it will only work for a restricted class of datetime strings.
These criteria are set out in the _guess_datetime_format function on these lines and you can also see some examples of good and bad formats in the test_parsing script.
Some of the main points are:
year, month and day must each be present and identifiable
the year must have four digits
exactly six digits must be used if using microseconds
you can't specify a timezone
This means that it will fail to guess the format for datetime strings in the question despite them being a valid ISO 8601 format:
>>> from pandas.core.tools.datetimes import _guess_datetime_format_for_array
>>> array = np.array(['2016-05-01T00:00:59.3+10:00'])
>>> _guess_datetime_format_for_array(array)
# returns None
In this case, dropping the timezone and padding the microseconds to six digits is enough to make pandas to recognise the format:
>>> array = np.array(['2016-05-01T00:00:59.300000']) # six digits, no tz
>>> _guess_datetime_format_for_array(array)
'%Y-%m-%dT%H:%M:%S.%f'
This is probably as good as it gets.
If pd.to_datetime is not asked to infer the format of the array, or given a format string to try, it will just try and parse each string separately and hope that it is successful. Crucially, it does not need to infer a format in advance to do this.
First, pandas parses the string assuming it is (approximately) a ISO 8601 format. This begins in a call to _string_to_dts and ultimately hits the low-level parse_iso_8601_datetime function that does the hard work.
You can check if your string is able to be parsed in this way using the _test_parse_iso8601 function. For example:
from pandas._libs.tslib import _test_parse_iso8601
def is_iso8601(string):
try:
_test_parse_iso8601(string)
return True
except ValueError:
return False
The dates in the array you give are recognised as this format:
>>> is_iso8601('2016-05-01T00:00:59.3+10:00')
True
But this doesn't deliver what the question asks for and I don't see any realistic way to recover the exact format that is recognised by the parse_iso_8601_datetime function.
If parsing the string as a ISO 8601 format fails, pandas falls back to using the parse() function from the third-party dateutil library (called by parse_datetime_string). This allows a fantastic level of parsing flexibility but, again, I don't know of any good way to extract the recognised datetime format from this function.
If both of these two parsers fail, pandas either raises an error, ignores the string or defaults to NaT (depending on what the user specifies). No further attempt is made to parse the string or guess the format of the string.
DateInfer (PyDateInfer) library allows to infer dates based on the sequence of available dates:
github.com/wdm0006/dateinfer
Usage from docs:
>>> import dateinfer
>>> dateinfer.infer(['Mon Jan 13 09:52:52 MST 2014', 'Tue Jan 21 15:30:00 EST 2014'])
'%a %b %d %H:%M:%S %Z %Y'
>>>
Disclaimer: I have used and then contributed to this library
You can use _guess_datetime_format from core.tools to get the format. ie
from pandas.core.tools import datetimes as tools
tools._guess_datetime_format(pd.to_datetime(array).format()[0][:10])
Output :
'%Y-%m-%d'
To know more about this method you can see here. Hope it helps.

Parse unformatted dates in Python

I have some text, taken from different websites, that I want to extract dates from. As one can imagine, the dates vary substantially in how they are formatted, and look something like:
Posted: 10/01/2014
Published on August 1st 2014
Last modified on 5th of July 2014
Posted by Dave on 10-01-14
What I want to know is if anyone knows of a Python library [or API] which would help with this - (other than e.g. regex, which will be my fallback). I could probably relatively easily remove the "posed on" parts, but getting the other stuff consistent does not look easy.
My solution using dateutil
Following Lukas's suggestion, I used the dateutil package (seemed far more flexible than Arrow), using the Fuzzy entry, which basically ignores things which are not dates.
Caution on Fuzzy parsing using dateutil
The main thing to note with this is that as noted in the thread Trouble in parsing date using dateutil if it is unable to parse a day/month/year it takes a default value (which is the current day, unless specified), and as far as i can tell there is no flag reported to indicate that it took the default.
This would result in "random text" returning today's date of 2015-4-16 which could have caused problems.
Solution
Since I really want to know when it fails, rather than fill in the date with a default value, I ended up running twice, and then seeing if it took the default on both instances - if not, then I assumed parsing correctly.
from datetime import datetime
from dateutil.parser import parse
def extract_date(text):
date = {}
date_1 = parse(text, fuzzy=True, default=datetime(2001, 01, 01))
date_2 = parse(text, fuzzy=True, default=datetime(2002, 02, 02))
if date_1.day == 1 and date_2.day ==2:
date["day"] = "XX"
else:
date["day"] = date_1.day
if date_1.month == 1 and date_2.month ==2:
date["month"] = "XX"
else:
date["month"] = date_1.month
if date_1.year == 2001 and date_2.year ==2002:
date["year"] = "XXXX"
else:
date["year"] = date_1.year
return(date)
print extract_date("Posted: by dave August 1st")
Obviously this is a bit of a botch (so if anyone has a more elegant solution -please share), but this correctly parsed the four examples i had above [where it assumed US format for the date 10/01/2014 rather than UK format], and resulted in XX being returned appropriately when missing data entered.
You could use Arrow library:
arrow.get('2013-05-05 12:30:45', ['MM/DD/YYYY', 'MM-DD-YYYY'])
Two arguments, first a str to parse and second a list of formats to try.

Elegant way to adjust date timezones in Python

I'm based in the UK, and grappling with summer time BST and timezones.
Here's my code:
TIME_OFFSET = 1 # 0 for GMT, 1 for BST
def RFC3339_to_localHHMM(input):
# Take an XML date (2013-04-08T22:35:00Z)
# return e.g. 08/04 23:35
return (datetime.datetime.strptime(input, '%Y-%m-%dT%H:%M:%SZ') +
datetime.timedelta(hours=TIME_OFFSET)).strftime('%d/%m %H:%M')
Setting a variable like this feels very wrong, but I can't find any elegant way to achieve the above without hideous amounts of code. Am I missing something, and is there no way to (for example) read the system timezone?
To convert UTC to given timezone:
from datetime import datetime
import pytz
local_tz = pytz.timezone("Europe/London") # time zone name from Olson database
def utc_to_local(utc_dt):
return utc_dt.replace(tzinfo=pytz.utc).astimezone(local_tz)
rfc3339s = "2013-04-08T22:35:00Z"
utc_dt = datetime.strptime(rfc3339s, '%Y-%m-%dT%H:%M:%SZ')
local_dt = utc_to_local(utc_dt)
print(local_dt.strftime('%d/%m %H:%M')) # -> 08/04 23:35
See also How to convert a python utc datetime to a local datetime using only python standard library?.
You seem to be asking a few separate questions here.
First, if you only care about your own machine's current local timezone, you don't need to know what it is. Just use the local-to-UTC functions. There are a few holes in the API, but even if you can't find the function you need, you can always just get from local to UTC or vice-versa by going through the POSIX timestamp and the fromtimestamp and utcfromtimestamp methods.
If you want to be able to deal with any timezone, see the top of the docs for the difference between aware and naive objects, but basically: an aware object is one that knows its timezone. So, that's what you need. The problem is that, as the docs say:
Note that no concrete tzinfo classes are supplied by the datetime module. Supporting timezones at whatever level of detail is required is up to the application.
The easiest way to support timezones is to install and use the third-party library pytz.
Meanwhile, as strftime() and strptime() Behavior sort-of explains, strptime always returns a naive object. You then have to call replace and/or astimezone (depending on whether the string was a UTC time or a local time) to get an aware object imbued with the right timezone.
But, even with all this, you still need to know what local timezone you're in, which means you still need a constant. In other words:
TIMEZONE = pytz.timezone('Europe/London')
def RFC3339_to_localHHMM(input):
# Take an XML date (2013-04-08T22:35:00Z)
# return e.g. 08/04 23:35
utc_naive = datetime.datetime.strptime(input, '%Y-%m-%dT%H:%M:%SZ')
utc = utc_naive.replace(pytz.utc)
bst = utc.astimezone(TIMEZONE)
return bst.strftime('%d/%m %H:%M')
So, how do you get the OS to give you the local timezone? Well, that's different for different platforms, and Python has nothing built in to help. But there are a few different third-party libraries that do, such as dateutil. For example:
def RFC3339_to_localHHMM(input):
# Take an XML date (2013-04-08T22:35:00Z)
# return e.g. 08/04 23:35
utc = datetime.datetime.strptime(input, '%Y-%m-%dT%H:%M:%SZ')
bst = utc.astimezone(dateutil.tz.tzlocal())
return bst.strftime('%d/%m %H:%M')
But now we've come full circle. If all you wanted was the local timezone, you didn't really need the timezone at all (at least for your simple use case). So, this is only necessary if you need to support any timezone, and also want to be able to, e.g., default to your local timezone (without having to write two copies of all of your code for the aware and naive cases).
(Also, if you're going to use dateutil in the first place, you might want to use it for more than just getting the timezone—it can basically replacing everything you're doing with both datetime and pytz.)
Of course there are other options besides these libraries—search PyPI, Google, and/or the ActiveState recipes.
If you want to convert a UTC input into a local time, regardless of which timezone you're in, try this:
def utctolocal(input):
if time.localtime()[-1] == 1: st=3600
else: st=0
return time.localtime(time.time()-time.mktime(time.gmtime())+time.mktime(time.localtime(time.mktime(time.strptime(input, '%Y-%m-%dT%H:%M:%SZ'))))+st)
Quite long code, but what it does is it simply adds the difference between time.gmtime() and time.localtime() to the time tuple created from the input.
Here's a function I use to do what I think you want. This assumes that the input is really a gmt, or more precisely, a utc datetime object:
def utc_to_local(utc_dt):
'''Converts a utc datetime obj to local datetime obj.'''
t = utc_dt.timetuple()
secs = calendar.timegm(t)
loc = time.localtime(secs)
return datetime.datetime.fromtimestamp(time.mktime(loc))
Like you said, this relies on the system time zone, which may give you shaky results, as some of the comments have pointed out. It has worked perfectly for me on Windows, however.
A simple function to check if a UCT corresponds to BST in London or GMT (for setting TIME_OFFSET above)
import datetime
def is_BST(input_date):
if input_date.month in range(4,9):
return True
if input_date.month in [11,12,1,2]:
return False
# Find start and end dates for current year
current_year = input_date.year
for day in range(25,32):
if datetime.datetime(current_year,3,day).weekday()==6:
BST_start = datetime.datetime(current_year,3,day,1)
if datetime.datetime(current_year,10,day).weekday()==6:
BST_end = datetime.datetime(current_year,10,day,1)
if (input_date > BST_start) and (input_date < BST_end):
return True
return False

Categories