I am trying to parse the mailing list of Apache Pig. I use the following function while parsing the dates.
from datetime import datetime
def str_to_date(date_str):
# First, remove the (UTC) type of parts at the end
try:
date_str = date_str[: date_str.index("(") - 1]
except ValueError:
pass
# Then, try different date formats
for date_format in [
"%a, %d %b %Y %H:%M:%S %z",
"%a %b %d %Y %H:%M:%S %z",
"%a %b %d %H:%M:%S %Y %z",
"%d %b %Y %H:%M:%S %z",
]:
try:
return datetime.strptime(date_str, date_format)
except ValueError:
pass
raise ValueError("No valid date format found for {}".format(date_str))
In the 201201.mbox, the following error raises:
ValueError: No valid date format found for Fri, 20 Jan 2012 16:31:14 +0580
When I inspect the mbox, I realized that it includes Date: Fri, 20 Jan 2012 16:31:14 +0580 line. So, it does not match any of the date formats in the function but the problem is +0580 should be "a 5-character string of the form +HHMM or -HHMM, where HH is a 2-digit string giving the number of UTC offset hours, and MM is a 2-digit string giving the number of UTC offset minutes" (docs)
According to the mbox, the offset of the mail date is +0580, which means plus 5 hours and 80 minutes. Isn't that wrong? Or, do I miss something?
There are only 60 minutes in an hour, so MM can't be more than 59. +0580 should be +0620.
Related
Im trying to convert a string to datetime and keep getting the error: ValueError: time data 'Mon, 22 Apr 2019 17:04:38 +0200 (CEST)' does not match format '%a, %d %b %Y %H:%M:%S %z %Z'
from datetime import datetime
s = "Mon, 22 Apr 2019 17:04:38 +0200 (CEST)"
d = datetime.strptime(s, '%a, %d %b %Y %H:%M:%S %z %Z')
What am i missing?
%Z is generally used for converting into string format. In any case, it is the offset, not the name of the time zone.
The rest of your code is valid, however:
s = "Mon, 22 Apr 2019 17:04:38 +0200"
d = datetime.strptime(s, '%a, %d %b %Y %H:%M:%S %z')
datetime only comes with the ability to parse UTC and whatever local time zone is listed in time.tzname. It can't match (CEST) because it doesn't know what timezone that is (It would also be redundant because you defined the timezone using the offset +0200).
You will need to implement your own (CEST) using datetime.tzinfo or by importing an external library like pytz or pendulum in order to parse (CEST) from a string into a datetime.timezone.
Also, don't forget to include parenthesis() in your match string.
This code passes, however, I do not know what happens to 'CEST' once it is converted into the string.
from datetime import datetime
tz = 'CEST'
s = "Mon, 22 Apr 2019 17:04:38 +0200 " + tz
d = datetime.strptime(s, '%a, %d %b %Y %H:%M:%S %z ' + tz)
When using
import datetime
s = 'Sat Apr 23 2016 00:00:00 GMT+0100'
print datetime.datetime.strptime(s, "%a %m %d %y %H:%M:%S GMT+0100")
I get:
ValueError: time data 'Sat Apr 23 2016 00:00:00 GMT+0100' does not match format '%a %m %d %y %H:%M:%S GMT+0100'
How to parse such a string?
Note: Using dateutil.parser.parse didn't work : it produced some weird datetime object that I could not subtract with another datetime, i.e. d1 - d2 didn't work.
According to this reference,
the format should be "%a %b %d %Y %H:%M:%S GMT+0100"
Use this format string instead: "%a %b %d %Y %H:%M:%S GMT+0100".
I made two changes:
Replaced %m (Month as a zero-padded decimal number) with %b (Month as locale’s abbreviated name)
Replaced %y (Year without century as a zero-padded decimal number) with %Y (Year with century as a decimal number)
I have found a question at this link that almost answers what I need but not quite. What I need to know, how using this method could I convert a string of the format u'Saturday, Feb 27 2016' into a Python date variable in the format 27/02/2016?
Thanks
You have to first remove the weekday name (it's not much use anyway) and parse the rest:
datetime.datetime.strptime('Saturday, Feb 27 2016'.split(', ', 1)[1], '%b %d %Y').date()
Alternatively, use dateutil:
dateutil.parser.parse('Saturday, Feb 27 2016').date()
EDIT
My mistake, you don't need to remove the Weekday (I'd missed it in the list of options):
datetime.datetime.strptime('Saturday, Feb 27 2016', '%A, %b %d %Y').date()
You don't have to remove anything, you can parse it as is and use strftime to get the format you want:
from datetime import datetime
s = u'Saturday, Feb 27 2016'
dt = datetime.strptime(s,"%A, %b %d %Y")
print(dt)
print(dt.strftime("%d/%m/%Y"))
2016-02-27 00:00:00
27/02/2016
%A Locale’s full weekday name.
%b Locale’s abbreviated month name.
%d Day of the month as a decimal number [01,31].
%Y Year with century as a decimal number.
The full listing of directives are here
I'm trying to convert a string given in "DD MM YYYY" format into a datetime object. Here's the code for the same:
from datetime import date, timedelta
s = "23 July 2001"
d = datetime.datetime.strptime(s, "%d %m %Y")
However, I get the following error:
ValueError: time data '23 July 2001' does not match format '%d %m %Y'
What's wrong ? Isn't the format specified in the string the same as that specified by "%d %m %Y" ?
%m means "Month as a zero-padded decimal number."
Your month is July so you should use %B, which is "Month as locale’s full name."
Reference: https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior
Like behzad.nouri said, Use d = datetime.datetime.strptime(s, "%d %B %Y").
Or make s = '23 07 2001' and d = datetime.datetime.strptime(s, "%d %m %Y")
I have these date strings:
Fri Oct 7 16:00:09 CEST 2011
I want to convert them to UTC. I have tried with this implementation:
def LocalToUtc(localtime):
return datetime.strptime(localtime, "%a %m %d %H:%M:%S %Z %Y").isoformat() + 'Z'
But I get a ValueError:
ValueError: time data 'Fri Oct 7 16:00:09 CEST 2011' does not match format '%a %m %d %H:%M:%S %Z %Y'
Any ideas?
Use the parsedatetime library.
There are two problems here:
You're using "%m" instead of "%b"
The standard lib can't parse "CEST", it understands only very few time zone names.
See also here: What possible values does datetime.strptime() accept for %Z?