I'm creating a web scraper and I'm running into an issue where the date the website gives me is of the form "Monday, January 1, 1991"
What's the best way to format this into a "MM-DD-YYYY" format? Should I split on the comma, pull out the month and convert it to a number, and then put the numbers together? Or is there some quicker way to do this?
Use the datetime module, using strptime to parse to a datetime object, then strftime to format as you need it:
from datetime import datetime
date = datetime.strptime("Monday, January 1, 1991", "%A, %B %d, %Y")
print(date.strftime("%m-%d-%Y"))
which outputs:
01-01-1991
For the record, any time you're considering rolling your own parser, the answer is almost always "Don't". Rolling your own parser is error-prone; if at all possible, look for an existing parser.
Related
Here is how the timestamp looks -
2015-07-17 06:01:51.066141+00:00
I'm looking around to convert this to unix date time.
datetime.strptime("2015-07-17 06:01:51.066141+00:00", "%Y-%m-%d %H:%M:%S.%f%z").strftime("%s")
ValueError: 'z' is a bad directive in format '%Y-%m-%d %H:%M:%S.%f%z'
throws error for me, probably because of wrong format being used.
PS: my virtualenv is on python 2.7
ideas please ?
python 2.7 strptime() does not support z directive, either you can use python 3.2+ or some other 3rd party library like dateutil.
For Python 2.7 use arrow:
import arrow
date_str = "2015-07-17 06:01:51.066141+00:00"
unix_time = arrow.get(date_str).timestamp
On PY3 (verified on 3.4), using only standard libs
The date string you show will not be parsed by the standard python datetime library since it has a colon in the timezone (see here). The colon can be easily removed since it's always in the same position (or use rfind to find its index starting from the right). Your simplest solution is:
import datetime
date_str = "2015-07-17 06:01:51.066141+00:00"
date_str_no_colon = date_str[:-3]+date_str[-2:] # remove last colon
dt_obj = datetime.datetime.strptime(date_str_no_colon, "%Y-%m-%d %H:%M:%S.%f%z")
unix_time = dt_obj.timestamp()
Note that arrow should still work with PY3, and is a better solution in general- you don't want to get into datetime parsing wars with python. It will win.
The way to parse the date is not right. You'll either need to parse it by hand, find another library (for example the dateutil.parser.parse method that will parse your string directly without supplying format string) that supports that format or make the timestamp in another format. Even with newer versions of python the %z directive does not accept TZ offsets in the +/-HH:MM format (with colon).
As the source of the timestamp is django.DateTimeField maybe this question can help?
For converting to unix timestamp you seem to have to do some work since there does not seem to be a direct method for that:
(t - datetime.utcfromtimestamp(0)).total_seconds()
where t is the datetime (assuming it's in UTC and there is no tzinfo) you want to convert to POSIX timestamp. If the assumption is not correct you need to put tzinfo in the zero timestamp you subtract as shown below where the assumption does not hold.
If you want to use dateutil.parser the complete solution would be:
(dateutil.parser.parse(timestamp) - datetime.utcfromtimestamp(0).replace(tzinfo=utc()).total_seconds()
strptime() has no support for timezones.
So, you can make the conversion ignoring the timezone in the following way:
datetime.strptime("2015-07-17 06:01:51.066141", "%Y-%m-%d %I:%M:%S.%f").strftime("%s")
'1437102111'
Or in order to avoid using %s, as suggested below in the commments :
from datetime import datetime
(datetime.strptime("2015-07-17 06:01:51.066141", "%Y-%m-%d %I:%M:%S.%f") - datetime(1970, 1, 1)).total_seconds()
1437112911.066141
Notice this is a working version for Python 2, you can also check solutions for other versions here
Otherwise, you will have to use other libraries (django.utils or email.utils) that support timezones, or implement the timezone parsing on your own.
P.S. :
strptime docs appear to have support for timezone, but in fact it has not been implemented. Try :
datetime.strptime("2015-07-17 06:01:51.066141+00:00", "%Y-%m-%d %I:%M:%S.%f%z").strftime("%s")
and you will see that it is not supported. You can also verify it by searching more about strptime()
There are two parts:
to convert "2015-07-17 06:01:51.066141+00:00" into a datetime object that represents UTC time, see Convert timestamps with offset to datetime obj using strptime. Or If you know that the utc offset is always +0000:
from datetime import datetime
utc_time = datetime.strptime(time_string, "%Y-%m-%d %H:%M:%S.%f+00:00")
to convert the UTC time to POSIX timestamp (unix time), see Converting datetime.date to UTC timestamp in Python:
from datetime import datetime
timestamp = (utc_time - datetime(1970, 1, 1)).total_seconds()
I am making a news blog for my project in which I have to display time under the news title as x minutes ago etc.
From the RSS feed of the news, I have the time stamp in the form of a string. For example:
timestamp = 'Tue, 12 Feb 2013 07:43:09 GMT'
I am trying to find the difference between this timestamp and the present time using datetime module in python. But for some reason it gives error:
ValueError: astimezone() cannot be applied to a naive datetime
I'll appreciate if somebody can point me in the right direction. Below is my attempt in Python:
from datetime import datetime
from pytz import timezone
timestamp = 'Tue, 12 Feb 2013 07:43:09 GMT'
t = datetime.strptime(timestamp, '%a, %d %b %Y %I:%M:%S %Z')
now_time = datetime.now(timezone('US/Pacific'))
# converting the timestamp to Pacific time
t_pacific = t.astimezone(timezone('US/Pacific')) # get error here
diff = t_pacific - t
Thanks!
Prakhar
Your example had a few problems (I see you've fixed them now):
First line should be:
from datetime import datetime
It looks as though you're missing a closing parenthesis on the 4th line:
now_time = datetime.now(timezone('US/Pacific')
What is timezone()? Where does that come from?
You don't really need to mess with timezones, I don't think — just use GMT (UTC). How about something more like this:
from datetime import datetime
timestamp = 'Tue, 12 Feb 2013 07:43:09 GMT'
t = datetime.strptime(timestamp, '%a, %d %b %Y %I:%M:%S %Z')
t_now = datetime.utcnow()
diff = t_now - t
The problem here is that t does not have a timezone—that's what the error message means by "naive datetime". From the docs:
There are two kinds of date and time objects: “naive” and “aware”… An aware object has sufficient knowledge of… time zone and daylight savings time information… A naive object does not…
You can verify that it's naive by doing this:
print(t.tzinfo)
The answer will be None.
As the astimezone docs say:
self must be aware (self.tzinfo must not be None, and self.utcoffset() must not return None).
The strptime function always generates a naive datetime.
You can fix this in various ways:
First convert t to a GMT datetime instead of a naive one, and then your conversion to 'US/Pacific' will work.
As the docs say, "If you merely want to attach a time zone object tz to a datetime dt without adjustment of date and time data, use dt.replace(tzinfo=tz)." Since you know the time is in UTC, just replace the empty tz with UTC, and you've got an aware time.
Convert to PST with a different mechanism than astimezone, on which will assume UTC or which allows you to specify the source.
There are various alternatives out there, but you're already using pytz, so see its documentation.
Convert now_time to UTC instead of converting t to PST.
The last one is probably the simplest and best for most use cases. Since you've only got now_time in PST because you explicitly asked for it that way, all you have to do is not do that (or explicitly ask for 'GMT' instead of 'US/Pacific'). Then you can just do your date arithmetic on UTC times.
If you need to display final results in PST, it's still often better to do the arithmetic in UTC, and convert at the end. (For example, you can have two times that are an hour apart, but have the same value in Pacific, because of 1am on DST day being repeated twice; that won't be an issue if you stay in UTC all the time.)
I am attempting to convert the following date (2012-12-25T08:52:00-05:00) to a datetime object in python. However, I cannot figure out what the -05:00 part of the date is referencing. I am simply trying to perform the following:
datetime.datetime.strptime('2012-12-25T08:52:00-05:00','%Y-%m-%dT%H:%M:%S')
But this comes up with an expected 'ValueError: unconverted data remains'. I'm just trying to figure out what the last part of the date is used for so that I can convert that string to a proper datetime object in python.
Happy Holidays!
Your date seems to be in the ISO 8601 format, I don't think datetime handles the timezone information at the end of the string format.
You can use pip install python-dateutil, its parser can return a datetime object :
import dateutil.parser
datestr = '2012-12-25T08:52:00-05:00'
dateutil.parser.parse(datestr)
>>> datetime.datetime(2012, 12, 25, 8, 52, tzinfo=tzoffset(None, -18000))
The -05:00 indicates the timezone offset from UTC, i.e. %z would be the correct strptime argument to parse it.
If the time is UTC the offset might be indicated using Z, e.g. 2012-12-25T08:52:00Z. Not sure if %z would actually accept this...
I'm very new to python and trying to build a simple web app in pieces.
I'm using the datetime library for the first time so please be patient with me.
All I'm trying to do is to get and display the current time and date so that I can cross-reference it with a target time & date later.
I'm getting some colossal errors. Any help is appreciated. Not sure what I'm doing incorrectly here to display the time formatted the way I want.
from datetime import datetime
date_string = "4:21 PM 1.24.2011"
format = "%I.%M %p %m %d, %Y"
my_date = datetime.strptime(date_string, format)
print(my_date.strftime(format))
The format of the date_string doesn't match the format you're trying to parse it with. The following format string should allow you to parse the date.
format = "%I:%M %p %m.%d.%Y"
And afterwards, if you want to print it using the other format
print(my_date.strftime("%I.%M %p %m %d, %Y"))
You're using wrong format string. Try to replace it with "%I:%M %p %m.%d.%Y".
Here's documentation how to use datetime class properly.
The problem is with your format. You need to make the format match date_string. So try this:
format = "%I:%M %p %m.%d.%Y"
That should do the trick
Also, it might be of interest to you to take a look at time.asctime
I need to convert a date from a string (entered into a url) in the form of 12/09/2008-12:40:49. Obviously, I'll need a UNIX Timestamp at the end of it, but before I get that I need the Date object first.
How do I do this? I can't find any resources that show the date in that format? Thank you.
You need the strptime method. If you're on Python 2.5 or higher, this is a method on datetime, otherwise you have to use a combination of the time and datetime modules to achieve this.
Python 2.5 up:
from datetime import datetime
dt = datetime.strptime(s, "%d/%m/%Y-%H:%M:%S")
below 2.5:
from datetime import datetime
from time import strptime
dt = datetime(*strptime(s, "%d/%m/%Y-%H:%M:%S")[0:6])
You can use the time.strptime() method to parse a date string. This will return a time_struct that you can pass to time.mktime() (when the string represents a local time) or calendar.timegm() (when the string is a UTC time) to get the number of seconds since the epoch.