Convert datetime to UTC in different order - python

Trying to convert this date format Thu Apr 26 22:51:49 PDT 2018 to UTC 2018-04-26T22:51:49Z I dont care about the day part and can be excluded.
type(results[0].create_date)
returns
<class 'str'>
So far I have tried this
print (datetime.strptime((results[0].create_date), "%Y-%m-%dT%H:%M:%SZ"))
but failing with this error
ValueError: time data 'Thu Apr 26 22:51:49 PDT 2018' does not match format '%Y-%m-%dT%H:%M:%SZ'

The reason your code doesn't work is because the %Y-%m-%dT%H:%M:%SZ string pattern does not match how the date is represented in the results[0].create_date variable. datetime.strptime attempts to match the given string to the format you specify to extract a datetime object. Amending the formatter may help here, but you may have difficulty with the PDT part.
I suggest using the dateutil.parser module. You can do the following:
import dateutil.parser
x = parser.parse(results[0].create_date)
print x
>>> 2018-04-26 22:51:49
This returns a datetime object and you can format however you want to include the 'T' and 'Z' as you suggested in your question.
NOTE: I have done this based on your given input and desired output. You must be aware however that Thu Apr 26 22:51:49 PDT 2018 in UTC is not equal to 2018-04-26T22:51:49Z as PDT is 7 hours behind UTC.

Related

Parse Date string to datetime with timezone

I have a string:
r = 'Thu Dec 17 08:56:41 CST 2020'
Here CST represent China central time('Asia/Shanghai'). I wanted to parse it to datetime...I am doing something like
from dateparser import parse
r1 = parse(r)
Which is giving me r1 as:
2020-12-17 08:56:41-06:00
And I am also doing this
r2 = r1.replace(tzinfo=pytz.timezone("Asia/Shanghai"))
And this is giving me r2 as:
2020-12-17 08:50:41+08:00
There is 6 min lag in r2 can someone tell me why is that? And how to correctly transfer my raw string r1 to desired r2 which is:
2020-12-17 08:56:41 in Asia/Shanghai timezone
Thanks
Using dateutil.parser you can directly parse your date correctly.
Note that CST is an ambiguous timezone, so you need to specify which one you mean. You can either do this directly in the tzinfos parameter of the parse() call or you can define a dictionary that has mappings for timezones and pass this. In this dict, you can either specify the offset, e.g.
timezone_info = {
"CDT": -5 * 3600,
"CEST": 2 * 3600,
"CST": 8 * 3600
}
parser.parse(r, tzinfos=timezone_info)
or (using gettz) directly specify a timezone:
timezone_info = {
"CDT": gettz("America/Chicago"),
"CEST": gettz("Europe/Berlin"),
"CST": gettz("Asia/Shanghai")
}
parser.parse(r, tzinfos=timezone_info)
See also the dateutil.parser documentation and the answers to this SO question.
Be aware that the latter approach is tricky if you have a location with daylight saving time! Depending on the date you apply it to, gettz("America/Chicago") will have UTC-5 or UTC-6 as a result (as Chicago switches between Central Standard Time and Central Daylight Time). So depending on your input data, the second example may actually not really be correct and yield the wrong outcome! Currently, China observes China Standard Time (CST) all year, so for your use case it makes no difference (may depend on your date range though).
Overall:
from dateutil import parser
from dateutil.tz import gettz
timezone_info = {"CST": gettz("Asia/Shanghai")}
r = 'Thu Dec 17 08:56:41 CST 2020'
d = parser.parse(r, tzinfos=timezone_info)
print(d)
print(d.strftime('%Y-%m-%d %H:%M:%S %Z%z'))
gets you
2020-12-17 08:56:41+08:00
2020-12-17 08:56:41 CST+0800
EDIT: Printing the human readable timezone name instead of the abbreviated one name is just a little more complicated with this approach, as dateutil.tz.gettz() gets you a tzfile that has no attribute which has just the name. However, you can obtain it via the protected _filename using split():
print(d.strftime('%Y-%m-%d %H:%M:%S') + " in " + "/".join(d.tzinfo._filename.split('/')[-2:]))
yields
2020-12-17 08:56:41+08:00 in Asia/Shanghai
This of course only works if you used gettz() to set the timezone in the first place.
EDIT 2: If you know that all your dates are in CST anyway, you can also ignore the timezone when parsing. This gets you naive (or unanware) datetimes which you can then later add a human readable timezone to. You can do this using replace() and specify the timezone either as shown above using gettz() or using timezone(() from the pytz module:
from dateutil import parser
from dateutil.tz import gettz
import pytz
r = 'Thu Dec 17 08:56:41 CST 2020'
d = parser.parse(r, ignoretz=True)
d_dateutil = d.replace(tzinfo=gettz('Asia/Shanghai'))
d_pytz = d.replace(tzinfo=pytz.timezone('Asia/Shanghai'))
Note that depending on which module you use to add the timezone information, the class of tzinfo differs. For the pytz object, there is a more direct way of accessing the timezone in human readable form:
print(type(d_dateutil.tzinfo))
print("/".join(d_dateutil.tzinfo._filename.split('/')[-2:]))
print(type(d_pytz.tzinfo))
print(d_pytz.tzinfo.zone)
produces
<class 'dateutil.tz.tz.tzfile'>
Asia/Shanghai
<class 'pytz.tzfile.Asia/Shanghai'>
Asia/Shanghai
from datetime import datetime
import pytz
# The datetime string you have
r = "Thu Dec 17 08:56:41 CST 2020"
# The time-zone string you want to use
offset_string = 'Asia/Shanghai'
# convert the time zone string into offset from UTC
# a. datetime.now(pytz.timezone(offset_string)).utcoffset().total_seconds() --- returns seconds offset from UTC
# b. convert seconds into hours (decimal) --- divide by 60 twice
# c. remove the decimal point, we want the structure as: +0800
offset_num_repr = '+{:05.2f}'.format(datetime.now(pytz.timezone(offset_string)).utcoffset().total_seconds()/60/60).replace('.', '')
print('Numeric representation of the offset: ', offset_num_repr)
# replace the CST 2020 with numeric timezone offset
# a. replace it with the offset computed above
updated_datetime = str(r).replace('CST', offset_num_repr)
print('\t Modified datetime string: ', updated_datetime)
# Now parse your string into datetime object
r = datetime.strptime(updated_datetime, "%a %b %d %H:%M:%S %z %Y")
print('\tFinal parsed datetime object: ', r)
Should produce:
Numeric representation of the offset: +0800
Modified datetime string: Thu Dec 17 08:56:41 +0800 2020
Final parsed datetime object: 2020-12-17 08:56:41+08:00

Python dateparser fails when timezone is in middle

I'm trying to parse a date string using the following code:
from dateutil.parser import parse
datestring = 'Thu Jul 25 15:13:16 GMT+06:00 2019'
d = parse(datestring)
print (d)
The parsed date is:
datetime.datetime(2019, 7, 25, 15, 13, 16, tzinfo=tzoffset(None, -21600))
As you can see, instead of adding 6 hours to GMT, it actually subtracted 6 hours.
What's wrong I'm doing here? Any help on how can I parse datestring in this format?
There's a comment in the source: https://github.com/dateutil/dateutil/blob/cbcc0871792e7eed4a42cc62630a08ec7a78be30/dateutil/parser/_parser.py#L803.
# Check for something like GMT+3, or BRST+3. Notice
# that it doesn't mean "I am 3 hours after GMT", but
# "my time +3 is GMT". If found, we reverse the
# logic so that timezone parsing code will get it
# right.
Important parts
Notice that it doesn't mean "I am 3 hours after GMT", but "my time +3 is GMT"
If found, we reverse the logic so that timezone parsing code will get it right
Last sentence in that comment (and 2nd bullet point above) explains why 6 hours are subtracted. Hence, Thu Jul 25 15:13:16 GMT+06:00 2019 means Thu Jul 25 09:13:16 2019 GMT.
Take a look at http://www.timebie.com/tz/timediff.php?q1=Universal%20Time&q2=GMT%20+6%20Time for more context.
dateutil.parse converts every time into GMT. The input is being read as 15:13:16 in GMT+06:00 time. Naturally, it becomes 15:13:16-06:00 in GMT.

Convert weekday name string into datetime

I have the following date (as an object format) : Tue 31 Jan in a pandas Series.
and I try to change it into : 31/01/2019
Please, how can I achieve this ? I understand more or less that pandas.Datetime can convert easily when a string date is clearer (like 6/1/1930 22:00) but not in my case, when their is a weekday name.
Thank you for your help.
Concat the year and callpd.to_datetime with a custom format:
s = pd.Series(['Tue 31 Jan', 'Mon 20 Feb',])
pd.to_datetime(s + ' 2019', format='%a %d %b %Y')
0 2019-01-31
1 2019-02-20
dtype: datetime64[ns]
This is fine as long as all your dates follow this format. If that is not the case, this cannot be solved reliably.
More information on datetime formats at strftime.org.
Another option is using the 3rd party dateutil library:
import dateutil
s.apply(dateutil.parser.parse)
0 2018-01-31
1 2018-02-20
dtype: datetime64[ns]
This can be installed with PyPi.
Another, slower option (but more flexible) is using the 3rd party datefinder library to sniff dates from string containing random text (if this is what you need):
import datefinder
s.apply(lambda x: next(datefinder.find_dates(x)))
0 2018-01-31
1 2018-02-20
dtype: datetime64[ns]
You can install it with PyPi.
Convert to a datetime object
If you wanted to use the datetime module, you could get the year by doing the following:
import datetime as dt
d = dt.datetime.strptime('Tue 31 Jan', '%a %d %b').replace(year=dt.datetime.now().year)
This is taking the date in your format, but replacing the default year 1900 with the current year in a reliable way.
This is similar to the other answers, but uses the builtin replace method as opposed to concatenating a string.
Output
To get the desired output from your new datetime object, you could perform the following:
>>> d.strftime('%d/%m/%Y')
'31/01/2018'
Here is two alternate ways to achieve the same result.
Method 1: Using datetime module
from datetime import datetime
datetime_object = datetime.strptime('Tue 31 Jan', '%a %d %b')
print(datetime_object) # outputs 1900-01-31 00:00:00
If you had given an Year parameter like Tue 31 Jan 2018, then this code would work.
from datetime import datetime
datetime_object = datetime.strptime('Tue 31 Jan 2018', '%a %d %b %Y')
print(datetime_object) # outputs 2018-01-31 00:00:00
To print the resultant date in a format like this 31/01/2019. You can use
print(datetime_object.strftime("%d/%m/%Y")) # outputs 31/01/2018
Here are all the possible formatting options available with datetime object.
Method 2: Using dateutil.parser
This method automatically fills in the Year parameter with current year.
from dateutil import parser
string = "Tue 31 Jan"
date = parser.parse(string)
print(date) # outputs 2018-01-31 00:00:00

Python Date / Time Regular Expression

I am pretty new to regular expressions and it's pretty alien to me. I am parsing an XML feed which produces a date time as follows:
Wed, 23 July 2014 19:25:52 GMT
But I want to split these up so there are as follows:
date = 23/07/2014
time = 19/25/52
Where would I start? I have looked at a couple of other questions on SO and all of them deviate a bit from what I am trying to achieve.
Use datetime.strptime to parse the date from string and then format it using the strftime method of datetime objects:
>>> from datetime import datetime
>>> dt = datetime.strptime("Wed, 23 July 2014 19:25:52 GMT", "%a, %d %B %Y %H:%M:%S %Z")
>>> dt.strftime('%d/%m/%Y')
'23/07/2014'
>>> dt.strftime('%H/%M/%S')
'19/25/52'
But if you're okay with the ISO format you can call date and time methods:
>>> str(dt.date())
'2014-07-23'
>>> str(dt.time())
'19:25:52'

Getting Python to print pretty date strings?

How can I get Python to output
'Mon Jun 04'
'Tue Jun 05'
etc, for a week of given time?
ex
today = datetime.datetime.today()
### do some magic
days = ['Tue Jun 05',...]
What do I do with 'today' to generate the results? I'm not even sure if I'm the right module, calender seems to share similar features.
days = [today.strftime("%a %b %y"), ...]
We use strftime to take a datetime object and format it to a string
http://docs.python.org/library/datetime.html#strftime-and-strptime-behavior
Look at the strftime method of datetime objects.

Categories