Python dateparser fails when timezone is in middle - python

I'm trying to parse a date string using the following code:
from dateutil.parser import parse
datestring = 'Thu Jul 25 15:13:16 GMT+06:00 2019'
d = parse(datestring)
print (d)
The parsed date is:
datetime.datetime(2019, 7, 25, 15, 13, 16, tzinfo=tzoffset(None, -21600))
As you can see, instead of adding 6 hours to GMT, it actually subtracted 6 hours.
What's wrong I'm doing here? Any help on how can I parse datestring in this format?

There's a comment in the source: https://github.com/dateutil/dateutil/blob/cbcc0871792e7eed4a42cc62630a08ec7a78be30/dateutil/parser/_parser.py#L803.
# Check for something like GMT+3, or BRST+3. Notice
# that it doesn't mean "I am 3 hours after GMT", but
# "my time +3 is GMT". If found, we reverse the
# logic so that timezone parsing code will get it
# right.
Important parts
Notice that it doesn't mean "I am 3 hours after GMT", but "my time +3 is GMT"
If found, we reverse the logic so that timezone parsing code will get it right
Last sentence in that comment (and 2nd bullet point above) explains why 6 hours are subtracted. Hence, Thu Jul 25 15:13:16 GMT+06:00 2019 means Thu Jul 25 09:13:16 2019 GMT.
Take a look at http://www.timebie.com/tz/timediff.php?q1=Universal%20Time&q2=GMT%20+6%20Time for more context.

dateutil.parse converts every time into GMT. The input is being read as 15:13:16 in GMT+06:00 time. Naturally, it becomes 15:13:16-06:00 in GMT.

Related

How to format a timestamp in python to a readable format?

I have a slack bot that gets schedule information and prints the start and end times of the user's schedule. I am trying to format the timestamp so that it is a more readable format. Is there a way to format this response?
Here is an example of the output:
co = get_co(command.split(' ')[1])
start = get_schedule(co['group'])['schedule']['schedule_layers'][0]['start']
end = get_schedule(co['group'])['schedule']['schedule_layers'][0]['end']
response = 'Start: {} \n End: {}'.format(start,end)
The current time format is 2019-06-28T15:12:49-04:00, but I want it to be something more readable like Fri Jun 28 15:12:49 2019
You can use dateparser to parse the date time string easily.
import dateparser
date = dateparser.parse('2019-06-28T15:12:49-04:00') # parses the date time string
print(date.strftime('%a %b %d %H:%m:%M %Y'))
# Fri Jun 28 15:06:12 2019
See this in action here
To convert timestamps of any format you can use the datetime module as described here:
https://stackabuse.com/converting-strings-to-datetime-in-python/

Convert datetime to UTC in different order

Trying to convert this date format Thu Apr 26 22:51:49 PDT 2018 to UTC 2018-04-26T22:51:49Z I dont care about the day part and can be excluded.
type(results[0].create_date)
returns
<class 'str'>
So far I have tried this
print (datetime.strptime((results[0].create_date), "%Y-%m-%dT%H:%M:%SZ"))
but failing with this error
ValueError: time data 'Thu Apr 26 22:51:49 PDT 2018' does not match format '%Y-%m-%dT%H:%M:%SZ'
The reason your code doesn't work is because the %Y-%m-%dT%H:%M:%SZ string pattern does not match how the date is represented in the results[0].create_date variable. datetime.strptime attempts to match the given string to the format you specify to extract a datetime object. Amending the formatter may help here, but you may have difficulty with the PDT part.
I suggest using the dateutil.parser module. You can do the following:
import dateutil.parser
x = parser.parse(results[0].create_date)
print x
>>> 2018-04-26 22:51:49
This returns a datetime object and you can format however you want to include the 'T' and 'Z' as you suggested in your question.
NOTE: I have done this based on your given input and desired output. You must be aware however that Thu Apr 26 22:51:49 PDT 2018 in UTC is not equal to 2018-04-26T22:51:49Z as PDT is 7 hours behind UTC.

How to preserve timezone when converting a date time string to a datetime object [duplicate]

This question already has answers here:
Parsing date/time string with timezone abbreviated name in Python?
(6 answers)
Closed 5 years ago.
I have a date in this format = "Tue, 28 Feb 2017 18:30:32 GMT"
I can convert it to a datetime object using time.strptime("Tue, 28 Feb 2017 18:30:32 GMT", "%a, %d %b %Y %H:%M:%S %Z") but the datetime object does not keep track of timezone.
I want to be able to to know the timezone. How can I achieve that? Any help is much appreciated.
from dateutil import parser
parser.parse("Tue, 28 Feb 2017 18:30:32 GMT")
datetime.datetime(2017, 2, 28, 18, 30, 32, tzinfo=tzutc())
This problem is actually more involved than it might first appear. I understand that timezone names are not unique and that there are throngs of the things. However, if the number of them that you need to work with is manageable, and if your inputs are limited to that format, then this approach might be good for you.
>>> from dateutil.parser import *
>>> tzinfos = {'GMT': 0, 'PST': -50, 'DST': 22 }
>>> aDate = parse("Tue, 28 Feb 2017 18:30:32 GMT", tzinfos=tzinfos)
>>> aDate.tzinfo.tzname(0)
'GMT'
>>> aDate = parse("Tue, 28 Feb 2017 18:30:32 PST", tzinfos=tzinfos)
>>> aDate.tzinfo.tzname(0)
'PST'
>>> aDate = parse("Tue, 28 Feb 2017 18:30:32 DST", tzinfos=tzinfos)
>>> aDate.tzinfo.tzname(0)
'DST'
Load the alternative timezone abbreviations into a dictionary, in this code called tzinfos then parse away. The timezone parsed from the date expression becomes available in the construct shown.
Other date items are available, as you would expect.
>>> aDate.day
28
>>> aDate.month
2
>>> aDate.year
2017

Convert 48-bits (6 octets) from DNP3 time to timestamp in python

I'm trying to convert 48-bits (8 octets) to a timestamp using python for a little security project. I'm working with some network packets from the DNP3 protocol and I'm trying to decode timestamp values foreach DNP3 class object.
According to the DNP3 standard, "DNP3 time (in the form of an UINT48): Absolute time value expressed as the number of milliseconds since the start of January 1, 1970".
I have the following octets which need to be converted into a datetime:
# List of DNP3 timestamps
DNP3_ts = []
# Feb 20, 2016 00:27:07.628000000 UTC
DNP3_ts.append('\xec\x58\xed\xf9\x52\x01')
# Feb 20, 2016 00:34:08.107000000 UTC
DNP3_ts.append('\x6b\xc3\xf3\xf9\x52\x01')
# Feb 20, 2016 00:42:40.460000000 UTC
DNP3_ts.append('\xcc\x94\xfb\xf9\x52\x01')
# Feb 20, 2016 00:56:47.642000000 UTC
DNP3_ts.append('\x1a\x82\x08\xfa\x52\x01')
# Feb 20, 2016 00:56:48.295000000 UTC
DNP3_ts.append('\xa7\x84\x08\xfa\x52\x01')
# Feb 20, 2016 00:58:21.036000000 UTC
DNP3_ts.append('\xec\xee\x09\xfa\x52\x01')
# Feb 20, 2016 01:17:09.147000000 UTC
DNP3_ts.append('\x9b\x25\x1b\xfa\x52\x01')
# Feb 20, 2016 01:49:05.895000000 UTC
DNP3_ts.append('\xe7\x64\x38\xfa\x52\x01')
# Feb 20, 2016 01:58:30.648000000 UTC
DNP3_ts.append('\xf8\x02\x41\xfa\x52\x01')
for ts in DNP3_ts:
print [ts]
So I need figure out the following steps:
# 1. Converting the octets into a 48bit Integer (which can't be done in python)
# 2. Using datetime to calculate time from 01/01/1970
# 3. Convert current time to 48bits (6 octets)
If anyone can help me out with these steps it would be very much appreciated!
You can trivially combine the bytes to create a 48-bit integer with some bitwise operations. You can convert each octet to a uint8 with ord() and left shift them by a different multiple of 8 so they all occupy a different location in the 48-bit number.
DNP3 encodes the bytes in a reverse order. To visualise this, let your octets from left to right called A-F and the bits of A called aaaaaaaa, etc. So from your octets to the 48-bit number you want to achieve this order.
A B C D E F
ffffffff eeeeeeee dddddddd cccccccc bbbbbbbb aaaaaaaa
Once you have the milliseconds, divide them by 1000 to get a float number in seconds and pass that to datetime.datetime.utcfromtimestamp(). You can further format this datetime object with the strftime() method. The code to achieve all this is
from datetime import datetime
def dnp3_to_datetime(octets):
milliseconds = 0
for i, value in enumerate(octets):
milliseconds = milliseconds | (ord(value) << (i*8))
date = datetime.utcfromtimestamp(milliseconds/1000.)
return date.strftime('%b %d, %Y %H:%M:%S.%f UTC')
By calling this function for each of your DNP3 times, you get the following results.
Feb 19, 2016 14:27:07.628000 UTC
Feb 19, 2016 14:34:08.107000 UTC
Feb 19, 2016 14:42:40.460000 UTC
Feb 19, 2016 14:56:47.642000 UTC
Feb 19, 2016 14:56:48.295000 UTC
Feb 19, 2016 14:58:21.036000 UTC
Feb 19, 2016 15:17:09.147000 UTC
Feb 19, 2016 15:49:05.895000 UTC
Feb 19, 2016 15:58:30.648000 UTC
You'll notice that these results lag by 8 hours exactly. I can't figure out this discrepancy, but I don't think my approach is wrong.
In order to go from a datetime to a DNP3 time, start by converting the time to a timestamp of milliseconds. Then, by right shifting and masking 8 bits at a time you can construct the DNP3 octets.
def datetime_to_dnp3(date=None):
if date is None:
date = datetime.utcnow()
seconds = (date - datetime(1970, 1, 1)).total_seconds()
milliseconds = int(seconds * 1000)
return ''.join(chr((milliseconds >> (i*8)) & 0xff) for i in xrange(6))
If you call it without arguments, it'll give you the current time, but you have the option to specify any specific datetime. For example, datetime(1970, 1, 1) will return \x00\x00\x00\x00\x00\x00 and datetime(1970, 1, 1, 0, 0, 0, 1000) (one millisecond after the 1970 epoch) will return \x01\x00\x00\x00\x00\x00.
Note, depending on the bytes in the DNP3 time, you may get weird symbols if you try to print them. Don't worry though, the bytes are still there, it's just that Python trying to encode them to characters. If you want to see the individual bytes with interfering with each other, simply print list(DNP3_ts[i]). You may notice that it prints '\x52' as R (similar to many ASCII printable characters), but they are equivalent.
To get an integer from the input bytes in Python 3:
>>> millis = int.from_bytes(b'\xec\x58\xed\xf9\x52\x01', 'little')
>>> millis
1455892027628
To interpret the integer as "milliseconds since epoch":
>>> from datetime import datetime, timedelta
>>> utc_time = datetime(1970, 1, 1) + timedelta(milliseconds=millis)
>>> str(utc_time)
'2016-02-19 14:27:07.628000'
Note: the result is different from the one provided in the comment in your question (Feb 20, 2016 00:27:07.628).
If you need to support older Python versions, to convert bytes in the little-endian order to an unsigned integer:
>>> import binascii
>>> int(binascii.hexlify(b'\xec\x58\xed\xf9\x52\x01'[::-1]), 16)
1455892027628

Tweet feature 'created_at' 10 digit number meaning

I have tweet data file. Each has feature as 'created_at' in the following format:
u'created_at': 1369859382
What does this 10 digit number correspond to?
Any help will be appreciated.
That could be a UNIX timestamp ...http://www.onlineconversion.com/unix_time.htm
The example you suggested is equivalent to Wed, 29 May 2013 20:29:42 GMT
Here is a useful resource for mystery date/times formats ... http://www.fmdiff.com/fm/timestamp.html?session=vc8uqio2fsg9op81ohnhbthclmsb21j3
It is the time in seconds since January 1, 1970. The number in your example is May 29, 2013, 1:29:42 PM (in the PDT time zone, anyway, seven hours behind UTC).
>>> import datetime
>>> datetime.datetime.fromtimestamp(1369859382)
datetime.datetime(2013, 5, 29, 13, 29, 42)

Categories