Python - Fuzzy in dateutil.parser

Python - Fuzzy in dateutil.parser - python

I'm trying to find dates in a string. This is what I'm doing.
def _is_date(string, fuzzy=False):
try:
return parse(string, fuzzy=fuzzy)
except ValueError:
return False
It works on some:-
>>> _is_date('delivered 22-jun-2022', fuzzy=True)
2022-06-22 00:00:00
>>> _is_date('04 sep, lets meet', fuzzy=True)
2022-09-04 00:00:00
however, it returns incorrect results for others.
>>> _is_date('Ive 4 kids', fuzzy=True)
2022-09-04 00:00:00
>> _is_date('samsung galaxy m32 (black,', fuzzy=True)
2022-09-23 00:00:32
>> _is_date('4gb ram..', fuzzy=True)
2022-09-04 00:00:00
How can I fix this? or is there any other way that can help me out with this problem statement.

The fuzzy flag isn't meant to be used the way you're using it. It is meant for processing strings along the lines of "Today is 9/23/22"; for this example, parse will ignore the "Today is " and parse the date/time portion.
Via experimentation, I found that when called with fuzzy=True, parse will try to interpret any character that is a digit as part of a date. Looking at the examples you expected to yield False:
'Ive4 kids' returns a date/time of 2022-09-04 00 - the 4 was taken to be the 4th of the current month
'samsung galaxy m32 (black,' gives 2022-09-23 00:00:32 - the 32 became the number of seconds after midnight today
4gb ram..' - again, the 4 was taken to be the 4th day of the current month
It seems you won't be able to use fuzzy the way you're hoping; somehow you'll have to clean up the strings before you pass them to parse, probably rejecting those that don't have legitimate dates before calling parse.
You might find it instructive to experiment with fuzzy_with_tokens=True instead of fuzzy. With fuzzy_with_tokens set to True, you will receive a two item tuple with a datetime object holding the resulting date in the first item and the ignored text in the second. Also, this might be a useful resource for you: https://dateutil.readthedocs.io/en/stable/parser.html

Related

Python convert string to datetime but formatting is not very predictable

I'm extract the execution time of a Linux process using Subprocess and ps. I'd like to put it in a datetime object, to perform datetime arithmetic. However, I'm a little concerned about the output ps returns for the execution time:
1-01:12:23 // 1 day, 1 hour, 12 minutes, 23 seconds
05:39:03 // 5 hours, 39 minutes, 3 seconds
15:06 // 15 minutes, 6 seconds
Notice there is no zero padding before the day. And it doesn't include months/years, whereas technically something could run for that long.
Consequently i'm unsure what format string to convert it to a timedelta because I don't want it to break if a process has ran for months, or another has only ran for hours.
UPDATE
Mozway has given a very smart answer. However, I'm taking a step back and wondering if I can get the execution time another way. I'm currently using ps to get the time, but it means I also have the pid. Is there something else I can do with the pid, to get the execution time in a simpler format?
(Can only use official Python libraries)
UPDATE2
It's actually colons between the hours, mins and seconds.

You should use a timedelta
Here is a suggestion on how to convert from your string:
import datetime
s = '1-01-12-23'
out = datetime.timedelta(**dict(zip(['days', 'hours', 'minutes', 'seconds'],
map(int, s.split('-')))))
Output:
datetime.timedelta(days=1, seconds=4343)
If you can have more or less units, and assuming the smallest units are present you take advantage of the fact that zip stops with the smallest iterable, just reverse the inputs:
s = '12-23'
units = ['days', 'hours', 'minutes', 'seconds']
out = datetime.timedelta(**dict(zip(reversed(units),
map(int, reversed(s.split('-'))))))
Output:
datetime.timedelta(seconds=743)
As a function
Using re.split to handle the 1-01:23:45 format
import re
def to_timedelta(s):
units = ['days', 'hours', 'minutes', 'seconds']
return datetime.timedelta(**dict(zip(reversed(units),
map(int, reversed(re.split('[-:]', s))))))
to_timedelta('1-01:12:23')
# datetime.timedelta(days=1, seconds=4343)
to_timedelta('05:39:03')
# datetime.timedelta(seconds=20343)
to_timedelta('15:06')
# datetime.timedelta(seconds=906)

Find dates / hours from multiple string format in Python

I need to parse some strings into datetime object in python.
Problem is: I'm fetching my strings from multiple FTP servers which are returning different strings.
MLSD and MLSN are a no go because some servers doesn't accept those.
Example:
lines = []
lines.append('U3SECADM 122880 23/10/20 09:22:45 *DIR histomail/')
lines.append('drwxr-xr-x 2 1007 1000 21 Oct 20 12:46 encours')
lines.append('10-13-20 02:00AM 264 CITDETL003_u_exp_histo_cmdb_mds_20201012180006_part1.zip')
lines.append('07-24-20 02:05AM <DIR> encours')
lines.append('QSYSOPR 673400 04/08/20 04:08:45 *STMF ZZED1520200804050843173818.zip')
I need to strip the datetime of those strings. Is there an efficient method in python to do this?

dateutil.parser - separate hours and minutes with dot

I use the dateutil.parser.parse function to recognize a date entered by a user. Normally hours and minutes are separated by a double point but sometimes a user enters something like 6.30pm which is parsed to 18:00. So the minutes are just dropped.
>>> dateutil.parser.parse ('6.30pm')
datetime.datetime(2019, 5, 14, 18, 0)
Is there a way to specify the dot as a legal separator or throw a ValueError if the user uses the wrong separator? I want to show at least an error message to the user and not just process the wrong recognized date.

What about a little substitution previous the parsing operation, something like:
import dateutil.parser
import re
def parse(timestr):
timestr = re.sub(r"(\d{1,2})\.(\d{2})(\D*)$", r"\1:\2\3", timestr)
return dateutil.parser.parse(timestr)
print(parse('6.30pm')) # >> 2019-05-14 18:30:00
print(parse('12:06.30')) # >> 2019-05-14 12:06:30
print(parse('2018-01-01 12:06:05.123')) # >> 2018-01-01 12:06:05.123000

Convert systemtime to filetime (Python) [duplicate]

Any links for me to convert datetime to filetime using python?
Example: 13 Apr 2011 07:21:01.0874 (UTC) FILETIME=[57D8C920:01CBF9AB]
Got the above from an email header.

My answer in duplicated question got deleted, so I'll post here:
Surfing around i found this link: http://cboard.cprogramming.com/windows-programming/85330-hex-time-filetime.html
After that, everything become simple:
>>> ft = "57D8C920:01CBF9AB"
... # switch parts
... h2, h1 = [int(h, base=16) for h in ft.split(':')]
... # rebuild
... ft_dec = struct.unpack('>Q', struct.pack('>LL', h1, h2))[0]
... ft_dec
... 129471528618740000L
... # use function from iceaway's comment
... print filetime_to_dt(ft_dec)
2011-04-13 07:21:01
Tuning it up is up for you.

Well here is the solution I end up with
parm3=0x57D8C920; parm3=0x01CBF9AB
#Int32x32To64
ft_dec = struct.unpack('>Q', struct.pack('>LL', parm4, parm3))[0]
from datetime import datetime
EPOCH_AS_FILETIME = 116444736000000000; HUNDREDS_OF_NANOSECONDS = 10000000
dt = datetime.fromtimestamp((ft_dec - EPOCH_AS_FILETIME) / HUNDREDS_OF_NANOSECONDS)
print dt
Output will be:
2011-04-13 09:21:01 (GMT +1)
13 Apr 2011 07:21:01.0874 (UTC)
base on David Buxton 'filetimes.py'
^-Note that theres a difference in the hours
Well I changes two things:
fromtimestamp() fits somehow better than *UTC*fromtimestamp() since I'm dealing with file times here.
FAT time resolution is 2 seconds so I don't care about the 100ns rest that might fall apart.
(Well actually since resolution is 2 seconds normally there be no rest when dividing HUNDREDS_OF_NANOSECONDS )
... and beside the order of parameter passing pay attention that struct.pack('>LL' is for unsigned 32bit Int's!
If you've signed int's simply change it to struct.pack('>ll' for signed 32bit Int's!
(or click the struct.pack link above for more info)

Convert snmp octet string to human readable date format

Using the pysnmp framework i get some values doing a snmp walk. Unfortunately for the oid
1.3.6.1.21.69.1.5.8.1.2 (DOCS-CABLE-DEVICE-MIB)
i get a weird result which i cant correctly print here since it contains ascii chars like BEL ACK
When doing a repr i get:
OctetString('\x07\xd8\t\x17\x03\x184\x00')
But the output should look like:
2008-9-23,3:24:52.0
the format is called "DateAndTime". How can i translate the OctetString output to a "human readable" date/time ?

You can find the format specification here.
A date-time specification.
field octets contents range
----- ------ -------- -----
1 1-2 year* 0..65536
2 3 month 1..12
3 4 day 1..31
4 5 hour 0..23
5 6 minutes 0..59
6 7 seconds 0..60
(use 60 for leap-second)
7 8 deci-seconds 0..9
8 9 direction from UTC '+' / '-'
9 10 hours from UTC* 0..13
10 11 minutes from UTC 0..59
* Notes:
- the value of year is in network-byte order
- daylight saving time in New Zealand is +13 For example,
Tuesday May 26, 1992 at 1:30:15 PM EDT would be displayed as:
1992-5-26,13:30:15.0,-4:0
Note that if only local time is known, then timezone
information (fields 8-10) is not present.
In order to decode your sample data you can use this quick-and-dirty one-liner:
>>> import struct, datetime
>>> s = '\x07\xd8\t\x17\x03\x184\x00'
>>> datetime.datetime(*struct.unpack('>HBBBBBB', s))
datetime.datetime(2008, 9, 23, 3, 24, 52)
The example above is far from perfect, it does not account for size (this object has variable size) and is missing timezone information. Also note that the field 7 is deci-seconds (0..9) while timetuple[6] is microseconds (0 <= x < 1000000); the correct implementations is left as an exercise for the reader.
[update]
8 years later, lets try to fix this answer (am I lazy or what?):
import struct, pytz, datetime
def decode_snmp_date(octetstr: bytes) -> datetime.datetime:
size = len(octetstr)
if size == 8:
(year, month, day, hour, minutes,
seconds, deci_seconds,
) = struct.unpack('>HBBBBBB', octetstr)
return datetime.datetime(
year, month, day, hour, minutes, seconds,
deci_seconds * 100_000, tzinfo=pytz.utc)
elif size == 11:
(year, month, day, hour, minutes,
seconds, deci_seconds, direction,
hours_from_utc, minutes_from_utc,
) = struct.unpack('>HBBBBBBcBB', octetstr)
offset = datetime.timedelta(
hours=hours_from_utc, minutes=minutes_from_utc)
if direction == b'-':
offset = -offset
return datetime.datetime(
year, month, day, hour, minutes, seconds,
deci_seconds * 100_000, tzinfo=pytz.utc) + offset
raise ValueError("The provided OCTETSTR is not a valid SNMP date")
I'm not sure I got the timezone offset right but I don't have sample data to test, feel free to amend the answer or ping me in the comments.

#Paulo Scardine: This was the best answer I found online when working to resolve a very similar problem. It still took me a little while to resolve my issue even with this answer, so I wanted to post a follow up answer that may add more clarity. (specifically the issue with the date having different length options).
The following piece of code connects to a server and grabs the system time and then outputs it as a string to illustrate the method.
import netsnmp
import struct
oid = netsnmp.Varbind('hrSystemDate.0')
resp = netsnmp.snmpget(oid, Version=1, DestHost='<ip>', Community='public')
oct = str(resp[0])
# hrSystemDate can be either 8 or 11 units in length.
oct_len = len(oct)
fmt_mapping = dict({8:'>HBBBBBB', 11:'>HBBBBBBcBB'})
if oct_len == 8 or oct_len == 11:
t = struct.unpack(fmt_mapping[oct_len], oct)
print 'date tuple: %s' % (repr(t))
else:
print 'invalid date format'
I hope this helps other people who are having similar issues trying to work with this type of data.

Shameless plug here: The Pycopia SNMP and SMI modules correctly handle this object, and others as well.
Pycopia is installed from source, and dont forget the mibs file if you try it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Fuzzy in dateutil.parser - python

Related

Python convert string to datetime but formatting is not very predictable

Find dates / hours from multiple string format in Python

dateutil.parser - separate hours and minutes with dot

Convert systemtime to filetime (Python) [duplicate]

Convert snmp octet string to human readable date format

Categories

Resources