String checking on multiple types

String checking on multiple types - python

I have a variable containing a string (extracted from a XML feed). The string value can be of type integer, date or string. I need to convert it from string to given data type. I am doing it this way but it is a little bit ugly so I am asking if there is a better technique. If I would checking for more types, I will end with very nested try - except blocks.
def normalize_availability(self, value):
"""
Normalize the availability date.
"""
try:
val = int(value)
except ValueError:
try:
val = datetime.datetime.strptime(value, '%Y-%m-%d')
except (ValueError, TypeError):
# Here could be another try - except block if more types needed
val = value
Thanks!

Use a handy helper function.
def tryconvert(value, default, *types):
"""Converts value to one of the given types. The first type that succeeds is
used, so the types should be specified from most-picky to least-picky (e.g.
int before float). The default is returned if all types fail to convert
the value. The types needn't actually be types (any callable that takes a
single argument and returns a value will work)."""
value = value.strip()
for t in types:
try:
return t(value)
except (ValueError, TypeError):
pass
return default
Then write a function for parsing the date/time:
def parsedatetime(value, format="%Y-%m-%d")
return datetime.datetime.striptime(value, format)
Now put 'em together:
value = tryconvert(value, None, parsedatetime, int)

The right way would be to know from the xml what type each should be. This would prevent things that happen to be numeric strings from ending up as an int, etc. But assuming that isn't possible.
for int type I prefer
if value.isdigit():
val = int(value)
for the date, the only other way I can think of would be splitting it and looking at the parts, which would be messier then just letting the strptime raise an exception.

def normalize_availability(value):
"""
Normalize the availability date.
"""
val = value
try:
val = datetime.datetime.strptime(value, '%Y-%m-%d')
except (ValueError):
if value.strip(" -+").isdigit():
val = int(value)
return val

Related

How to get the type of what in the string in python

I wrote a function that gets an argument, that should be a string, Now the function should know what is the data type of what in the string; Let's Say an example: -
def input_type(value):
pass
print(input_type("1"))
now the function should return the type is an integer
I need this with string type and float type and integer type.

For this specific question one could do this:-
def input_type(value):
if isinstance(value, str):
try:
int(value)
return 'int'
except ValueError:
pass
try:
float(value)
return 'float'
except ValueError:
pass
return 'string'
print(input_type('1'))
print(input_type('1.5'))
print(input_type('abc'))
print(input_type(999))
This will output:-
int
float
string
None

What is the difference between ds.get() and ds.get_item() in pydicom

Does anyone know what is the difference in Pydicom between the two methods FileDataset.get() and FileDataset.get_item()?
Thanks!

Both of these are not used often in user code. Dataset.get is the equivalent of python's dict.get; it allows you to ask for an item in the dictionary, but return a default if that item does not exist in the Dataset. The more usual way to get an item from a Dataset is to use the dot notation, e.g.
dataset.PatientName
or to get the DataElement object via the tag number, e.g.
dataset[0x100010]
Dataset.get_item is a lower-level routine, primarily used when there is something wrong with some incoming data, and it needs to be corrected before the "raw data element" value is converted into python standard types (int, float, string types, etc).
When used with a keyword, Dataset.get() returns a value, not a DataElement instance. Dataset.get_item always returns either a DataElement instance, or a RawDataElement instance.

I imagine your answer is in the source for those two functions. Looks like get() handled strings as well as DataElements as input.
def get(self, key, default=None):
"""Extend dict.get() to handle DICOM DataElement keywords.
Parameters
----------
key : str or pydicom.tag.Tag
The element keyword or Tag or the class attribute name to get.
default : obj or None
If the DataElement or class attribute is not present, return
`default` (default None).
Returns
-------
value
If `key` is the keyword for a DataElement in the Dataset then
return the DataElement's value.
pydicom.dataelem.DataElement
If `key` is a tag for a DataElement in the Dataset then return the
DataElement instance.
value
If `key` is a class attribute then return its value.
"""
if isinstance(key, (str, compat.text_type)):
try:
return getattr(self, key)
except AttributeError:
return default
else:
# is not a string, try to make it into a tag and then hand it
# off to the underlying dict
if not isinstance(key, BaseTag):
try:
key = Tag(key)
except Exception:
raise TypeError("Dataset.get key must be a string or tag")
try:
return_val = self.__getitem__(key)
except KeyError:
return_val = default
return return_val
def get_item(self, key):
"""Return the raw data element if possible.
It will be raw if the user has never accessed the value, or set their
own value. Note if the data element is a deferred-read element,
then it is read and converted before being returned.
Parameters
----------
key
The DICOM (group, element) tag in any form accepted by
pydicom.tag.Tag such as [0x0010, 0x0010], (0x10, 0x10), 0x00100010,
etc. May also be a slice made up of DICOM tags.
Returns
-------
pydicom.dataelem.DataElement
"""
if isinstance(key, slice):
return self._dataset_slice(key)
if isinstance(key, BaseTag):
tag = key
else:
tag = Tag(key)
data_elem = dict.__getitem__(self, tag)
# If a deferred read, return using __getitem__ to read and convert it
if isinstance(data_elem, tuple) and data_elem.value is None:
return self[key]
return data_elem

Python datetime pattern matching

I'm trying to identify if a string can be cast as a date, according to a list of different formats. Thus, the whole list has to be looped over. If a match is found, that match should be returned. If all attempts return errors, that error should be returned. I'm not quite sure how to do this, my approach can be seen below.
_DEFAULT_PATTERNS = ["%d.%m.%Y", "%y-%m-%d"]
try:
if format == 'default':
for p in _DEFAULT_PATTERNS:
try:
value = datetime.strptime(value, p).date()
except:
continue
except Exception:
return ERROR
return value

Your first choice would be to use dateutil.parser. If, however, the parser does not meet your needs, here's a version of your code, tidied up:
def parseDate(value):
PATTERNS = ("%d.%m.%Y", "%y-%m-%d")
for p in PATTERNS:
try:
return datetime.strptime(value, p).date()
except ValueError:
continue
return False # No match found
Alternatively, raise an exception if the match is not found (instead of returning False). This will make your function more similar to strptime:
raise ValueError

Try something like this:
from datetime import datetime
_DEFAULT_PATTERNS = ["%d.%m.%Y", "%y-%m-%d"]
def is_castable_to_date(value):
for p in _DEFAULT_PATTERNS:
try:
value = datetime.strptime(value, p).date()
return True
except:
pass
return False
print is_castable_to_date("12-12-12")
print is_castable_to_date("12.12.12")
print is_castable_to_date("12/12/12")

Conditional string representation based on variable type

I would like to create a string representation of a datetime object that could contain a None value. So far, I came up with a solution, but I was looking at a better/cleaner way of doing it.
Let's say I have the following two variables:
import datetime as dt
a = None
b = dt.datetime(2017, 11, 30)
def str_format(str):
return '{:%Y-%m-%d}'.format(str)
The following would return a formatted string:
str_format(b)
'2017-11-30'
But the following would return an error:
str_format(a)
TypeError: unsupported format string passed to NoneType.__format__
So far I can up with the following solution:
def str_format(str):
if isinstance(str, type(None)) is False:
return '{:%Y-%m-%d}'.format(str)
else:
return '{}'.format(str)
str_format(a)
'None'
str_format(b)
'2017-11-30'
However, I was looking at a more efficient/cleaner way of writing the function.

Often times these types of things are wrapped in a try/except
def str_format(str):
try:
return '{:%Y-%m-%d}'.format(str)
except TypeError:
# unrecognized type, return blank or whatever you want to return
return ''
The answer on this question explains why you typically use try/except instead of a conditional check fairly well.

your function is overcomplex. None is a singleton, so the pythonic way of testing against it is just is None.
Just do it in one line with a ternary expression:
def str_format(s):
return str(s) if s is None else '{:%Y-%m-%d}'.format(s)
or to return a default date (ex: 1/1/2010) if None is passed:
def str_format(s):
return '{:%Y-%m-%d}'.format(s or dt.datetime(2010, 1, 1))
as a side note don't use str as a variable name as it is the python string type.

Is there a Python equivalent to C#'s DateTime.TryParse()?

Is there an equivalent to C#'s DateTime.TryParse() in Python?
I'm referring to the fact that it avoids throwing an exception, not the fact that it guesses the format.

If you don't want the exception, catch the exception.
try:
d = datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S")
except ValueError:
d = None
In the zen of Python, explicit is better than implicit. strptime always returns a datetime parsed in the exact format specified. This makes sense, because you have to define the behavior in case of failure, maybe what you really want is.
except ValueError:
d = datetime.datetime.now()
or
except ValueError:
d = datetime.datetime.fromtimestamp(0)
or
except ValueError:
raise WebFramework.ServerError(404, "Invalid date")
By making it explicit, it's clear to the next person who reads it what the failover behavior is, and that it is what you need it to be.
Or maybe you're confident that the date cannot be invalid; it's coming from a database DATETIME, column, in which case there won't be an exception to catch, and so don't catch it.

We want to try...catch multiple datetime formats fmt1,fmt2,...,fmtn and suppress/handle the exceptions (from strptime) for all those that mismatch (and in particular, avoid needing a yukky n-deep indented ladder of try..catch clauses). I found two elegant ways, the second is best in general. (This is big problem on real-world data where multiple, mismatched, incomplete, inconsistent and multilanguage/region date formats are often mixed freely in one dataset.)
1) Individually try applying each format and handle each individual strptime() fail as a return-value of None, so you can chain fn calls...
To start off, adapting from #OrWeis' answer for compactness:
def try_strptime_single_format(s, fmt):
try:
return datetime.datetime.strptime(s, fmt)
except ValueError:
return None
Now you can invoke as try_strptime(s, fmt1) or try_strptime(s, fmt2) or try_strptime(s, fmt3) ... But we can improve that to:
2) Apply multiple possible formats (either pass in as an argument or use sensible defaults), iterate over those, catch and handle any errors internally:
Cleaner, simpler and more OO-friendly is to generalize this to make the formats parameter either a single string or a list, then iterate over that... so your invocation reduces to try_strptime(s, [fmt1, fmt2, fmt3, ...])
def try_strptime(s, fmts=['%d-%b-%y','%m/%d/%Y']):
for fmt in fmts:
try:
return datetime.strptime(s, fmt)
except:
continue
return None # or reraise the ValueError if no format matched, if you prefer
(As a sidebar, note that ...finally is not the droid we want, since it would execute after each loop pass i.e. on each candidate format, not once at the end of the loop.)
I find implementation 2) is cleaner and better. In particular the function/method can store a list of default formats, which makes it more failsafe and less exception-happy on real-world data. (We could even infer which default formats to apply based on other columns, e.g. first try German date formats on German data, Arabic on Arabic, weblog datetime formats on weblog data etc.)

No, what you're asking for is not idiomatic Python, and so there generally won't be functions that discard errors like that in the standard library. The relevant standard library modules are documented here:
http://docs.python.org/library/datetime.html
http://docs.python.org/library/time.html
The parsing functions all raise exceptions on invalid input.
However, as the other answers have stated, it wouldn't be terribly difficult to construct one for your application (your question was phrased "in Python" rather than "in the Python standard library" so it's not clear if assistance writing such a function "in Python" is answering the question or not).

Here's an equivalent function implementation
import datetime
def try_strptime(s, format):
"""
#param s the string to parse
#param format the format to attempt parsing of the given string
#return the parsed datetime or None on failure to parse
#see datetime.datetime.strptime
"""
try:
date = datetime.datetime.strptime(s, format)
except ValueError:
date = None
return date

Brute force is also an option:
def TryParse(datestring, offset):
nu = datetime.datetime.now()
retval = nu
formats = ["%d-%m-%Y","%Y-%m-%d","%d-%m-%y","%y-%m-%d"]
if datestring == None:
retval = datetime.datetime(nu.year,nu.month,nu.day,0,0,0) - datetime.timedelta(offset,0,0,0,0,0,0)
elif datestring == '':
retval = datetime.datetime(nu.year,nu.month,nu.day,0,0,0) - datetime.timedelta(offset,0,0,0,0,0,0)
else:
succes = False
for aformat in formats:
try:
retval = datetime.datetime.strptime(datestring,aformat)
succes = True
break
except:
pass
if not succes:
retval = datetime.datetime(nu.year,nu.month,nu.day,0,0,0) - datetime.timedelta(offset,0,0,0,0,0,0)
return retval

Use time.strptime to parse dates from strings.
Documentation: http://docs.python.org/library/time.html#time.strptime
Examples from: http://pleac.sourceforge.net/pleac_python/datesandtimes.html
#-----------------------------
# Parsing Dates and Times from Strings
time.strptime("Tue Jun 16 20:18:03 1981")
# (1981, 6, 16, 20, 18, 3, 1, 167, -1)
time.strptime("16/6/1981", "%d/%m/%Y")
# (1981, 6, 16, 0, 0, 0, 1, 167, -1)
# strptime() can use any of the formatting codes from time.strftime()
# The easiest way to convert this to a datetime seems to be;
now = datetime.datetime(*time.strptime("16/6/1981", "%d/%m/%Y")[0:5])
# the '*' operator unpacks the tuple, producing the argument list.

I agree tryparse is very useful function on c#. Unfortunately no equivalent direct function of that in python (may be i am not aware). I believe you want to check a string is whether date or not without worrying about date format. My recommendation is go for pandas to_datetime function:
def is_valid_date(str_to_validate: str) -> bool:
try:
if pd.to_datetime(str_to_validate):
return True
else:
return False
except ValueError:
return False

How about strptime?
http://docs.python.org/library/time.html#time.strptime
It will throw a ValueError if it is unable to parse the string based on the format that is provided.
Edit:
Since the question was edited to include the bit about exceptions after I answered it. I wanted to add a note about that.
As was pointed out in other answers, if you don't want your program to raise an exception, you can simply catch it and handle it:
try:
date = datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S")
except ValueError:
date = None
That's a Pythonic way to do what you want.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

String checking on multiple types - python

def normalize_availability(value): """ Normalize the availability date. """ val = value try: val = datetime.datetime.strptime(value, '%Y-%m-%d') except (ValueError): if value.strip(" -+").isdigit(): val = int(value) return val

Related

How to get the type of what in the string in python

What is the difference between ds.get() and ds.get_item() in pydicom

Python datetime pattern matching

Conditional string representation based on variable type

Is there a Python equivalent to C#'s DateTime.TryParse()?

Categories

Resources