Enum vs String as a parameter in a function - python

I noticed that many libraries nowadays seem to prefer the use of strings over enum-type variables for parameters.
Where people would previously use enums, e.g. dateutil.rrule.FR for a Friday, it seems that this has shifted towards using string (e.g. 'FRI').
Same in numpy (or pandas for that matter), where searchsorted for example uses of strings (e.g. side='left', or side='right') rather than a defined enum. For the avoidance of doubt, before python 3.4 this could have been easily implemented as an enum as such:
class SIDE:
RIGHT = 0
LEFT = 1
And the advantages of enums-type variable are clear: You can't misspell them without raising an error, they offer proper support for IDEs, etc.
So why use strings at all, instead of sticking to enum types? Doesn't this make the programs much more prone to user errors? It's not like enums create an overhead - if anything they should be slightly more efficient. So when and why did this paradigm shift happen?

I think enums are safer especially for larger systems with multiple developers.
As soon as the need arises to change the value of such an enum, looking up and replacing a string in many places is not my idea of fun :-)
The most important criteria IMHO is the usage: for use in a module or even a package a string seems to be fine, in a public API I'ld prefer enums.

[update]
As of today (2019) Python introduced dataclasses - combined with optional type annotations and static type analyzers like mypy I think this is a solved problem.
As for efficiency, attribute lookup is somewhat expensive in Python compared to most computer languages so I guess some libraries may still chose to avoid it for performance reasons.
[original answer]
IMHO it is a matter of taste. Some people like this style:
def searchsorted(a, v, side='left', sorter=None):
...
assert side in ('left', 'right'), "Invalid side '{}'".format(side)
...
numpy.searchsorted(a, v, side='right')
Yes, if you call searchsorted with side='foo' you may get an AssertionError way later at runtime - but at least the bug will be pretty easy to spot looking the traceback.
While other people may prefer (for the advantages you highlighted):
numpy.searchsorted(a, v, side=numpy.CONSTANTS.SIDE.RIGHT)
I favor the first because I think seldom used constants are not worth the namespace cruft. You may disagree, and people may align with either side due to other concerns.
If you really care, nothing prevents you from defining your own "enums":
class SIDE(object):
RIGHT = 'right'
LEFT = 'left'
numpy.searchsorted(a, v, side=SIDE.RIGHT)
I think it is not worth but again it is a matter of taste.
[update]
Stefan made a fair point:
As soon as the need arises to change the value of such an enum, looking up and replacing a string in many places is not my idea of fun :-)
I can see how painful this can be in a language without named parameters - using the example you have to search for the string 'right' and get a lot of false positives. In Python you can narrow it down searching for side='right'.
Of course if you are dealing with an interface that already has a defined set of enums/constants (like an external C library) then yes, by all means mimic the existing conventions.

I understand this question has already been answered, but there is one thing that has not at all been addressed: the fact that Python Enum objects must be explicitly called for their value when using values stored by Enums.
>>> class Test(Enum):
... WORD='word'
... ANOTHER='another'
...
>>> str(Test.WORD.value)
'word'
>>> str(Test.WORD)
'Test.WORD'
One simple solution to this problem is to offer an implementation of __str__()
>>> class Test(Enum):
... WORD='word'
... ANOTHER='another'
... def __str__(self):
... return self.value
...
>>> Test.WORD
<Test.WORD: 'word'>
>>> str(Test.WORD)
'word'
Yes, adding .value is not a huge deal, but it is an inconvenience nonetheless. Using regular strings requires zero extra effort, no extra classes, or redefinition of any default class methods. Still, there must be explicit casting to a string value in many cases, where a simple str would not have a problem.

i prefer strings for the reason of debugging. compare an object like
side=1, opt_type=0, order_type=6
to
side='BUY', opt_type='PUT', order_type='FILL_OR_KILL'
i also like "enums" where the values are strings:
class Side(object):
BUY = 'BUY'
SELL = 'SELL'
SHORT = 'SHORT'

Strictly speaking Python does not have enums - or at least it didn't prior to v3.4
https://docs.python.org/3/library/enum.html
I prefer to think of your example as programmer defined constants.
In argparse, one set of constants have string values. While the code uses the constant names, users more often use the strings.
e.g. argparse.ZERO_OR_MORE = '*'
arg.parse.OPTIONAL = '?'
numpy is one of the older 3rd party packages (at least its roots like numeric are). String values are more common than enums. In fact I can't off hand think of any enums (as you define them).

Related

Whats the use of values in enum in Python?

I'm working with enum lately and I dont really get the utility of them in some cases. I hope my question is not too trivial or too stupid, and I would really love to better understand the logic behind this python structure. One common use I found online or in some pieces of code I have been working on lately is the use of values that are strings like for example:
from enum import Enum
class Days(Enum):
MONDAY = 'monday'
TUESDAY = 'tuesday'
...
SUNDAY = 'sunday'
And here from my humble prospective, the values seems redundant: if I print the values of some member I obtain the following:
print(Days.MONDAY.value)
>> 'monday'
I totally understand the utility when the values are numbers and they represent a gerarchic structure like for example
class Levels(Enum):
HIGH = 10
MID = 5
LOW = 0
In which you can do stuff like:
HIGH > LOW
> True
But in a lot of example and actual code use, I see the first approach, the one with MONDAY = 'monday', i.e. when the values are string instead of numerical values, and this case I really dont understand the utility of having a key that is pretty much equal to the value.
If anyone can help me understand, or show some utilities I would really love to understand new stuff.
The members (should) always be named in all uppercase (per the very first note on the enum docs), so if you want them to have "names" that are in some other casing, you can assign strings with arbitrary case, which may be more human-friendly when it comes time to display the values to the user.
You can also convert from said values to the enum constant easily, with the Enum's constructor, e.g.:
Days('monday') is Days.MONDAY # This is True
so if the data from the user (or database or whatever) has specific values, you can easily convert them to their logical Enum equivalents this way.
If the values really aren't meaningful, you can just assign auto() to all of them and not think about the values.
Just in case you're asking "why not use the strings themselves?", the general advantage to enums is guaranteed uniqueness and exhaustiveness for efficient checks and self-documenting code. If you use the strings directly, you have to use == checks (str has no guarantee that equal values are the same object unless you explicitly intern them, so is checks can't be used), and people can pass in strings that don't actually come from the expected set of strings. With Enums there is a central definition of all possible values (and therefore all other values are not possible), and since the values are all guaranteed singletons, when you have a member of that Enum, you can use is/is not testing for cheap identity testing (in CPython at least, it's literally just a pointer comparison) without relying on value equality tests, ==/!=, that invoke the more expensive rich comparison machinery. This even works when you make aliases for the same enum member, e.g.:
class Foo(Enum):
SPAM = 1
EGGS = 2
ALSO_SPAM = 1
which seamlessly makes Foo.ALSO_SPAM the same object as Foo.SPAM so Foo.SPAM is Foo.ALSO_SPAM is true, allowing two aliases with different names to be used interchangably.
It's one approach to adding type safety to your code. Even if Days looks like a collection of strings, it's not a subclass/subtype/special-case of str.
>>> isinstance(Days.MONDAY, Days)
True
>>> isinstance('monday', Days)
False
>>> isinstance(Days.MONDAY, str)
False
In addition to #ShadowRanger's answer: The primary reason why Enum has both a name and a value field is so meaningful constants can have easy to use names -- constants such as database values, various codes (http, error, response, etc.) -- in other words, so Enum can provide an easy interface with other systems.

Python number base class OR how to determine a value is a number

Python seems to lacks a base class for "all numbers", e. g. int, float, complex, long (in Python2). This is unfortunate and a bit inconvenient to me.
I'm writing a function for matching data types onto each other (a tree matching / searching algorithm). For this I'd like to test the input values for being lists, dicts, strings, or "numbers". Each of the four cases is handled separately, but I do not want to distinguish between int, long, float, or complex (though the latter will probably not appear). This would be rather easy to achieve if all number types would be derived from a base type number, but unfortunately, they are not, AFAICS.
This enforcement of explicitness makes it obvious that the unusual complex is included. Which typically raises questions which I rather not want to think about. My design rather says "all number types" than that explicit list. I also do not want to explicitly list all possible number types coming from other libraries like numpy or similar.
First question, rather a theoretical one: Why didn't the designers make all number types inherit a common number base class? Is there a good reason for this, maybe a theory about it which lets it seem not recommended? Or would it make sense to propose this idea for later versions of Python?
Second question, the more practical one: Is there a recommended way of checking a value for being a number, maybe a library call I'm not aware of? The straight-forward version of using isinstance(x, (int, float, complex, long)) does look like a clutch to me, isn't compatible to Python3 which doesn't know a long type anymore, and doesn't include library-based number types like numpy.int32.
There is actually a base class for those types you listed.
If you're not looking at numpy types, a good starting point would be numbers.Complex:
>>> import numbers
>>> isinstance(1+9j, numbers.Complex)
True
>>> isinstance(1L, numbers.Complex)
True
>>> isinstance(1., numbers.Complex)
True
>>> isinstance(1, numbers.Complex)
True
It gets a bit messier when you start to include those from numpy, however, the numbers.Complex abstract base class already handles a good number of the mentioned cases.
Sorry, a bit late to the party. But this might be helpful for future readers of this page. I suggest you use Python's "duck typing" character and its EAFP ("Easier to Ask Forgiveness than Permission") philosophy: in other words, just try using the object in question as a number. Write something like this:
def isnumber(thing):
try:
thing + 0
return True
except TypeError:
return False
It should work for any type of number, including user-defined classes.

__repr__ for (large) composite objects

I would like to have informative representations for my composite objects (i.e., objects composed of other (potentially composite) objects). However, because my code fundamentally deals with high-precision numbers (please don't ask me why I don't just use doubles), I end up with representations like you see here: http://pastebin.com/jpLgAfxC. Would it just be better to just stick with the default __repr__?
Whether to have a verbose repr depends on what you want to accomplish. For complex or composite objects, I know which I'd prefer of the following:
Point(x=1.12, y=2.2, z=-1.9)
<__main__.Point object at 0x103011890>
They both tell me what type the object is, but only the first is clear about all of the (relevant) values involved, and avoids low-level information that is only relevant on the rarest of occasions.
I like to see the real values. But, yours is a special case, given that your values are so frightfully humongous:
72401317106217603290426741268390656010621951704689382948334809645
87850348552960901165648762842931879347325584704068956434195098288
38279057775096090002410493665682226331178331461681861612403032369
73237863637784679012984303024949059416189689048527978878840119376
5152408961823197987224502419157858495179687559851
That they cannot be useful for most development or debugging purposes. I'm sure there are times you need the full serialization--to send to and from files, for example. But those have to be fairly rare, no? I can't imagine you really remember all 309 digits, or can determine if the above number is the same as the one below on visual inspection:
72401317106217603290426741268390656010621951704689382948334809645
87850348552960901165648762842931879347325584704068956434195098288
38279057775096090002410493665682226331178331461681861612403032369
73327863637784679012984303024949059416189689048527978878840119376
5152408961823197987224502419157858495179687559851
They're not the same. But unless you're Spock or The Terminator, you wouldn't know that from a quick glance. (And actually, I've made it easier here, length-wrapping to avoid having to horizontally scroll.)
So I would recommend (massively) shortening their representation, to make the output more tractable. This is like printing out the entire chapter text every time you want to print a Chapter object. Overkill.
Instead, try something much shorter and easier to work with. Truncation and/or ellipsis are useful. e.g.
72401...59851
7240131710...
You can use the object id as well. If your high-precision type is HP, then:
HP(0x103011890)
At least then you will be able to tell them apart. One ugliness of using object ids, however, is that objects can be logically equivalent, but if you create multiple objects with the same logical value, they'd have different ids, thus appear different when they are not. You can get around that by creating your own short hash function. There's a bit of an art to hashing, but for reprs, even something simple would work. E.g.:
import binascii, struct
def shorthash(s):
"""
Given a Python value, produce a short alphanumeric hash that
helps identify it for debugging purposes. A riff on
http://stackoverflow.com/a/2511059/240490
Enhanced to remove trailing boilerplate, and to work
on either Python 2 or Python 3.
"""
hashbytes = binascii.b2a_base64(struct.pack('l', hash(s)))
return hashbytes.decode('utf-8').rstrip().rstrip("=")
Then define your repr in the high-precision class:
def __repr__(self):
clsname = self.__class__.__name__
return '{0}({1}).format(clsname, shorthash(self.value))
Where self.value is whatever local attribute, property, or method creates the multi-hundred-digit value. If you're subclassing int, this could be just self.
This gets you to:
HP(Tea+5MY0WwA)
The two massive, almost identical numbers above? Using this scheme, they render out to:
HP(XhkG0358Fx4)
HP(27CdIG5elhQ)
Which are obviously different. You can combine this with a bit of a value representation. E.g. a few alternatives:
HP(~7.24013e308 # XhkG0358Fx4)
HP(dig='72401...59851', ndigits=309, hash='XhkG0358Fx4')
You'll find these shorter values more useful in debugging contexts. You can, of course, keep around a method or property (e.g. .value, .digits, or .alldigits) for those case in which you need every last bit, but define the common case as something more easily consumed.
Thank you to Demian for the pointer to https://docs.python.org/2/reference/datamodel.html#object.repr, specifically:
This is typically used for debugging, so it is important that the
representation is information-rich and unambiguous.
http://pastebin.com/jpLgAfxC is probably the best possible __repr__ in this case.

No need for enums

I read a post recently where someone mentioned that there is no need for using enums in python. I'm interested in whether this is true or not.
For example, I use an enum to represent modem control signals:
class Signals:
CTS = "CTS"
DSR = "DSR"
...
Isn't it better that I use if signal == Signals.CTS: than if signal == "CTS":, or am I missing something?
Signals.CTS does seem better than "CTS". But Signals is not an enum, it's a class with specific fields. The claim, as I've heard it, is that you don't need a separate enum language construct, as you can do things like you've done in the question, or perhaps:
CTS, DSR, XXX, YYY, ZZZ = range(5)
If you have that in a signals module, it can be imported as used in a similar fashion, e.g., if signal == signals.CTS:. This is used in several modules in the standard library, including the re and os modules.
In your exact example, I guess it would be okay to use defined constants, as it would raise an error, when the constant is not found, alas a typo in a string would not.
I guess there is an at least equal solution using object orientation.
BTW: if "CTS": will always be True, since only empty strings are interpreted as False.
It depends on whether you use values of Signal.CTS, Signal.DSR as data. For example if you send these strings to actual modem. If this is true, then it would be a good idea to have aliases defined as you did, because external interfaces tend to change or be less uniform when you would expect. Otherwise if you don't ever use symbols values then you can skip layer of abstraction and use strings directly.
The only thing is not to mix internal symbols and external data.
If you want to have meaningful string constants (CTS = "CTS", etc.), you can simply do:
for constant_name in ('CTS', 'DSR'): # All constant names go here
globals()[constant_name] = constant_name
This defines variables CTS dans DSR with the values you want. (Reference about the use of globals(): Programmatically creating variables in Python.)
Directly defining your constants at the top level of a module is done in many standard library modules (like for instance the re and os modules [re.IGNORECASE, etc.]), so this approach is quite clean.
I think there's a lot more to load on enumerations (setting arbitrary values, bitwise operations, whitespace-d descriptions).
Please read below very short post, check the enum class offered, and judge yourself.
Python Enum Post
Do we need Enums in Python? Do we need an html module, a database module, or a bool type?
I would classify Enums as a nice-to-have, not a must-have.
However, part of the reason Enums finally showed up (in Python 3.4) is because they a such a nice-to-have that many folk reimplemented enums by hand. With so many private and public versions of enumerations, interoperability becomes an issue, standard use becomes an issue, etc., etc.
So to answer your question: No, we don't need an Enum type. But we now have one anyway. There's even a back-ported version.

Python code readability

I have a programming experience with statically typed languages. Now writing code in Python I feel difficulties with its readability. Lets say I have a class Host:
class Host(object):
def __init__(self, name, network_interface):
self.name = name
self.network_interface = network_interface
I don't understand from this definition, what "network_interface" should be. Is it a string, like "eth0" or is it an instance of a class NetworkInterface? The only way I'm thinking about to solve this is a documenting the code with a "docstring". Something like this:
class Host(object):
''' Attributes:
#name: a string
#network_interface: an instance of class NetworkInterface'''
Or may be there are name conventions for things like that?
Using dynamic languages will teach you something about static languages: all the help you got from the static language that you now miss in the dynamic language, it wasn't all that helpful.
To use your example, in a static language, you'd know that the parameter was a string, and in Python you don't. So in Python you write a docstring. And while you're writing it, you realize you had more to say about it than, "it's a string". You need to say what data is in the string, and what format it should have, and what the default is, and something about error conditions.
And then you realize you should have written all that down for your static language as well. Sure, Java would force you know that it was a string, but there's all these other details that need to be specified, and you have to manually do that work in any language.
The docstring conventions are at PEP 257.
The example there follows this format for specifying arguments, you can add the types if they matter:
def complex(real=0.0, imag=0.0):
"""Form a complex number.
Keyword arguments:
real -- the real part (default 0.0)
imag -- the imaginary part (default 0.0)
"""
if imag == 0.0 and real == 0.0: return complex_zero
...
There was also a rejected PEP for docstrings for attributes ( rather than constructor arguments ).
The most pythonic solution is to document with examples. If possible, state what operations an object must support to be acceptable, rather than a specific type.
class Host(object):
def __init__(self, name, network_interface)
"""Initialise host with given name and network_interface.
network_interface -- must support the same operations as NetworkInterface
>>> network_interface = NetworkInterface()
>>> host = Host("my_host", network_interface)
"""
...
At this point, hook your source up to doctest to make sure your doc examples continue to work in future.
Personally I found very usefull to use pylint to validate my code.
If you follow pylint suggestion almost automatically your code become more readable,
you will improve your python writing skills, respect naming conventions. You can also define your own naming conventions and so on. It's very useful specially for a python beginner.
I suggest you to use.
Python, though not as overtly typed as C or Java, is still typed and will throw exceptions if you're doing things with types that simply do not play nice together.
To that end, if you're concerned about your code being used correctly, maintained correctly, etc. simply use docstrings, comments, or even more explicit variable names to indicate what the type should be.
Even better yet, include code that will allow it to handle whichever type it may be passed as long as it yields a usable result.
One benefit of static typing is that types are a form of documentation. When programming in Python, you can document more flexibly and fluently. Of course in your example you want to say that network_interface should implement NetworkInterface, but in many cases the type is obvious from the context, variable name, or by convention, and in these cases by omitting the obvious you can produce more readable code. Common is to describe the meaning of a parameter and implicitly giving the type.
For example:
def Bar(foo, count):
"""Bar the foo the given number of times."""
...
This describes the function tersely and precisely. What foo and bar mean will be obvious from context, and that count is a (positive) integer is implicit.
For your example, I'd just mention the type in the document string:
"""Create a named host on the given NetworkInterface."""
This is shorter, more readable, and contains more information than a listing of the types.

Categories