How to write __getitem__ cleanly?

How to write __getitem__ cleanly? - python

In Python, when implementing a sequence type, I often (relatively speaking) find myself writing code like this:
class FooSequence(collections.abc.Sequence):
# Snip other methods
def __getitem__(self, key):
if isinstance(key, int):
# Get a single item
elif isinstance(key, slice):
# Get a whole slice
else:
raise TypeError('Index must be int, not {}'.format(type(key).__name__))
The code checks the type of its argument explicitly with isinstance(). This is regarded as an antipattern within the Python community. How do I avoid it?
I cannot use functools.singledispatch, because that's quite deliberately incompatible with methods (it will attempt to dispatch on self, which is entirely useless since we're already dispatching on self via OOP polymorphism). It works with #staticmethod, but what if I need to get stuff out of self?
Casting to int() and then catching the TypeError, checking for a slice, and possibly re-raising is still ugly, though perhaps slightly less so.
It might be cleaner to convert integers into one-element slices and handle both situations with the same code, but that has its own problems (return 0 or [0]?).

As much as it seems odd, I suspect that the way you have it is the best way to go about things. Patterns generally exist to encompass common use cases, but that doesn't mean that they should be taken as gospel when following them makes life more difficult. The main reason that PEP 443 gives for balking at explicit typechecking is that it is "brittle and closed to extension". However, that mainly applies to custom functions that take a number of different types at any time. From the Python docs on __getitem__:
For sequence types, the accepted keys should be integers and slice objects. Note that the special interpretation of negative indexes (if the class wishes to emulate a sequence type) is up to the __getitem__() method. If key is of an inappropriate type, TypeError may be raised; if of a value outside the set of indexes for the sequence (after any special interpretation of negative values), IndexError should be raised. For mapping types, if key is missing (not in the container), KeyError should be raised.
The Python documentation explicitly states the two types that should be accepted, and what to do if an item that is not of those two types is provided. Given that the types are provided by the documentation itself, it's unlikely to change (doing so would break far more implementations than just yours), so it's likely not worth the trouble to go out of your way to code against Python itself potentially changing.
If you're set on avoiding explicit typechecking, I would point you toward this SO answer. It contains a concise implementation of a #methdispatch decorator (not my name, but i'll roll with it) that lets #singledispatch work with methods by forcing it to check args[1] (arg) rather than args[0] (self). Using that should allow you to use custom single dispatch with your __getitem__ method.
Whether or not you consider either of these "pythonic" is up to you, but remember that while The Zen of Python notes that "Special cases aren't special enough to break the rules", it then immediately notes that "practicality beats purity". In this case, just checking for the two types that the documentation explicitly states are the only things __getitem__ should support seems like the practical way to me.

The antipattern is for code to do explicit type checking, which means using the type() function. Why? Because then a subclass of the target type will no longer work. For instance, __getitem__ can use an int, but using type() to check for it means an int-subclass, which would work, will fail only because type() does not return int.
When a type-check is necessary, isinstance is the appropriate way to do it as it does not exclude subclasses.
When writing __dunder__ methods, type checking is necessary and expected -- using isinstance().
In other words, your code is perfectly Pythonic, and its only problem is the error message (it doesn't mention slices).

I'm not aware of a way to avoid doing it once. That's just the tradeoff of using a dynamically-typed language in this way. However, that doesn't mean you have to do it over and over again. I would solve it once by creating an abstract class with split out method names, then inherit from that class instead of directly from Sequence, like:
class UnannoyingSequence(collections.abc.Sequence):
def __getitem__(self, key):
if isinstance(key, int):
return self.getitem(key)
elif isinstance(key, slice):
return self.getslice(key)
else:
raise TypeError('Index must be int, not {}'.format(type(key).__name__))
# default implementation in terms of getitem
def getslice(self, key):
# Get a whole slice
class FooSequence(UnannoyingSequence):
def getitem(self, key):
# Get a single item
# optional efficient, type-specific implementation not in terms of getitem
def getslice(self, key):
# Get a whole slice
This cleans up FooSequence enough that I might even do it this way if I only had the one derived class. I'm sort of surprised the standard library doesn't already work that way.

To stay pythonic, you have work with the semantics rather than the type of the objects. So if you have some parameter as accessor to a sequence, just use it like that. Use the abstraction for a parameter as long as possible. If you expect a set of user identifiers, do not expect a set, but rather some data structure with a method add. If you expect some text, do not expect a unicode object, but rather some container for characters featuring encode and decode methods.
I assume in general you want to do something like "Use the behavior of the base implementation unless some special value is provided. If you want to implement __getitem__, you can use a case distinction where something different happens if one special value is provided. I'd use the following pattern:
class FooSequence(collections.abc.Sequence):
# Snip other methods
def __getitem__(self, key):
try:
if key == SPECIAL_VALUE:
return SOMETHING_SPECIAL
else:
return self.our_baseclass_instance[key]
except AttributeError:
raise TypeError('Wrong type: {}'.format(type(key).__name__))
If you want to distinguish between a single value (in perl terminology "scalar") and a sequence (in Java terminology "collection"), then it is pythonically fine to determine whether an iterator is implemented. You can either use a try-catch pattern or hasattr as I do now:
>>> a = 42
>>> b = [1, 3, 5, 7]
>>> c = slice(1, 42)
>>> hasattr(a, "__iter__")
False
>>> hasattr(b, "__iter__")
True
>>> hasattr(c, "__iter__")
False
>>>
Applied to our example:
class FooSequence(collections.abc.Sequence):
# Snip other methods
def __getitem__(self, key):
try:
if hasattr(key, "__iter__"):
return map(lambda x: WHATEVER(x), key)
else:
return self.our_baseclass_instance[key]
except AttributeError:
raise TypeError('Wrong type: {}'.format(type(key).__name__))
Dynamic programming languages like python and ruby use duck typing. And a duck is an animal, that walks like a duck, swims like a duck and quacks like a duck. Not because somebody calls it a "duck".

Related

I'm trying to validate a python class property, but it isn't raising the error I tried to raise. How should I correct this? [duplicate]

How do I check if an object is of a given type, or if it inherits from a given type?
How do I check if the object o is of type str?
Beginners often wrongly expect the string to already be "a number" - either expecting Python 3.x input to convert type, or expecting that a string like '1' is also simultaneously an integer. This is the wrong canonical for those questions. Please carefully read the question and then use How do I check if a string represents a number (float or int)?, How can I read inputs as numbers? and/or Asking the user for input until they give a valid response as appropriate.

Use isinstance to check if o is an instance of str or any subclass of str:
if isinstance(o, str):
To check if the type of o is exactly str, excluding subclasses of str:
if type(o) is str:
See Built-in Functions in the Python Library Reference for relevant information.
Checking for strings in Python 2
For Python 2, this is a better way to check if o is a string:
if isinstance(o, basestring):
because this will also catch Unicode strings. unicode is not a subclass of str; both str and unicode are subclasses of basestring. In Python 3, basestring no longer exists since there's a strict separation of strings (str) and binary data (bytes).
Alternatively, isinstance accepts a tuple of classes. This will return True if o is an instance of any subclass of any of (str, unicode):
if isinstance(o, (str, unicode)):

The most Pythonic way to check the type of an object is... not to check it.
Since Python encourages Duck Typing, you should just try...except to use the object's methods the way you want to use them. So if your function is looking for a writable file object, don't check that it's a subclass of file, just try to use its .write() method!
Of course, sometimes these nice abstractions break down and isinstance(obj, cls) is what you need. But use sparingly.

isinstance(o, str) will return True if o is an str or is of a type that inherits from str.
type(o) is str will return True if and only if o is a str. It will return False if o is of a type that inherits from str.

After the question was asked and answered, type hints were added to Python. Type hints in Python allow types to be checked but in a very different way from statically typed languages. Type hints in Python associate the expected types of arguments with functions as runtime accessible data associated with functions and this allows for types to be checked. Example of type hint syntax:
def foo(i: int):
return i
foo(5)
foo('oops')
In this case we want an error to be triggered for foo('oops') since the annotated type of the argument is int. The added type hint does not cause an error to occur when the script is run normally. However, it adds attributes to the function describing the expected types that other programs can query and use to check for type errors.
One of these other programs that can be used to find the type error is mypy:
mypy script.py
script.py:12: error: Argument 1 to "foo" has incompatible type "str"; expected "int"
(You might need to install mypy from your package manager. I don't think it comes with CPython but seems to have some level of "officialness".)
Type checking this way is different from type checking in statically typed compiled languages. Because types are dynamic in Python, type checking must be done at runtime, which imposes a cost -- even on correct programs -- if we insist that it happen at every chance. Explicit type checks may also be more restrictive than needed and cause unnecessary errors (e.g. does the argument really need to be of exactly list type or is anything iterable sufficient?).
The upside of explicit type checking is that it can catch errors earlier and give clearer error messages than duck typing. The exact requirements of a duck type can only be expressed with external documentation (hopefully it's thorough and accurate) and errors from incompatible types can occur far from where they originate.
Python's type hints are meant to offer a compromise where types can be specified and checked but there is no additional cost during usual code execution.
The typing package offers type variables that can be used in type hints to express needed behaviors without requiring particular types. For example, it includes variables such as Iterable and Callable for hints to specify the need for any type with those behaviors.
While type hints are the most Pythonic way to check types, it's often even more Pythonic to not check types at all and rely on duck typing. Type hints are relatively new and the jury is still out on when they're the most Pythonic solution. A relatively uncontroversial but very general comparison: Type hints provide a form of documentation that can be enforced, allow code to generate earlier and easier to understand errors, can catch errors that duck typing can't, and can be checked statically (in an unusual sense but it's still outside of runtime). On the other hand, duck typing has been the Pythonic way for a long time, doesn't impose the cognitive overhead of static typing, is less verbose, and will accept all viable types and then some.

In Python 3.10, you can use | in isinstance:
>>> isinstance('1223', int | str)
True
>>> isinstance('abcd', int | str)
True

isinstance(o, str)
Link to docs

You can check for type of a variable using __name__ of a type.
Ex:
>>> a = [1,2,3,4]
>>> b = 1
>>> type(a).__name__
'list'
>>> type(a).__name__ == 'list'
True
>>> type(b).__name__ == 'list'
False
>>> type(b).__name__
'int'

For more complex type validations I like typeguard's approach of validating based on python type hint annotations:
from typeguard import check_type
from typing import List
try:
check_type('mylist', [1, 2], List[int])
except TypeError as e:
print(e)
You can perform very complex validations in very clean and readable fashion.
check_type('foo', [1, 3.14], List[Union[int, float]])
# vs
isinstance(foo, list) and all(isinstance(a, (int, float)) for a in foo)

I think the cool thing about using a dynamic language like Python is you really shouldn't have to check something like that.
I would just call the required methods on your object and catch an AttributeError. Later on this will allow you to call your methods with other (seemingly unrelated) objects to accomplish different tasks, such as mocking an object for testing.
I've used this a lot when getting data off the web with urllib2.urlopen() which returns a file like object. This can in turn can be passed to almost any method that reads from a file, because it implements the same read() method as a real file.
But I'm sure there is a time and place for using isinstance(), otherwise it probably wouldn't be there :)

The accepted answer answers the question in that it provides the answers to the asked questions.
Q: What is the best way to check whether a given object is of a given type? How about checking whether the object inherits from a given type?
A: Use isinstance, issubclass, type to check based on types.
As other answers and comments are quick to point out however, there's a lot more to the idea of "type-checking" than that in python. Since the addition of Python 3 and type hints, much has changed as well. Below, I go over some of the difficulties with type checking, duck typing, and exception handling. For those that think type checking isn't what is needed (it usually isn't, but we're here), I also point out how type hints can be used instead.
Type Checking
Type checking is not always an appropriate thing to do in python. Consider the following example:
def sum(nums):
"""Expect an iterable of integers and return the sum."""
result = 0
for n in nums:
result += n
return result
To check if the input is an iterable of integers, we run into a major issue. The only way to check if every element is an integer would be to loop through to check each element. But if we loop through the entire iterator, then there will be nothing left for intended code. We have two options in this kind of situation.
Check as we loop.
Check beforehand but store everything as we check.
Option 1 has the downside of complicating our code, especially if we need to perform similar checks in many places. It forces us to move type checking from the top of the function to everywhere we use the iterable in our code.
Option 2 has the obvious downside that it destroys the entire purpose of iterators. The entire point is to not store the data because we shouldn't need to.
One might also think that checking if checking all of the elements is too much then perhaps we can just check if the input itself is of the type iterable, but there isn't actually any iterable base class. Any type implementing __iter__ is iterable.
Exception Handling and Duck Typing
An alternative approach would be to forgo type checking altogether and focus on exception handling and duck typing instead. That is to say, wrap your code in a try-except block and catch any errors that occur. Alternatively, don't do anything and let exceptions rise naturally from your code.
Here's one way to go about catching an exception.
def sum(nums):
"""Try to catch exceptions?"""
try:
result = 0
for n in nums:
result += n
return result
except TypeError as e:
print(e)
Compared to the options before, this is certainly better. We're checking as we run the code. If there's a TypeError anywhere, we'll know. We don't have to place a check everywhere that we loop through the input. And we don't have to store the input as we iterate over it.
Furthermore, this approach enables duck typing. Rather than checking for specific types, we have moved to checking for specific behaviors and look for when the input fails to behave as expected (in this case, looping through nums and being able to add n).
However, the exact reasons which make exception handling nice can also be their downfall.
A float isn't an int, but it satisfies the behavioral requirements to work.
It is also bad practice to wrap the entire code with a try-except block.
At first these may not seem like issues, but here's some reasons that may change your mind.
A user can no longer expect our function to return an int as intended. This may break code elsewhere.
Since exceptions can come from a wide variety of sources, using the try-except on the whole code block may end up catching exceptions you didn't intend to. We only wanted to check if nums was iterable and had integer elements.
Ideally we'd like to catch exceptions our code generators and raise, in their place, more informative exceptions. It's not fun when an exception is raised from someone else's code with no explanation other than a line you didn't write and that some TypeError occured.
In order to fix the exception handling in response to the above points, our code would then become this... abomination.
def sum(nums):
"""
Try to catch all of our exceptions only.
Re-raise them with more specific details.
"""
result = 0
try:
iter(nums)
except TypeError as e:
raise TypeError("nums must be iterable")
for n in nums:
try:
result += int(n)
except TypeError as e:
raise TypeError("stopped mid iteration since a non-integer was found")
return result
You can kinda see where this is going. The more we try to "properly" check things, the worse our code is looking. Compared to the original code, this isn't readable at all.
We could argue perhaps this is a bit extreme. But on the other hand, this is only a very simple example. In practice, your code is probably much more complicated than this.
Type Hints
We've seen what happens when we try to modify our small example to "enable type checking". Rather than focusing on trying to force specific types, type hinting allows for a way to make types clear to users.
from typing import Iterable
def sum(nums: Iterable[int]) -> int:
result = 0
for n in nums:
result += n
return result
Here are some advantages to using type-hints.
The code actually looks good now!
Static type analysis may be performed by your editor if you use type hints!
They are stored on the function/class, making them dynamically usable e.g. typeguard and dataclasses.
They show up for functions when using help(...).
No need to sanity check if your input type is right based on a description or worse lack thereof.
You can "type" hint based on structure e.g. "does it have this attribute?" without requiring subclassing by the user.
The downside to type hinting?
Type hints are nothing more than syntax and special text on their own. It isn't the same as type checking.
In other words, it doesn't actually answer the question because it doesn't provide type checking. Regardless, however, if you are here for type checking, then you should be type hinting as well. Of course, if you've come to the conclusion that type checking isn't actually necessary but you want some semblance of typing, then type hints are for you.

To Hugo:
You probably mean list rather than array, but that points to the whole problem with type checking - you don't want to know if the object in question is a list, you want to know if it's some kind of sequence or if it's a single object. So try to use it like a sequence.
Say you want to add the object to an existing sequence, or if it's a sequence of objects, add them all
try:
my_sequence.extend(o)
except TypeError:
my_sequence.append(o)
One trick with this is if you are working with strings and/or sequences of strings - that's tricky, as a string is often thought of as a single object, but it's also a sequence of characters. Worse than that, as it's really a sequence of single-length strings.
I usually choose to design my API so that it only accepts either a single value or a sequence - it makes things easier. It's not hard to put a [ ] around your single value when you pass it in if need be.
(Though this can cause errors with strings, as they do look like (are) sequences.)

If you have to check for the type of str or int please use instanceof. As already mentioned by others the explanation is to also include sub classes. One important example for sub classes from my perspective are Enums with data type like IntEnum or StrEnum. Which are a pretty nice way to define related constants. However, it is kind of annoying if libraries do not accept those as such types.
Example:
import enum
class MyEnum(str, enum.Enum):
A = "a"
B = "b"
print(f"is string: {isinstance(MyEnum.A, str)}") # True
print(f"is string: {type(MyEnum.A) == str}") # False!!!
print(f"is string: {type(MyEnum.A.value) == str}") # True

In Python, you can use the built-in isinstance() function to check if an object is of a given type, or if it inherits from a given type.
To check if the object o is of type str, you would use the following code:
if isinstance(o, str):
# o is of type str
You can also use type() function to check the object type.
if type(o) == str:
# o is of type str
You can also check if the object is a sub class of a particular class using issubclass() function.
if issubclass(type(o),str):
# o is sub class of str

A simple way to check type is to compare it with something whose type you know.
>>> a = 1
>>> type(a) == type(1)
True
>>> b = 'abc'
>>> type(b) == type('')
True

I think the best way is to typing well your variables. You can do this by using the "typing" library.
Example:
from typing import NewType
UserId = NewType ('UserId', int)
some_id = UserId (524313`)
See https://docs.python.org/3/library/typing.html.

What is the best way to work with unique item, list and generator [duplicate]

I want to write a function that accepts a parameter which can be either a sequence or a single value. The type of value is str, int, etc., but I don't want it to be restricted to a hardcoded list.
In other words, I want to know if the parameter X is a sequence or something I have to convert to a sequence to avoid special-casing later. I could do
type(X) in (list, tuple)
but there may be other sequence types I'm not aware of, and no common base class.
-N.
Edit: See my "answer" below for why most of these answers don't help me. Maybe you have something better to suggest.

As of 2.6, use abstract base classes.
>>> import collections
>>> isinstance([], collections.Sequence)
True
>>> isinstance(0, collections.Sequence)
False
Furthermore ABC's can be customized to account for exceptions, such as not considering strings to be sequences. Here an example:
import abc
import collections
class Atomic(object):
__metaclass__ = abc.ABCMeta
#classmethod
def __subclasshook__(cls, other):
return not issubclass(other, collections.Sequence) or NotImplemented
Atomic.register(basestring)
After registration the Atomic class can be used with isinstance and issubclass:
assert isinstance("hello", Atomic) == True
This is still much better than a hard-coded list, because you only need to register the exceptions to the rule, and external users of the code can register their own.
Note that in Python 3 the syntax for specifying metaclasses changed and the basestring abstract superclass was removed, which requires something like the following to be used instead:
class Atomic(metaclass=abc.ABCMeta):
#classmethod
def __subclasshook__(cls, other):
return not issubclass(other, collections.Sequence) or NotImplemented
Atomic.register(str)
If desired, it's possible to write code which is compatible both both Python 2.6+ and 3.x, but doing so requires using a slightly more complicated technique which dynamically creates the needed abstract base class, thereby avoiding syntax errors due to the metaclass syntax difference. This is essentially the same as what Benjamin Peterson's six module'swith_metaclass()function does.
class _AtomicBase(object):
#classmethod
def __subclasshook__(cls, other):
return not issubclass(other, collections.Sequence) or NotImplemented
class Atomic(abc.ABCMeta("NewMeta", (_AtomicBase,), {})):
pass
try:
unicode = unicode
except NameError: # 'unicode' is undefined, assume Python >= 3
Atomic.register(str) # str includes unicode in Py3, make both Atomic
Atomic.register(bytes) # bytes will also be considered Atomic (optional)
else:
# basestring is the abstract superclass of both str and unicode types
Atomic.register(basestring) # make both types of strings Atomic
In versions before 2.6, there are type checkers in theoperatormodule.
>>> import operator
>>> operator.isSequenceType([])
True
>>> operator.isSequenceType(0)
False

The problem with all of the above
mentioned ways is that str is
considered a sequence (it's iterable,
has getitem, etc.) yet it's
usually treated as a single item.
For example, a function may accept an
argument that can either be a filename
or a list of filenames. What's the
most Pythonic way for the function to
detect the first from the latter?
Based on the revised question, it sounds like what you want is something more like:
def to_sequence(arg):
'''
determine whether an arg should be treated as a "unit" or a "sequence"
if it's a unit, return a 1-tuple with the arg
'''
def _multiple(x):
return hasattr(x,"__iter__")
if _multiple(arg):
return arg
else:
return (arg,)
>>> to_sequence("a string")
('a string',)
>>> to_sequence( (1,2,3) )
(1, 2, 3)
>>> to_sequence( xrange(5) )
xrange(5)
This isn't guaranteed to handle all types, but it handles the cases you mention quite well, and should do the right thing for most of the built-in types.
When using it, make sure whatever receives the output of this can handle iterables.

IMHO, the python way is to pass the list as *list. As in:
myfunc(item)
myfunc(*items)

Sequences are described here:
https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange
So sequences are not the same as iterable objects. I think sequence must implement
__getitem__, whereas iterable objects must implement __iter__.
So for example string are sequences and don't implement __iter__, xrange objects are sequences and don't implement __getslice__.
But from what you seen to want to do, I'm not sure you want sequences, but rather iterable objects.
So go for hasattr("__getitem__", X) you want sequences, but go rather hasattr("__iter__", X) if you don't want strings for example.

In cases like this, I prefer to just always take the sequence type or always take the scalar. Strings won't be the only types that would behave poorly in this setup; rather, any type that has an aggregate use and allows iteration over its parts might misbehave.

The simplest method would be to check if you can turn it into an iterator. ie
try:
it = iter(X)
# Iterable
except TypeError:
# Not iterable
If you need to ensure that it's a restartable or random access sequence (ie not a generator etc), this approach won't be sufficient however.
As others have noted, strings are also iterable, so if you need so exclude them (particularly important if recursing through items, as list(iter('a')) gives ['a'] again, then you may need to specifically exclude them with:
if not isinstance(X, basestring)

I'm new here so I don't know what's the correct way to do it. I want to answer my answers:
The problem with all of the above mentioned ways is that str is considered a sequence (it's iterable, has __getitem__, etc.) yet it's usually treated as a single item.
For example, a function may accept an argument that can either be a filename or a list of filenames. What's the most Pythonic way for the function to detect the first from the latter?
Should I post this as a new question? Edit the original one?

I think what I would do is check whether the object has certain methods that indicate it is a sequence. I'm not sure if there is an official definition of what makes a sequence. The best I can think of is, it must support slicing. So you could say:
is_sequence = '__getslice__' in dir(X)
You might also check for the particular functionality you're going to be using.
As pi pointed out in the comment, one issue is that a string is a sequence, but you probably don't want to treat it as one. You could add an explicit test that the type is not str.

If strings are the problem, detect a sequence and filter out the special case of strings:
def is_iterable(x):
if type(x) == str:
return False
try:
iter(x)
return True
except TypeError:
return False

You're asking the wrong question. You don't try to detect types in Python; you detect behavior.
Write another function that handles a single value. (let's call it _use_single_val).
Write one function that handles a sequence parameter. (let's call it _use_sequence).
Write a third parent function that calls the two above. (call it use_seq_or_val). Surround each call with an exception handler to catch an invalid parameter (i.e. not single value or sequence).
Write unit tests to pass correct & incorrect parameters to the parent function to make sure it catches the exceptions properly.
def _use_single_val(v):
print v + 1 # this will fail if v is not a value type
def _use_sequence(s):
print s[0] # this will fail if s is not indexable
def use_seq_or_val(item):
try:
_use_single_val(item)
except TypeError:
pass
try:
_use_sequence(item)
except TypeError:
pass
raise TypeError, "item not a single value or sequence"
EDIT: Revised to handle the "sequence or single value" asked about in the question.

Revised answer:
I don't know if your idea of "sequence" matches what the Python manuals call a "Sequence Type", but in case it does, you should look for the __Contains__ method. That is the method Python uses to implement the check "if something in object:"
if hasattr(X, '__contains__'):
print "X is a sequence"
My original answer:
I would check if the object that you received implements an iterator interface:
if hasattr(X, '__iter__'):
print "X is a sequence"
For me, that's the closest match to your definition of sequence since that would allow you to do something like:
for each in X:
print each

You could pass your parameter in the built-in len() function and check whether this causes an error. As others said, the string type requires special handling.
According to the documentation the len function can accept a sequence (string, list, tuple) or a dictionary.
You could check that an object is a string with the following code:
x.__class__ == "".__class__

Defining typed dictionaries in Python [duplicate]

How do I check if an object is of a given type, or if it inherits from a given type?
How do I check if the object o is of type str?
Beginners often wrongly expect the string to already be "a number" - either expecting Python 3.x input to convert type, or expecting that a string like '1' is also simultaneously an integer. This is the wrong canonical for those questions. Please carefully read the question and then use How do I check if a string represents a number (float or int)?, How can I read inputs as numbers? and/or Asking the user for input until they give a valid response as appropriate.

Use isinstance to check if o is an instance of str or any subclass of str:
if isinstance(o, str):
To check if the type of o is exactly str, excluding subclasses of str:
if type(o) is str:
See Built-in Functions in the Python Library Reference for relevant information.
Checking for strings in Python 2
For Python 2, this is a better way to check if o is a string:
if isinstance(o, basestring):
because this will also catch Unicode strings. unicode is not a subclass of str; both str and unicode are subclasses of basestring. In Python 3, basestring no longer exists since there's a strict separation of strings (str) and binary data (bytes).
Alternatively, isinstance accepts a tuple of classes. This will return True if o is an instance of any subclass of any of (str, unicode):
if isinstance(o, (str, unicode)):

The most Pythonic way to check the type of an object is... not to check it.
Since Python encourages Duck Typing, you should just try...except to use the object's methods the way you want to use them. So if your function is looking for a writable file object, don't check that it's a subclass of file, just try to use its .write() method!
Of course, sometimes these nice abstractions break down and isinstance(obj, cls) is what you need. But use sparingly.

isinstance(o, str) will return True if o is an str or is of a type that inherits from str.
type(o) is str will return True if and only if o is a str. It will return False if o is of a type that inherits from str.

After the question was asked and answered, type hints were added to Python. Type hints in Python allow types to be checked but in a very different way from statically typed languages. Type hints in Python associate the expected types of arguments with functions as runtime accessible data associated with functions and this allows for types to be checked. Example of type hint syntax:
def foo(i: int):
return i
foo(5)
foo('oops')
In this case we want an error to be triggered for foo('oops') since the annotated type of the argument is int. The added type hint does not cause an error to occur when the script is run normally. However, it adds attributes to the function describing the expected types that other programs can query and use to check for type errors.
One of these other programs that can be used to find the type error is mypy:
mypy script.py
script.py:12: error: Argument 1 to "foo" has incompatible type "str"; expected "int"
(You might need to install mypy from your package manager. I don't think it comes with CPython but seems to have some level of "officialness".)
Type checking this way is different from type checking in statically typed compiled languages. Because types are dynamic in Python, type checking must be done at runtime, which imposes a cost -- even on correct programs -- if we insist that it happen at every chance. Explicit type checks may also be more restrictive than needed and cause unnecessary errors (e.g. does the argument really need to be of exactly list type or is anything iterable sufficient?).
The upside of explicit type checking is that it can catch errors earlier and give clearer error messages than duck typing. The exact requirements of a duck type can only be expressed with external documentation (hopefully it's thorough and accurate) and errors from incompatible types can occur far from where they originate.
Python's type hints are meant to offer a compromise where types can be specified and checked but there is no additional cost during usual code execution.
The typing package offers type variables that can be used in type hints to express needed behaviors without requiring particular types. For example, it includes variables such as Iterable and Callable for hints to specify the need for any type with those behaviors.
While type hints are the most Pythonic way to check types, it's often even more Pythonic to not check types at all and rely on duck typing. Type hints are relatively new and the jury is still out on when they're the most Pythonic solution. A relatively uncontroversial but very general comparison: Type hints provide a form of documentation that can be enforced, allow code to generate earlier and easier to understand errors, can catch errors that duck typing can't, and can be checked statically (in an unusual sense but it's still outside of runtime). On the other hand, duck typing has been the Pythonic way for a long time, doesn't impose the cognitive overhead of static typing, is less verbose, and will accept all viable types and then some.

In Python 3.10, you can use | in isinstance:
>>> isinstance('1223', int | str)
True
>>> isinstance('abcd', int | str)
True

isinstance(o, str)
Link to docs

You can check for type of a variable using __name__ of a type.
Ex:
>>> a = [1,2,3,4]
>>> b = 1
>>> type(a).__name__
'list'
>>> type(a).__name__ == 'list'
True
>>> type(b).__name__ == 'list'
False
>>> type(b).__name__
'int'

For more complex type validations I like typeguard's approach of validating based on python type hint annotations:
from typeguard import check_type
from typing import List
try:
check_type('mylist', [1, 2], List[int])
except TypeError as e:
print(e)
You can perform very complex validations in very clean and readable fashion.
check_type('foo', [1, 3.14], List[Union[int, float]])
# vs
isinstance(foo, list) and all(isinstance(a, (int, float)) for a in foo)

I think the cool thing about using a dynamic language like Python is you really shouldn't have to check something like that.
I would just call the required methods on your object and catch an AttributeError. Later on this will allow you to call your methods with other (seemingly unrelated) objects to accomplish different tasks, such as mocking an object for testing.
I've used this a lot when getting data off the web with urllib2.urlopen() which returns a file like object. This can in turn can be passed to almost any method that reads from a file, because it implements the same read() method as a real file.
But I'm sure there is a time and place for using isinstance(), otherwise it probably wouldn't be there :)

The accepted answer answers the question in that it provides the answers to the asked questions.
Q: What is the best way to check whether a given object is of a given type? How about checking whether the object inherits from a given type?
A: Use isinstance, issubclass, type to check based on types.
As other answers and comments are quick to point out however, there's a lot more to the idea of "type-checking" than that in python. Since the addition of Python 3 and type hints, much has changed as well. Below, I go over some of the difficulties with type checking, duck typing, and exception handling. For those that think type checking isn't what is needed (it usually isn't, but we're here), I also point out how type hints can be used instead.
Type Checking
Type checking is not always an appropriate thing to do in python. Consider the following example:
def sum(nums):
"""Expect an iterable of integers and return the sum."""
result = 0
for n in nums:
result += n
return result
To check if the input is an iterable of integers, we run into a major issue. The only way to check if every element is an integer would be to loop through to check each element. But if we loop through the entire iterator, then there will be nothing left for intended code. We have two options in this kind of situation.
Check as we loop.
Check beforehand but store everything as we check.
Option 1 has the downside of complicating our code, especially if we need to perform similar checks in many places. It forces us to move type checking from the top of the function to everywhere we use the iterable in our code.
Option 2 has the obvious downside that it destroys the entire purpose of iterators. The entire point is to not store the data because we shouldn't need to.
One might also think that checking if checking all of the elements is too much then perhaps we can just check if the input itself is of the type iterable, but there isn't actually any iterable base class. Any type implementing __iter__ is iterable.
Exception Handling and Duck Typing
An alternative approach would be to forgo type checking altogether and focus on exception handling and duck typing instead. That is to say, wrap your code in a try-except block and catch any errors that occur. Alternatively, don't do anything and let exceptions rise naturally from your code.
Here's one way to go about catching an exception.
def sum(nums):
"""Try to catch exceptions?"""
try:
result = 0
for n in nums:
result += n
return result
except TypeError as e:
print(e)
Compared to the options before, this is certainly better. We're checking as we run the code. If there's a TypeError anywhere, we'll know. We don't have to place a check everywhere that we loop through the input. And we don't have to store the input as we iterate over it.
Furthermore, this approach enables duck typing. Rather than checking for specific types, we have moved to checking for specific behaviors and look for when the input fails to behave as expected (in this case, looping through nums and being able to add n).
However, the exact reasons which make exception handling nice can also be their downfall.
A float isn't an int, but it satisfies the behavioral requirements to work.
It is also bad practice to wrap the entire code with a try-except block.
At first these may not seem like issues, but here's some reasons that may change your mind.
A user can no longer expect our function to return an int as intended. This may break code elsewhere.
Since exceptions can come from a wide variety of sources, using the try-except on the whole code block may end up catching exceptions you didn't intend to. We only wanted to check if nums was iterable and had integer elements.
Ideally we'd like to catch exceptions our code generators and raise, in their place, more informative exceptions. It's not fun when an exception is raised from someone else's code with no explanation other than a line you didn't write and that some TypeError occured.
In order to fix the exception handling in response to the above points, our code would then become this... abomination.
def sum(nums):
"""
Try to catch all of our exceptions only.
Re-raise them with more specific details.
"""
result = 0
try:
iter(nums)
except TypeError as e:
raise TypeError("nums must be iterable")
for n in nums:
try:
result += int(n)
except TypeError as e:
raise TypeError("stopped mid iteration since a non-integer was found")
return result
You can kinda see where this is going. The more we try to "properly" check things, the worse our code is looking. Compared to the original code, this isn't readable at all.
We could argue perhaps this is a bit extreme. But on the other hand, this is only a very simple example. In practice, your code is probably much more complicated than this.
Type Hints
We've seen what happens when we try to modify our small example to "enable type checking". Rather than focusing on trying to force specific types, type hinting allows for a way to make types clear to users.
from typing import Iterable
def sum(nums: Iterable[int]) -> int:
result = 0
for n in nums:
result += n
return result
Here are some advantages to using type-hints.
The code actually looks good now!
Static type analysis may be performed by your editor if you use type hints!
They are stored on the function/class, making them dynamically usable e.g. typeguard and dataclasses.
They show up for functions when using help(...).
No need to sanity check if your input type is right based on a description or worse lack thereof.
You can "type" hint based on structure e.g. "does it have this attribute?" without requiring subclassing by the user.
The downside to type hinting?
Type hints are nothing more than syntax and special text on their own. It isn't the same as type checking.
In other words, it doesn't actually answer the question because it doesn't provide type checking. Regardless, however, if you are here for type checking, then you should be type hinting as well. Of course, if you've come to the conclusion that type checking isn't actually necessary but you want some semblance of typing, then type hints are for you.

To Hugo:
You probably mean list rather than array, but that points to the whole problem with type checking - you don't want to know if the object in question is a list, you want to know if it's some kind of sequence or if it's a single object. So try to use it like a sequence.
Say you want to add the object to an existing sequence, or if it's a sequence of objects, add them all
try:
my_sequence.extend(o)
except TypeError:
my_sequence.append(o)
One trick with this is if you are working with strings and/or sequences of strings - that's tricky, as a string is often thought of as a single object, but it's also a sequence of characters. Worse than that, as it's really a sequence of single-length strings.
I usually choose to design my API so that it only accepts either a single value or a sequence - it makes things easier. It's not hard to put a [ ] around your single value when you pass it in if need be.
(Though this can cause errors with strings, as they do look like (are) sequences.)

If you have to check for the type of str or int please use instanceof. As already mentioned by others the explanation is to also include sub classes. One important example for sub classes from my perspective are Enums with data type like IntEnum or StrEnum. Which are a pretty nice way to define related constants. However, it is kind of annoying if libraries do not accept those as such types.
Example:
import enum
class MyEnum(str, enum.Enum):
A = "a"
B = "b"
print(f"is string: {isinstance(MyEnum.A, str)}") # True
print(f"is string: {type(MyEnum.A) == str}") # False!!!
print(f"is string: {type(MyEnum.A.value) == str}") # True

In Python, you can use the built-in isinstance() function to check if an object is of a given type, or if it inherits from a given type.
To check if the object o is of type str, you would use the following code:
if isinstance(o, str):
# o is of type str
You can also use type() function to check the object type.
if type(o) == str:
# o is of type str
You can also check if the object is a sub class of a particular class using issubclass() function.
if issubclass(type(o),str):
# o is sub class of str

A simple way to check type is to compare it with something whose type you know.
>>> a = 1
>>> type(a) == type(1)
True
>>> b = 'abc'
>>> type(b) == type('')
True

I think the best way is to typing well your variables. You can do this by using the "typing" library.
Example:
from typing import NewType
UserId = NewType ('UserId', int)
some_id = UserId (524313`)
See https://docs.python.org/3/library/typing.html.

Implementing getitem

Is there a way to implement __getitem__ in a way that supports integer and slice indices without manually checking the type of the argument?
I see a lot of examples of this form, but it seems very hacky to me.
def __getitem__(self,key):
if isinstance(key,int):
# do integery foo here
if isinstance(key,slice):
# do slicey bar here
On a related note, why does this problem exist in the first place? Somtimes returning an int and sometimes a slice is weird design. Calling foo[4] should call foo.__getitem__(slice(4,5,1)) or similar.

You could use exception handling; assume key is a slice object and call the indices() method on it. If that fails it must've been an integer:
def __getitem__(self, key):
try:
return [self.somelist[i] * 5 for i in key.indices(self.length)]
except AttributeError:
# not a slice object (no `indices` attribute)
return self.somelist[key] * 5
Most use-cases for custom containers don't need to support slicing, and historically, the __getitem__ method only ever had to handle integers (for sequences, that is); the __getslice__() method was there to handle slicing instead. When __getslice__ was deprecated, for backwards compatibility and for simpler APIs it was easier to have __getitem__ handle both integers and slice objects.
And that is ignoring the fact that outside sequences, key doesn't have to be an integer. Custom classes are free to support any key type they like.

is there any way to prevent side effects in python?

Is there any way to prevent side effects in python? For example, the following function has a side effect, is there any keyword or any other way to have the python complain about it?
def func_with_side_affect(a):
a.append('foo')

Python is really not set up to enforce prevention of side-effects. As some others have mentioned, you can try to deepcopy the data or use immutable types, but these still have corner cases that are tricky to catch, and it's just a ton more effort than it's worth.
Using a functional style in Python normally involves the programmer simply designing their functions to be functional. In other words, whenever you write a function, you write it in such a way that it doesn't mutate the arguments.
If you're calling someone else's function, then you have to make sure the data you are passing in either cannot be mutated, or you have to keep around a safe, untouched copy of the data yourself, that you keep away from that untrusted function.

No, but with you example, you could use immutable types, and pass tuple as an a argument. Side effects can not affect immutable types, for example you can not append to tuple, you could only create other tuple by extending given.
UPD: But still, your function could change objects which is referenced by your immutable object (as it was pointed out in comments), write to files and do some other IO.

Sorry really late to the party. You can use effect library to isolate side-effects in your python code. As others have said in Python you have to explicitly write functional style code but this library really encourages towards it.

About the only way to enforce that would be to overwrite the function specification to deepcopy any arguments before they are passed to the original function. You could to that with a function decorator.
That way, the function has no way to actually change the originally passed arguments. This however has the "sideeffect" of a considerable slowdown as the deepcopy operation is rather costly in terms of memory (and garbage-collection) usage as well as CPU consumption.
I'd rather recommend you properly test your code to ensure that no accidental changes happen or use a language that uses full copy-by-value semantics (or has only immutable variables).
As another workaround, you could make your passed objects basically immutable by adding this to your classes:
"""An immutable class with a single attribute 'value'."""
def __setattr__(self, *args):
raise TypeError("can't modify immutable instance")
__delattr__ = __setattr__
def __init__(self, value):
# we can no longer use self.value = value to store the instance data
# so we must explicitly call the superclass
super(Immutable, self).__setattr__('value', value)
(Code copied from the Wikipedia article about Immutable object)

Since any Python code can do IO, any Python code could launch intercontinental ballistic missiles (and I'd consider launching ICBMs to be a fairly catastrophic side effect for most purposes).
The only way to avoid side effects is to not use Python code in the first place but rather data - i.e. you end up creating a domain specific language which disallows side effects, and a Python interpreter which executes programs of that language.

You'll have to make a copy of the list first. Something like this:
def func_without_side_affect(a):
b = a[:]
b.append('foo')
return b
This shorter version might work for you too:
def func_without_side_affect(a):
return a[:] + ['foo']
If you have nested lists or other things like that, you'll probably want to look at copy.deepcopy to make the copy instead of the [:] slice operator.

It would be very difficult to do for the general case, but for some practical cases you could do something like this:
def call_function_checking_for_modification(f, *args, **kwargs):
myargs = [deepcopy(x) for x in args]
mykwargs = dict((x, deepcopy(kwargs[x])) for x in kwargs)
retval = f(*args, **kwargs)
for arg, myarg in izip(args, myargs):
if arg != myarg:
raise ValueError, 'Argument was modified during function call!'
for kwkey in kwargs:
if kwargs[kwkey] != mykwargs[kwkey]:
raise ValueError, 'Argument was modified during function call!'
return retval
But, obviously, there are a few issues with this. For trivial things (i.e. all the inputs are simple types), then this isn't very useful anyways - those will likely be immutable, and in any case they are easy (well, relatively) to detect than complex types.
For complex types though, the deepcopy will be expensive, and there's no guarantee that the == operator will actually work correctly. (and simple copy isn't good enough... imagine a list, where one element changes value... a simple copy will just store a reference, and so the original value with change too).
In general, though, this is not that useful, since if you are already worried about side effects with calling this functions, you can just guard against them more intelligently (by storing your own copy if needed, auditing the destination function, etc), and if it's your function you are worried about causing side effects, you will have audited it to make sure.
Something like the above could be wrapped in a decorator though; with the expensive parts gated by a global variable (if _debug == True:, something like that), it could maybe be useful in projects where lots of people are editing the same code, though, i guess...
Edit: This only works for environments where a more 'strict' form of 'side effects' is expected. In many programming languages, you can make the available of side effects much more explicit - in C++ for instance, everything is by value unless explicitly a pointer or reference, and even then you can declare incoming references as const so that it can't be modified. There, 'side effects' can throw errors at compile time. (of course there are way to get some anyways).
The above enforces that any modified values are in the return value/tuple. If you are in python 3 (i'm not yet) I think you could specify decoration in the function declaration itself to specify attributes of function arguments, including whether they would be allowed to be modified, and include that in the above function to allow some arguments explicitly to be mutable.
Note that I think you could probably also do something like this:
class ImmutableObject(object):
def __init__(self, inobj):
self._inited = False
self._inobj = inobj
self._inited = True
def __repr__(self):
return self._inobj.__repr__()
def __str__(self):
return self._inobj.__str__()
def __getitem__(self, key):
return ImmutableObject(self._inobj.__getitem__(key))
def __iter__(self):
return self.__iter__()
def __setitem__(self, key, value):
raise AttributeError, 'Object is read-only'
def __getattr__(self, key):
x = getattr(self._inobj, key)
if callable(x):
return x
else:
return ImmutableObject(x)
def __setattr__(self, attr, value):
if attr not in ['_inobj', '_inited'] and self._inited == True:
raise AttributeError, 'Object is read-only'
object.__setattr__(self, attr, value)
(Probably not a complete implementation, haven't tested much, but a start). Works like this:
a = [1,2,3]
b = [a,3,4,5]
print c
[[1, 2, 3], 3, 4, 5]
c[0][1:] = [7,8]
AttributeError: Object is read-only
It would let you protect a specific object from modification if you didn't trust the downstream function, while still being relatively lightweight. Still requires explicit wrapping of the object though. You could probably build a decorator to do this semi-automatically though for all arguments. Make sure to skip the ones that are callable.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.