mypy - callable with derived classes gives error - python

class BaseClass:
p: int
class DerivedClass(BaseClass):
q: int
def p(q: Callable[[BaseClass], str]) -> None:
return None
def r(derived: DerivedClass) -> str:
return ""
p(r)
Expected behavior:
    - No error from mypy -
Actual behavior:
Argument 1 to "p" has incompatible type "Callable[[DerivedClass], str]";
expected "Callable[[BaseClass], str]"

Let's talk about type variance. Under typical subtyping rules, if we have a type DerivedClass that is a subtype of a type BaseClass, then every instance of DerivedClass is an instance of BaseClass. Simple enough, right? But now the complexity arises when we have generic type arguments.
Let's suppose that we have a class that gets a value and returns it. I don't know how it gets it; maybe it queries a database, maybe it reads the file system, maybe it just makes one up. But it gets a value.
class Getter:
def get_value(self):
# Some deep magic ...
Now let's assume that, when we construct the Getter, we know what type it should be querying at compile-time. We can use a type variable to annotate this.
T = TypeVar("T")
class Getter(Generic[T]):
def get_value(self) -> T:
...
Now, Getter is a valid thing. We can have a Getter[int] which gets an integer and a Getter[str] which gets a string.
But here's a question. If I have a Getter[int], is that a valid Getter[object]? Surely, if I can get a value as an int, it's easy enough to upcast it, right?
my_getter_int: Getter[int] = ...
my_getter_obj: Getter[object] = my_getter_int
But Python won't allow this. See, Getter was declared to be invariant in its type argument. That's a fancy way of saying that, even though int is a subtype of object, Getter[int] and Getter[object] have no relationship.
But, like I said, surely they should have a relationship, right? Well, yes. If your type is only used in positive position (glossing over some details, that means roughly that it only appears as the return value of methods or as the type of read-only properties), then we can make it covariant.
T_co = TypeVar("T_co", covariant=True)
class Getter(Generic[T_co]):
def get_value(self) -> T_co:
...
By convention, in Python, we denote covariant type arguments using names that end in _co. But the thing that actually makes it covariant here is the covariant=True keyword argument.
Now, with this version of Getter, Getter[int] is actually a subtype of Getter[object]. In general, if A is a subtype of B, then Getter[A] is a subtype of Getter[B]. Covariance preserves subtyping.
Okay, that's covariance. Now consider the opposite. Let's say we have a setter which sets some value in a database.
class Setter:
def set_value(self, value):
...
Same assumptions as before. Suppose we know what the type is in advance. Nowe we write
T = TypeVar("T")
class Setter:
def set_value(self, value: T) -> None:
...
Okay, great. Now, if I have a value my_setter : Setter[int], is that a Setter[object]? Well, my_setter can always take an integer value, whereas a Setter[object] is guaranteed to be able to take any object. my_setter can't guarantee that, so it's actually not. If we try to make T covariant in this example, we'll get
error: Cannot use a covariant type variable as a parameter
Because it's actually not a valid relationship. In fact, in this case, we get the opposite relationship. If we have a my_setter : Setter[object], then that's a guarantee that we can pass it any object at all, so certainly we can pass it an integer, hence we have a Setter[int]. This is called contravariance.
T_contra = TypeVar("T_contra", contravariant=True)
class Setter:
def set_value(self, value: T_contra) -> None:
...
We can make our type contravariant if it only appears in negative position, which (again, oversimplifying a bit) generally means that it appears as arguments to functions, but not as a return value. Now, Setter[object] is a subtype of Setter[int]. It's backwards. In general, if A is a subtype of B, then Setter[B] is a subtype of Setter[A]. Contravariance reverses the subtyping relationship.
Now, back to your example. You have a Callable[[DerivedClass], str] and want to know if it's a valid Callable[[BaseClass], str]
Applying our principles from before, we have a type Callable[[T], S] (I'm assuming only one argument for simplicity's sake, but in reality this works in Python for any number of arguments) and want to ask if T and S are covariant, contravariant, or invariant.
Well, what is a Callable? It's a function. It has one thing we can do: call it with a T and get an S. So it's pretty clear that T is only used as an argument and S as a result. Things only used as arguments are contravariant, and those used as results are covariant, so in reality it's more correct to write
Callable[[T_contra], S_co]
Arguments to Callable are contravariant, which means that if DerivedClass is a subtype of BaseClass, then Callable[[BaseClass], str] is a subtype of Callable[[DerivedClass], str], the opposite relationship to the one you suggested. You need a function that can accept any BaseClass. A function with a BaseClass argument would suffice, and so would a function with an object argument, or any type which is a supertype of BaseClass, but subtypes are insufficient because they're too specific for your contract.

MyPy objects to your call of p with r as its argument because given only the type signatures, it can't be sure the function won't be called with a non-DerivedClass instance.
For instance, given the same type annotations, p could be implemented like this:
def p(q: Callable[[BaseClass], str]) -> None:
obj = BaseClass()
q(obj)
This will break p(r) if r has an implementation that depends on the derived attributes of its argument:
def r(derived: DerivedClass) -> str:
return str(derived.q)

Related

Type Narrowing of Class Attributes in Python (TypeGuard) without Subclassing

Consider I have a python class that has a attributes (i.e. a dataclass, pydantic, attrs, django model, ...) that consist of a union, i.e. None and and a state.
Now I have a complex checking function that checks some values.
If I use this checking function, I want to tell the type checker, that some of my class attributes are narrowed.
For instance see this simplified example:
import dataclasses
from typing import TypeGuard
#dataclasses.dataclass
class SomeDataClass:
state: tuple[int, int] | None
name: str
# Assume many more data attributes
class SomeDataClassWithSetState(SomeDataClass):
state: tuple[int, int]
def complex_check(data: SomeDataClass) -> TypeGuard[SomeDataClassWithSetState]:
# Assume some complex checks here, for simplicity it is only:
return data.state is not None and data.name.startswith("SPECIAL")
def get_sum(data: SomeDataClass) -> int:
if complex_check(data):
return data.state[0] + data.state[1]
return 0
Explore on mypy Playground
As seen it is possible to do this with subclasses, which for various reason is not an option for me:
it introduces a lot of duplication
some possible libraries used for dataclasses are not happy with being subclasses without side condition
there could be some Metaclass or __subclasses__ magic that handles all subclass specially, i.e. creating database for the dataclasses
So is there an option to type narrow a(n) attribute(s) of a class without introducing a solely new class, as proposed here?
TL;DR: You cannot narrow the type of an attribute. You can only narrow the type of an object.
As I already mentioned in my comment, for typing.TypeGuard to be useful it relies on two distinct types T and S. Then, depending on the returned bool, the type guard function tells the type checker to assume the object to be either T or S.
You say, you don't want to have another class/subclass alongside SomeDataClass for various (vaguely valid) reasons. But if you don't have another type, then TypeGuard is useless. So that is not the route to take here.
I understand that you want to reduce the type-safety checks like if obj.state is None because you may need to access the state attribute in multiple different places in your code. You must have some place in your code, where you create/mutate a SomeDataClass instance in a way that ensures its state attribute is not None. One solution then is to have a getter for that attribute that performs the type-safety check and only ever returns the narrower type or raises an error. I typically do this via #property for improved readability. Example:
from dataclasses import dataclass
#dataclass
class SomeDataClass:
name: str
optional_state: tuple[int, int] | None = None
#property
def state(self) -> tuple[int, int]:
if self.optional_state is None:
raise RuntimeError("or some other appropriate exception")
return self.optional_state
def set_state(obj: SomeDataClass, value: tuple[int, int]) -> None:
obj.optional_state = value
if __name__ == "__main__":
foo = SomeDataClass(optional_state=(1, 2), name="foo")
bar = SomeDataClass(name="bar")
baz = SomeDataClass(name="baz")
set_state(bar, (2, 3))
print(foo.state)
print(bar.state)
try:
print(baz.state)
except RuntimeError:
print("baz has no state")
I realize you mean there are many more checks happening in complex_check, but either that function doesn't change the type of data or it does. If the type remains the same, you need to introduce type-safety for attributes like state in some other place, which is why I suggest a getter method.
Another option is obviously to have a separate class, which is what is typically done with FastAPI/Pydantic/SQLModel for example and use clever inheritance to reduce code duplication. You mentioned this may cause problems because of subclassing magic. Well, if it does, use the other approach, but I can't think of an example that would cause the problems you mentioned. Maybe you can be more specific and show a case where subclassing would lead to problems.

Use list of derived class as list of base class in Python

I have a function which takes a list of a base class as argument, and I have a variable which is a list of a derived class. Using this variable as the argument gives mypy error: Argument 1 to "do_stuff" has incompatible type "List[DerivedClass]"; expected "List[BaseClass]".
class BaseClass(TypedDict):
base_field: str
class DerivedClass(BaseClass):
derived_field: str
def do_stuff(data: List[BaseClass]) -> None:
pass
foo: List[DerivedClass] = [{'base_field': 'foo', 'derived_field': 'bar'}]
do_stuff(foo)
If the argument and variable are instead BaseClass and DerivedClass respectively, i.e. not lists, it understands that the variable can be casted implicitly to the base class. But for lists it doesn't work. How can I solve this, preferably other than #type: ignore.
It depends on what exactly do_stuff is doing, but nine times out of ten the best solution is to use Sequence instead of List:
from typing import Sequence
def do_stuff(data: Sequence[BaseClass]) -> None:
pass
The reason you can't use List[BaseClass] here is that do_stuff would be allowed to add BaseClass instances to data, which would in turn break foo in the caller. Sequence doesn't imply mutability, so do_stuff is not allowed (static-typing-wise) to modify a Sequence parameter, which prevents that issue. (Put differently, Sequence is covariant and List is invariant. Most mutable generics are invariant because of exactly this issue.)
If do_stuff does need to mutate data, you'll need to rethink the typing -- should it be allowed to add a BaseClass to it? If not, maybe do_stuff should take a List[DerivedClass]. If so, you need to declare foo as a List[BaseClass] to account for that possibility.
You can try using a TypeVar and bound it to BaseClass. In code, that looks like
from typing import TypeVar
T = TypeVar('T', bound=BaseClass)
def do_stuff(data: list[T]) -> None:
pass
this also makes mypy happy

What is the static type of self?

I want to constrain a method parameter to be of the same type as the class it's called on (see the end for an example). While trying to do that, I've come across this behaviour that I'm struggling to get my head around.
The following doesn't type check
class A:
def foo(self) -> None:
pass
A.foo(1)
with
error: Argument 1 to "foo" of "A" has incompatible type "int"; expected "A"
as I'd expect, since I'd have thought A.foo should only take an A. If however I add a self type
from typing import TypeVar
Self = TypeVar("Self")
class A:
def foo(self: Self) -> None:
pass
A.foo(1)
it does type check. I would have expected it to fail, telling me I need to pass an A not an int. This suggests to me that the type checker usually infers the type A for self, and adding a Self type overrides that, I'm guessing to object. This fits with the error
from typing import TypeVar
Self = TypeVar("Self")
class A:
def bar(self) -> int:
return 0
def foo(self: Self) -> None:
self.bar()
error: "Self" has no attribute "bar"
which I can fix if I bound as Self = TypeVar("Self", bound='A')
Am I right that this means self is not constrained, in e.g. the same way I'd expect this to be constrained in Scala?
I guess this only has an impact if I specify the type of self to be anything but the class it's defined on, intentionally or otherwise. I'm also interested to know what the impact is of overriding self to be another type, and indeed whether it even makes sense with how Python resolves and calls methods.
Context
I want to do things like
class A:
def foo(self: Self, bar: List[Self]) -> Self:
...
but I was expecting Self to be constrained to be an A, and was surprised that it wasn't.
Two things:
self is only half-magic.
The self arg has the magical property that, if you call an attribute of an object as a function, and that function has self as its first arg, then the object itself will be prepended to the explicit args as the self.
I guess any good static analyzer would take as implicit that self has the class in question as its type, which is what you're seeing in your first example.
TypeVar is for polymorphism.
And I think that's what you're trying to do? In your third example, Self can be any type, depending on context. In the context of A.foo(1), Self is int, so self.bar() fails.
It may be possible to write an instance method that can be called as a static method against class non-members with parametric type restrictions, but it's probably not a good idea for any application in the wild. Just name the variable something else and declare the method to be static.
If you omit a type hint on self, the type checker will automatically assume it has whatever the type of the containing class is.
This means that:
class A:
def foo(self) -> None: pass
...is equivalent to doing:
class A:
def foo(self: A) -> None: pass
If you want self to be something else, you should set a custom type hint.
Regarding this code snippet:
from typing import TypeVar
Self = TypeVar("Self")
class A:
def foo(self: Self) -> None:
pass
A.foo(1)
Using a TypeVar only once in a function signature is either malformed or redundant, depending on your perspective.
But this is kind of unrelated to the main thrust of your question. We can repair your code snippet by instead doing:
from typing import TypeVar
Self = TypeVar("Self")
class A:
def foo(self: Self) -> Self:
return self
A.foo(1)
...which exhibits the same behaviors you noticed.
But regardless of which of the two code snippets we look at, I believe the type checker will indeed assume self has the same type as whatever the upper bound of Self is while type checking the body of foo. In this case, the upper bound is object, as you suspected.
We get this behavior whether or not we're doing anything fancy with self or not. For example, we'd get the exact same behavior by just doing:
def foo(x: Self) -> Self:
return x
...and so forth. From the perspective of the type checker, there's nothing special about the self parameter, except that we set a default type for it if it's missing a type hint instead of just using Any.
error: "Self" has no attribute "bar"
which I can fix if I bound as Self = TypeVar("Self", bound='A')
Am I right that this means self is not constrained, in e.g. the same way I'd expect this to be constrained in Scala?
I'm unfamiliar with how this is constrained in Scala, but it is indeed the case that if you chose to override the default type of self, you are responsible for setting your own constraints and bounds as appropriate.
To put it another way, once a TypeVar is defined, its meaning won't be changed when you try using it in a function definition. This is the rule for TypeVars/functions in general. And since mostly there's nothing special about self, the same rule also applies there.
(Though type checkers such as mypy will also try doing some basic sanity checks on whatever constraints you end up picking to ensure you don't end up with a method that's impossible to call or whatever. For example, it complain if you tried setting the bound of Self to int.)
Note that doing things like:
from typing import TypeVar, List
Self = TypeVar('Self', bound='A')
class A:
def foo(self: Self, bar: List[Self]) -> Self:
...
class B(A): pass
x = A().foo([A(), A()])
y = B().foo([B(), B()])
reveal_type(x) # Revealed type is 'A'
reveal_type(y) # Revealed type is 'B'
...is explicitly supported by PEP 484. The mypy docs also have a few examples.

Python: overwrite __int__

A working solution, which returns integers, was given in Overload int() in Python.
However, it works only for returning int, but not float or let's say a list:
class Test:
def __init__(self, mylist):
self.mylist = mylist
def __int__(self):
return list(map(int, self.mylist))
t = Test([1.5, 6.1])
t.__int__() # [1, 6]
int(t)
Thus t.__int__() works, but int(t) gives TypeError: __int__ returned non-int (type list).
Thus, is there a possibility to fully overwrite int, maybe with __getattribute__ or metaclass?
The __int__, __float__, ... special methods and various others do not overwrite their respective type such as int, float, etc. These methods serve as hooks that allow the types to ask for an appropriate value. The types will still enforce that a proper type is provided.
If required, one can actually overwrite int and similar on the builtins module. This can be done anywhere, and has global effect.
import builtins
# store the original ``int`` type as a default argument
def weakint(x, base=None, _real_int=builtins.int):
"""A weakly typed ``int`` whose return type may be another type"""
if base is None:
try:
return type(x).__int__(x)
except AttributeError:
return _real_int(x)
return _real_int(x, base)
# overwrite the original ``int`` type with the weaker one
builtins.int = weakint
Note that replacing builtin types may violate assumptions of code about these types, e.g. that type(int(x)) is int holds true. Only do so if absolutely required.
This is an example of how to replace int(...). It will break various features of int being a type, e.g. checking inheritance, unless the replacement is a carefully crafted type. A full replacement requires emulating the initial int type, e.g. via custom subclassing checks, and will not be completely possible for some builtin operations.
From the doc of __int__
Called to implement the built-in functions complex(), int() and float(). Should return a value of the appropriate type.
Here your method return a list and not an int, this works when calling it explictly but not using int() which checks the type of what is returned by the __int__
Here is a working example of what if could be, even if the usage if not very pertinent
class Test:
def __init__(self, mylist):
self.mylist = mylist
def __int__(self):
return int(sum(self.mylist))

How to make Mypy deal with subclasses in functions as expected

I have the following code:
from typing import Callable
MyCallable = Callable[[object], int]
MyCallableSubclass = Callable[['MyObject'], int]
def get_id(obj: object) -> int:
return id(obj)
def get_id_subclass(obj: 'MyObject') -> int:
return id(obj)
def run_mycallable_function_on_object(obj: object, func: MyCallable) -> int:
return func(obj)
class MyObject(object):
'''Object that is a direct subclass of `object`'''
pass
my_object = MyObject()
# works just fine
run_mycallable_function_on_object(my_object, get_id)
# Does not work (it runs, but Mypy raises the following error:)
# Argument 2 to "run_mycallable_function_on_object" has incompatible type "Callable[[MyObject], int]"; expected "Callable[[object], int]"
run_mycallable_function_on_object(my_object, get_id_subclass)
Since MyObject inherits from object, why doesn't MyCallableSubclass work in every place that MyCallable does?
I've read a bit about the Liskov substitution principle, and also consulted the Mypy docs about covariance and contravariance. However, even in the docs themselves, they give a very similar example where they say
Callable is an example of type that behaves contravariant in types of arguments, namely Callable[[Employee], int] is a subtype of Callable[[Manager], int].
So then why is using Callable[[MyObject], int] instead of Callable[[object], int] throwing an error in Mypy?
Overall I have two questions:
Why is this happening?
How do I fix it?
As I was writing this question, I realized the answer to my problem, so I figured I'd still ask the question and answer it to save people some time with similar questions later.
What's going on?
Notice that last example from the Mypy docs:
Callable is an example of type that behaves contravariant in types of arguments, namely Callable[[Employee], int] is a subtype of Callable[[Manager], int].
Here, Manager subclasses from Employee. That is, if something is expecting a function that can take in managers, it's alright if the function it gets overgeneralizes and can take in any employee, because it will definitely take in managers.
However, in our case, MyObject subclasses from object. So, if something is expecting a function that can take in objects, then it's not okay if the function it gets overspecifies and can only take in MyObjects.
Why? Imagine a class called NotMyObject that inherits from object, but doesn't inherit from MyObject. If a function should be able to take any object, it should be able to take in both NotMyObjects and MyObjects. However, the specific function can only take in MyObjects, so it won't work for this case.
How can I fix it?
Mypy is correct. You need to have the more specific function (MyCallableSubclass) as the type, otherwise either your code could have bugs, or you are typing incorrectly.

Categories