Efficient arithmetic special methods in Cython - python

According to the Cython documentation regarding arithmetic special methods (operator overloads), the way they're implemented, I can't rely on self being the object whose special method is being called.
Evidently, this has two consequences:
I can't specify a static type in the method declaration. For example, if I have a class Foo which can only be multiplied by, say, an int, then I can't have def __mul__(self, int op) without seeing TypeErrors (sometimes).
In order to decide what to do, I have to check the types of the operands, presumably using isinstance() to handle subclasses, which seems farcically expensive in an operator.
Is there any good way to handle this while retaining the convenience of operator syntax? My whole reason for switching my classes to Cython extension types is to improve efficiency, but as they rely heavily on the arithmetic methods, based on the above it seems like I'm actually going to make them worse.

If I understand the docs and my test results correctly, you actually can have a fast __mul__(self, int op) on a Foo, but you can only use it as Foo() * 4, not 4 * Foo(). The latter would require an __rmul__, which is not supported, so it always raises TypeError.
The fact that the second argument is typed int means that Cython does the typecheck for you, so you can be sure that the left argument is really self.

Related

How to implement #dataclass to define arithmetic operations in Python?

I'm learning Python on my own and I found a task that requires using a decorator #dataclass to create a class with basic arithmetic operations.
from dataclasses import dataclass
from numbers import Number
#dataclass
class MyClass:
x: float
y: float
def __add__(self, other):
match other:
case Number():
return MyClass(float(other) + self.x, self.y)
case MyClass(ot_x, ot_y):
return MyClass(self.x + ot_x, self.y + ot_y)
__radd__ = __add__
I have implemented the addition operation. But I also need to do the operations of subtraction __sub__, multiplication __mul__, division __truediv__, negation __neg__, also __mod__ and __pow__. But I couldn't realize these operations. The main thing for me is to use the construction match/case. Maybe there are simpler ways to create it.
I will be glad of your help.
If you're trying to make a complete numeric type, I strongly suggest checking out the implementation of the fractions.Fraction type in the fractions source code. The class was intentionally designed as a model for how you'd overload all the pairs of operators needed to implement a numeric type at the Python layer (it's explicitly pointed out in the numbers module's guide to type implementers).
The critical parts for minimizing boilerplate begin with the definition of the _operator_fallbacks utility function within the class (which is used to take a single implementation of the operation and the paired operator module function representing it, and generate the associated __op__ and __rop__ operators, being type strict for the former and relaxed for the latter, matching the intended behavior of each operator based on whether it's the first chance or last chance to implement the method).
It's far too much code to include here, but to show how you'd implement addition using it, I'll adapt your code to call it (you'd likely use a slightly different implementation of _operator_fallbacks, but the idea is the same):
import operator
# Optional, but if you want to act like built-in numeric types, you
# should be immutable, and using slots (if you can rely on Python 3.10+)
# dramatically reduce per-instance memory overhead
# Pre-3.10, since x and y don't have defaults, you could define __slots__ manually
#dataclass(frozen=True, slots=True)
class MyClass:
x: float
y: float
# _operator_fallbacks defined here
# When it received a non-MyClass, it would construct a MyClass from it, e.g.
# to match your original code it would construct it as MyClass(float(val), 0)
# and then invoke the monomorphic_operator, e.g. the _add passed to it below
# or, if the type did not make sense to convert to MyClass, but it made sense
# to convert the MyClass instance to the other type, it would do so, then use the
# provided fallback operator to perform the computation
# For completely incompatible types, it just returns NotImplemented
def _add(a, b):
"""a + b"""
return MyClass(a.x + b.x, a.y + b.y) # _operator_fallback has already coerced the types appropriately
__add__, __radd__ = _operator_fallbacks(_add, operator.add)
By putting the ugliness of type-checking and coercion in common code found in _operator_fallbacks, and putting only the real work of addition in _add, it avoids a lot of every-operator-overload boilerplate (as you can see here; _operator_fallbacks will be a page of code to make the forward and reverse functions and return them, but each new operator is only a few lines, defining the monomorphic operator and calling _operator_fallbacks to generate the __op__/__rop__ pair.

Is this an example of python function overload?

I know python does not allow us to overload functions. However, does it have inbuilt overloaded methods?
Consider this:
setattr(object_name,'variable', 'value')
setattr(class_name,'method','function')
The first statement dynamically adds variables to objects during run time, but the second one attaches outside functions to classes at run time.
The same function does different things based on its arguments. Is this function overload?
The function setattr(foo, 'bar', baz) is always the same as foo.bar = baz, regardless of the type of foo. There is no overloading here.
In Python 3, limited overloading is possible with functools.singledispatch, but setattr is not implemented with that.
A far more interesting example, in my opinion, is type(). type() does two entirely different things depending on how you call it:
If called with a single argument, it returns the type of that argument.
If called with three arguments (of the correct types), it dynamically creates a new class.
Nevertheless, type() is not overloaded. Why not? Because it is implemented as one function that counts how many arguments it got and then decides what to do. In pure Python, this is done with the variadic *args syntax, but type() is implemented in C, so it looks rather different. It's doing the same thing, though.
Python, in some sense, doesn't need a function overloading capability when other languages do. Consider the following example in C:
int add(int x, int y) {
return x + y;
}
If you wish to extend the notion to include stuff that are not integers you would need to make another function:
float add(float x, float y) {
return x + y;
}
In Python, all you need is:
def add(x, y):
return x + y
It works fine for both, and it isn't considered function overloading. You can also handle different cases of variable types using methods like isinstance. The major issue, as pointed out by this question, is the number of types. But in your case you pass the same number of types, and even so, there are ways around this without function overloading.
overloading methods is tricky in python. However, there could be usage of passing the dict, list or primitive variables.
I have tried something for my use cases, this could help here to understand people to overload the methods.
Let's take the example:
a class overload method with call the methods from different class.
def add_bullet(sprite=None, start=None, headto=None, spead=None, acceleration=None):
pass the arguments from remote class:
add_bullet(sprite = 'test', start=Yes,headto={'lat':10.6666,'long':10.6666},accelaration=10.6}
OR add_bullet(sprite = 'test', start=Yes,headto={'lat':10.6666,'long':10.6666},speed=['10','20,'30']}
So, handling is being achieved for list, Dictionary or primitive variables from method overloading.
try it out for your codes

Can some operators in Python not be overloaded properly?

I am studying Scott Meyers' More Effective C++. Item 7 advises to never overload && and ||, because their short-circuit behavior cannot be replicated when the operators are turned into function calls (or is this no longer the case?).
As operators can also be overloaded in Python, I am curious whether this situation exists there as well. Is there any operator in Python (2.x, 3.x) that, when overridden, cannot be given its original meaning?
Here is an example of 'original meaning'
class MyInt {
public:
MyInt operator+(MyInt &m) {
return MyInt(this.val + m.val);
};
int val;
MyInt(int v) : val(v){}
}
Exactly the same rationale applies to Python. You shouldn't (and can't) overload and and or, because their short-circuiting behavior cannot be expressed in terms of functions. not isn't permitted either - I guess this is because there's no guarantee that it will be invoked at all.
As pointed out in the comments, the proposal to allow the overloading of logical and and or was officially rejected.
The assignment operator can also not be overloaded.
class Thing: ...
thing = Thing()
thing = 'something else'
There is nothing you can override in Thing to change the behavior of the = operator.
(You can overload property assignment though.)
In Python, all object methods that represent operators are treated "equal": their precedences are described in the language model, and there is no conflict with overriding any.
But both C++ "&&" and "||" - in Python "and" and "or" - are not available in Python as object methods to start with - they check for the object truthfulness, though - which is defined by __bool__. If __bool__is not implemented, Python check for a __len__ method, and check if its output is zero, in which case the object's truth value is False. In all other cases its truth value is True. That makes it for any semantic problems that would arise from combining overriding with the short-circuiting behavior.
Note one can override & and | by implementing __and__ and __or__ with no problems.
As for the other operators, although not directly related, one should just take care with __getattribute__ - the method called when retrieving any attribute from an object (we normally don't mention it as an operator) - including calls from within itself. The __getattr__ is also in place, and is just invoked at the end of the attribute search chain, when an attribute is not found.

Is there a tool to statically analyse python code to determine if mutable parameters are changes (detect side effects)

Is there a Python static analysis tool which can detect when function parameters are mutated, therefore causing a side-effect?
that is
def foo(x):
x.append("x at the end")
will change the calling scope x when x is a list.
Can this reliably be detected? I'm asking because such a tool would make it easier to comply with pure functional approaches.
I suppose a decorator could be used to warn about it (for development) but this wouldn't be as reliable as static analysis.
Your foo function will mutate its argument if it's called with a list—but if it's called with something different, it might raise an exception, or do something that doesn't mutate it.
Similarly, you can write a type that mutates itself every time you call len on it, and then a function that just printing the length of its argument would be mutating its arguments.
It's even worse if you use an operator like +=, which will call the (generally-mutating) __iadd__ method on types that have it, like list, but will call the (non-mutating) __add__ method on types that don't, like tuple. So, what are you going to do in those cases?
For that matter, even a for loop over an argument is mutating if you pass in an iterator, but (usually) not if you pass in a sequence.
If you just want to make a list of frequently-mutating method names and operators and search for those, that wouldn't be too hard to write as an AST visitor. But that's going to give you a lot of both false negatives and false positives.
This is exactly the kind of problem that static typing was designed to solve. Python doesn't have static typing built it, but it's possible to build on top of Python.
First, if you're using Python 3.x, you can use annotations to store the types of the parameters. For example:
def foo(x: MutableSequence) -> NoneType:
x.append("x at the end")
Now you know, from the fact that it takes a MutableSequence (or a list) rather than a Sequence, that it intends to mutate its parameter. And, even if it doesn't do so now, some future version might well do so, so you should trust its annotations anyway.
And now you can solve your problem the same way you would in Haskell or ML: your pure functional code takes a Sequence and it calls functions with that Sequence, and you just need to ensure that none of those functions is defined to take a MutableSequence, right?
That last part is the hard part. Python doesn't stop me from writing this:
def foo(x: Sequence) -> NoneType:
x.append("x at the end")
For that, you need a static type checker. Guido has been pushing to standardize annotations to allow the mypy static checker to become a semi-official part of Python. It's not completely finished yet, and it's not as powerful a type system as typical typed functional languages, but it will handle most Python code well enough for what you're looking for. But mypy isn't the only static type checker available; there are others if you search.
Anyway, with a type checker, that foo function would fail with an error explaining that Sequence has no such method append. And if, on the other hand, foo were properly defined as taking a MutableSequence, your functional code that calls it with a Sequence would fail with an error explaining that Sequence is not a subtype of MutableSequence.

How do you set a conditional in python based on datatypes?

This question seems mind-boggling simple, yet I can't figure it out. I know you can check datatypes in python, but how can you set a conditional based on the datatype? For instance, if I have to write a code that sorts through a dictionary/list and adds up all the integers, how do I isolate the search to look for only integers?
I guess a quick example would look something like this:
y = []
for x in somelist:
if type(x) == <type 'int'>: ### <--- psuedo-code line
y.append(x)
print sum(int(z) for z in y)
So for line 3, how would I set such a conditional?
How about,
if isinstance(x, int):
but a cleaner way would simply be
sum(z for z in y if isinstance(z, int))
TLDR:
Use if isinstance(x, int): unless you have a reason not to.
Use if type(x) is int: if you need exact type equality and nothing else.
Use try: ix = int(x) if you are fine with converting to the target type.
There is a really big "it depends" to type-checking in Python. There are many ways to deal with types, and all have their pros and cons. With Python3, several more have emerged.
Explicit type equality
Types are first-class objects, and you can treat them like any other value.
So if you want the type of something to be equal to int, just test for it:
if type(x) is int:
This is the most restrictive type of testing: it requires exact type equality. Often, this is not what you want:
It rules out substitute types: a float would not be valid, even though it behaves like an int for many purposes.
It rules out subclasses and abstract types: a pretty-printing int subclass or enum would be rejected, even though they are logically Integers.
This severely limits portability: Python2 Strings can be either str or unicode, and Integers can be either int or long.
Note that explicit type equality has its uses for low-level operations:
Some types cannot be subclassed, such as slice. An explicit check is, well, more explicit here.
Some low-level operations, such as serialisation or C-APIs, require specific types.
Variants
A comparison can also be performed against the __class__ attribute:
if x.__class__ is int:
Note if a class defines a __class__ property, this is not the same as type(x).
When there are several classes to check for, using a dict to dispatch actions is more extensible and can be faster (≥5-10 types) than explicit checks.
This is especially useful for conversions and serialisation:
dispatch_dict = {float: round, str: int, int: lambda x: x}
def convert(x):
converter = self.dispatch_dict[type(x)] # lookup callable based on type
return converter(x)
Instance check on explicit types
The idiomatic type test uses the isinstance builtin:
if isinstance(x, int):
This check is both exact and performant. This is most often what people want for checking types:
It handles subtypes properly. A pretty-printing int subclass would still pass this test.
It allows checking multiple types at once. In Python2, doing isinstance(x, (int, long)) gets you all builtin integers.
Most importantly, the downsides are negligible most of the time:
It still accepts funky subclasses that behave in weird ways. Since anything can be made to behave in weird ways, this is futile to guard against.
It can easily be too restrictive: many people check for isinstance(x, list) when any sequence (e.g. tuple) or even iterable (e.g. a generator) would do as well. This is more of a concern for general purpose libraries than scripts or applications.
Variant
If you already have a type, issubclass behaves the same:
if issubclass(x_type, int):
Instance check on abstract type
Python has a concept of abstract base classes. Loosely speaking, these express the meaning of types, not their hierarchy:
if isinstance(x, numbers.Real): # accept anything you can sum up like a number
In other words, type(x) does not necessarily inherit from numbers.Real but must behave like it.
Still, this is a very complex and difficult concept:
It is often overkill if you are looking for basic types. An Integer is simply an int most of the time.
People coming from other languages often confuse its concepts.
Distinguishing it from e.g. C++, the emphasis is abstract base class as opposed to abstract base class.
ABCs can be used like Java interfaces, but may still have concrete functionality.
However, it is incredibly useful for generic libraries and abstractions.
Many functions/algorithms do not need explicit types, just their behaviour.
If you just need to look up things by key, dict restricts you to a specific in-memory type. By contrast, collections.abc.Mapping also includes database wrappers, large disk-backed dictionaries, lazy containers, ... - and dict.
It allows expressing partial type constraints.
There is no strict base type implementing iteration. But if you check objects against collections.abc.Iterable, they all work in a for loop.
It allows creating separate, optimised implementations that appear as the same abstract type.
While it is usually not needed for throwaway scripts, I would highly recommend using this for anything that lives beyond a few python releases.
Tentative conversion
The idiomatic way of handling types is not to test them, but to assume they are compatible. If you already expect some wrong types in your input, simply skip everything that is not compatible:
try:
ix = int(x)
except (ValueError, TypeError):
continue # not compatible with int, try the next one
else:
a.append(ix)
This is not actually a type check, but usually serves the same intention.
It guarantees you have the expected type in your output.
It has some limited leeway in converting wrong types, e.g. specialising float to int.
It works without you knowing which types conform to int.
The major downside is that it is an explicit transformation.
You can silently accept "wrong" values, e.g. converting a str containing a literal.
It needlessly converts even types that would be good enough, e.g. float to int when you just need numbers.
Conversion is an effective tool for some specific use cases. It works best if you know roughly what your input is, and must make guarantees about your output.
Function dispatch
Sometimes the goal of type checking is just to select an appropriate function. In this case, function dispatch such as functools.singledispatch allows specialising function implementations for specific types:
#singledispatch
def append_int(value, sequence):
return
#append_int.register
def _(value: int, sequence):
sequence.append(value)
This is a combination of isinstance and dict dispatch. It is most useful for larger applications:
It keeps the site of usage small, regardless of the number of dispatched types.
It allows registering specialisations for additional types later, even in other modules.
Still, it doesn't come without its downsides:
Originating in functional and strongly typed languages, many Python programmers are not familiar with single- or even multiple-dispatch.
Dispatches require separate functions, and are therefore not suitable to be defined at the site of usage.
Creating the functions and "warming up" the dispatch cache takes notable runtime overhead. Dispatch functions should be defined once and re-used often.
Even a warmed up dispatch table is slower than a hand-written if/else or dict lookup.
Controlling the input
The best course of action is to ensure you never have to check for type in the first place. This is a bit of a meta-topic, as it depends strongly on the use case.
Here, the source of somelist should never have put non-numbers into it.
You can simply use type and equal operator like this
if (type(x) == int):
let me declare variable x of type int
x = 2
if type(x) == type(1) or isinstance(x, int):
# do something
Both works fine.
Easy - use types.
import types
k = 5
if(type(k)==types.IntType):
print "int"
Here's a quick dir(types):
['BooleanType', 'BufferType', 'BuiltinFunctionType', 'BuiltinMethodType', 'ClassType', 'CodeType', 'ComplexType', 'DictProxyType', 'DictType', 'DictionaryType', 'EllipsisType', 'FileType', 'FloatType', 'FrameType', 'FunctionType', 'GeneratorType', 'GetSetDescriptorType', 'InstanceType', 'IntType', 'LambdaType', 'ListType', 'LongType', 'MemberDescriptorType', 'MethodType', 'ModuleType', 'NoneType', 'NotImplementedType', 'ObjectType', 'SliceType', 'StringType', 'StringTypes', 'TracebackType', 'TupleType', 'TypeType', 'UnboundMethodType', 'UnicodeType', 'XRangeType', '__builtins__', '__doc__', '__file__', '__name__', '__package__']
You can use the type function on both sides of the operator. Like this:
if type(x) == type(1):

Categories