I need to represent immutable vectors in Python ("vectors" as in linear algebra, not as in programming). The tuple seems like an obvious choice.
The trouble is when I need to implement things like addition and scalar multiplication. If a and b are vectors, and c is a number, the best I can think of is this:
tuple(map(lambda x,y: x + y, a, b)) # add vectors 'a' and 'b'
tuple(map(lambda x: x * c, a)) # multiply vector 'a' by scalar 'c'
which seems inelegant; there should be a clearer, simpler way to get this done -- not to mention avoiding the call to tuple, since map returns a list.
Is there a better option?
NumPy supports various algebraic operations with its arrays.
Immutable types are pretty rare in Python and third-party extensions thereof; the OP rightly claims "there are enough uses for linear algebra that it doesn't seem likely I have to roll my own" -- but all the existing types I know that do linear algebra are mutable! So, as the OP is adamant on immutability, there is nothing for it but the roll-your-own route.
Not that there's all that much rolling involved, e.g. if you specifically need 2-d vectors:
import math
class ImmutableVector(object):
__slots__ = ('_d',)
def __init__(self, x, y):
object.__setattr__(self, _d, (x, y))
def __setattr__(self, n, v):
raise ValueError("Can't alter instance of %s" % type(self))
#property
def x(self):
return self._d[0]
#property
def y(self):
return self._d[1]
def __eq__(self, other):
return self._d == other._d
def __ne__(self, other):
return self._d != other._d
def __hash__(self):
return hash(self._d)
def __add__(self, other):
return type(self)(self.x+other.x, self.y+other.y)
def __mul__(self, scalar):
return type(self)(self.x*scalar, self.y*scalar)
def __repr__(self):
return '%s(%s, %s)' % (type(self).__name__, self.x, self.y)
def __abs__(self):
return math.hypot(self.x, self.y)
I "threw in for free" a few extras such as .x and .y R/O properties, nice string representation, usability in sets or as keys in dicts (why else would one want immutability?-), low memory footprint, abs(v) to give v's vector-length -- I'm sure you can think of other "wouldn't-it-be-cool-if" methods and operators, depending on your application field, and they'll be just as easy. If you need other dimensionalities it won't be much harder, though a tad less readable since the .x, .y notation doesn't apply any more;-) (but I'd use genexps, not map).
By inheriting from tuple, you can make a nice Vector class pretty easily. Here's enough code to provide addition of vectors, and multiplication of a vector by a scalar. It gives you arbitrary length vectors, and can work with complex numbers, ints, or floats.
class Vector(tuple):
def __add__(self, a):
# TODO: check lengths are compatable.
return Vector(x + y for x, y in zip(self, a))
def __mul__(self, c):
return Vector(x * c for x in self)
def __rmul__(self, c):
return Vector(c * x for x in self)
a = Vector((1, 2, 3))
b = Vector((2, 3, 4))
print a + b
print 3 * a
print a * 3
Although using a library like NumPy seems to be the resolution for the OP, I think there is still some value in a simple solution which does not require additional libraries and which you can stay immutable, with iterables.
Using the itertools and operators modules:
imap(add, a, b) # returns iterable to sum of a and b vectors
This implementation is simple. It does not use lambda neither any list-tuple conversion as it is iterator based.
from itertools import imap
from operator import add
vec1 = (1, 2, 3)
vec2 = (10, 20, 30)
result = imap(add, vec1, vec2)
print(tuple(result))
Yields:
(11, 22, 33)
Why not create your own class, making use of 2 Cartesian point member variables? (sorry if the syntax is a little off, my python is rusty)
class point:
def __init__(self,x,y):
self.x=x
self.y=y
#etc
def add(self,p):
return point(self.x + p.x, self.y + p.y)
class vector:
def __init__(self,a,b):
self.pointA=a
self.pointB=b
#etc
def add(self,v):
return vector(self.pointA + v.pointA, self.pointB + v.pointB)
For occasional use, a Python 3 solution without repeating lambdas is possible via using the standard operator package:
from operator import add, mul
a = (1, 2, 3)
b = (4, 5, 6)
print(tuple(map(add, a , b)))
print(tuple(map(mul, a , b)))
which prints:
(5, 7, 9)
(4, 10, 18)
For serious linear algebra computations using numpy vectors is the canonical solution:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a+b)
print(a*b)
which prints:
[5 7 9]
[ 4 10 18]
Since pretty much all of the sequence manipulation functions return lists, that's pretty much what you're going to have to do.
Related
I have been using python 3.7 for a few months, but I recently had to shift to python 2.7. Since I am developing scientific code, I heavily rely on the use of infix operator # to multiply nd-arrays. This operator was introduced with python 3.5 (see here), therefore, I cannot use it with my new setup.
The obvious solution is to replace all M1 # M2 by numpy.matmul(M1, M2), which severely limits my code's readability.
I saw this hack which consists in defining an Infix class allowing to create custom operators by overloading or and ror operators. My question is: How could I use this trick to make an infix |at| operator working just like #?
What I tried is:
import numpy as np
class Infix:
def __init__(self, function):
self.function = function
def __ror__(self, other):
return Infix(lambda x, self=self, other=other: self.function(other, x))
def __or__(self, other):
return self.function(other)
def __call__(self, value1, value2):
return self.function(value1, value2)
# Matrix multiplication
at = Infix(lambda x,y: np.matmul(x,y))
M1 = np.ones((2,3))
M2 = np.ones((3,4))
print(M1 |at| M2)
When I execute this code, I get :
ValueError: operands could not be broadcast together with shapes (2,3) (3,4)
I think I have an idea of what is not working. When I only look at M1|at, I can see that it is a 2*3 array of functions:
array([[<__main__.Infix object at 0x7faa1c0d6da0>,
<__main__.Infix object at 0x7faa1c0d6860>,
<__main__.Infix object at 0x7faa1c0d6828>],
[<__main__.Infix object at 0x7faa1c0d6f60>,
<__main__.Infix object at 0x7faa1c0d61d0>,
<__main__.Infix object at 0x7faa1c0d64e0>]], dtype=object)
This is not what I expected, since I would like my code to consider this 2d-array as a whole, and not element-wise...
Does anybody have a clue of what I should do?
PS: I also considered using this answer, but I have to avoid the use of external modules.
I found the solution to my problem here.
As suggested in the comments, the ideal fix would either be to use Python 3.x or to use numpy.matmul, but this code seems to work, and even has the right precedence :
import numpy as np
class Infix(np.ndarray):
def __new__(cls, function):
obj = np.ndarray.__new__(cls, 0)
obj.function = function
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.function = getattr(obj, 'function', None)
def __rmul__(self, other):
return Infix(lambda x, self=self, other=other: self.function(other, x))
def __mul__(self, other):
return self.function(other)
def __call__(self, value1, value2):
return self.function(value1, value2)
at = Infix(np.matmul)
M1 = np.ones((2,3))
M2 = np.ones((3,4))
M3 = np.ones((2,4))
print(M1 *at* M2)
print(M3 + M1 *at* M2)
I was thinking about how to use super to make a pipeline in python. I have a series of transformations I must do to a stream, and I thought that a good way to do it was something in the lines of:
class MyBase(object):
def transformData(self, x):
return x
class FirstStage(MyBase):
def transformData(self, x):
y = super(FirstStage, self).transformData(x)
return self.__transformation(y)
def __transformation(self, x):
return x * x
class SecondStage(FirstStage):
def transformData(self, x):
y = super(SecondStage, self).transformData(x)
return self.__transformation(y)
def __transformation(self, x):
return x + 1
It works as I intended, but there's a potential repetition. If I have N stages, I'll have N identical transformData methods where the only thing I change is the name of the current class.
Is there a way to remove this boilerplate? I tried a few things but the results only proved to me that I hadn't understood perfectly how super worked.
What I wanted was to define only the method __transformation and naturally inherit a transformData method that would go up in MRO, call that class' transformData method and then call the current class' __transformation on the result. Is it possible or do I have to define a new identical transformData for each child class?
I agree that this is a poor way of implementing a pipeline. That can be done with much simpler (and clearer) schemes. I thought of this as the least modification I could do on a existing model to get a pipeline out of the existing classes without modifying the code too much. I agree this is not the best way to do it. It would be a trick, and tricks should be avoided. Also I thought of it as a way of better understanding how super works.
Buuuut. Out of curiosity... is it possible to do it in the above scheme without the transformData repetition? This is a genuine doubt. Is there a trick to inherit transformData in a way that the super call in it is changed to be called on the current class?
It would be a tremendously unclear, unreadable, smart-ass trickery. I know. But is it possible?
I don't think using inheritance for a pipeline is the right way to go.
Instead, consider something like this -- here with "simple" examples and a parametrized one (a class using the __call__ magic method, but returning a closured function would do too, or even "JITing" one by way of eval).
def two_power(x):
return x * x
def add_one(x):
return x + 1
class CustomTransform(object):
def __init__(self, multiplier):
self.multiplier = multiplier
def __call__(self, value):
return value * self.multiplier
def transform(data, pipeline):
for datum in data:
for transform in pipeline:
datum = transform(datum)
yield datum
pipe = (two_power, two_power, add_one, CustomTransform(1.25))
print list(transform([1, 2, 4, 8], pipe))
would output
[2.5, 21.25, 321.25, 5121.25]
The problem is that using inheritance here is rather weird in terms of OOP. And do you really need to define the whole chain of transformations when defining classes?
But it's better to forget OOP here, the task is not for OOP. Just define functions for transformations:
def get_pipeline(*functions):
def pipeline(x):
for f in functions:
x = f(x)
return x
return pipeline
p = get_pipeline(lambda x: x * 2, lambda x: x + 1)
print p(5)
An even shorter version is here:
def get_pipeline(*fs):
return lambda v: reduce(lambda x, f: f(x), fs, v)
p = get_pipeline(lambda x: x * 2, lambda x: x + 1)
print p(5)
And here is an OOP solution. It is rather clumsy if compared to the previous one:
class Transform(object):
def __init__(self, prev=None):
self.prev_transform = prev
def transformation(self, x):
raise Exception("Not implemented")
def transformData(self, x):
if self.prev_transform:
x = self.prev_transform.transformData(x)
return self.transformation(x)
class TransformAdd1(Transform):
def transformation(self, x):
return x + 1
class TransformMul2(Transform):
def transformation(self, x):
return x * 2
t = TransformAdd1(TransformMul2())
print t.transformData(1) # 1 * 2 + 1
I have a custom data type, say: mytime, which represent hours and minutes, such as 29:45, it is 29 hours and 45 minutes.
I want to use max built-in function to find the item in a list of lists, whose sum of its elements is the greatest, where all lists contain values of mytime type.
x = [[a, b], [c, d]]
a,b,c,d are of mytime type.
max(x, key=sum)
won't work here, because a,b,c,d, are not integers.
If I type a + b at python command line, I get the sum of these two time values, result is of mytime type, without any errors.
How do I use max function here?
Let's say your class looks like this:
class mytime(object):
def __init__(self, h, m):
self.h = h
self.m = m
def __add__(self, other):
return mytime(self.h + other.h, self.m + other.m)
def __repr__(self):
return '%i:%i' % (self.h, self.m)
and you use it like this:
a = mytime(10, 10)
b = mytime(2, 22)
print a + b
and it will work as expect:
12:32
Problem:
What you want to do is:
l = [a, b]
print sum(l)
but it will fail:
TypeError: unsupported operand type(s) for +: 'int' and 'mytime'
The problem is that the sum function will start with 0 and will add up all values of the list. It will try to evaluate
0 + mytime(10, 10)
which will fail.
Solution:
The solution to your problem is implementing the __radd__ function, which represents "reverse add" and is called when the arguments can't be resolved in the "forward" direction. For example, x + y is evaluated as x.__add__(y) if possible, but if that doesn't exist then Python tries y.__radd__(x).
So you can add the following method to your class:
def __radd__(self, other):
return mytime(self.h, self.m)
and the sum function will work for you (in this implementation ignoring the other value, which is probably fine in your case).
You can write your own sum function:
def my_sum(item):
return sum(60 * e[0] + e[1] for e in item)
x = [[(2,0), (3,0)], [(9, 0), (4, 0)]]
print max(x, key=my_sum)
I have represented your mytime data structure as tuples (with hours and minutes) so you may need to adjust my_sum to your data structure. The only requirement is that the hours and minutes of a mytime can be filled in for e[0] and e[1] respectively.
The above code returns the greatest element (in this case [(9, 0), (4, 0)]).
Are you sure using a + b works? All sum does is repeatedly apply + to adjacent elements (it's the same as reduce(operator.add, sequence) with a special case to break on strings)... So if it does work - then max(x, key=sum) should just work -- as long as mydate supports comparison operators - eg __gt__, __eq__, __lt__
Example
You need to have __gt__ defined for max to work...
class mydate(object):
def __init__(self, num):
self.num = num
def __add__(self, other): # make sure sum works
return self.num + other.num
def __gt__(self, other): # make sure max can do > comparison
return self.num > other.num
def __repr__(self):
return 'date: {}'.format(self.num)
x = mydate(3)
y = mydate(5)
z = mydate(2)
print max([x,y,z], key=sum)
I'd like to be able to write something like this in python:
a = (1, 2)
b = (3, 4)
c = a + b # c would be (4, 6)
d = 3 * b # d would be (9, 12)
I realize that you can overload operators to work with custom classes, but is there a way to overload operators to work with pairs?
Of course, such solutions as
c = tuple([x+y for x, y in zip(a, b)])
do work, but, let aside performance, they aren't quite as pretty as overloading the + operator.
One can of course define add and mul functions such as
def add((x1, y1), (x2, y2)):
return (x1 + x2, y1 + y2)
def mul(a, (x, y)):
return (a * x, a * y)
but still being able to write q * b + r instead of add(times(q, b), r) would be nicer.
Ideas?
EDIT: On a side note, I realize that since + currently maps to tuple concatenation, it might be unwise to redefine it, even if it's possible. The question still holds for - for example =)
In contrast to Ruby, you can't change the behaviour of built-in types in Python. All you can do is create a new type derived from a built-in type. Literals will still create the built-in type, though.
Probably the best you can get is
class T(tuple):
def __add__(self, other):
return T(x + y for x, y in zip(self, other))
def __rmul__(self, other):
return T(other * x for x in self)
a = T((1, 2))
b = T((3, 4))
c = a + b # c would be (4, 6)
d = 3 * b # d would be (9, 12)
You can inherit a class from tuple and overload its __add__ method. Here's a very simplistic example:
class mytuple(tuple):
def __add__(self, other):
assert len(self) == len(other)
return tuple([x + y for x, y in zip(self, other)])
mt = mytuple((5, 6))
print mt + (2, 3) # prints (7, 9)
I wouldn't recommend this approach though, because tuples weren't really designed for this purpose. If you want to perform numeric computations, just use numpy.
You cannot modify types defined in C, so you would need to create all new types for this. Or you could just use NumPy, which already has types that support this.
There is the famous infix operator hack that would allow you to do soemthing like this:
x = Infix(lambda a,b:tuple([x+y for x, y in zip(a, b)]))
y = Infix(lambda a,b:tuple([a*y for y in b]))
c = a |x| b # c would be (4, 6)
d = 3 |y| b # d would be (9, 12)
That would hide the generator expressions and be applicable to tuples of all lengths, at the expense of "weird" pseudo-operators |x| and |y|.
Using python complex numbers is definitely one way to do it, if not extremely pretty.
a = 1 + 2j
b = 3 + 4j
c = a + b # c would be 4 + 6j
d = 3 * b # d would be 9 + 12j
That saves the definition of an extra class.
Also, expanding on previous answers,
class T(tuple):
def __add__((x, y), (x1, y1)):
return T((x+x1, y+y1))
def __rmul__((x, y), other):
return T((other * x, other * y))
would improve performance, at the cost of restraining the implementation to pairs.
Write your own class and implement __mul__, __add__ etc.
You can use numpy.array to get all you need.
Before I start, I'm already aware that object immutability in Python is often a bad idea, however, I believe that in my case it would be appropriate.
Let's say I'm working with a coordinate system in my code, such that each coordinate uses a struct of X, Y, Z. I've already overloaded subtraction, addition, etc. methods to do what I want. My current problem is the assignment operator, which I've read cannot be overloaded. Problem is when I have the following, I do not want A to point to the same point as B, I want the two to be independent, in case I need to overwrite a coordinate of one but not the other later:
B = Point(1,2,3)
A = B
I'm aware that I can use deepcopy, but that seems like a hack, especially since I could have a list of points that I might need to take a slice of (in which case it would again have a slice of point references, not points). I've also considered using tuples, but my points have member methods I need, and a very large portion of my code already uses the structs.
My idea was to modify Point to be immutable, since it's really only 3 floats of data, and from doing some research _new _() seems like the right function to overwrite for this. I'm not sure how to achieve this though, would it be something like this or am I way off?
def __new__(self):
return Point(self.x, self.y, self.z)
EDIT:
My bad, I realized after reading katrielalex's post that I can't modify a parameter of immutable object once it has been defined, in which case it's not a problem that both A and B point to the same data since a reassignment would require creation of a new point. I'd say that katrielalex's and vonPetrushev's posts achieve what I want, I think I'll go with vonPetrushev's solution since I don't need to rewrite all my current code to use tuples (the extra set of parentheses and not being able to reference coordinates as point.x)
In conjunction with katrielalex's suggestion, making the Point a named tuple would be good as well. Here I've just replaced the tuple parent with namedtuple('Point', 'x y z') - and that's enough for it to work.
>>> from collections import namedtuple
>>> class Point(namedtuple('Point', 'x y z')):
... def __add__(self, other):
... return Point((i + j for i, j in zip(self, other)))
...
... def __mul__(self, other):
... return sum(i * j for i, j in zip(self, other))
...
... def __sub__(self, other):
... return Point((i - j for i, j in zip(self, other)))
...
... #property
... def mod(self):
... from math import sqrt
... return sqrt(sum(i*i for i in self))
...
Then you can have:
>>> Point(1, 2, 3)
Point(x=1, y=2, z=3)
>>> Point(x=1, y=2, z=3).mod
3.7416573867739413
>>> Point(x=1, y=2, z=3) * Point(0, 0, 1)
3
>>> Point._make((1, 2, 3))
Point(x=1, y=2, z=3)
(Thanks to katrielalex for suggesting to extend the namedtuple rather than copying the code produced.)
You can make Point a subclass of tuple -- remember, the built-in types (at least in recent Pythons) are just more classes. This will give you the desired immutability.
However, I'm slightly confused about your suggested use case:
in case I need to overwrite a coordinate of one but not the other later:
That doesn't make sense if Points are immutable...
>>> class Point(tuple):
... def __add__(self, other):
... return Point((i + j for i, j in zip(self, other)))
...
... def __mul__(self, other):
... return sum(i * j for i, j in zip(self, other))
...
... def __sub__(self, other):
... return Point((i - j for i, j in zip(self, other)))
...
... #property
... def mod(self):
... from math import sqrt
... return sqrt(sum(i*i for i in self))
...
>>> a = Point((1,2,3))
>>> b = Point((4,5,6))
>>> a + b
(5, 7, 9)
>>> b - a
(3, 3, 3)
>>> a * b
32
>>> a.mod
3.7416573867739413
>>> a[0] = 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'Point' object does not support item assignment
try this:
class Point(object):
def __init__(self, x, y, z):
self._x=x
self._y=y
self._z=z
def __getattr__(self, key):
try:
key={'x':'_x','y':'_y','z':'_z'}[key]
except KeyError:
raise AttributeError
else:
return self.__dict__[key]
def __setattr__(self, key, value):
if key in ['_x','_y','_z']:
object.__setattr__(self, key, value)
else:
raise TypeError("'Point' object does not support item assignment")
So, you can construct a Point object, but not change its attributes.