Prototype pattern in Python - python

I have the following implementation of the prototype pattern in Python 2.7:
def clone (instance):
x = object.__new__ (type (instance))
x.__dict__ = dict (instance.__dict__)
return x
This clearly doesn't work for non new-style classes (old style classes?) and built-ins like dict.
Is there a way, cleanly and staying within Python 2 to extend this to mutable built in types like sequence and mapping types?

I think that you can use the copy module
The deepcopy method creates 1-1 copy of an object:
>>> import copy
>>> x = [1,2,3]
>>> z = copy.deepcopy(x)
>>> x[0] = 3
>>> x
[3, 2, 3]
>>> z
[1, 2, 3]

it from book Python in practice Book by Mark Summerfield
you can create copy of object
class Point:
__slots__ = ("x", "y")
def __init__(self, x, y):
self.x = x
self.y = y
def make_object(Class,*args,**kwargs):
return Class(*args,**kwargs)
point1 = Point(1, 2)
point2 = eval("{}({}, {})".format("Point", 2, 4)) # Risky
point3 = getattr(sys.modules[__name__], "Point")(3, 6)
point4 = globals()["Point"](4, 8)
point5 = make_object(Point, 5, 10)
point6 = copy.deepcopy(point5)
point6.x = 6
point6.y = 12
point7 = point1.__class__(7, 14) # Could have used any of point1 to point6
schema and example of use code you can read on tutorialspoint, it for java, but principle is comprehensible

What you're trying to do is misguided.
If you want just want a copy, just use copy or deepcopy.
If you want JavaScript-style objects, create a root class that you can clone—or, better, rethink your code in Python instead of JavaScript.
If you want fully-functional JavaScript-style cloning even for builtins, you can't have that.
Also, keep in mind that the main motivations for using the prototype pattern in non-prototype-based languages like Java and C++ are (a) to avoid the cost of newing up an object and (b) to allow adding different methods or attributes to different instances. For the former, you're not avoiding the cost, and it doesn't matter anyway in Python. For the latter, it's already easy to add methods and attributes to instances in Python, and cloning doesn't make it any easier in any way.
I'm aware that there is something different going on with numeric types, compared with other built in types…
No, the distinction here isn't numbers vs. sequences, it's immutable types vs. mutable:
>>> id(tuple())
4298170448
>>> id(tuple())
4298170448
>>> id(tuple(()))
4298170448
>>> id(tuple([]))
4298170448
Also, it has nothing to do with the int or tuple constructor being fancy. If the value isn't cached anywhere, even a repeated literal will get a new instance each time:
>>> id(20000)
4439747152
>>> id(20000)
4439747216
Small integers, the empty tuple, and the values of the un-assginable magic constants are pre-cached at startup. Short strings generally get interned. But the exact details are implementation-dependent.
So, how does this affect you?
Well, cloning immutable types is pointless. By definition, it's an unchangeable copy of the same thing, so what good can that do?
Meanwhile, types that don't use __dict__ for their storage can't be cloned in this way (whether they're builtin types, types that use slots, types that generate their attributes dynamically, …). In some cases you'll get an error, in others just the wrong behavior.
So, if by "sequence, mapping and numeric types" you include builtins like int and list, stdlib types like Decimal and deque, or common third-party types like gmpy.mpz or blist.sorteddict, then this mechanism will not work.
On top of that, even if you could clone builtin classes, you can't add new attributes to them:
>>> a = []
>>> a.foo = 3
AttributeError: 'list' object has no attribute 'foo'
So, if you got this to work, again, it wouldn't be useful anyway.
Meanwhile, calling object.__new__ instead of type(instance).__new__ can cause a variety of problems for different classes. Some of them, like __dict__, will give you an error telling you so, but you can't count on that in every case.
There are other, less serious, problems with the idea. For example:
>>> class Foo(object):
... def __init__(self):
... self.__private = 3
>>> foo = Foo()
>>> foo.__private
3
>>> bar = clone(foo)
>>> bar.__private
AttributeError: 'Foo' object has no attribute '__private'
>>> bar._Foo__private
3

Related

Is it bad form to override the "dot" operator in Python?

Usually, a period in Python denotes class membership:
class A:
a = 1
>>> A.a
1
Sometimes the language doesn't seem quite flexible enough to completely express an idea from domains outside of computer science though. Consider the following example which (fairly brittly for brevity) uses the same operator to seem like something completely different.
class Vector:
def __init__(self, data):
self.data = list(data)
def dot(self, x):
return sum([a*b for a, b in zip(self.data, x.data)])
def __getattr__(self, x):
if x == 'Vector':
return lambda p: self.dot(Vector(p))
return self.dot(globals()[x])
Here we've taken over __getattr__() so that in many scenarios where Python would attempt to find an attribute from our vector it instead computes the mathematical dot product.
>>> v = Vector([1, 2])
>>> v.Vector([3, 4])
11
>>> v.v
5
If such behavior is kept restricted in scope to the domain of interest, is there anything wrong with such a design pattern?
It's a bad idea.
Why? Because the "dot operator", as you call it, isn't really an operator to begin with. That's because the "operand" on the right-hand side is interpreted as a string, not as an expression. This may seem insignificant to you, but it has plenty of problematic consequences:
Python programmers are used to foo.bar meaning "Take the bar attribute of the foo object". Turning the dot into a dot product operator breaks this expectation and will confuse people who read your code. It's unintuitive.
It's ambiguous, because you cannot know if the user is trying to calculate a dot product or access an attribute. Consider:
>>> data = Vector([1, 2])
>>> v.data # dot product or accessing the data attribute?
Keep in mind that methods are attributes, too:
>>> dot = Vector([1, 2])
>>> v.dot # dot product or accessing the dot method?
Because the right-hand operand is interpreted as a string, you have to jump through a whole bunch of hoops to turn that string into something useful - as you've tried to do with globals()[x], which looks up a variable in the global scope. The problem is that - in certain situations - it's completely impossible to access a variable just by its name. No matter what you do, you will never be able to access a variable that no longer exists because it's already been garbage collected:
def func():
v2 = Vector([1, 2])
def closure_func():
return v.v2 # this will never work because v2 is already dead!
return closure_func
closure_func = func()
result = closure_func()
Because the right-hand operand is a string, you cannot use arbitrary expressions on the right-hand side. You're limited to variables; trying to use anything else on the right-hand side will throw some kind of exception. And to make it worse, it won't even throw the appropriate TypeError like other operators would:
>>> [] + 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate list (not "int") to list
>>> v.1
File "<stdin>", line 1
v.1
^
SyntaxError: invalid syntax
Unlike real operators, the "dot operator" can only be implemented in the left-hand operand. All other operators can be implemented in either one of two corresponding dundermethods, for example __add__ and __radd__ for the + operator. Example:
>>> class Incrementer:
... def __radd__(self, other):
... return other + 1
...
>>> 2 + Incrementer()
3
This isn't possible with your dot product:
>>> my_v = MyCustomVector()
>>> v.my_v
AttributeError: 'MyCustomVector' object has no attribute 'data'
Bottom line: Implementing a dot method in your Vector class is the way to go. Since the dot isn't a real operator, trying to turn it into one is bound to backfire.
I would not recommend it. What do you mean by "the language isn't flexible enough to express an idea"? In your example, v.dot(u) is expressive and has the desired effect. This is, by the way, exactly how numpy does it.
If you want to use vectors, there's a special method name that isn't mentioned in th most part called __matmul__. This comes with its corresponding in-place and reflected methods __imatmul__ and __rmatmul__. The operator is an #:
a # b
# corresponds to
a.__matmul__(b)

Overhead of creating classes in Python: Exact same code using class twice as slow as native DS?

I created a Stack class as an exercise in Python, using all list functions. For example, Stack.push() is just list.append(), Stack.pop() is list.pop() and Stack.isEmpty() is just list == [ ].
I was using my Stack class to implement a decimal to binary converter, and what I noticed is that even though the two functions are completely equivalent beyond the wrapping of my Stack class for push(), pop() and isEmpty(), the implementation using the Stack class is twice as slow as the implementation using Python's list.
Is that because there's always an inherent overhead to using classes in Python? And if so, where does the overhead come from technically speaking ("under the hood")? Finally, if the overhead is so significant, isn't it better not to use classes unless you absolutely have to?
def dectobin1(num):
s = Stack()
while num > 0:
s.push(num % 2)
num = num // 2
binnum = ''
while not s.isEmpty():
binnum = binnum + str(s.pop())
return binnum
def dectobin2(num):
l = []
while num > 0:
l.append(num % 2)
num = num // 2
binnum = ''
while not l == []:
binnum = binnum + str(l.pop())
return binnum
t1 = Timer('dectobin1(255)', 'from __main__ import dectobin1')
print(t1.timeit(number = 1000))
0.0211110115051
t2 = Timer('dectobin2(255)', 'from __main__ import dectobin2')
print(t2.timeit(number = 1000))
0.0094211101532
First off, a warning: Function calls are rarely what limits you in speed. This is often an unnecessary micro-optimisation. Only do that, if it is what actually limits your performance. Do some good profiling before and have a look if there might be a better way to optimise.
Make sure you don't sacrifice legibility for this tiny performance tweak!
Classes in Python are a little bit of a hack.
The way it works is that each object has a __dict__ field (a dict) which contains all attributes the object contains. Also each object has a __class__ object which again contains a __dict__ field (again a dict) which contains all class attributes.
So for example have a look at this:
>>> class X(): # I know this is an old-style class declaration, but this causes far less clutter for this demonstration
... def y(self):
... pass
...
>>> x = X()
>>> x.__class__.__dict__
{'y': <function y at 0x6ffffe29938>, '__module__': '__main__', '__doc__': None}
If you define a function dynamically (so not in the class declaration but after the object creation) the function does not go to the x.__class__.__dict__ but instead to x.__dict__.
Also there are two dicts that hold all variables accessible from the current function. There is globals() and locals() which include all global and local variables.
So now let's say, you have an object x of class X with functions y and z that were declared in the class declaration and a second function z, which was defined dynamically. Let's say object x is defined in global space.
Also, for comparison, there are two functions flocal(), which was defined in local space and fglobal(), which was defined in global space.
Now I will show what happens if you call each of these functions:
flocal():
locals()["flocal"]()
fglobal():
locals()["fglobal"] -> not found
globals()["fglobal"]()
x.y():
locals()["x"] -> not found
globals()["x"].__dict__["y"] -> not found, because y is in class space
.__class__.__dict__["y"]()
x.z():
locals()["x"] -> not found
globals()["x"].__dict__["z"]() -> found in object dict, ignoring z() in class space
So as you see, class space methods take a lot more time to lookup, object space methods are slow as well. The fastest option is a local function.
But you can get around that without sacrificing classes. Lets say, x.y() is called quite a lot and needs to be optimised.
class X():
def y(self):
pass
x = X()
for i in range(100000):
x.y() # slow
y = x.y # move the function lookup outside of loop
for i in range(100000):
y() # faster
Similar things happen with member variables of objects. They are also slower than local variables. The effect also adds up, if you call a function or use a member variable that is in an object that is a member variable of a different object. So for example
a.b.c.d.e.f()
would be a fair bit slower as each dot needs another dictionary lookup.
An official Python performance guide reccomends to avoid dots in performance critical parts of the code:
https://wiki.python.org/moin/PythonSpeed/PerformanceTips
There is an inherent overhead using functions (where methods on an instance are just wrappers around functions to pass in self).
A function call requires the current function information (a frame) to be stored on a stack (the Python call stack), and a new frame to be created for the function being called. That all takes time and memory:
>>> from timeit import timeit
>>> def f(): pass
...
>>> timeit(f, number=10**7)
0.8021022859902587
There is also a (smaller) cost of looking up the attribute (methods are attributes too), and creating the method object (each attribute lookup for a method name causes a new method object to be created):
>>> class Foo:
... bar = None
... def baz(self): pass
...
>>> timeit('instance.bar', 'from __main__ import Foo; instance = Foo()', number=10**7)
0.238075322995428
>>> timeit('instance.baz', 'from __main__ import Foo; instance = Foo()', number=10**7)
0.3402297169959638
So the sum cost of attribute lookup, method object creation and call stack operations add up to the extra time requirements you observed.

Assign results of function call in one line in python

How can I assign the results of a function call to multiple variables when the results are stored by name (not index-able), in python.
For example (tested in Python 3),
import random
# foo, as defined somewhere else where we can't or don't want to change it
def foo():
t = random.randint(1,100)
# put in a dummy class instead of just "return t,t+1"
# because otherwise we could subscript or just A,B = foo()
class Cat(object):
x = t
y = t + 1
return Cat()
# METHOD 1
# clearly wrong; A should be 1 more than B; they point to fields of different objects
A,B = foo().x, foo().y
print(A,B)
# METHOD 2
# correct, but requires two lines and an implicit variable
t = foo()
A,B = t.x, t.y
del t # don't really want t lying around
print(A,B)
# METHOD 3
# correct and one line, but an obfuscated mess
A,B = [ (t.x,t.y) for t in (foo(),) ][0]
print(A,B)
print(t) # this will raise an exception, but unless you know your python cold it might not be obvious before running
# METHOD 4
# Conforms to the suggestions in the links below without modifying the initial function foo or class Cat.
# But while all subsequent calls are pretty, but we have to use an otherwise meaningless shell function
def get_foo():
t = foo()
return t.x, t.y
A,B = get_foo()
What we don't want to do
If the results were indexable ( Cat extended tuple/list, we had used a namedtuple, etc.), we could simply write A,B = foo() as indicated in the comment above the Cat class. That's what's recommended here , for example.
Let's assume we have a good reason not to allow that. Maybe we like the clarity of assigning from the variable names (if they're more meaningful than x and y) or maybe the object is not primarily a container. Maybe the fields are properties, so access actually involves a method call. We don't have to assume any of those to answer this question though; the Cat class can be taken at face value.
This question already deals with how to design functions/classes the best way possible; if the function's expected return value are already well defined and does not involve tuple-like access, what is the best way to accept multiple values when returning?
I would strongly recommend either using multiple statements, or just keeping the result object without unpacking its attributes. That said, you can use operator.attrgetter for this:
from operator import attrgetter
a, b, c = attrgetter('a', 'b', 'c')(foo())

How to create a new unknown or dynamic/expando object in Python

In python how can we create a new object without having a predefined Class and later dynamically add properties to it ?
example:
dynamic_object = Dynamic()
dynamic_object.dynamic_property_a = "abc"
dynamic_object.dynamic_property_b = "abcdefg"
What is the best way to do it?
EDIT Because many people advised in comments that I might not need this.
The thing is that I have a function that serializes an object's properties. For that reason, I don't want to create an object of the expected class due to some constructor restrictions, but instead create a similar one, let's say like a mock, add any "custom" properties I need, then feed it back to the function.
Just define your own class to do it:
class Expando(object):
pass
ex = Expando()
ex.foo = 17
ex.bar = "Hello"
If you take metaclassing approach from #Martijn's answer, #Ned's answer can be rewritten shorter (though it's obviously less readable, but does the same thing).
obj = type('Expando', (object,), {})()
obj.foo = 71
obj.bar = 'World'
Or just, which does the same as above using dict argument:
obj = type('Expando', (object,), {'foo': 71, 'bar': 'World'})()
For Python 3, passing object to bases argument is not necessary (see type documentation).
But for simple cases instantiation doesn't have any benefit, so is okay to do:
ns = type('Expando', (object,), {'foo': 71, 'bar': 'World'})
At the same time, personally I prefer a plain class (i.e. without instantiation) for ad-hoc test configuration cases as simplest and readable:
class ns:
foo = 71
bar = 'World'
Update
In Python 3.3+ there is exactly what OP asks for, types.SimpleNamespace. It's just:
A simple object subclass that provides attribute access to its namespace, as well as a meaningful repr.
Unlike object, with SimpleNamespace you can add and remove attributes. If a SimpleNamespace object is initialized with keyword arguments, those are directly added to the underlying namespace.
import types
obj = types.SimpleNamespace()
obj.a = 123
print(obj.a) # 123
print(repr(obj)) # namespace(a=123)
However, in stdlib of both Python 2 and Python 3 there's argparse.Namespace, which has the same purpose:
Simple object for storing attributes.
Implements equality by attribute names and values, and provides a simple string representation.
import argparse
obj = argparse.Namespace()
obj.a = 123
print(obj.a) # 123
print(repr(obj)) # Namespace(a=123)
Note that both can be initialised with keyword arguments:
types.SimpleNamespace(a = 'foo',b = 123)
argparse.Namespace(a = 'foo',b = 123)
Using an object just to hold values isn't the most Pythonic style of programming. It's common in programming languages that don't have good associative containers, but in Python, you can use use a dictionary:
my_dict = {} # empty dict instance
my_dict["foo"] = "bar"
my_dict["num"] = 42
You can also use a "dictionary literal" to define the dictionary's contents all at once:
my_dict = {"foo":"bar", "num":42}
Or, if your keys are all legal identifiers (and they will be, if you were planning on them being attribute names), you can use the dict constructor with keyword arguments as key-value pairs:
my_dict = dict(foo="bar", num=42) # note, no quotation marks needed around keys
Filling out a dictionary is in fact what Python is doing behind the scenes when you do use an object, such as in Ned Batchelder's answer. The attributes of his ex object get stored in a dictionary, ex.__dict__, which should end up being equal to an equivalent dict created directly.
Unless attribute syntax (e.g. ex.foo) is absolutely necessary, you may as well skip the object entirely and use a dictionary directly.
Use the collections.namedtuple() class factory to create a custom class for your return value:
from collections import namedtuple
return namedtuple('Expando', ('dynamic_property_a', 'dynamic_property_b'))('abc', 'abcdefg')
The returned value can be used both as a tuple and by attribute access:
print retval[0] # prints 'abc'
print retval.dynamic_property_b # prints 'abcdefg'
One way that I found is also by creating a lambda. It can have sideeffects and comes with some properties that are not wanted. Just posting for the interest.
dynamic_object = lambda:expando
dynamic_object.dynamic_property_a = "abc"
dynamic_object.dynamic_property_b = "abcdefg"
I define a dictionary first because it's easy to define. Then I use namedtuple to convert it to an object:
from collections import namedtuple
def dict_to_obj(dict):
return namedtuple("ObjectName", dict.keys())(*dict.values())
my_dict = {
'name': 'The mighty object',
'description': 'Yep! Thats me',
'prop3': 1234
}
my_obj = dict_to_obj(my_dict)
Ned Batchelder's answer is the best. I just wanted to record a slightly different answer here, which avoids the use of the class keyword (in case that's useful for instructive reasons, demonstration of closure, etc.)
Just define your own class to do it:
def Expando():
def inst():
None
return inst
ex = Expando()
ex.foo = 17
ex.bar = "Hello"

Most pythonic way of ensuring a list of objects contains only unique items

I have a list of objects (Foo). A Foo object has several attributes. An instance of a Foo object is equivalent (equal) to another instance of a Foo object iff (if and only if) all the attributes are equal.
I have the following code:
class Foo(object):
def __init__(self, myid):
self.myid=myid
def __eq__(self, other):
if isinstance(other, self.__class__):
print 'DEBUG: self:',self.__dict__
print 'DEBUG: other:',other.__dict__
return self.__dict__ == other.__dict__
else:
print 'DEBUG: ATTEMPT TO COMPARE DIFFERENT CLASSES:',self.__class__,'compared to:', other.__class__
return False
import copy
f1 = Foo(1)
f2 = Foo(2)
f3 = Foo(3)
f4 = Foo(4)
f5 = copy.deepcopy(f3) # overkill here (I know), but needed for my real code
f_list = [f1,f2,f3,f4,f5]
# Surely, there must be a better way? (this dosen't work BTW!)
new_foo_list = list(set(f_list))
I often used this little (anti?) 'pattern' above (converting to set and back), when dealing with simple types (int, float, string - and surprisingly datetime.datetime types), but it has come a cropper with the more involved data type - like Foo above.
So, how could I change the list f1 above into a list of unique items - without having to loop through each item and doing a check on whether it already exists in some temporary cache etc etc?.
What is the most pythonic way to do this?
First, I want to emphasize that using set is certainly not an anti-pattern. sets eliminate duplicates in O(n) time, which is the best you can do, and way better than the naive O(n^2) solution of comparing every item to every other item. It's even better than sorting -- and indeed, it seems your data structure might not even have a natural order, in which case sorting doesn't make a lot of sense.
The problem with using a set in this case is that you have to define a custom __hash__ method. Others have said this. But whether or not you can do so easily is an open question -- it depends on details about your actual class that you haven't told us. For example, if any attributes of a Foo object above are not hashable, then creating a custom hash function is going to be difficult, because you'll have to not only write a custom hash for Foo objects, you'll also have to write custom hashes for every other type of object!
So you need to tell us more about what kinds of attributes your class has if you want a conclusive answer. But I can offer some speculation.
Assuming that a hash function could be written for Foo objects, but also assuming that that Foo objects are mutable and so really shouldn't have a __hash__ method, as Niklas B. points out, here is one workable approach. Create a function freeze that, given a mutable instance of Foo, returns an immutable collection of the data in Foo. So for example, say Foo has a dict and a list in it; freeze returns a tuple containing a tuple of tuples (representing the dict) and another tuple (representing the list). The function freeze should have the following property:
freeze(a) == freeze(b)
If and only if
a == b
Now pass your list through the following code:
dupe_free = dict((freeze(x), x) for x in dupe_list).values()
Now you have a dupe free list in O(n) time. (Indeed, after adding this suggestion, I saw that fraxel suggested something similar; but I think using a custom function -- or even a method -- (x.freeze(), x) -- is the better way to go, rather than relying on __dict__ as he does, which can be unreliable. The same goes for your custom __eq__ method, IMO -- __dict__ is not always a safe shortcut for various reasons I can't get into here.)
Another approach would be to use only immutable objects in the first place! For example, you could use namedtuples. Here's an example stolen from the python docs:
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(11, y=22) # instantiate with positional or keyword arguments
>>> p[0] + p[1] # indexable like the plain tuple (11, 22)
33
>>> x, y = p # unpack like a regular tuple
>>> x, y
(11, 22)
>>> p.x + p.y # fields also accessible by name
33
>>> p # readable __repr__ with a name=value style
Point(x=11, y=22)
Have you tried using a set (or frozenset)? It's explicitly for holding a unique set of items.
You'll need to create an appropriate __hash__ method, though. set (and frozenset) use the __hash__ method to hash objects; __eq__ is only used on a collision, AFAIK. Accordingly, you'll want to use a hash like hash(frozenset(self.__dict__.items())).
According to the documentation, you need to define __hash__() and __eq__() for your custom class to work correctly with a set or frozenset, as both are implemented using hash tables in CPython.
If you implement __hash__, keep in mind that if a == b, then hash(a) must equal hash(b). Rather than comparing the whole __dict__s, I suggest the following more straightforward implementation for your simple class:
class Foo(object):
def __init__(self, myid):
self.myid = myid
def __eq__(self, other):
return isinstance(other, self.__class__) and other.myid == self.myid
def __hash__(self):
return hash(self.myid)
If your object contains mutable attributes, you simply shouldn't put it inside a set or use it as a dictionary key.
Here is an alternative method, just make a dictionary keyed by __dict__.items() for the instances:
f_list = [f1,f2,f3,f4,f5]
f_dict = dict([(tuple(i.__dict__.items()), i) for i in f_list])
print f_dict
print f_dict.values()
#output:
{(('myid', 1),): <__main__.Foo object at 0xb75e190c>,
(('myid', 2),): <__main__.Foo object at 0xb75e184c>,
(('myid', 3),): <__main__.Foo object at 0xb75e1f6c>,
(('myid', 4),): <__main__.Foo object at 0xb75e1cec>}
[<__main__.Foo object at 0xb75e190c>,
<__main__.Foo object at 0xb75e184c>,
<__main__.Foo object at 0xb75e1f6c>,
<__main__.Foo object at 0xb75e1cec>]
This way you just let the dictionary take care of the uniqueness based on attributes, and can easily retrieve the objects by getting the values.
If you are allowed you can use a set http://docs.python.org/library/sets.html
list = [1,2,3,3,45,4,45,6]
print set(list)
set([1, 2, 3, 4, 6, 45])
x = set(list)
print x
set([1, 2, 3, 4, 6, 45])

Categories