Python: item.method() and function(item) - python

What is the logic for picking some methods to be prefixed with the items they are used with, but some are functions that need items as the arguments?
For example:
L=[1,4,3]
print len(L) #function(item)
L.sort() #item.method()
I thought maybe the functions that modify the item need to be prefixed while the ones that return information about the item use it as an argument, but I'm not too sure.
Edit:
What I'm trying to ask is why does python not have L.len()? What is the difference between the nature of the two kinds of functions? Or was it randomly chosen that some operations will be methods while some will be functions?

One of the principles behind Python is There is Only One Way to Do It. In particular, to get the length of a sequence (array / tuple / xrange...), you always use len, regardless of the sequence type.
However, sorting is not supporting on all of those sequence types. This makes it more suitable to being a method.
a = [0,1,2]
b = (0,1,2)
c = xrange(3)
d = "abc"
print len(a), len(b), len(c), len(d) # Ok
a.sort() # Ok
b.sort() # AttributeError: 'tuple' object has no attribute 'sort'
c.sort() # AttributeError: 'xrange' object has no attribute 'sort'
d.sort() # AttributeError: 'str' object has no attribute 'sort'

Something that may help you understand a bit better: http://www.tutorialspoint.com/python/python_classes_objects.htm
What you describe as item.function() is actually a method, which is defined by the class that said item belongs to. You need to form a comprehensive understanding of function, class, object, method and maybe more in Python.
Just conceptually speaking, when you call L.sort(), the sort() method of type/class list actually accepts an argument usually by convention called self that represents the object/instance of the type/class list, in this case L. And sort just like a standalone sorted function but just applies the sorting logic to L itself. Comparatively, sorted function would require an iterable (a list, for example), to be its required argument in order to function.
Code example:
my_list = [2, 1, 3]
# .sort() is a list method that applies the sorting logic to a
# specific instance of list, in this case, my_list
my_list.sort()
# sorted is a built-in function that's more generic which can be
# used on any iterable of Python, including list type
sorted(my_list)

There's a bigger difference between methods and functions than just their syntax.
def foo():
print "function!"
class ExampleClass(object):
def foo(self):
print "method!"
In this example, i defined a function foo and a class ExampleClass with 1 method, foo.
Let's try to use them:
>>> foo()
function!
>>> e = ExampleClass()
>>> e.foo()
method!
>>> l = [3,4,5]
>>> l.foo()
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
l.foo()
AttributeError: 'list' object has no attribute 'foo'
>>>
Even though both have the same name, Python knows that if you do foo(), your calling a function, so it'll check if there's any function defined with that name.
And if you do a.foo(), it knows you're calling a method, so it'll check if there's a method foo defined for objects of the type a has, and if there is, it will call it. In the last example, we try that with a list and it gives us an error because lists don't have a foo method defined.

Related

Why does the Method Resolution Order effect the behavior of my code?

I was looking into how the order in which you declare classes to inherit from affects Method Resolution Order (Detailed Here By Raymond Hettinger). I personally was using this to elegantly create an Ordered Counter via this code:
class OrderedCounter(Counter, OrderedDict):
pass
counts = OrderedCounter([1, 2, 3, 1])
print(*counts.items())
>>> (1, 2) (2, 1) (3, 1)
I was trying to understand why the following didn't work similarly:
class OrderedCounter(OrderedDict, Counter):
pass
counts = OrderedCounter([1, 2, 3, 1])
print(*counts.items())
>>> TypeError: 'int' object is not iterable
While I understand that on a fundamental level this is because the OrderedCounter object is using the OrderedDict.__init__() function in the second example which according to the documentation only accepts "[items]". In the first example however the Counter.__init__() function is used which according to the documentation accepts "[iterable-or-mapping]" thus it can take the list as an input.
I wanted to further understand this interaction specifically though so I went to look at the actual source. When I looked at the OrderedDict.__init__() function I noticed that after some error handling it made a call to self.update(*args, **kwds). However, the code simply has the line update = MutableMapping.update which I can't find much documentation on.
I guess I would just like a more concrete answer as to why the second code block doesn't work.
Note: For context, I have a decent amount of programming experience but I'm new to python and OOP in Python
TLDR: How/Why does the Method Resolution Order interfere with the second code block?
In your second example, class OrderedCounter(OrderedDict, Counter): the object looks in OrderedDict first which uses the update method from MutableMapping.
MutableMapping is an Abstract Base Class in collections._abc. Its update method source is here. You can see that if the other argument is not a mapping it will try to iterate over other unpacking a key and value on each iteration.
for key, value in other:
self[key] = value
If other is a sequence of tuples it would work.
>>> other = ((1,2),(3,4))
>>> for key,value in other:
print(key,value)
1 2
3 4
>>>
But if other is a sequence of single items it will throw the error when it tries to unpack a single value into two names/variables.
>>> other = (1,2,3,4)
>>> for key,value in other:
print(key,value)
Traceback (most recent call last):
File "<pyshell#50>", line 1, in <module>
for key,value in other:
TypeError: cannot unpack non-iterable int object
>>>
Whearas collections.Counter's update method calls a different function if other is not a Mapping.
else:
_count_elements(self, iterable)
_count_elements adds keys for new items (with a count of zero) or adds one to the count of existing keys.
As you probably discovered if a class inherits from two classes it will look in the first class to find an attribute, if it isn't there it will look in the second class.
>>> class A:
def __init__(self):
pass
def f(self):
print('class A')
>>> class B:
def __init__(self):
pass
def f(self):
print('class B')
>>> class C(A,B):
pass
>>> c = C()
>>> c.f()
class A
>>> class D(B,A):
pass
>>> d = D()
>>> d.f()
class B
In mro, children precede their parents and the order of appearance in __bases__ is respected.
In the first example, Counter is a subclass of dict. When OrderedDict is provided along with Counter, the parent dict of Counter is replaced by OrderedDict and the code works seamlessly.
In the second example, OrderedDict is again a subclass of dict. When Counter is provided along with OrderedDict, it tries to replace the parent dict of OrderedDict with Counter, which is counter intuitive (pun intended). Hence the error!!
I hope this layman explaination helps you. Just think about that for a moment.

How to call method on multiple objects with similar names and same type?

How might I go about simplifying this code, preferably putting it into a loop or one line?
object1.callMethod()
object2.callMethod()
object3.callMethod()
difObject1.callDifMethod()
difObject2.callDifMethod()
difObject3.callDifMethod()
I know that I can put the objects in a list and iterate through that, however, that would still require two separate loops and two separate lists. Is there any way that I can make a singular list of [1,2,3] and use that to distinguish between the three different objects of each type, since the numbers are in the object names as well?
getattr(object, method_name)()
If all of the method and object names are generally semantic, you can use getattr to reference the method based on a string variable, and then call it with ().
objects = [object1, object2, object3]
for object in objects:
getattr(object, method_name)()
If you want to run the objects/method in parallel, use zip.
objects = [object1, object2, object3]
methods = ['method1name', 'method2name', 'method3name']
for object, method in zip(objects, methods):
getattr(object, method)()
You could use a dictionary approach:
methods = {cls1.method1: [cls1_obj1, cls1_obj2, cls1_obj3],
cls1.method2: [cls1_obj4, cls1_obj5, cls1_obj6],
cls2.method1: [cls2_obj1, cls2_obj2}
for method, objs in methods.items():
for obj in objs:
method(obj)
This assumes you are using an instance method though. For a static/class method you'll need to adjust the object being passed for the method.
I'm not sure there's anything elegant that doesn't involve predefining multiples (or combinations) of lists and dicts and loop over it, since you would need to be explicit in which object runs which methods, a definition is required either way.
Ideally, if you have multiple similar objects of the same class, you might opt to instantiate them in a list from the get go:
# Instead of this
object1 = Foo(1)
object2 = Foo(2)
object3 = Foo(3)
...
# do this
foos = [Foo(i) for i in range(3)]
# Or this
bars = {name: Bar(name) for name in list_of_names}
Then it becomes trivial to manipulate them in group:
for foo in foos:
foo.foo_method()
for bar in bars.values():
bar.bar_method()
While still easy to reference the object on its own:
foo[index].foo_method()
bar[key].bar_method()
You could use the eval function:
>>> for i in range(1,4):
>>> eval("object%d" % i).callMethod()
>>> eval("difObject%d" % i).callDifMethod()

Difference between common method VS operator in Python data type as list [duplicate]

My question:
It seems that __getattr__ is not called for indexing operations, ie I can't use __getattr__ on a class A to provide A[...]. Is there a reason for this? Or a way to get around it so that __getattr__ can provide that functionality without having to explicitly define __getitem__, __setitem__, etc on A?
Minimal Example:
Let's say I define two nearly identical classes, Explicit and Implicit. Each creates a little list self._arr on initiation, and each defines a __getattr__ that just passes all attribute requests to self._arr. The only difference is that Explicit also defines __getitem__ (by just passing it on to self._arr).
# Passes all attribute requests on to a list it contains
class Explicit():
def __init__(self):
self._arr=[1,2,3,4]
def __getattr__(self,attr):
print('called __getattr_')
return getattr(self._arr,attr)
def __getitem__(self,item):
return self._arr[item]
# Same as above but __getitem__ not defined
class Implicit():
def __init__(self):
self._arr=[1,2,3,4]
def __getattr__(self,attr):
print('called __getattr_')
return getattr(self._arr,attr)
This works as expected:
>>> e=Explicit()
>>> print(e.copy())
called __getattr_
[1, 2, 3, 4]
>>> print(hasattr(e,'__getitem__'))
True
>>> print(e[0])
1
But this doesn't:
>>> i=Implicit()
>>> print(i.copy())
called __getattr_
[1, 2, 3, 4]
>>> print(hasattr(i,'__getitem__'))
called __getattr_
True
>>> print(i.__getitem__(0))
called __getattr_
1
>>> print(i[0])
TypeError: 'Implicit' object does not support indexing
Python bypasses __getattr__, __getattribute__, and the instance dict when looking up "special" methods for implementing language mechanics. (For the most part, special methods are ones with two underscores on each side of the name.) If you were expecting i[0] to invoke i.__getitem__(0), which would in turn invoke i.__getattr__('__getitem__')(0), that's why that didn't happen.

Most pythonic way of ensuring a list of objects contains only unique items

I have a list of objects (Foo). A Foo object has several attributes. An instance of a Foo object is equivalent (equal) to another instance of a Foo object iff (if and only if) all the attributes are equal.
I have the following code:
class Foo(object):
def __init__(self, myid):
self.myid=myid
def __eq__(self, other):
if isinstance(other, self.__class__):
print 'DEBUG: self:',self.__dict__
print 'DEBUG: other:',other.__dict__
return self.__dict__ == other.__dict__
else:
print 'DEBUG: ATTEMPT TO COMPARE DIFFERENT CLASSES:',self.__class__,'compared to:', other.__class__
return False
import copy
f1 = Foo(1)
f2 = Foo(2)
f3 = Foo(3)
f4 = Foo(4)
f5 = copy.deepcopy(f3) # overkill here (I know), but needed for my real code
f_list = [f1,f2,f3,f4,f5]
# Surely, there must be a better way? (this dosen't work BTW!)
new_foo_list = list(set(f_list))
I often used this little (anti?) 'pattern' above (converting to set and back), when dealing with simple types (int, float, string - and surprisingly datetime.datetime types), but it has come a cropper with the more involved data type - like Foo above.
So, how could I change the list f1 above into a list of unique items - without having to loop through each item and doing a check on whether it already exists in some temporary cache etc etc?.
What is the most pythonic way to do this?
First, I want to emphasize that using set is certainly not an anti-pattern. sets eliminate duplicates in O(n) time, which is the best you can do, and way better than the naive O(n^2) solution of comparing every item to every other item. It's even better than sorting -- and indeed, it seems your data structure might not even have a natural order, in which case sorting doesn't make a lot of sense.
The problem with using a set in this case is that you have to define a custom __hash__ method. Others have said this. But whether or not you can do so easily is an open question -- it depends on details about your actual class that you haven't told us. For example, if any attributes of a Foo object above are not hashable, then creating a custom hash function is going to be difficult, because you'll have to not only write a custom hash for Foo objects, you'll also have to write custom hashes for every other type of object!
So you need to tell us more about what kinds of attributes your class has if you want a conclusive answer. But I can offer some speculation.
Assuming that a hash function could be written for Foo objects, but also assuming that that Foo objects are mutable and so really shouldn't have a __hash__ method, as Niklas B. points out, here is one workable approach. Create a function freeze that, given a mutable instance of Foo, returns an immutable collection of the data in Foo. So for example, say Foo has a dict and a list in it; freeze returns a tuple containing a tuple of tuples (representing the dict) and another tuple (representing the list). The function freeze should have the following property:
freeze(a) == freeze(b)
If and only if
a == b
Now pass your list through the following code:
dupe_free = dict((freeze(x), x) for x in dupe_list).values()
Now you have a dupe free list in O(n) time. (Indeed, after adding this suggestion, I saw that fraxel suggested something similar; but I think using a custom function -- or even a method -- (x.freeze(), x) -- is the better way to go, rather than relying on __dict__ as he does, which can be unreliable. The same goes for your custom __eq__ method, IMO -- __dict__ is not always a safe shortcut for various reasons I can't get into here.)
Another approach would be to use only immutable objects in the first place! For example, you could use namedtuples. Here's an example stolen from the python docs:
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(11, y=22) # instantiate with positional or keyword arguments
>>> p[0] + p[1] # indexable like the plain tuple (11, 22)
33
>>> x, y = p # unpack like a regular tuple
>>> x, y
(11, 22)
>>> p.x + p.y # fields also accessible by name
33
>>> p # readable __repr__ with a name=value style
Point(x=11, y=22)
Have you tried using a set (or frozenset)? It's explicitly for holding a unique set of items.
You'll need to create an appropriate __hash__ method, though. set (and frozenset) use the __hash__ method to hash objects; __eq__ is only used on a collision, AFAIK. Accordingly, you'll want to use a hash like hash(frozenset(self.__dict__.items())).
According to the documentation, you need to define __hash__() and __eq__() for your custom class to work correctly with a set or frozenset, as both are implemented using hash tables in CPython.
If you implement __hash__, keep in mind that if a == b, then hash(a) must equal hash(b). Rather than comparing the whole __dict__s, I suggest the following more straightforward implementation for your simple class:
class Foo(object):
def __init__(self, myid):
self.myid = myid
def __eq__(self, other):
return isinstance(other, self.__class__) and other.myid == self.myid
def __hash__(self):
return hash(self.myid)
If your object contains mutable attributes, you simply shouldn't put it inside a set or use it as a dictionary key.
Here is an alternative method, just make a dictionary keyed by __dict__.items() for the instances:
f_list = [f1,f2,f3,f4,f5]
f_dict = dict([(tuple(i.__dict__.items()), i) for i in f_list])
print f_dict
print f_dict.values()
#output:
{(('myid', 1),): <__main__.Foo object at 0xb75e190c>,
(('myid', 2),): <__main__.Foo object at 0xb75e184c>,
(('myid', 3),): <__main__.Foo object at 0xb75e1f6c>,
(('myid', 4),): <__main__.Foo object at 0xb75e1cec>}
[<__main__.Foo object at 0xb75e190c>,
<__main__.Foo object at 0xb75e184c>,
<__main__.Foo object at 0xb75e1f6c>,
<__main__.Foo object at 0xb75e1cec>]
This way you just let the dictionary take care of the uniqueness based on attributes, and can easily retrieve the objects by getting the values.
If you are allowed you can use a set http://docs.python.org/library/sets.html
list = [1,2,3,3,45,4,45,6]
print set(list)
set([1, 2, 3, 4, 6, 45])
x = set(list)
print x
set([1, 2, 3, 4, 6, 45])

Subclassing and overriding a generator function in python

I need to override a method of a parent class, which is a generator, and am wondering the correct way to do this. Is there anything wrong with the following, or a more efficient way?
class A:
def gen(self):
yield 1
yield 2
class B(A):
def gen(self):
yield 3
for n in super().gen():
yield n
For Python 3.3 and up, the best, most general way to do this is:
class A:
def gen(self):
yield 1
yield 2
class B(A):
def gen(self):
yield 3
yield from super().gen()
This uses the new yield from syntax for delegating to a subgenerator. It's better than the other solutions because it's actually handing control to the generator it delegates to; if said generator supports .send and .throw to pass values and exceptions into the generator, then delegation means it actually receives the values; explicitly looping and yielding one by one will receive the values in the gen wrapper, not the generator actually producing the values, and the same problem applies to other solutions like using itertools.chain.
What you have looks fine, but is not the only approach. What's important about a generator function is that it returns an iterable object. Your subclass could thus instead directly create an iterable, for example:
import itertools
class B(A):
def gen(self):
return itertools.chain([3], super().gen())
The better approach is going to depend on exactly what you're doing; the above looks needlessly complex, but I wouldn't want to generalize from such a simple example.
To call a method from a subclass you need the keyword super.
New Source Code:
class B(A):
def gen(self):
yield 3
for n in super().gen():
yield n
This:
b = B()
for i in b.gen():
print(i)
produces the output:
3
1
2
In the first Iteration your generator stops at '3', for the following iterations it just goes on as the superclass normally would.
This Question provides a really good and lengthy explanation of generators, iterators and the yield- keyword:
What does the "yield" keyword do in Python?
Your code is correct.
Or rather, I don't see problem in it and it apparently runs correctly.
The only thing I can think of is the following one.
.
Post-scriptum
For new-style classes, see other answers that use super()
But super() only works for new-style classes
Anyway, this answer could be useful at least, but only, for classic-style classes.
.
When the interpreter arrives on the instruction for n in A.gen(self):, it must find the function A.gen.
The notation A.gen doesn't mean that the object A.gen is INSIDE the object A.
The object A.gen is SOMEWHERE in the memory and the interpreter will know where to find it by obtaining the needed information (an address) from A.__dict__['gen'] , in which A.__dict__ is the namespace of A.
So, finding the function object A.gen in the memory requires a lookup in A.__dict__
But to perform this lookup, the interpreter must first find the object A itself.
So, when it arrives on the instruction for n in A.gen(self): , it first searches if the identifier A is among the local identifiers, that is to say it searches for the string 'A' in the local namespace of the function (of which I don't know the name).
Since it is not, the interpreter goes outside the function and searches for this identifier at the module level, in the global namespace (which is globals() )
At this point, it may be that the global namespace would have hundreds or thousands of attributes names among which to perform the lookup for 'A'.
However, A has very few attributes: its __dict__ 's keys are only '_ module _' , 'gen' and '_ doc _' (to see that, make print A.__dict__ )
So, it would be a pity that the little search for the string 'gen' in A._dict_ should be done after a search among hundreds of items in the dictionary-namespace globals() of the module level.
.
That's why I suggest another way to make the interpreter able to find the function A.gen
class A:
def gen(self):
yield 1
yield 2
class BB(A):
def gen(self):
yield 3
for n in self.__class__.__bases__[0].gen(self):
yield n
bb = BB()
print list(bb.gen()) # prints [3, 1, 2]
self._class_ is the class from which has been instanciated the instance, that is to say it is Bu
self._class_._bases_ is a tuple containing the base classes of Bu
Presently there is only one element in this tuple , so self._class_._bases_[0] is A
__class__ and __bases__ are names of special attributes that aren't listed in _dict_ ;
In fact _class_ , _bases_ and _dict_ are special attributes of similar nature, they are Python-provided attributes, see:
http://www.cafepy.com/article/python_attributes_and_methods/python_attributes_and_methods.html
.
Well, what I mean , in the end, is that there are few elements in self._class_ and in self._class_._bases_ , so it is rational to think that the successive lookups in these objects to finally find the way to access to A.gen will be faster than the lookup to search for 'gen' in the global namespace in case this one contains hundreds of elements.
Maybe that's trying to do too much optimization, maybe not.
This answer is mainly to give information on the underlying implied mechanisms, that I personally find interesting to know.
.
Edit
You can obtain the same as your code with a more concise instruction
class A:
def gen(self):
yield 1
yield 2
class Bu(A):
def gen(self):
yield 3
for n in A.gen(self):
yield n
b = Bu()
print 'list(b.gen()) ==',list(b.gen())
from itertools import chain
w = chain(iter((3,)),xrange(1,3))
print 'list(w) ==',list(w)
produces
list(b.gen()) == [3, 1, 2]
list(w) == [3, 1, 2]
If A.gen() also may contain a return statement, then you also need to make sure your override returns with a value. This is easiest done as follows:
class A:
def gen(self):
yield 1
return 2
class B:
def gen(self):
yield 3
ret = yield from super().gen()
return ret
This gives:
>>> i = A.gen()
>>> next(i)
1
>>> next(i)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration: 2
>>> i = B.gen()
>>> next(i)
3
>>> next(i)
1
>>> next(i)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration: 2
Without an explicit return statement, the last line is StopIteration instaed of StopIteration: 2.

Categories