numpy.r_ is not a function. What is it? - python

According to the numpy/scipy doc on numpy.r_ here, it is "not a function, so takes no parameters".
If it is not a function, what is the proper term for "functions" such as numpy.r_?

I would argue that for all purposes r_ is a function, but one implemented by a clever hack using different syntax. Mike already explained how r_ is in reality not a function, but a class instance of RClass, which has __getitem__ implemented, so that you can use it as r_[1]. The cosmetic difference is that you use square brackets instead of curved ones, so you are not doing a function call, but you are actually indexing the object. Although this is technically true, for all purposes, it works just like a function call, but one that allows some extra syntax not allowed by a normal function.
The motivation for creating r_ probably comes from Matlab's syntax, which allows to construct arrays in a very compact way, like x = [1:10, 15, 20:10:100]. To achieve the same in numpy, you would have to do x = np.hstack((np.arange(1,11), 15, np.arange(20,110,10))). Using colons to create ranges is not allowed in python, but they do exist in the form of the slice notation to index into a list, like L[3:5], and even A[2:10, 20:30] for multi-dimensional arrays. Under the hood, these index notation gets transformed to a call to the __getitem__ method of the object, where the colon notation gets transformed into a slice object:
In [13]: class C(object):
...: def __getitem__(self, x):
...: print x
In [14]: c = C()
In [15]: c[1:11, 15, 20:110:10]
(slice(1, 11, None), 15, slice(20, 110, 10))
The r_ object 'abuses' this fact to create a 'function' that accepts slice notation, which also does some additional things like concatenating everything together and returning the result, so that you can write x = np.r_[1:11, 15, 20:110:10]. The "Not a function, so takes no parameters" in the documentation is slightly misleading ...

It's a class instance (aka an object):
In [2]: numpy.r_
Out[2]: <numpy.lib.index_tricks.RClass at 0x1923710>
A class is a construct which is used to define a distinct type - as such a class allows instances of itself. Each instance can have properties (member/instance variables and methods).
One of the methods a class can have is the __getitem__ method, this is called whenever you append [something,something...something] to the name of the instance. In the case of the numpy.r_ instance the method returns a numpy array.
Take the following class for example:
class myClass(object)
def __getitem__(self,i)
return i*2
Look at these outputs for the above class:
In [1]: a = myClass()
In [2]: a[3]
Out[2]: 6
In [3]: a[3,4]
Out[3]: (3, 4, 3, 4)
I am calling the __getitem__ method of myClass (via the [] parentheses) and the __getitem__ method is returning (the contents of a list * 2 in this case)- it is not the class/instance behaving as a function - it is the __getitem__ function of the myClass instance which is being called.
On a final note, you will notice that to instantiate myClass I had to do a = myClass() whereas to get an instance of RClass you use numpy.r_ This is because numpy instantiates RClass and binds it to the name numpy.r_ itself. This is the relevant line in the numpy source code. In my opinion this is rather ugly and confusing!

Related

Have a result be both a class and destructurable as a tuple

I came across a method in Python that returns a class, but can be destructured as if it's a tuple.
How can you define a result of a function to be both an instance of a class AND use destructure assignment as if it's a tuple?
An example where you see this behavior:
import scipy.stats as stats
res = stats.ttest_ind(data1, data2)
print(type(res)) # <class 'scipy.stats.stats.Ttest_indResult'>
# One way to assign values is by directly accessing the instance's properties.
p = res.pvalue
t = res.statistic
# A second way is to treat the result as a tuple, and assign to variables directly. But how is this working?
# We saw above that the type of the result is NOT a tuple but a class. How would Python know the order of the properties here? (It's not like we're destructuring based on named properties)
t, p = stats.ttest_ind(data1, data2)
It's a named tuple, which is basically an extension to tuple type in python.
To unpack a data type with a, b = some_object, the object on the right side needs to be iterable. A list or tuple works, obviously, but you can make your own class iterable by implementing an __iter__ method.
For example, the following class would behave consistently with the interface you've shown the Ttest_indResult class to have (though it's probably implemented very differently):
class MyClass:
def __init__(self, statistic, pvalue):
self.statistic = statistic # these attributes are accessible by name
self.pvalue = pvalue
def __iter__(self): # but you can also iterate to get the same values
yield self.statistic
yield self.pvalue

Difference between common method VS operator in Python data type as list [duplicate]

My question:
It seems that __getattr__ is not called for indexing operations, ie I can't use __getattr__ on a class A to provide A[...]. Is there a reason for this? Or a way to get around it so that __getattr__ can provide that functionality without having to explicitly define __getitem__, __setitem__, etc on A?
Minimal Example:
Let's say I define two nearly identical classes, Explicit and Implicit. Each creates a little list self._arr on initiation, and each defines a __getattr__ that just passes all attribute requests to self._arr. The only difference is that Explicit also defines __getitem__ (by just passing it on to self._arr).
# Passes all attribute requests on to a list it contains
class Explicit():
def __init__(self):
self._arr=[1,2,3,4]
def __getattr__(self,attr):
print('called __getattr_')
return getattr(self._arr,attr)
def __getitem__(self,item):
return self._arr[item]
# Same as above but __getitem__ not defined
class Implicit():
def __init__(self):
self._arr=[1,2,3,4]
def __getattr__(self,attr):
print('called __getattr_')
return getattr(self._arr,attr)
This works as expected:
>>> e=Explicit()
>>> print(e.copy())
called __getattr_
[1, 2, 3, 4]
>>> print(hasattr(e,'__getitem__'))
True
>>> print(e[0])
1
But this doesn't:
>>> i=Implicit()
>>> print(i.copy())
called __getattr_
[1, 2, 3, 4]
>>> print(hasattr(i,'__getitem__'))
called __getattr_
True
>>> print(i.__getitem__(0))
called __getattr_
1
>>> print(i[0])
TypeError: 'Implicit' object does not support indexing
Python bypasses __getattr__, __getattribute__, and the instance dict when looking up "special" methods for implementing language mechanics. (For the most part, special methods are ones with two underscores on each side of the name.) If you were expecting i[0] to invoke i.__getitem__(0), which would in turn invoke i.__getattr__('__getitem__')(0), that's why that didn't happen.

Python: item.method() and function(item)

What is the logic for picking some methods to be prefixed with the items they are used with, but some are functions that need items as the arguments?
For example:
L=[1,4,3]
print len(L) #function(item)
L.sort() #item.method()
I thought maybe the functions that modify the item need to be prefixed while the ones that return information about the item use it as an argument, but I'm not too sure.
Edit:
What I'm trying to ask is why does python not have L.len()? What is the difference between the nature of the two kinds of functions? Or was it randomly chosen that some operations will be methods while some will be functions?
One of the principles behind Python is There is Only One Way to Do It. In particular, to get the length of a sequence (array / tuple / xrange...), you always use len, regardless of the sequence type.
However, sorting is not supporting on all of those sequence types. This makes it more suitable to being a method.
a = [0,1,2]
b = (0,1,2)
c = xrange(3)
d = "abc"
print len(a), len(b), len(c), len(d) # Ok
a.sort() # Ok
b.sort() # AttributeError: 'tuple' object has no attribute 'sort'
c.sort() # AttributeError: 'xrange' object has no attribute 'sort'
d.sort() # AttributeError: 'str' object has no attribute 'sort'
Something that may help you understand a bit better: http://www.tutorialspoint.com/python/python_classes_objects.htm
What you describe as item.function() is actually a method, which is defined by the class that said item belongs to. You need to form a comprehensive understanding of function, class, object, method and maybe more in Python.
Just conceptually speaking, when you call L.sort(), the sort() method of type/class list actually accepts an argument usually by convention called self that represents the object/instance of the type/class list, in this case L. And sort just like a standalone sorted function but just applies the sorting logic to L itself. Comparatively, sorted function would require an iterable (a list, for example), to be its required argument in order to function.
Code example:
my_list = [2, 1, 3]
# .sort() is a list method that applies the sorting logic to a
# specific instance of list, in this case, my_list
my_list.sort()
# sorted is a built-in function that's more generic which can be
# used on any iterable of Python, including list type
sorted(my_list)
There's a bigger difference between methods and functions than just their syntax.
def foo():
print "function!"
class ExampleClass(object):
def foo(self):
print "method!"
In this example, i defined a function foo and a class ExampleClass with 1 method, foo.
Let's try to use them:
>>> foo()
function!
>>> e = ExampleClass()
>>> e.foo()
method!
>>> l = [3,4,5]
>>> l.foo()
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
l.foo()
AttributeError: 'list' object has no attribute 'foo'
>>>
Even though both have the same name, Python knows that if you do foo(), your calling a function, so it'll check if there's any function defined with that name.
And if you do a.foo(), it knows you're calling a method, so it'll check if there's a method foo defined for objects of the type a has, and if there is, it will call it. In the last example, we try that with a list and it gives us an error because lists don't have a foo method defined.

Prototype pattern in Python

I have the following implementation of the prototype pattern in Python 2.7:
def clone (instance):
x = object.__new__ (type (instance))
x.__dict__ = dict (instance.__dict__)
return x
This clearly doesn't work for non new-style classes (old style classes?) and built-ins like dict.
Is there a way, cleanly and staying within Python 2 to extend this to mutable built in types like sequence and mapping types?
I think that you can use the copy module
The deepcopy method creates 1-1 copy of an object:
>>> import copy
>>> x = [1,2,3]
>>> z = copy.deepcopy(x)
>>> x[0] = 3
>>> x
[3, 2, 3]
>>> z
[1, 2, 3]
it from book Python in practice Book by Mark Summerfield
you can create copy of object
class Point:
__slots__ = ("x", "y")
def __init__(self, x, y):
self.x = x
self.y = y
def make_object(Class,*args,**kwargs):
return Class(*args,**kwargs)
point1 = Point(1, 2)
point2 = eval("{}({}, {})".format("Point", 2, 4)) # Risky
point3 = getattr(sys.modules[__name__], "Point")(3, 6)
point4 = globals()["Point"](4, 8)
point5 = make_object(Point, 5, 10)
point6 = copy.deepcopy(point5)
point6.x = 6
point6.y = 12
point7 = point1.__class__(7, 14) # Could have used any of point1 to point6
schema and example of use code you can read on tutorialspoint, it for java, but principle is comprehensible
What you're trying to do is misguided.
If you want just want a copy, just use copy or deepcopy.
If you want JavaScript-style objects, create a root class that you can clone—or, better, rethink your code in Python instead of JavaScript.
If you want fully-functional JavaScript-style cloning even for builtins, you can't have that.
Also, keep in mind that the main motivations for using the prototype pattern in non-prototype-based languages like Java and C++ are (a) to avoid the cost of newing up an object and (b) to allow adding different methods or attributes to different instances. For the former, you're not avoiding the cost, and it doesn't matter anyway in Python. For the latter, it's already easy to add methods and attributes to instances in Python, and cloning doesn't make it any easier in any way.
I'm aware that there is something different going on with numeric types, compared with other built in types…
No, the distinction here isn't numbers vs. sequences, it's immutable types vs. mutable:
>>> id(tuple())
4298170448
>>> id(tuple())
4298170448
>>> id(tuple(()))
4298170448
>>> id(tuple([]))
4298170448
Also, it has nothing to do with the int or tuple constructor being fancy. If the value isn't cached anywhere, even a repeated literal will get a new instance each time:
>>> id(20000)
4439747152
>>> id(20000)
4439747216
Small integers, the empty tuple, and the values of the un-assginable magic constants are pre-cached at startup. Short strings generally get interned. But the exact details are implementation-dependent.
So, how does this affect you?
Well, cloning immutable types is pointless. By definition, it's an unchangeable copy of the same thing, so what good can that do?
Meanwhile, types that don't use __dict__ for their storage can't be cloned in this way (whether they're builtin types, types that use slots, types that generate their attributes dynamically, …). In some cases you'll get an error, in others just the wrong behavior.
So, if by "sequence, mapping and numeric types" you include builtins like int and list, stdlib types like Decimal and deque, or common third-party types like gmpy.mpz or blist.sorteddict, then this mechanism will not work.
On top of that, even if you could clone builtin classes, you can't add new attributes to them:
>>> a = []
>>> a.foo = 3
AttributeError: 'list' object has no attribute 'foo'
So, if you got this to work, again, it wouldn't be useful anyway.
Meanwhile, calling object.__new__ instead of type(instance).__new__ can cause a variety of problems for different classes. Some of them, like __dict__, will give you an error telling you so, but you can't count on that in every case.
There are other, less serious, problems with the idea. For example:
>>> class Foo(object):
... def __init__(self):
... self.__private = 3
>>> foo = Foo()
>>> foo.__private
3
>>> bar = clone(foo)
>>> bar.__private
AttributeError: 'Foo' object has no attribute '__private'
>>> bar._Foo__private
3

Most pythonic way of ensuring a list of objects contains only unique items

I have a list of objects (Foo). A Foo object has several attributes. An instance of a Foo object is equivalent (equal) to another instance of a Foo object iff (if and only if) all the attributes are equal.
I have the following code:
class Foo(object):
def __init__(self, myid):
self.myid=myid
def __eq__(self, other):
if isinstance(other, self.__class__):
print 'DEBUG: self:',self.__dict__
print 'DEBUG: other:',other.__dict__
return self.__dict__ == other.__dict__
else:
print 'DEBUG: ATTEMPT TO COMPARE DIFFERENT CLASSES:',self.__class__,'compared to:', other.__class__
return False
import copy
f1 = Foo(1)
f2 = Foo(2)
f3 = Foo(3)
f4 = Foo(4)
f5 = copy.deepcopy(f3) # overkill here (I know), but needed for my real code
f_list = [f1,f2,f3,f4,f5]
# Surely, there must be a better way? (this dosen't work BTW!)
new_foo_list = list(set(f_list))
I often used this little (anti?) 'pattern' above (converting to set and back), when dealing with simple types (int, float, string - and surprisingly datetime.datetime types), but it has come a cropper with the more involved data type - like Foo above.
So, how could I change the list f1 above into a list of unique items - without having to loop through each item and doing a check on whether it already exists in some temporary cache etc etc?.
What is the most pythonic way to do this?
First, I want to emphasize that using set is certainly not an anti-pattern. sets eliminate duplicates in O(n) time, which is the best you can do, and way better than the naive O(n^2) solution of comparing every item to every other item. It's even better than sorting -- and indeed, it seems your data structure might not even have a natural order, in which case sorting doesn't make a lot of sense.
The problem with using a set in this case is that you have to define a custom __hash__ method. Others have said this. But whether or not you can do so easily is an open question -- it depends on details about your actual class that you haven't told us. For example, if any attributes of a Foo object above are not hashable, then creating a custom hash function is going to be difficult, because you'll have to not only write a custom hash for Foo objects, you'll also have to write custom hashes for every other type of object!
So you need to tell us more about what kinds of attributes your class has if you want a conclusive answer. But I can offer some speculation.
Assuming that a hash function could be written for Foo objects, but also assuming that that Foo objects are mutable and so really shouldn't have a __hash__ method, as Niklas B. points out, here is one workable approach. Create a function freeze that, given a mutable instance of Foo, returns an immutable collection of the data in Foo. So for example, say Foo has a dict and a list in it; freeze returns a tuple containing a tuple of tuples (representing the dict) and another tuple (representing the list). The function freeze should have the following property:
freeze(a) == freeze(b)
If and only if
a == b
Now pass your list through the following code:
dupe_free = dict((freeze(x), x) for x in dupe_list).values()
Now you have a dupe free list in O(n) time. (Indeed, after adding this suggestion, I saw that fraxel suggested something similar; but I think using a custom function -- or even a method -- (x.freeze(), x) -- is the better way to go, rather than relying on __dict__ as he does, which can be unreliable. The same goes for your custom __eq__ method, IMO -- __dict__ is not always a safe shortcut for various reasons I can't get into here.)
Another approach would be to use only immutable objects in the first place! For example, you could use namedtuples. Here's an example stolen from the python docs:
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(11, y=22) # instantiate with positional or keyword arguments
>>> p[0] + p[1] # indexable like the plain tuple (11, 22)
33
>>> x, y = p # unpack like a regular tuple
>>> x, y
(11, 22)
>>> p.x + p.y # fields also accessible by name
33
>>> p # readable __repr__ with a name=value style
Point(x=11, y=22)
Have you tried using a set (or frozenset)? It's explicitly for holding a unique set of items.
You'll need to create an appropriate __hash__ method, though. set (and frozenset) use the __hash__ method to hash objects; __eq__ is only used on a collision, AFAIK. Accordingly, you'll want to use a hash like hash(frozenset(self.__dict__.items())).
According to the documentation, you need to define __hash__() and __eq__() for your custom class to work correctly with a set or frozenset, as both are implemented using hash tables in CPython.
If you implement __hash__, keep in mind that if a == b, then hash(a) must equal hash(b). Rather than comparing the whole __dict__s, I suggest the following more straightforward implementation for your simple class:
class Foo(object):
def __init__(self, myid):
self.myid = myid
def __eq__(self, other):
return isinstance(other, self.__class__) and other.myid == self.myid
def __hash__(self):
return hash(self.myid)
If your object contains mutable attributes, you simply shouldn't put it inside a set or use it as a dictionary key.
Here is an alternative method, just make a dictionary keyed by __dict__.items() for the instances:
f_list = [f1,f2,f3,f4,f5]
f_dict = dict([(tuple(i.__dict__.items()), i) for i in f_list])
print f_dict
print f_dict.values()
#output:
{(('myid', 1),): <__main__.Foo object at 0xb75e190c>,
(('myid', 2),): <__main__.Foo object at 0xb75e184c>,
(('myid', 3),): <__main__.Foo object at 0xb75e1f6c>,
(('myid', 4),): <__main__.Foo object at 0xb75e1cec>}
[<__main__.Foo object at 0xb75e190c>,
<__main__.Foo object at 0xb75e184c>,
<__main__.Foo object at 0xb75e1f6c>,
<__main__.Foo object at 0xb75e1cec>]
This way you just let the dictionary take care of the uniqueness based on attributes, and can easily retrieve the objects by getting the values.
If you are allowed you can use a set http://docs.python.org/library/sets.html
list = [1,2,3,3,45,4,45,6]
print set(list)
set([1, 2, 3, 4, 6, 45])
x = set(list)
print x
set([1, 2, 3, 4, 6, 45])

Categories