I am new to ML and pandas.
I was going through a jupyter file of a linear regression program.
There I saw
dataframe.head()
dataframe.describe()
dataframe.shape
Why does the first two have parentheses () and shape doesn't?
I try to run dataframe.shape() and it give error TypeError: 'tuple' object is not callable.
Here's a link to the documentation but it didn't help:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html
dataframe.shape looks like a function and functions should have ().
How to know when a function will not have a ()
Classes can have attributes and methods, attributes are .word, while methods are .word(), or .word(arg1, arg2, etc.)
You won't know in advance whether something you want to call is a method or attribute, but you will if you read the documentation for that class. In that documentation, shape is listed under attributes not under methods, so you can infer from that classification how to use it (i.e. without parentheses). Here's the doc link for pandas dataframes: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
Make a habit of reading the documentation and it will save you a lot of headache!
This is a Python question and not limited to Pandas. Python uses parentheses as syntactic sugar to "call" the object you initiated the parentheses after.
Suppose you have some object a (this doesn't have to be a function, though functions ARE callable). Then we "call" a by putting parentheses after it.
a()
If a is "callable" it will do something. Otherwise, you'll get an error.
I'll define a stripped down class and show what happens:
class A():
pass
a = A()
a()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-353-d08e164f0b26> in <module>
3
4 a = A()
----> 5 a()
TypeError: 'A' object is not callable
However, I can make a callable by defining a method __call__
class A():
def __call__(self):
return "Hello World!"
a = A()
a()
'Hello World!'
So to answer your question, df.shape doesn't have a __call__ method. This is common for most attributes of class instances and is definitely true for df.shape.
Related
I recently spent way too long debugging a piece of code, only to realize that the issue was I did not include a () after a command. What is the logic behind which commands require a () and which do not?
For example:
import pandas as pd
col1=['a','b','c','d','e']
col2=[1,2,3,4,5]
df=pd.DataFrame(list(zip(col1,col2)),columns=['col1','col2'])
df.columns
Returns Index(['col1', 'col2'], dtype='object') as expected. If we use .columns() we get an error.
Other commands it is the opposite:
df.isna()
Returns:
col1 col2
0 False False
1 False False
2 False False
3 False False
4 False False
but df.isna returns:
<bound method DataFrame.isna of col1 col2
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5>
Which, while not throwing an error, is clearly not what we're looking for.
What's the logic behind which commands use a () and which do not?
I use pandas as an example here, but I think this is relevant to python more generally.
Because functions need parenthesis for their arguments, while variables do not, that's why it's list.append(<item>) but it's list.items.
If you call a function without the parenthesis like list.append what returns is a description of the function, not a description of what the function does, but a description of what it is.
As for classes, a call to a class with parenthesis initiates an object of that class, while a call to a class without the parenthesis point to the class itself, which means that if you were to execute print(SomeClass) you'd get <class '__main__.SomeClass'> which is a description of what it is, the same kind of response you'd get if you were to call a function without parenthesis.
What's the logic behind which commands use a () and which do not?
An object needs to have a __call__ method associated with it for it to called as a function using ():
class Test:
def __call__(self, arg):
print("Called with", arg)
t = Test() # The Test class object uses __call__ to create instances
t(5) # Then this line prints "Called with 5"
So, the difference is that columns doesn't have a __call__ method defined, while Index and DataFrame do.
TL;DR you just kinda have to know
Nominally, the parens are needed to call a function instead of just returning an object.
foo.bar # get the bar object
foo.bar() # call the bar object
Callable objects have a __call__ method. When python sees the (), it knows to call __call__. This is done at the C level.
In addition, python has the concept of a property. Its a callable data object that looks like a regular data object.
class Foo:
def __init__(self):
self._foo = "foo"
#property
def foo(self):
return "I am " + self._foo
#foo.setter
def foo(self, val):
assert isinstance(val, str)
self._foo = val + " you bet"
f = Foo()
f.foo = "Hello" # calls setter
print(f.foo) # calls getter
Similarly, when python sees array notation foo[1] it will call an object's __getitem__ or __setitem__ methods and the object is free to overload that call in any way it sees fit.
Finally, the object itself can intercept attribute access with __getattr__, __getattribute__ and __setattr__ methods, leaving everything up in the air. In fact, python doesn't really know what getting and setting attributes means. It is calling these methods. Most objects just use the default versions inherited from object. If the class is implemented in C, there is no end to what could be going on in the background.
Python is a dynamic language and many packages add abstractions to make it easier (?) to use their services. The downside is that you may spend more time with help text and documentation than one may like.
Object method vs Object attribute.
Objects has methods and attributes.
Methods require a parenthesis to call them -- even if the method does not require arguments.
Where as attributes are like variables are pointed to objects as the program progresses. You just call these attributes by their name (without parenthesis). Of course you may have to qualify both the methods and attributes with the object names as required.
I created some test code, but I can't really understand why it works.
Shouldn't moo be defined before we can use it?
#!/usr/bin/python3
class Test():
def __init__(self):
self.printer = None
def foo(self):
self.printer = self.moo
self.printer()
def moo(self):
print("Y u printing?")
test = Test()
test.foo()
Output:
$ python test.py
Y u printing?
I know that the rule is define earlier, not higher, but in this case it's neither of those.
There's really nothing to be confused about here.
We have a function that says "when you call foo with a self parameter, look up moo in self's namespace, assign that value to printer in self's namespace, look up printer in self's namespace, and call that value".1
Unless/until you call that function, it doesn't matter whether or not anyone anywhere has an attribute named moo.
When you do call that method, whatever you pass as the self had better have a moo attribute or you're going to get an AttributeError. But this is no different from looking up an attribute on any object. If you write def spam(n): return n.bit_length() as a global function, when you call that function, whatever you pass as the n had better have a bit_length attribute or you're going to get an AttributeError.
So, we're calling it as test.foo(), so we're passing test as self. If you know how attribute lookup works (and there are already plenty of questions and answers on SO about that), you can trace this through. Slightly oversimplified:
Does test.__dict__ have a 'moo'? No.
Does type(test).__dict__ have a 'moo'? Yes. So we're done.
Again, this is the same way we check if 3 has a bit_length() method; there's no extra magic here.
That's really all there is to it.
In particular, notice that test.__dict__ does not have a 'moo'. Methods don't get created at construction time (__new__) any more than they get created at initialization time (__init__). The instance doesn't have any methods in it, because it doesn't have to; they can be looked up on the type.2
Sure, we could get into descriptors, and method resolution order, and object.__getattribute__, and how class and def statements are compiled and executed, and special method lookup to see if there's a custom __getattribute__, and so on, but you don't need any of that to understand this question.
1. If you're confused by this, it's probably because you're thinking in terms of semi-OO languages like C++ and its descendants, where a class has to specify all of its instances' attributes and methods, so the compiler can look at this->moo(), work out that this has a static type ofFoo, work out thatmoois the third method defined onFoo, and compile it into something likethis->vptr2`. If that's what you're expecting, forget all of it. In Python, methods are just attributes, and attributes are just looked up, by name, on demand.
2. If you're going to ask "then why is a bound method not the same thing as a function?", the answer is descriptors. Briefly: when an attribute is found on the type, Python calls the value's __get__ method, passing it the instance, and function objects' __get__ methods return method objects. So, if you want to refer specifically to bound method objects, then they get created every time a method is looked up. In particular, the bound method object does not exist yet when we call foo; it gets created by looking up self.moo inside foo.
While all that #scharette says is likely true (I don't know enough of Python internals to agree with confidence :) ), I'd like to propose an alternative explanation as to why one can instantiate Test and call foo():
The method's body is not executed until you actually call it. It does not matter if foo() contains references to undefined attributes, it will be parsed fine. As long as you create moo before you call foo, you're ok.
Try entering a truncated Test class in your interpreter:
class Test():
def __init__(self):
self.printer = None
def foo(self):
self.printer = self.moo
self.printer()
No moo, so we get this:
>>> test = Test()
>>> test.foo()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in foo
Let's add moo to the class now:
>>> def moo(self):
... print("Y u printing?")
...
>>> Test.moo = moo
>>> test1 = Test()
>>> test1.foo()
Y u printing?
>>>
Alternatively, you can add moo directly to the instance:
>>> def moo():
... print("Y u printing?")
...
>>> test.moo = moo
>>> test.foo()
Y u printing?
The only difference is that the instance's moo does not take a self (see here for explanation).
I was sure that there'd be an answer to this question somewhere on stack overflow, but I haven't been able to find one; most of them are in regards to passing functions, and not methods, as arguments to functions.
I'm currently working with Python 2.7.5 and I'm trying to define a function like this:
def func(object, method):
object.method()
that when called like so:
some_object_instance = SomeObject()
func(some_object_instance, SomeObject.some_object_method)
using a class defined like this:
class SomeObject:
def some_object_method(self):
# do something
is basically equivalent to doing this:
some_object_instance.some_object_method()
I, however, keep getting an attribute error--something along the lines of
'SomeObject' has no attribute 'method'
I was under the impression that I could legally pass methods as arguments and have them evaluate correctly when used in the aforementioned manner. What am I missing?
That's not the way method calling works. The foo.bar syntax looks for a method named bar on the foo object. If you already have the method, just call it:
def func(object, method):
method(object)
func(some_object_instance, SomeObject.some_object_method)
SomeObject.some_object_method is what's called an "unbound method": it's a method object without a self bound into it, so you have to explicitly pass the self to it.
This might make more sense with a concrete example:
>>> s = 'ABC'
>>> s_lower = s.lower # bound method
>>> s_lower()
'abc'
>>> str_lower = str.lower # unbound method
>>> str_lower(s)
'abc'
By comparison, some_object_instance.some_object_method is a "bound method", you can just call it as-is, and some_object_instance is already "bound in" as the self argument:
def func2(method):
method()
func2(some_object_instance.some_object_method)
Unbound methods aren't explained in detail the tutorial; they're covered in the section on bound methods. So you have to go to the reference manual for documentation (in [standard type hierarchy] (https://docs.python.org/2/reference/datamodel.html#the-standard-type-hierarchy), way down in the subsection "User-defined methods"), which can be a little bit daunting for novices.
Personally, I didn't really get this until I learned how it worked under the covers. About a year ago, I wrote a blog post How methods work to try to explain it to someone else (but in Python 3.x terms, which is slightly different); it may help. To really get it, you have to get through the Descriptor HOWTO, which may take a few read-throughs and a lot of playing around in the interactive interpreter before it really clicks, but hopefully you can understand the basic concepts behind methods before getting to that point.
Since you are passing an unbound method to the function, you need to call it as:
method(object)
Or better pass the name of the method as string and then use getattr:
getattr(object, method)()
This question already has answers here:
Why do I get a TypeError that says "takes no arguments (1 given)"? [duplicate]
(4 answers)
Closed 6 years ago.
Something rather odd is happening in an interaction between bound methods, inheritance, and getattr that I am failing to understand.
I have a directory setup like:
/a
__init__.py
start_module.py
/b
__init__.py
imported_module.py
imported_module.py contains a number of class objects one of which is of this form:
class Foo(some_parent_class):
def bar(self):
return [1,2,3]
A function in start_module.py uses inspect to get a list of strings representing the classes in imported_module.py. "Foo" is the first of those strings. The goal is to run bar in start_module.py using that string and getattr.*
To do this I use code in start_module of the form:
for class_name in class_name_list:
instance = getattr(b.imported_module, class_name)()
function = getattr(instance, "bar")
for doodad in [x for x in function()]:
print doodad
Which does successfully start to iterate over the list comprehension, but on the first string, "bar", I get a bizarre error. Despite bar being a bound method, and so as far as I understand expecting an instance of Foo as an argument, I am told:
TypeError: bar() takes no arguments (1 given)
This makes it seem like my call to function() is passing the Foo instance, but the code is not expecting to receive it.
I really have no idea what is going on here and couldn't parse out an explanation through looking on Google and Stack Overflow. Is the double getattr causing some weird interaction? Is my understanding of class objects in Python too hazy? I'd love to hear your thoughts.
*To avoid the anti-pattern, the real end objective is to have start_module.py automatically have access to all methods of name bar across a variety of classes similar to Foo in imported_module.py. I am doing this in the hopes of avoiding making my successors maintain a list for what could be a very large number of Foo-resembling classes.
Answered below: I think the biggest takeaways here are that inspect is very useful, and that if there is a common cause for the bug you are experiencing, make absolutely sure you've ruled that out before moving on to search for other possibilities. In this case I overlooked the fact that the module I was looking at that had correct code might not be the one being imported due to recent edits to the file structure.
Since the sample code you posted is wrong, I'm guessing that you have another module with the Foo class somewhere - maybe bar is defined like this
class Foo(object):
def bar(): # <-- missing self parameter
return [1,2,3]
This does give that error message
>>> Foo().bar()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: bar() takes no arguments (1 given)
Class method have a self argument that is essentially automatically passed. It is just the class instance on which you are calling the method. You don't need to pass another parameter.
I'm not able to reproduce the error you're getting. Here's my attempt at a short, self-contained compilable example, run from the Python shell:
>>> class Foo(object):
def bar(self):
print("Foo.bar!")
>>> import __main__ as mod
>>> cls = getattr(mod, "Foo")
>>> inst = cls()
>>> func = getattr(inst, "bar")
>>> func()
Foo.bar!
Perhaps you can try adapting your inspect based code to an example like this one and see where it is going wrong.
I know a ton has been written on this subject. I cannot, however, absorb much of it. Perhaps because I'm a complete novice teaching myself without the benefit of any training in computer science. Regardless, maybe if some of you big brains chime in on this specific example, you'll help other beginners like me.
So, I've written the following function which works just fine when I call it (as a module?) as it's own file called 'funky.py':
I type the following into my terminal:
python classy.py
and it runs fine.
def load_deck():
suite = ('Spades', 'Hearts')
rank = ('2', '3')
full_deck = {}
i = 0
for s in suite:
for r in rank:
full_deck[i] = "%s of %s" % (r, s)
i += 1
return full_deck
print load_deck()
When I put the same function in a class, however, I get an error.
Here's my code for 'classy.py':
class GAME():
def load_deck():
suite = ('Spades', 'Hearts')
rank = ('2', '3')
full_deck = {}
i = 0
for s in suite:
for r in rank:
full_deck[i] = "%s of %s" % (r, s)
i += 1
return full_deck
MyGame = GAME()
print MyGame.load_deck()
I get the following error:
Traceback (most recent call last):
File "classy.py", line 15, in <module>
print MyGame.load_deck()
TypeError: load_deck() takes no arguments (1 given)
So, I changed the definition line to the following and it works fine:
def load_deck(self):
What is it about putting a function in a class that demands the use of 'self'. I understand that 'self' is just a convention. So, why is any argument needed at all? Do functions behave differently when they are called from within a class?
Also, and this is almost more important, why does my class work without the benefit of using init ? What would using init do for my class?
Basically, if someone has the time to explain this to me like i'm a 6 year-old, it would help. Thanks in advance for any help.
Defining a function in a class definition invokes some magic that turns it into a method descriptor. When you access foo.method it will automatically create a bound method and pass the object instance as the first parameter. You can avoid this by using the #staticmethod decorator.
__init__ is simply a method called when your class is created to do optional setup. __new__ is what actually creates the object.
Here are some examples
>>> class Foo(object):
def bar(*args, **kwargs):
print args, kwargs
>>> foo = Foo()
>>> foo.bar
<bound method Foo.bar of <__main__.Foo object at 0x01C9FEB0>>
>>> Foo.bar
<unbound method Foo.bar>
>>> foo.bar()
(<__main__.Foo object at 0x01C9FEB0>,) {}
>>> Foo.bar()
Traceback (most recent call last):
File "<pyshell#29>", line 1, in <module>
Foo.bar()
TypeError: unbound method bar() must be called with Foo instance as first argument (got nothing instead)
>>> Foo.bar(foo)
(<__main__.Foo object at 0x01C9FEB0>,) {}
So, why is any argument needed at all?
To access attributes on the current instance of the class.
Say you have a class with two methods, load_deck and shuffle. At the end of load_deck you want to shuffle the deck (by calling the shuffle method)
In Python you'd do something like this:
class Game(object):
def shuffle(self, deck):
return random.shuffle(deck)
def load_deck(self):
# ...
return self.shuffle(full_deck)
Compare this to the roughly-equivalent C++ code:
class Game {
shuffle(deck) {
return random.shuffle(deck);
}
load_deck() {
// ...
return shuffle(full_deck)
}
}
On shuffle(full_deck) line, first it looks for a local variable called shuffle - this doesn't exist, to next it checks one level higher, and finds an instance-method called shuffle (if this doesn't exist, it would check for a global variable with the right name)
This is okay, but it's not clear if shuffle refers to some local variable, or the instance method. To address this ambiguity, instance-methods or instance-attributes can also be accessed via this:
...
load_deck() {
// ...
return this->shuffle(full_deck)
}
this is almost identical to Python's self, except it's not passed as an argument.
Why is it useful to have self as an argument useful? The FAQ lists several good reasons - these can be summarised by a line in "The Zen of Python":
Explicit is better than implicit.
This is backed up by a post in The History of Python blog,
I decided to give up on the idea of implicit references to instance variables. Languages like C++ let you write this->foo to explicitly reference the instance variable foo (in case there’s a separate local variable foo). Thus, I decided to make such explicit references the only way to reference instance variables. In addition, I decided that rather than making the current object ("this") a special keyword, I would simply make "this" (or its equivalent) the first named argument to a method. Instance variables would just always be referenced as attributes of that argument.
With explicit references, there is no need to have a special syntax for method definitions nor do you have to worry about complicated semantics concerning variable lookup. Instead, one simply defines a function whose first argument corresponds to the instance, which by convention is named "self."
If you don't intent to use self you should probably declare the method to be a staticmethod.
class Game:
#staticmethod
def load_deck():
....
This undoes the automatic default packing that ordinarily happens to turn a function in a class scope into a method taking the instance as an argument.
Passing arguments you don't use is disconcerting to others trying to read your code.
Most classes have members. Yours doesn't, so all of its methods should be static. As your project develops, you will probably find data that should be accessible to all of the functions in it, and you will put those in self, and pass it around to all of them.
In this context, where the application itself is your primary object, __init__ is just the function that would initialize all of those shared values.
This is the first step toward an object-oriented style, wherein smaller pieces of data get used as objects themselves. But this is a normal stage in moving from straight scripting to OO programming.