I see it's not considered pythonic to use isinstance(), and people suggest e.g. to use hasattr().
I wonder what the best way is to document the proper use of a function that uses hasattr().
Example:
I get stock data from different websites (e.g. Yahoo Finance, Google Finance), and there are classes GoogleFinanceData and YahooFinanceData which both have a method get_stock(date).
There is also a function which compares the value of two stocks:
def compare_stocks(stock1,stock2,date):
if hasattr(stock1,'get_stock') and hasattr(stock2,'get_stock'):
if stock1.get_stock(date) < stock2.get_stock(date):
print "stock1 < stock2"
else:
print "stock1 > stock2"
The function is used like this:
compare_stocks(GoogleFinanceData('Microsoft'),YahooFinanceData('Apple'),'2012-03-14')
It is NOT used like this:
compare_stocks('Tree',123,'bla')
The question is: How do I let people know which classes they can use for stock1 and stock2? Am I supposed to write a docstring like "stock1 and stock2 ought to have a method get_stock" and people have to look through the source themselves? Or do I put all right classes into one module and reference that file in the docstring?
If all you ever do is call the function with *FinanceData instances, I'd not even bother with testing for the get_stock method; it's an error to pass in anything else and the function should just break if someone passes in strings.
In other words, just document your function as expecting a get_stock() method, and not test at all. Duck typing is for code that needs to accept distinctly different types of input, not for code that only works for one specific type.
I don't see whats unpythonic about the use of isinstance(), I would create a base class and refer to the base class' documentation.
def compare_stocks(stock1, stock2, date):
""" Compares stock data of two FinanceData objects at a certain time. """
if isinstance(stock1, FinanceData) and isinstance(stock2, FinanceData):
return 'comparison'
class FinanceData(object):
def get_stock(self, date):
""" Returns stock data in format XX, expects parameter date in format YY """
raise NotImplementedError
class GoogleFinanceData(FinanceData):
def get_stock(self, date):
""" Implements FinanceData.get_stock() """
return 'important data'
As you see I don't use duck-typing here, but since you've asked this question in regards to documentation, I think this is the cleaner approach for readability. Whenever another developer sees the compare_stocks function or a get_stock method he knows exactly where he has to look for further information regarding functionality, data structure or implementation details.
Do what you suggest, put in the docstring that passed arguments should have a get_stock function, that is what your function requires, listing classes is bad since the code may well be used with derived or other classes when it suits someone.
Related
I have a function like the following:
def do_something(thing):
pass
def foo(everything, which_things):
"""Do stuff to things.
Args:
everything: Dict of things indexed by thing identifiers.
which_things: Iterable of thing identifiers. Something is only
done to these things.
"""
for thing_identifier in which_things:
do_something(everything[thing_identifier])
But I want to extend it so that a caller can do_something with everything passed in everything without having to provide a list of identifiers. (As a motivation, if everything was an opaque container whose keys weren't accessible to library users but only library internals. In this case, foo can access the keys but the caller can't. Another motivation is error prevention: having a constant with obvious semantics avoids a caller mistakenly passing the wrong set of identifiers in.) So one thought is to have a constant USE_EVERYTHING that can be passed in, like so:
def foo(everything, which_things):
"""Do stuff to things.
Args:
everything: Dict of things indexed by thing identifiers.
which_things: Iterable of thing identifiers. Something is only
done to these things. Alternatively pass USE_EVERYTHING to
do something to everything.
"""
if which_things == USE_EVERYTHING:
which_things = everything.keys()
for thing_identifier in which_things:
do_something(everything[thing_identifier])
What are some advantages and limitations of this approach? How can I define a USE_EVERYTHING constant so that it is unique and specific to this function?
My first thought is to give it its own type, like so:
class UseEverythingType:
pass
USE_EVERYTHING = UseEverythingType()
This would be in a package only exporting USE_EVERYTHING, discouraging creating any other UseEverythingType objects. But I worry that I'm not considering all aspects of Python's object model -- could two instances of USE_EVERYTHING somehow compare unequal?
I am just getting started with OOP, so I apologise in advance if my question is as obvious as 2+2. :)
Basically I created a class that adds attributes and methods to a panda data frame. That's because I am sometimes looking to do complex but repetitive tasks like merging with a bunch of other tables, dropping duplicates, etc. So it's pleasant to be able to do that in just one go with a predefined method. For example, I can create an object like this:
mysupertable = MySupperTable(original_dataframe)
And then do:
mysupertable.complex_operation()
Where original_dataframe is the original panda data frame (or object) that is defined as an attribute to the class. Now, this is all good and well, but if I want to print (or just access) that original data frame I have to do something like
print(mysupertable.original_dataframe)
Is there a way to have that happening "by default" so if I just do:
print(mysupertable)
it will print the original data frame, rather than the memory location?
I know there are the str and rep methods that can be implemented in a class which return default string representations for an object. I was just wondering if there was a similar magic method (or else) to just default showing a particular attribute. I tried looking this up but I think I am somehow not finding the right words to describe what I want to do, because I can't seem to be able to find an answer.
Thank you!
Cheers
In your MySupperTable class, do:
class MySupperTable:
# ... other stuff in the class
def __str__(self) -> str:
return str(self.original_dataframe)
That will make it so that when a MySupperTable is converted to a str, it will convert its original_dataframe to a str and return that.
When you pass an object to print() it will print the object's string representation, which under the hood is retrieved by calling the object.__str__(). You can give a custom definition to this method the way that you would define any other method.
I have a function which needs to behave differently depending on the type of the parameter taken in. My first impulse was to include some calls to isinstance, but I keep seeing answers on stackoverflow saying that this is bad form and unpythonic but without much reason why its bad form and unpythonic. For the latter, I suppose it has something to do with duck-typing, but whats the big deal to check if your arguments are of a specific type? Isn't it better to play it safe?
Consult this great post
My opinion on the matter is this:
If you are restricting your code, don't do it.
If you are using it to direct your code, then limit it to very specific cases.
Good Example: (this is okay)
def write_to_file(var, content, close=True):
if isinstance(var, str):
var = open(var, 'w')
var.write(content)
if close:
var.close()
Bad Example: (this is bad)
def write_to_file(var, content, close=True):
if not isinstance(var, file):
raise Exception('expected a file')
var.write(content)
if close:
var.close()
Using isinstance limits the objects which you can pass to your function. For example:
def add(a,b):
if isinstance(a,int) and isinstance(b,int):
return a + b
else:
raise ValueError
Now you might try to call it:
add(1.0,2)
expecting to get 3 but instead you get an error because 1.0 isn't an integer. Clearly, using isinstance here prevented our function from being as useful as it could be. Ultimately, if our objects taste like a duck when we roast them, we don't care what type they were to begin with just as long as they work.
However, there are situations where the opposite is true:
def read(f):
if isinstance(f,basestring):
with open(f) as fin
return fin.read()
else:
return f.read()
The point is, you need to decide the API that you want your function to have. Cases where your function should behave differently based on the type exist, but are rare (checking for strings to open files is one of the more common uses that I know of).
Because doing so explicitly prevents duck-typing.
Here's an example. The csv module allows me to write data to a file in CSV format. For that reason, the function accepts a file as a parameter. But what if I didn't want to write to an actual file, but to something like a StringIO object? That's a perfectly good use of it, since StringIO implements the necessary read and write methods. But if csv was explicitly checking for an actual object of type file, that would be forbidden.
Generally, Python takes the view that we should allow things as much as possible - it's the same reasoning behind the lack of real private variables in classes.
sometimes usage of isinstance just reimplements the polymorphic dispatch. Look at str(...), it calls object.__str__(..) which is implemented with each type individually. By implementing __str__ you can reuse code that depends on str by extending that one object instead of having to manipulate the built in method str(...).
Basically this is the culminating point of OOP. You want polymorphic behaviour, you do not want do spell out types.
There are valid reasons to use it though.
Is it common in Python to keep testing for type values when working in a OOP fashion?
class Foo():
def __init__(self,barObject):
self.bar = setBarObject(barObject)
def setBarObject(barObject);
if (isInstance(barObject,Bar):
self.bar = barObject
else:
# throw exception, log, etc.
class Bar():
pass
Or I can use a more loose approach, like:
class Foo():
def __init__(self,barObject):
self.bar = barObject
class Bar():
pass
Nope, in fact it's overwhelmingly common not to test for type values, as in your second approach. The idea is that a client of your code (i.e. some other programmer who uses your class) should be able to pass any kind of object that has all the appropriate methods or properties. If it doesn't happen to be an instance of some particular class, that's fine; your code never needs to know the difference. This is called duck typing, because of the adage "If it quacks like a duck and flies like a duck, it might as well be a duck" (well, that's not the actual adage but I got the gist of it I think)
One place you'll see this a lot is in the standard library, with any functions that handle file input or output. Instead of requiring an actual file object, they'll take anything that implements the read() or readline() method (depending on the function), or write() for writing. In fact you'll often see this in the documentation, e.g. with tokenize.generate_tokens, which I just happened to be looking at earlier today:
The generate_tokens() generator requires one argument, readline, which must be a callable object which provides the same interface as the readline() method of built-in file objects (see section File Objects). Each call to the function should return one line of input as a string.
This allows you to use a StringIO object (like an in-memory file), or something wackier like a dialog box, in place of a real file.
In your own code, just access whatever properties of an object you need, and if it's the wrong kind of object, one of the properties you need won't be there and it'll throw an exception.
I think that it's good practice to check input for type. It's reasonable to assume that if you asked a user to give one data type they might give you another, so you should code to defend against this.
However, it seems like a waste of time (both writing and running the program) to check the type of input that the program generates independent of input. As in a strongly-typed language, checking type isn't important to defend against programmer error.
So basically, check input but nothing else so that code can run smoothly and users don't have to wonder why they got an exception rather than a result.
If your alternative to the type check is an else containing exception handling, then you should really consider duck typing one tier up, supporting as many objects with the methods you require from the input, and working inside a try.
You can then except (and except as specifically as possible) that.
The final result wouldn't be unlike what you have there, but a lot more versatile and Pythonic.
Everything else that needed to be said about the actual question, whether it's common/good practice or not, I think has been answered excellently by David's.
I agree with some of the above answers, in that I generally never check for type from one function to another.
However, as someone else mentioned, anything accepted from a user should be checked, and for things like this I use regular expressions. The nice thing about using regular expressions to validate user input is that not only can you verify that the data is in the correct format, but you can parse the input into a more convenient form, like a string into a dictionary.
I have two functions like the following:
def fitnesscompare(x, y):
if x.fitness>y.fitness:
return 1
elif x.fitness==y.fitness:
return 0
else: #x.fitness<y.fitness
return -1
that are used with 'sort' to sort on different attributes of class instances.
These are used from within other functions and methods in the program.
Can I make them visible everywhere rather than having to pass them to each object in which they are used?
Thanks
The best approach (to get the visibility you ask about) is to put this def statement in a module (say fit.py), import fit from any other module that needs access to items defined in this one, and use fit.fitnesscompare in any of those modules as needed.
What you ask, and what you really need, may actually be different...:
as I explained in another post earlier today, custom comparison functions are not the best way to customize sorting in Python (which is why in Python 3 they're not even allowed any more): rather, a custom key-extraction function will serve you much better (future-proof, more general, faster). I.e., instead of calling, say
somelist.sort(cmp=fit.fitnesscompare)
call
somelist.sort(key=fit.fitnessextract)
where
def fitnessextract(x):
return x.fitness
or, for really blazing speed,
import operator
somelist.sort(key=operator.attrgetter('fitness'))
Defining a function with def makes that function available within whatever scope you've defined it in. At module level, using def will make that function available to any other function inside that module.
Can you perhaps post an example of what is not working for you? The code you've posted appears to be unrelated to your actual problem.