Sorting with a custom method with additional parameter? - python

So I have a fitness function (returning only true or false for a given pair of arguments) which I would like to use as a key for sorting my list of possible arguments. While normally, I'd be able to do something like:
sorted(possibleArguments, key = fitnessFunction)
Here the probles is that my fitness function looks like this:
def fitnessFunction(arg1, arg2, f):
return f(*arg1) < f(*arg2)
Of course in the method I want to use the sorting, the function using which the fitness is to be calculated is known and doesn't change during the sorting but can I somehow tell Python that's the case? Can I do something like:
sorted(possibleArguments, key = fitnessFunction(one element to be compared, the other one, function I'm currently interested in))
If so, how?

key does not take a comparison function, it converts an element of the list into a comparable item.
BTW It's no longer possible to pass a comparison function to sort in python 3 (and the __cmp__ method is gone from objects too), so you better get used to it (it was cumbersome, you had to return 0 if equal, negative if lesser, positive if bigger, a bit like strcmp does, archaic. You could create complex comparison functions, but they could reveal unstable. I surely don't miss them).
Fortunately you have the f() function which is enough.
You just have to do this in your case:
sorted(possibleArguments, key = lambda x : f(*x))
the comparisons are done by the sort function. No need for fitnessFunction

Related

Handle multiple returns from a function in Python

I wrote a function (testFunction) with four return values in Python:
diff1, diff2, sameCount, vennPlot
where the first 3 values (in the output tuple) were used to plot "vennPlot" inside of the function.
A similar questions was asked : How can I plot output from a function which returns multiple values in Python?, but in my case, I also want to know two additional things:
I will likely to use this function later, and seems like I need to memorize the order of the returns so that I can extract the correct return for downstream work. Am I correct here? If so, is there better ways to refer to the tuple return than do output[1], or output[2]? (output=testFunction(...))
Generally speaking, is it appropriate to have multiple outputs from a function? (E.g. in my case, I could just return the first three values and draw the venn diagram outside of the function.)
Technically, every function returns exactly one value; that value, however, can be a tuple, a list, or some other type that contains multiple values.
That said, you can return something that uses something other than just the order of values to distinguish them. You can return a dict:
def testFunction(...):
...
return dict(diff1=..., diff2=..., sameCount=..., venn=...)
x = testFunction(...)
print(x['diff1'])
or you can define a named tuple:
ReturnType = collections.namedtuple('ReturnType', 'diff1 diff2 sameCount venn')
def testFunction(...):
...
return ReturnType(diff1=..., diff2=..., sameCount=..., venn=...)
x = testFunction(...)
print(x.diff1) # or x[0], if you still want to use the index
To answer your first question, you can unpack tuples returned from a function as such:
diff1, diff2, samecount, vennplot = testFunction(...)
Secondly, there is nothing wrong with multiple outputs from a function, though using multiple return statements within the same function is typically best avoided if possible for clarity's sake.
I will likely to use this function later, and seems like I need to memorize the order of the returns so that I can extract the correct return for downstream work. Am I correct here?
It seems you're correct (depends on your use case).
If so, is there better ways to refer to the tuple return than do output[1], or output[2]? (output=testFunction(...))
You could use a namedtuple: docs
or - if order is not important - you could just return a dictionary, so you can acess the values by name.
Generally speaking, is it appropriate to have multiple outputs from a function? (E.g. in my case, I could just return the first three values and draw the venn diagram outside of the function.)
Sure, as long as it's documented, then it's just what the function does and the programmer knows then how to handle the return values.
Python supports direct unpacking into variables. So downstream, when you call the function, you can retrieve the return values into separate variables as simply as:
diff1, diff2, sameCount, vennPlot= testFunction(...)
EDIT: You can even "swallow" the ones you don't need. For example:
diff1, *stuff_in_the_middle, vennPlot= testFunction(...)
in which case stuff_in_the_middle will contain a tuple of 2.
It is quite appropriate AFAIK, even standard library modules return tuples.
For example - Popen.communicate() from the subprocess module.

No cmp keyword for max function in python

So, when sorting a list in python, the sorted function can take a cmp keyword to override the __cmp__ function of the objects we are sorting.
I would expect max to have a similar keyword, but it doesn't. People know why?
And in any case, anyone know the most pythonic way to get around this? I don't want to override __cmp__ for the classes themselves, and other options I can think of like sorted(L,cmp=compare)[0] seem ugly. What would be a nice way to do this?
The actual example is given as L=[a1,a2,...,an] where each ak is itself a list of integers we want the maximum where ai<aj is in the lexicographical sense.
Don't use cmp. It has been removed from Python3 in favor of key.
Since Python compares strings lexicographically, you could use key = str to find the "maximum" integer:
In [2]: max([10,9], key = str)
Out[2]: 9

Delegate sort based on a condition

I have a list of objects, that are pre-sorted based on some complex criteria that cannot be easily duplicated with attrgetter, for example. I want to further sort a subset of them alphabetically, if both of them have the property: part_of_subset.
How do I do this without re-defining an alphabetic sort function?
def cmp(a, b):
if a.part_of_subset and b.part_of_subset:
# sort alphabetically -- must I duplicate alphabetic sort code?
return 0
While you can define a comparison function for sorting, it is generally recommended to use a key function. For your application, this key function should return the same value for everything that should be left untouched, and the sort key for the rest. Example
def my_key(a):
if a.part_of_subset:
return 0,
return 1, a.sort_key
collection.sort(key=my_key)
Note that the subset that is sorted will be grouped together to one block after the already sorted elements.
Edited: To get rid of the restriction that sort_key may never be None, and to make the code work in Python 3, I updated the key function. The old version might also have led to strange results in the case that the sort keys are of different types (which does not seem too useful, but anyway).
You can delegate the sorting to another function under certain conditions by just saying return cmp(a, b). I'm referring to the builtin Python function cmp, not your cmp.

Python: Return tuple or list?

I have a method that returns either a list or a tuple. What is the most pythonic way of denoting the return type in the argument?
def names(self, section, as_type=()):
return type(as_type)(([m[0] for m in self.items(section)]))
The pythonic way would be not to care about the type at all. Return a tuple, and if the calling function needs a list, then let it call list() on the result. Or vice versa, whichever makes more sense as a default type.
Even better, have it return a generator expression:
def names(self, section):
return (m[0] for m in self.items(section))
Now the caller gets an iterable that is evaluated lazily. He then can decide to iterate over it:
for name in obj.names(section):
...
or create a list or tuple from it from scratch - he never has to change an existing list into a tuple or vice versa, so this is efficient in all cases:
mylist = list(obj.names(section))
mytuple = tuple(obj.names(section))
Return whatever the caller will want most of the time. If they will want to be able to sort, remove or delete items, etc. then use a list. If they will want to use it as a dictionary key, use a tuple. If the primary use will be iteration, return an iterator. If it doesn't matter to the caller, which it won't more often than you might think, then return whatever makes the code the most straightforward. Usually this will be a list or an iterator.
Don't provide your own way to convert the output to a given type. Python has a perfectly simple way to do this already and any programmer using your function will be familiar with it. Look at the standard Python library. Do any of those routines do this? No, because there's no reason to.
Exception: sometimes there's a way to get an iterator or a list, even though it is easy to convert an iterator to a list. Usually this capability is provided as two separate functions or methods. Maybe you might want to follow suit sometimes, especially if you could implement the two alternatives using different algorithms that provide some clear benefit to callers who want one or another.
Keep it simple:
def names(self, section):
"""Returns a list of names."""
return [m[0] for m in self.items(section)]
If the caller wants a tuple instead of a list, he does this:
names = tuple(obj.names(section))

custom comparison for built-in containers

In my code there's numerous comparisons for equality of various containers (list, dict, etc.). The keys and values of the containers are of types float, bool, int, and str. The built-in == and != worked perfectly fine.
I just learned that the floats used in the values of the containers must be compared using a custom comparison function. I've written that function already (let's call it approxEqual(), and assume that it takes two floats and return True if they are judged to be equal and False otherwise).
I prefer that the changes to the existing code are kept to a minimum. (New classes/functions/etc can be as complicated as necessary.)
Example:
if dict1 != dict2:
raise DataMismatch
The dict1 != dict2 condition needs to be rewritten so that any floats used in values of dict1 and dict2 are compared using approxEqual function instead of __eq__.
The actual contents of dictionaries comes from various sources (parsing files, calculations, etc.).
Note: I asked a question earlier about how to override built-in float's eq. That would have been an easy solution, but I learned that Python doesn't allow overriding built-in types' __eq__ operator. Hence this new question.
The only route to altering the way built-in containers check equality is to make them contain as values, instead of the "originals", wrapped values (wrapped in a class that overrides __eq__ and __ne__). This is if you need to alter the way the containers themselves use equality checking, e.g. for the purpose of the in operator where the right-hand side operand is a list -- as well as in containers' method such as their own __eq__ (type(x).__eq__(y) is the typical way Python will perform internally what you code as x == y).
If what you're talking about is performing your own equality checks (without altering the checks performed internally by the containers themselves), then the only way is to change every cont1 == cont2 into (e.g.) same(cont1, cont2, value_same) where value_same is a function accepting two values and returning True or False like == would. That's probably too invasive WRT the criterion you specify.
If you can change the container themselves (i.e., the number of places where container objects are created is much smaller than the number of places where two containers are checked for equality), then using a container subclass which overrides __eq__ is best.
E.g.:
class EqMixin(object):
def __eq__(self, other):
return same(cont1, cont2, value_same)
(with same being as I mentioned in the A's 2nd paragraph) and
class EqM_list(EqMixin, list): pass
(and so forth for other container types you need), then wherever you have (e.g.)
x = list(someiter)
change it into
x = EqM_list(someiter)
and be sure to also catch other ways to create list objects, e.g. replace
x = [bah*2 for bah in buh]
with
x = EqM_list(bah*2 for bah in buh)
and
x = d.keys()
with
x = EqM_list(d.iterkeys())
and so forth.
Yeah, I know, what a bother -- but it's a core principle (and practice;-) of Python that builtin types (be they containers, or value types like e.g. float) themselves cannot be changed. That's a very different philosophy from e.g. Ruby's and Javascript's (and I personally prefer it but I do see how it can seem limiting at times!).
Edit: the OP specific request seems to be (in terms of this answer) "how do I implement same" for the various container types, not how to apply it without changing the == into a function call. If that's correct, then (e.g) without using iterators for simplicity:
def samelist(a, b, samevalue):
if len(a) != len(b): return False
return all(samevalue(x, y) for x, y in zip(a, b))
def samedict(a, b, samevalue):
if set(a) != set(b): return False
return all(samevalue(a[x], b[x]) for x in a))
Note that this applies to values, as requested, NOT to keys. "Fuzzying up" the equality comparison of a dict's keys (or a set's members) is a REAL problem. Look at it this way: first, how to you guarantee with absolute certainty that samevalue(a, b) and samevalue(b, c) totally implies and ensures samevalue(a, c)? This transitivity condition does not apply to most semi-sensible "fuzzy comparisons" I've ever seen, and yet it's completely indispensable for the hash-table based containers (such as dicts and sets). If you pass that hurdle, then the nightmare of making the hash values somehow "magically" consistent arises -- and what if two actually different keys in one dict "map to" equality in this sense with the same key in the other dict, which of the two corresponding values should be used then...? This way madness lies, if you ask me, so I hope that when you say values you do mean, exactly, values, and not keys!-)

Categories