Delegate sort based on a condition - python

I have a list of objects, that are pre-sorted based on some complex criteria that cannot be easily duplicated with attrgetter, for example. I want to further sort a subset of them alphabetically, if both of them have the property: part_of_subset.
How do I do this without re-defining an alphabetic sort function?
def cmp(a, b):
if a.part_of_subset and b.part_of_subset:
# sort alphabetically -- must I duplicate alphabetic sort code?
return 0

While you can define a comparison function for sorting, it is generally recommended to use a key function. For your application, this key function should return the same value for everything that should be left untouched, and the sort key for the rest. Example
def my_key(a):
if a.part_of_subset:
return 0,
return 1, a.sort_key
collection.sort(key=my_key)
Note that the subset that is sorted will be grouped together to one block after the already sorted elements.
Edited: To get rid of the restriction that sort_key may never be None, and to make the code work in Python 3, I updated the key function. The old version might also have led to strange results in the case that the sort keys are of different types (which does not seem too useful, but anyway).

You can delegate the sorting to another function under certain conditions by just saying return cmp(a, b). I'm referring to the builtin Python function cmp, not your cmp.

Related

How elegantly iterate over list or dictionary

I have a variable (object from database). In some cases this variable can be type of list and in some cases dictionary.
Standard for cycle if variable is list:
for value in object_values:
self.do_something(value)
Standard for cycle if variable is dictionary:
for key, value in object_values.items():
self.do_something(value)
I can use instanceof() two check the type, but then I still need two functions or if with two for cycles. I have now if condition which calls one of the two functions, one for iterating as list (e.g. iterate_list()) and the second for iterating as dictionary (e.g. iterate_dict()) .
Is there any better option how elegantly and more pythonic way resolve problem that I don't know if the variable will be list or dictionary?
in your case, since the data is either the items or the values of the dictionary, you could use a ternary to get values() or just the iterable depending on the type:
def iterate(self,object_values):
for value in object_values.values() if isinstance(object_values,dict) else object_values:
self.do_something(value)
If you pass a tuple, generator or other iterable, it falls back on "standard" iteration. If you pass a dictionary (or OrderedDict or other), it iterates on the values.
Performance-wise, the ternary expression is evaluated only once at the start of the iteration, so it's fine.
The isinstance bit could even be replaced by if hasattr(object_values,"values") so even non-dict objects with a values member would match.
(Note that you should be aware of the "least atonishment" principle. Some people may expect an iteration on the keys of the dictionary when calling the method)

Is there a general method for testing if the attributes of two objects are equivalent in Python?

I just wrote a testing script for a project I have in python, and I learned that checking if the values of two objects are equivalent is not as simple as foo == bar. For example, I have an object dashboard that has a pandas dataframe as an attribute, so I defined its __eq__ method as:
def __eq__(self, other):
vals = [self.__dict__[k] == other.__dict__[k] for k in self.__dict__.keys()]
vals = [v if isinstance(v, bool) else all(v) for v in vals]
return all(vals)
It compares the dictionaries of each object, and if any of these comparisons yields something other than a boolean (e.g., a dataframe) it applies all() to reduce it to a single boolean. I then apply all() to this entire list of attribute comparisons to test whether or not every attribute of self and other are equivalent.
I used this __eq__ definition in several classes, and also used something similar for a comparison method in my parent Test class. I got my test to work, but I'm curious if there's a more elegant/efficient way to handle this. (Disclaimer: Testing is new to me, as well as OOP in general.)
No, there is not a general method for testing equality of objects.
This is partly because custom/arbitrary objects have widely varying interpretations of what equality means. For example, in your code you've defined equality to mean "the values are booleans and/or iterables of booleans and all values agree", but this rule ONLY applies to keys that both dictionaries have in common. Someone else might need to check that a dictionary has all the same keys but doesn't care about the values, or that all keys and all values are identical and of specific datatypes, etc. So, this is left to the user to implement.
to compare dataframes, since you mention this, there is:
DataFrame.equals(other)
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.equals.html

Handle multiple returns from a function in Python

I wrote a function (testFunction) with four return values in Python:
diff1, diff2, sameCount, vennPlot
where the first 3 values (in the output tuple) were used to plot "vennPlot" inside of the function.
A similar questions was asked : How can I plot output from a function which returns multiple values in Python?, but in my case, I also want to know two additional things:
I will likely to use this function later, and seems like I need to memorize the order of the returns so that I can extract the correct return for downstream work. Am I correct here? If so, is there better ways to refer to the tuple return than do output[1], or output[2]? (output=testFunction(...))
Generally speaking, is it appropriate to have multiple outputs from a function? (E.g. in my case, I could just return the first three values and draw the venn diagram outside of the function.)
Technically, every function returns exactly one value; that value, however, can be a tuple, a list, or some other type that contains multiple values.
That said, you can return something that uses something other than just the order of values to distinguish them. You can return a dict:
def testFunction(...):
...
return dict(diff1=..., diff2=..., sameCount=..., venn=...)
x = testFunction(...)
print(x['diff1'])
or you can define a named tuple:
ReturnType = collections.namedtuple('ReturnType', 'diff1 diff2 sameCount venn')
def testFunction(...):
...
return ReturnType(diff1=..., diff2=..., sameCount=..., venn=...)
x = testFunction(...)
print(x.diff1) # or x[0], if you still want to use the index
To answer your first question, you can unpack tuples returned from a function as such:
diff1, diff2, samecount, vennplot = testFunction(...)
Secondly, there is nothing wrong with multiple outputs from a function, though using multiple return statements within the same function is typically best avoided if possible for clarity's sake.
I will likely to use this function later, and seems like I need to memorize the order of the returns so that I can extract the correct return for downstream work. Am I correct here?
It seems you're correct (depends on your use case).
If so, is there better ways to refer to the tuple return than do output[1], or output[2]? (output=testFunction(...))
You could use a namedtuple: docs
or - if order is not important - you could just return a dictionary, so you can acess the values by name.
Generally speaking, is it appropriate to have multiple outputs from a function? (E.g. in my case, I could just return the first three values and draw the venn diagram outside of the function.)
Sure, as long as it's documented, then it's just what the function does and the programmer knows then how to handle the return values.
Python supports direct unpacking into variables. So downstream, when you call the function, you can retrieve the return values into separate variables as simply as:
diff1, diff2, sameCount, vennPlot= testFunction(...)
EDIT: You can even "swallow" the ones you don't need. For example:
diff1, *stuff_in_the_middle, vennPlot= testFunction(...)
in which case stuff_in_the_middle will contain a tuple of 2.
It is quite appropriate AFAIK, even standard library modules return tuples.
For example - Popen.communicate() from the subprocess module.

Sorting with a custom method with additional parameter?

So I have a fitness function (returning only true or false for a given pair of arguments) which I would like to use as a key for sorting my list of possible arguments. While normally, I'd be able to do something like:
sorted(possibleArguments, key = fitnessFunction)
Here the probles is that my fitness function looks like this:
def fitnessFunction(arg1, arg2, f):
return f(*arg1) < f(*arg2)
Of course in the method I want to use the sorting, the function using which the fitness is to be calculated is known and doesn't change during the sorting but can I somehow tell Python that's the case? Can I do something like:
sorted(possibleArguments, key = fitnessFunction(one element to be compared, the other one, function I'm currently interested in))
If so, how?
key does not take a comparison function, it converts an element of the list into a comparable item.
BTW It's no longer possible to pass a comparison function to sort in python 3 (and the __cmp__ method is gone from objects too), so you better get used to it (it was cumbersome, you had to return 0 if equal, negative if lesser, positive if bigger, a bit like strcmp does, archaic. You could create complex comparison functions, but they could reveal unstable. I surely don't miss them).
Fortunately you have the f() function which is enough.
You just have to do this in your case:
sorted(possibleArguments, key = lambda x : f(*x))
the comparisons are done by the sort function. No need for fitnessFunction

Return multiple vars: list/tuple

I have a function which must return many values (statistics) for other function to interact with them. So I thought about returning them inside a list (array). But then I wondered: should I do so using a list (["foo", "bar"]) or using a tuple (("foo", "bar"))? what are the problems or differences there are when using one instead of the other??
Use a tuple. In your application, it doesn't seem like you will want or need to change the list of results after.
Though, with many return values you might want to consider returning a dictionary with named values. That way is more flexible and extensible, as adding a new statistic doesn't requiring modifying every single time you use the function.
If you do not need to edit the return value, use a tuple. The main difference is that lists can be edited.
See this: What's the difference between lists and tuples?

Categories