Avoid flaky unit tests due to non-deterministic dict order

Avoid flaky unit tests due to non-deterministic dict order - python

Consider a function that looks similar to this one
def get_sorted_values():
return sorted(get_unsorted_values(), key=some_sort_key)
Basically it returns an unsorted list from somewhere and sorts it before.
How would I reliably unit-test such a function? Of course I can assert that the resulting list has the expected order, but that would not reliably enforce the sorting because get_unsorted_values() might deliver it sorted in the expected way by coincidence.
So there is a certain chance that the unit test will pass, even when I drop the sorted call.
One possibility is to mock the get_unsorted_values() function so that the order is always wrong but I would like to avoid mocking.
What other options are there?
P.S. The root cause for non deterministic behavior is that the values come from a dict which has the non-deterministic order.

Related

Repeatably hashing an arbitrary Python tuple

I'm writing a specialised unit testing tool that needs to save the results of tests to be compared against in the future. Thus I need to be able to consistently map parameters that were passed to each test to the test result from running the test function with those parameters for each version. I was hoping there was a way to just hash the tuple and use that hash to name the files where I store the test results.
My first impulse was just to call hash() on the tuple of parameters, but of course that won't work since hash is randomized between interpreter instances now.
I'm having a hard time coming up with a way that works for whatever arbitrary elements that might be in the tuple (I guess restricting it to a mix of ints, floats, strings, and lists\tuples of those three would be okay). Any ideas?
I've thought of using the repr of the tuple or pickling it, but repr isn't guaranteed to produce byte-for-byte same output for same input, and I don't think pickling is either (is it?)
I've seen this already, but the answers are all based on that same assumption that doesn't hold anymore and don't really translate to this problem anyway, a lot of the discussion was about making the hash not depend on the order items come up and I do want the hash to depend on order.

Not sure if I understand your question fully, but will just give it a try.
Before you do the hash, just serialize the result to a JSON string, and do the hash computing on your JSON string.
params = (1, 3, 2)
hashlib.sha224(json.dumps(params)).hexdigest()
# '5f0f7a621e6f420002d54ee28b0c169b8112ef72d8a6b60e6a25171c'
If your params is a dictionary, use sort_keys=True to ensure your keys are sorted.
params = {'b': 123, 'c': 345}
hashlib.sha224(json.dumps(params, sort_keys=True)).hexdigest()
# '2e75966ce3f1185cbfb4eccc49d5552c08cfb7502a8765fe1dce9303'

One approach for simple tests would be to disable the hash randomization entirely by setting PYTHONHASHSEED=0 in the environment that launches your script, e.g., in bash, doing:
export PYTHONHASHSEED=0

How to force dicts to be unordered (for testing)?

I've just spent half a day tracing down a bug to a dict that I forgot to sort when iterating over it. Even though that part of code is tested, the tests did not pick it up because that dict had a repeatable ordering during the test. Only when I shuffled the dict did the tests fail! (I used an intermediate random.shuffle'd list and built an OrderedDict)
This kinda scares me, because there might be similar bugs all over the place!
Is there any way to globally force all dicts to be unordered during testing?
Update: I think I at least figured out what caused the bug. As is described here, dicts with int keys are usually sorted, but might not always be. My hypothesis: In my tests, the ints are always small (order 10), and thus always in order. In the real run, however, the ints are much bigger (order 10^13), and thus not always ordered. I was able to reproduce this behaviour in an interactive session: list(foo.keys()) == sorted(foo.keys()) was always True for small keys, but not for every dict with very large keys.

As of 3.6, dicts maintain key insertion order. If you would like them to behave a specific way, you could create a class that inherits from a dict and give it the behavior you want (eg ensuring that it shuffles the key list before returning it).
Regardless of the version of python you use, be sure to check out the implementation details if you want to try to rely on a specific behavior of the dict implementation. Better yet, for portability, just code a solution that ensures they behave as you expect, either as I mentioned above, or by using shuffle over a list of the keys, or something else.

Python Style: nested vs extra function

I'm quite new to python (2.7) and have a question about what's the most Pythonic way to do something; my code (part of a class) Looks like this (a somewhat naive Version):
def calc_pump_height(self):
for i in range(len(self.primary_)):
for j in range(len(self.primary_)):
if self.connections_[i][j].sub_kind_ in [1,4]:
self.calc_spec_pump_height(i,j)
def calc_spec_pump_height(self,i,j):
pass
(obviously pass will be replaced by something else, manipulating attributes of the object of this class, without generating a return value)
I'd like to ask how I should do this: I could avoid the second function and write the extra code directly into the first function, getting rid of one function (Simple is better than complex), but creating a heavily nested function at the same time (Flat is better than nested).
I could also create some sort of list comprehension to avoid using a double Loop, eg:
def calc_pump_height(self):
ra = range(len(self.primary_))
[self.calc_spec_pump_height(i,j) for i,j in zip(ra, ra)]
(I'd have to move the if condition into the 2nd function; this would also create a null-list but I don't care about this, since calc_spec_pump_height is supposed to manipulate the object, not return something useful)
In essence: I'm iterating over a 2D list, testing each object for a certain characteristic and then do something with that object.
Which of the above methods is 'the best'? Or is there another way that I'm missing?

The key thing about functions/methods is that they should do one thing.
calc_pump_height implements two things: It finds elements in a 2D list that match some criteria, and then it calculates a value for each of those elements. It's ok for its purpose to be combining the other two operations, if that makes sense for the object's public API, but its not ok for it to implement either or both.
Finding the elements that match the criteria is a discrete step; that should be a function.
Calculating your value is clearly a discrete step; that should be a function.
I would implement the element matcher as a (private) generator, that takes the test condition as an argument, and yields all matching elements. It's just an iterator over your data structure, masked by the logical test. You can wrap that in a named public method called get_1_4_subkinds() or something that makes more sense in your domain. That generalises the code and gives you the flexibility to implement other conditions in the future. Also, your i and j are tightly coupled, so it makes sense to pass them around as a single concept. Then your code becomes:
def calc_pump_height(self):
for subkind_indices in self.get_1_4_subkinds():
self.calc_pump_spec_height(subkind_indices)

You have misunderstood “simplicity”:
write the extra code directly into the first function, getting rid of one function (Simple is better than complex)
That's not simple. Breaking complex sequences into discrete, focussed functions increases simplicity.
In that light, I would say that yes, you should definitely prefer calc_spec_pump_height as a separate function.

You can eliminate one level of nesting in your first function by using itertools.product to generate your i and j values at the same time (itertools.product(range(len(self.primary_)), repeat=2). The zip you use in the your second version won't work correctly, it will only yield identical pairs, 0,0, 1,1, 2,2, etc.
As for the overall design, you should not use a list comprehension if you don't care about the return value from the function you're calling. Use an explicit loop when it's the looping you want (rather than a list of computed values).
If there's a non-trivial amount of code that will go in calc_spec_pump_height, it makes perfect sense to make it as a separate method. If it's a one or two liner, then it might be OK to inline within calc_pump_height, but that method's loops and condition testing may be complicated enough already to justify factoring out the inner part of the algorithm.
You should usually think about splitting a big function up when it is too long to fit onto a single screen in your editor. That is about the limit of how many details (variable names, etc.) we can keep in our mind simultaneously. On the other hand, you shouldn't waste time (either your own programming time or function call overhead at run time) by factoring out every little piece of every problem. Factor part of a function out if you're using it from more than one place, or if you can't keep the details of the whole function in your head at once otherwise.
So, other than the (marginal) improvement of itertools.product and given the limited information you've provided about what calc_spec_pump_height will do, I think your code is already about as good as it can get!

A long definition of an object inside a Python unit test

I'm unit testing my application. What most of the tests do is calling a function with specific arguments and asserting the equality of the return value with an expected value.
In some tests the expected return value is a relatively big object. One of them, for example, is a dictionary which maps 5 strings to lists of tuples. It takes 40-50 repetitive lines of code to define that object, but that object is an expected value of one of the functions I'm testing. I don't want to have a 40-50 lines of code defining an expected return value inside a test function because most of my test functions consist of 3-6 lines of code. I'm looking for a best practice for such situations. What is the right way of putting lengthy definitions inside a test?
Here are the ideas I was thinking of to address the issue, ranked from the best to the worst as I see it:
Testing samples of the object: Making a few equality assertions based on a subset of the keys. This will sacrifice the thoroughness of the test for the sake of code elegance.
Defining the object in a separate module: Writing the lengthy 40-50 lines of code in a separate .py file, importing the module in the test and then make the equality assertion. This will make the test short and concise but I don't like having a separate file as a supplement to a test; the object definition is part of the test after all.
Defining the object inside the test function: This is the trivial solution which I wish to avoid. My tests are pretty simple and straightforward and the lengthy definition of that object doesn't fit.
Maybe I'm too obsessed with clean code, but I like none of the above solutions. Is there another common practice I haven't thought of?

I'd suggest using a separation of testing code and testing data. For this reason I usually create an abstract base class which contains the methods I'd like to test and create several specific test case classes to tie the methods to the data. (I use the Django framework, so all abstract test classes I put into testbase.py):
testbase.py:
class TestSomeFeature(unittest.TestCase):
test_data_A = ...
def test_A(self):
... #perform test
and now the implementations in test.py
class TestSomeFeatureWithDataXY(testbase.TestSomeFeature):
test_data_A = XY
The test data can also be externalized, e.g a JSON file:
class TestSomeFeatureWithDataXYZ(testbase.TestSomeFeature):
#property
def test_data_A(self):
return json.load("data/XYZ.json")
I hope I made my points clear enough. In your case I'd strongly opt for using data files. Django supports this out-of-the-box by using test fixtures to be loaded into the database prior executing any tests.

It really depends on what you want to test.
If you want to test that a dictionary contains certain keys with certain values, then I would suggest separate assertions to check each key. This way your test will still be valid if the dictionary is extended, and test failures should clearly identify the problem (an error message telling you that one 50-line long dictionary is not equal to a second 50 line long dictionary is not exactly clear).
If you really do want to verify that the dictionary contains only the given keys, then a single assertion might be appropriate. Define the object you are comparing against where it is most clear. If defining it in a separate file (as Constantinius's answer suggests) makes things more readable then consider doing that.
In both cases, the guiding principle is to only test the behaviour you care about. If you test behaviour you don't care about, you may find your test suite more obstructive than helpful when refactoring.

Python documentation: iterable many times?

In documenting a Python function, I find it more Pythonic to say:
def Foo(i):
"""i: An interable containing…"""
…rather than…
def Foo(i):
"""i: A list of …"""
When i really doesn't need to be a list. (Foo will happily operate on a set, tuple, etc.) The problem is generators. Generators typically only allow 1 iteration. Most functions are OK with generators or iterables that only allow a single pass, but some are not.
For those functions that cannot accept generators/things that can only be iterated once, is there a clear, consistent Python term to say "thing that can only be iterated more than once"?
The Python glossary for iterable and iterator seem to have a "once, but maybe more if you're lucky" definition.

I don't know of a standard term for this, at least not offhand, but I think "reusable iterable" would get the point across if you need a short phrase.
In practice, it's generally possible to structure your function so that you don't need to iterate over i more than once. Alternatively, you can create a list out of the iterable and then iterate over the list as many times as you want; or you can use itertools.tee to get multiple independent "copies" of the iterator. That lets you accept a generator even if you do need to use it more than once.

This is probably more a matter of style and preference than anything else, yet... I have a different take on my documentation: I always write the docstring according to the expected input in the context of the program.
Example: if I wrote a function that expect to go over keys of a dictionary and ignore its values I write:
arg : a dictionary of...
even if for e in arg: would work with other iterables. I chose to do so, because within the context of my code, I don't care if the function would still work... I care more that whoever reads the documentation understand how that function is meant to be used.
On the other hand, if I am writing a utility function that can cope with a wide spectrum of iterables by design, I go one of these two ways:
document what kind of exception will be rose under certain conditions [ex: "Raise TypeError if the iterable can't be iterated more than once"]
perform some pre-emptive argument handling that will make the function compatible with 'once-only' iterables.
In other words, I try to either make my function solid enough to handle edge cases, or to be very outspoken on its limitations.
Again: there's nothing wrong with the approach you want to take, but I consider this one of the cases in which "explicit is better than implicit": a documentation in which is mentioned "reusable iterable" is definitively accurate, but the adjective could easily be overlooked.
HTH!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Avoid flaky unit tests due to non-deterministic dict order - python

Related

Repeatably hashing an arbitrary Python tuple

How to force dicts to be unordered (for testing)?

Python Style: nested vs extra function

A long definition of an object inside a Python unit test

Python documentation: iterable many times?

Categories

Resources