pytest - best design for fuzzing with limited parameters - python

I was wondering about fuzzing in pytest and what is the best way to do that.
In the past I used hypothesis library to fuzz values, but it work best only when running each test many times.
Because my system is slow I want to be able to split the tests into 2 categories: "daily_run" and "regression":
daily_run will run each test 1 time
regression will run each test X times
Each run I want to be able to use random values. The problem is that the test parameters have "valid" range I want to stay in when fuzzing. For example:
#pytest.parametrize("month_number", [4])
def test_foo(month_number):
# Test with that value
So in that example I get fixed value for the month number - 4. I'll give another example before explaining what I have tried:
#pytest.parametrize("month_number", [40])
def test_invalid_foo(month_number):
# Test with that invalid value
So in the second example I test with an invalid value.
The range for the month number is obviously 1-12. I guess I can write some logic of getting a random value between 1-12 on valid month, and getting random value between -infinity - 0 & 13 - infinity. But that is a very verbose way to do that for month_number alone. On reality I got dozen of parameters and I don't want to have to write a functionality for each one.
Of course I can write some generic logic to do so, and use that generic logic on every parameter, but I still wonder if there is a better way.
Also don't forget the 2 categories - daily_run & regression.
What is the best practice to write fuzzed test with parameter limits?

for python fuzzing, take a look at the new atheris fuzzer. Like libfuzzer, it gives you a Dataprovider to feed inputs your target, which by default feeds bytes, but can also be configured to one of the following:
def ConsumeIntList(count: int, bytes: int)
def ConsumeIntListInRange(count: int, min: int, max: int)
Your categories of targets is confusing at best. Usually you fuzz a harness/target for a certain amount of time and potentially reuse the folder of testcases produced.
Or you have a set of harnesses/targets that you fuzz often and some not so often (but I doubt that would be a good idea).

Related

Best practices for unit testing in Python - multiple functions apply to same object

I have a bunch of functions which apply to a similar object, for example a Numpy array which represents an n-dimensional box:
# 3-D box parameterized as:
# box[0] = 3-D min coordinate
# box[1] = 3-D max coordinate
box = np.array([
[1, 3, 0],
[4, 5, 7]
])
Now I have a whole bunch of functions that I want to run on lists of boxes, eg. volumes, intersection, smallest_containing_box, etc. In my mind here is the way I was hoping to set this up:
# list of test functions:
test_funcs = [volume, intersection, smallest_containing_box, ...]
# manually create a bunch of inputs and outputs
test_set_1 = (
input = [boxA, boxB, ...], # where each of these are np.Array objects
output = [
[volA, volB, ...], # floats I calculated manually
intersection, # np.Array representing the correct intersection
smallest_containing_box, # etc.
]
)
# Create a bunch of these, eg. test_set_2, test_set_3, etc. and bundle them in a list:
test_sets = [test_set_1, ...]
# Now run the set of tests over each of these:
test_results = [[assertEqual(test(t.input), t.output) for test in test_funcs] for t in test_sets]
The reason I want to structure it this way is so that I can create multiple sets of (input, answer) pairs and just run all the tests over each. Unless I'm missing something, the structure of unittest doesn't seem to work well with this approach. Instead, it seems like it wants me to create an individual TestCase object for each pair of function and input, i.e.
class TestCase1(unittest.TestCase):
def setUp(self):
self.input = [...]
self.volume = [volA, volB, ...]
self.intersection = ...
# etc.
def test_volume(self):
self.assertEqual(volume(self.input), self.volume)
def test_intersection(self):
self.assertEqual(intersection(self.input), self.output)
# etc.
# Repeat this for every test case!?
This seems like a crazy amount of boilerplate. Am I missing something?
Let met try to describe how I understand your approach: You have implemented a number of different functions that have a similarity, namely that they operate on the same types of input data. In your tests you try to make use of that similarity: You create some input data and pass that input data to all of your functions.
This test-data centric approach is unusual. The typical unit-testing approach is code-centric. The reason is, that one primary goal of unit-testing is to find bugs in the code. Different functions have (obviously) different code, and therefore the types of bugs may be different. Thus, test data is typically carefully designed to identify certain kinds of bugs in the respective code. Test design methods are approaches that methodically design test cases such that ideally all likely bugs would be detected.
I am sceptical that with your test-data centric approach you will be equally successful in finding the bugs in your different functions: For the volume function there may be overflow scenarios (and also underflow scenarios) that don't apply for intersection or smallest_containing_box. In contrast, there will have to be empty intersections, one-point-intersections etc.. Thus it seems that each of the functions probably needs specifically designed test scenarios.
Regarding the boilerplate code that seems to be the consequence of code-centric unit-testing: There are several ways to limit that. You would, agreed, have different test methods for different functions under test. But then you could use parameterized tests to avoid further code duplication. And, for the case that you still see an advantage to use (at least sometimes) common test-data for the different functions: For such cases you can use factory functions that create the test-data and can be called from the different test cases. For example, you could have a factory function make-unit-cube to be used from different tests.
Try unittest.TestSuite(). That gives you an object where you can add test cases. In your case, create the suite, then loop over your lists, creating instances of TestCase which all have only a single test method. Pass the test data to the constructor and save them to properties there instead of in setUp().
The unit test runner will detect the suite when you create it in a method called suite() and run all of them.
Note: Assign a name to each TestCase instance or finding out which failed will be very hard.

Testing with manual calculation or programmed calculation?

I'm using Django and I need to write a test that requires calculation.
Is it best practice to calculate the expected value manually or is it ok to do this by using the sum function (see below)
This example is easier for me because I don't have to calculate something manually:
def test_balance(self):
amounts = [120.82, 90.23, 89.32, 193.92]
for amount in amounts:
self.mockedTransaction(amount=amount)
total = Balance.total()
self.assertEqual(total, sum(amounts))
Or in this example I have to calculate the expected value manually:
def test_balance(self):
self.mockedTransaction(amount=120.82)
self.mockedTransaction(amount=90.23)
self.mockedTransaction(amount=89.32)
self.mockedTransaction(amount=193.92)
total = Balance.total()
self.assertEqual(total, 494.29)
Does your total function just use sum to get the sum of a list of numbers? If so, your test is performing the same steps as your code under test. In that case, the first test can't fail. It would be better to use a manually generated value. (If total does just wrap sum, then I wouldn't spend a lot of time worrying about it, though. That function has been thoroughly tested.)
If Balance.total() is getting its value by some other method (like a SQL query run on the database), then it's fine to compute the expected value in your test method, particularly in simple cases like summing a list of values. If the computation is very complex, then you probably want to go back to manually calculated values. Otherwise your test code might be just as difficult to debug as your code under test.

How to use a random seed value in order to unittest a PRNG in Python?

I'm still pretty new to programming and just learning how to unittest. I need to test a function that returns a random value. I've so far found answers suggesting the use of a specific seed value so that the 'random' sequence is constant and can be compared. This is what I've got so far:
This is the function I want to test:
import random
def roll():
'''Returns a random number in the range 1 to 6, inclusive.'''
return random.randint(1, 6)
And this is my unittest:
class Tests(unittest.TestCase):
def test_random_roll(self):
random.seed(900)
seq = random.randint(1, 6)
self.assertEqual(roll(), seq)
How do I set the corresponding seed value for the PRNG in the function so that it can be tested without writing it into the function itself? Or is this completely the wrong way to go about testing a random number generator?
Thanks
The other answers are correct as far as they go. Here I'm answering the deeper question of how to test a random number generator:
Your provided function is not really a random number generator, as its entire implementation depends on a provided random number generator. In other words, you are trusting that Python provides you with a sensible random generator. For most purposes, this is a good thing to do. If you are writing cryptographic primitives, you might want to do something else, and at that point you would want some really robust test strategies (but they will never be enough).
Testing a function returns a specific sequence of numbers tells you virtually nothing about the correctness of your function in terms of "producing random numbers". A predefined sequence of numbers is the opposite of a random sequence.
So, what do you actually want to test? For 'roll' function, I think you'd like to test:
That given 'enough' rolls it produces all the numbers between 1 and 6, preferably in 'approximately' equal proportions.
That it doesn't produce anything else.
The problem with 1. is that your function is defined to be a random sequence, so there is always a non-zero chance that any hard limits you put in to define 'enough' or 'approximately equal' will occasionally fail. You could do some calculations to pick some limits that would make sure your test is unlikely to fail more than e.g. 1 in a billion times, or you could slap a random.seed() call that will mean it will never fail if it passes once (unless the underlying implementation from Python changes).
Item 2. could be 'tested' more easily - generate some large 'N' number of items, check that all are within expected outcome.
For all of this, however, I'd ask what value the unit tests actually are. You literally cannot write a test to check whether something is 'random' or not. To see whether the function has a reasonable source of randomness and uses it correctly, tests are useless - you have to inspect the code. Once you have done that, it's clear that your function is correct (providing Python provides a decent random number generator).
In short, this is one of those cases where unit tests provide extremely little value. I would probably just write one test (item 2 above), and leave it at that.
By seeding the prng with a known seed, you know which sequence it will produce, so you can test for this sequence:
class Tests(unittest.TestCase):
def test_random_roll(self):
random.seed(900)
self.assertEqual(roll(), 6)
self.assertEqual(roll(), 2)
self.assertEqual(roll(), 5)

How should I use #pm.stochastic in PyMC?

Fairly simple question: How should I use #pm.stochastic? I have read some blog posts that claim #pm.stochasticexpects a negative log value:
#pm.stochastic(observed=True)
def loglike(value=data):
# some calculations that generate a numeric result
return -np.log(result)
I tried this recently but found really bad results. Since I also noticed that some people used np.log instead of -np.log, I give it a try and worked much better. What is really expecting #pm.stochastic? I'm guessing there was a small confusion on the sign required due to a very popular example using something like np.log(1/(1+t_1-t_0)) which was written as -np.log(1+t_1-t_0)
Another question: What is this decorator doing with the value argument? As I understand it, we start with some proposed value for the priors that need to enter in the likelihood and the idea of #pm.stochastic is basically produce some number to compare this likelihood to the number generated by the previous iteration in the sampling process. The likelihood should receive the value argument and some values for the priors, but I'm not sure if this is all value is doing because that's the only required argument and yet I can write:
#pm.stochastic(observed=True)
def loglike(value=[1]):
data = [3,5,1] # some data
# some calculations that generate a numeric result
return np.log(result)
And as far as I can tell, that produces the same result as before. Maybe, it works in this way because I added observed=True to the decorator. If I would have tried this in a stochastic variable with observed=False by default, value would be changed in each iteration trying to obtain a better likelihood.
#pm.stochastic is a decorator, so it is expecting a function. The simplest way to use it is to give it a function that includes value as one of its arguments, and returns a log-likelihood.
You should use the #pm.stochastic decorator to define a custom prior for a parameter in your model. You should use the #pm.observed decorator to define a custom likelihood for data. Both of these decorators will create a pm.Stochastic object, which takes its name from the function it decorates, and has all the familiar methods and attributes (here is a nice article on Python decorators).
Examples:
A parameter a that has a triangular distribution a priori:
#pm.stochastic
def a(value=.5):
if 0 <= value < 1:
return np.log(1.-value)
else:
return -np.inf
Here value=.5 is used as the initial value of the parameter, and changing it to value=1 raises an exception, because it is outside of the support of the distribution.
A likelihood b that has is normal distribution centered at a, with a fixed precision:
#pm.observed
def b(value=[.2,.3], mu=a):
return pm.normal_like(value, mu, 100.)
Here value=[.2,.3] is used to represent the observed data.
I've put this together in a notebook that shows it all in action here.
Yes confusion is easy since the #stochastic returns a likelihood which is the opposite of the error essentially. So you take the negative log of your custom error function and return THAT as your log-likelihood.

Semantic Type Safety in Python

In my recent project I have the problem, that some values are often misinterpreted. For instance I calculate a wave as a sum of two waves (for which I need two amplitudes and two phase shifts), and then sample it at 4 points. I pass these tuples of four values to different functions, but sometimes I made the mistake to pass wave parameters instead of sample points.
These errors are hard to find, because all the calculations work without any error, but the values are totally meaningless in this context and so the results are just wrong.
What I want now is some kind of semantic type. I want to state that the one function returns sample points and the other function awaits sample points, and that I can do nothing that would conflict this declarations without immediately getting an error.
Is there any way to do this in python?
I would recommend implementing specific data types to be able to distinguish between different kind of information with the same structure.
You can simply subclass list for example and then do some type checking at runtime within your functions:
class WaveParameter(list):
pass
class Point(list):
pass
# you can use them just like lists
point = Point([1, 2, 3, 4])
wp = WaveParameter([5, 6])
# of course all methods from list are inherited
wp.append(7)
wp.append(8)
# let's check them
print(point)
print(wp)
# type checking examples
print isinstance(point, Point)
print isinstance(wp, Point)
print isinstance(point, WaveParameter)
print isinstance(wp, WaveParameter)
So you can include this kind of type checking in your functions, to make sure the correct kind of data was passed to it:
def example_function_with_waveparameter(data):
if not isinstance(data, WaveParameter):
log.error("received wrong parameter type (%s instead WaveParameter)" %
type(data))
# and then do the stuff
or simply assert:
def example_function_with_waveparameter(data):
assert(isinstance(data, WaveParameter))
Pyhon's notion of a "semantic type" is called a class, but as mentioned, Python is dynamically typed so even using custom classes instead of tuples you won't get any compile-time error - at best you'll get runtime errors if your classes are designed in such a way that trying to use one instead of the other will fail.
Now classes are not just about data, they are about behaviour too, so if you have functions that do waveform-specific computations these functions would probably become methods of the Waveform class, and idem for the Point part, and this might be enough to avoid logical errors like passing a "waveform" tuple to a function expecting a "point" tuple.
To make a long story short: if you want a statically typed functional language, Python is not the right tool (Haskell might be a better choice). If you really want / have to use Python, try using classes and methods instead of tuples and functions, it still won't detect type errors at compile-time but chances are you'll have less type errors AND that these type errors will be detected at runtime instead of producing wrong results.

Categories