Python unit test advice - python

Can I get some advice on writing a unit test for the following piece of code?
%python
import sys
import json
sys.argv = []
sys.argv.append('{"product1":{"brand":"x","type":"y"}}')
sys.argv.append('{"product1":{"brand":"z","type":"a"}}')
products = sys.argv
yy= {}
my_products = []
for n, i in enumerate(products[:]):
xx = json.loads(i)
for j in xx.keys():
yy["brand"] = xx[j]['brand']
yy["type"] = xx[j]["type"]
my_products.append(yy)
print my_products

As it stands there aren't any units to test!!!
A test might consist of:
packaging your program in a script
invoking your program from python unit test as a subprocess
piping the output of your command process to a buffer
asserting the buffer is what you except it to be
While the above would technically allow you to have an automated test on your code it comes with a lot of burden:
- multi processing
- weak assertions by not having types
- coarse interaction (have to invoke a script, can't just assert on the brand/type logic
One way to address those issues could be to package your code into smaller units, ie create a method to encapsulate:
for j in xx.keys():
yy["brand"] = xx[j]['brand']
yy["type"] = xx[j]["type"]
my_products.append(yy)
Import it, exercise it and assert on its output. Then there might be something to map the loading and application of xx.keys() loop to an array (which you could also encapsulate as a function).
And then there could be the highest level taking in args and composing the product mapper loader transformer. And since your code will be thoroughly unit tested at this point, you may get away with not having a test for your top level script?

Related

Within a python script, check syntactic correctness of C code in str format

Necessarily within a python program and given an str variable that contains C code, I want to check fast if this code is syntactically correct, or not. Essentially, I only need to pass it through the compiler's front end.
My current implementation uses a temp file to dump the string and calls a clang process with subprocess (non-working code below to illustrate my solution). This is very slow for my needs.
src = "int main(){printf("This is a C program\n"); return 0;}"
with open(temp_file, 'w') as f:
f.write(src)
cmd = ["clang", abs_path(f), flags]
subprocess.Popen(cmd)
## etc..
After looking around, I found out about clang.cindex module (pip clang), which I tried out. After reading a bit the main module, lines 2763-2837 (specifically line 2828) led me to the conclusion that the following code snippet will do what I need:
import clang.cindex
......
try:
unit = clang.cindex.TranslationUnit.from_source(temp_code_file, ##args, etc.)
print("Compiled!")
except clang.cindex.TranslationUnitLoadError:
print("Did not compile!")
However, it seems that even if the source file contains obvious syntactic errors, an exception is not raised. Anyone knows what am I missing to make this work ?
On a general context, any suggestions on how to do this task as fast as possible would be more than welcome. Even with clang.cindex, I cannot get away from writing my string-represented code to a temp file, which may be an additional overhead. Writing a python parser could solve this but is an overkill at the moment, no matter how much I need speed.
The compilation itself succeeds even if the file has syntax errors. Consider the following example:
import clang.cindex
with open('broken.c', 'w') as f:
f.write('foo bar baz')
unit = clang.cindex.TranslationUnit.from_source('broken.c')
for d in unit.diagnostics:
print(d.severity, d)
Run it and you will get
3 broken.c:1:1: error: unknown type name 'foo'
3 broken.c:1:8: error: expected ';' after top level declarator
The severity member of is an int, with the value from the enum CXDiagnosticSeverity with values
CXDiagnostic_Ignored = 0
CXDiagnostic_Note = 1
CXDiagnostic_Warning = 2
CXDiagnostic_Error = 3
CXDiagnostic_Fatal = 4

Concurrent.futures.map initializes code from beginning

I am a fairly beginner programmer with python and in general with not that much experience, and currently I'm trying to parallelize a process that is heavily CPU bound in my code. I'm using anaconda to create environments and Visual Code to debug.
A summary of the code is as following :
from tkinter import filedialog
import myfuncs as mf, concurrent.futures
file_path = filedialog.askopenfilename('Ask for a file containing data')
# import data from file_path
a = input('Ask the user for input')
Next calculations are made from these and I reach a stage where I need to iterate of a list of lists. These lists may contain up to two values and calls are made to a separate file.
For example the inputs are :
sub_data1 = [test1]
sub_data2 = [test1, test2]
dataset = [sub_data1, sub_data2]
This is the stage I use concurrent.futures.ProcessPoolExecutor()-instance and its .map() method :
with concurrent.futures.ProcessPoolExecutor() as executor:
sm_res = executor.map(mf.process_distr, dataset)
While inside a myfuncs.py, the mf.process_distr() function works like this :
def process_distr(tests):
sm_reg = []
for i in range(len(tests)):
if i==0:
# do stuff
sm_reg.append(result1)
else:
# do stuff
sm_reg.append(result2)
return sm_reg
The problem is that when I try to execute this code on the main.py file, it seems that the main.py starts running multiple times, and asks for user inputs and file dialog pops up multiple times (same amount as cores count).
How can I resolve this matter?
Edit: After reading more into it, encapsulating the whole main.py code with:
if __name__ == '__main__':
did the trick. Thank you to anyone who gave time to help with my rookie problem.

Access variables and lists from function

I am new to unit testing with Python. I would like to test some functions in my code. In particular I need to test if the outputs have specific dimensions or the same dimensions.
My Python script for unit testing looks like this:
import unittest
from func import *
class myTests(unittest.TestCase):
def setUp(self):
# I am not really sure whats the purpose of this function
def test_main(self):
# check if outputs of the function "main" are not empty:
self.assertTrue(main, msg = 'The main() function provides no return values!')
# check if "run['l_modeloutputs']" and "run['l_modeloutputs']", within the main() function have the same size:
self.assertCountEqual(self, run['l_modeloutputs'], run['l_dataoutputs'], msg=None)
# --> Doesn't work so far!
# check if the dimensions of "props['k_iso']", within the main() function are (80,40,100):
def tearDown(self):
# I am also not sure of the purpose of this function
if _name__ == "__main__":
unittest.main()
Here is the code under test:
def main(param_file):
# Load parameter file
run, model, sequences, hydraulics, flowtrans, elements, mg = hu.model_setup(param_file)
# some other code
...
if 'l_modeloutputs' in run:
if hydraulics['flag_gen'] is False:
print('No hydraulic parameters generated. No model outputs saved')
else:
save_models(realdir, realname, mg, run['l_modeloutputs'], flowtrans, props['k_iso'], props['ktensors'])
I need to access the parameters run['l_modeloutputs'] and run['l_dataoutputs'] of the main function from func.py. How can I pass the dimensions of these parameters to the unit testing script?
It sounds a bit like one of two things at the moment. Either your code isn't laid out at the moment in a way that is easy to test, or maybe you are trying to test or call too much code in one go.
If your code is laid out like the following:
main(file_name):
with open(file_name) as file:
... do work ...
results = outcome_of_work
and you are trying to test what you have got from the file_name as well as the size of results, then you may want to think of refactoring this so that you can test a smaller action. Maybe:
main(file_name):
# `get_file_contents` appears to be `hu.model_setup`
# `file_contents` would be `run`
file_contents = get_file_contents(file_name)
results = do_work_on_file_contents(file_contents)
Of course, if you already have a similar setup then the following is also applicable. This you can do easier tests, as you have easy control to both what's going into test (file_name or file_contents) and can then test the outcome (file_contents or results) for expected results.
With the unittest module you would basically be creating a small function for each test:
class Test(TestCase):
def test_get_file_contents(self):
# ... set up example `file-like object` ...
run = hu.model_setup(file_name)
self.assertCountEqual(
run['l_modeloutputs'], run['l_dataoutputs'])
... repeat for other possible files ...
def test_do_work_on_file_contents(self):
example_input = ... setup input ...
example_output = do_work_on_file_contents(example_input)
assert example_output == as_expected
This can then be repeated for different sets of potential inputs, both good and edge cases.
Its probably worth looking about for a more in-depth tutorial as this is obviously only a very quick look over.
And setUp and tearDown are only needed if there is something to be done for each test you have written (i.e. you have set up an object in a particular way, for several tests, this can be done in setUp and its run before each test function.

Writing python unit tests inside the actual code

Sometimes I'm writing small utilities functions and pack them as python package.
How small? 30 - 60 lines of python.
And my question is do you think writing the tests inside the actual code is bad? abusing?
I can see a great benefits like usage examples inside the code itself without jumping between files (again from really small projects).
Example:
#!/usr/bin/env python
# Actual code
def increment(number, by=1):
return number += by
# Tests
def test_increment_positive():
assert increment(1) == 2
def test_increment_negative():
assert increment(-5) == -4
def test_increment_zero():
assert increment(0) == 1
The general Idea taken from the monitoring framework riemann which I use, in riemann you write your tests file along with your code link
You can write doctests inside your documentation to indicate how your function should be used:
def increment(number, by=1):
""" Increments the given number by some other number
>>> increment(3)
4
>>> increment(5,3)
8
"""
return number += by
From the documentation:
To check that a module’s docstrings are up-to-date by verifying that all interactive examples still work as documented.
To perform regression testing by verifying that interactive examples from a test file or a test object work as expected.
To write tutorial documentation for a package, liberally illustrated with input-output examples. Depending on whether the
examples or the expository text are emphasized, this has the
flavor of “literate testing” or “executable documentation”

Python something resets my random seed

My question is the exact opposite of this one.
This is an excerpt from my test file
f1 = open('seed1234','r')
f2 = open('seed7883','r')
s1 = eval(f1.read())
s2 = eval(f2.read())
f1.close()
f2.close()
####
test_sampler1.random_inst.setstate(s1)
out1 = test_sampler1.run()
self.assertEqual(out1,self.out1_regress) # this is fine and passes
test_sampler2.random_inst.setstate(s2)
out2 = test_sampler2.run()
self.assertEqual(out2,self.out2_regress) # this FAILS
Some info -
test_sampler1 and test_sampler2 are 2 object from a class that performs some stochastic sampling. The class has an attribute random_inst which is an object of type random.Random(). The file seed1234 contains a TestSampler's random_inst's state as returned by random.getstate() when it was given a seed of 1234 and you can guess what seed7883 is. What I did was I created a TestSampler in the terminal, gave it a random seed of 1234, acquired the state with rand_inst.getstate() and save it to a file. I then recreate the regression test and I always get the same output.
HOWEVER
The same procedure as above doesn't work for test_sampler2 - whatever I do not get the same random sequence of numbers. I am using python's random module and I am not importing it anywhere else, but I do use numpy in some places (but not numpy.random).
The only difference between test_sampler1 and test_sampler2 is that they are created from 2 different files. I know this is a big deal and it is totally dependent on the code I wrote but I also can't simply paste ~800 lines of code here, I am merely looking for some general idea of what I might be messing up...
What might be scrambling the state of test_sampler2's random number generator?
Solution
There were 2 separate issues with my code:
1
My script is a command line script and after I refactored it to use python's optparse library I found out that I was setting the seed for my sampler using something like seed = sys.argv[1] which meant that I was setting the seed to be a str, not an int - seed can take any hashable object and I found it the hard way. This explains why I would get 2 different sequences if I used the same seed - one if I run my script from the command line with sth like python sample 1234 #seed is 1234 and from my unit_tests.py file when I would create an object instance like test_sampler1 = TestSampler(seed=1234).
2
I have a function for discrete distribution sampling which I borrowed from here (look at the accepted answer). The code there was missing something fundamental: it was still non-deterministic in the sense that if you give it the same values and probabilities array, but transformed by a permutation (say values ['a','b'] and probs [0.1,0.9] and values ['b','a'] and probabilities [0.9,0.1]) and the seed is set and you will get the same random sample, say 0.3, by the PRNG, but since the intervals for your probabilities are different, in one case you'll get a b and in one an a. To fix it, I just zipped the values and probabilities together, sorted by probability and tadaa - I now always get the same probability intervals.
After fixing both issues the code worked as expected i.e. out2 started behaving deterministically.
The only thing (apart from an internal Python bug) that can change the state of a random.Random instance is calling methods on that instance. So the problem lies in something you haven't shown us. Here's a little test program:
from random import Random
r1 = Random()
r2 = Random()
for _ in range(100):
r1.random()
for _ in range(200):
r2.random()
r1state = r1.getstate()
r2state = r2.getstate()
with open("r1state", "w") as f:
print >> f, r1state
with open("r2state", "w") as f:
print >> f, r2state
for _ in range(100):
with open("r1state") as f:
r1.setstate(eval(f.read()))
with open("r2state") as f:
r2.setstate(eval(f.read()))
assert r1state == r1.getstate()
assert r2state == r2.getstate()
I haven't run that all day, but I bet I could and never see a failing assert ;-)
BTW, it's certainly more common to use pickle for this kind of thing, but it's not going to solve your real problem. The problem is not in getting or setting the state. The problem is that something you haven't yet found is calling methods on your random.Random instance(s).
While it's a major pain in the butt to do so, you could try adding print statements to random.py to find out what's doing it. There are cleverer ways to do that, but better to keep it dirt simple so that you don't end up actually debugging the debugging code.

Categories