I've got a large number of tests written using Python unittest (Python 2.7.8) as a large TestSuite. Many of these tests invoke other programs. Sometimes these other programs dump core. When they do, I want to discover that and ensure the test fails. If some number of cores are dumped, I want to abort the entire test environment and exit rather than continuing: my total test suite has >6000 tests and if everything is dumping core it's useless (and dangerous: disk space etc.) to continue.
In order to ensure we look for coredumps after every test (so I have the best possible idea of what program/invocation dumped core) I decided to look for cores in tearDown(), which I am doing successfully. If I find a core, I can run an assert variant in tearDown() to specify that the test failed.
But I can't figure out how to just give up on all my testing completely from within tearDown() if I find too many cores. I even tried to run sys.exit("too many cores"), but unittest case.py catches every exception thrown by tearDown() except KeyboardInterrupt (if I try to raise that by hand my script hangs until I do a real ^C).
I thought about trying to call stop(), but this is a method on the result and I can't find any way to get access to the result object from within tearDown() (!).
So far my only option seems to be to invoke os._exit() which is really annoying because it keeps any results from being reported at all!
Is there really no facility in Python unittest.TestCase to tell the test environment to just stop right now, generate what results you have but don't run anything else?
Can you check how many cores have been dumped in setUp()? If so, you could just call self.skipTest('Too many cores dumped.') when things get bad.
If you don't want to look for cores in setUp(), you could probably use a class variable to hold the core dump count and check that instead.
Related
i have a pytest test suite with about 1800 tests which takes more than 10 minutes to collect and execute. i tried to create a cprofile on the test and found out that majority of the time, around 300 seconds went in {built-in method builtins.compile}
There were some other compile method calls from the regular expression package which i tried to remove and saw a reduction of about 50 seconds. but it still takes 9.5 minutes which is huge.
What i understood till now is that the builtins compile method is used to convert the script into code object and that pytest internally uses this function for creating and executing code objects. But 9-10 minutes is insanely huge amount of time for running 1800 tests. I am new to pytest and python so trying to figure out the reason for this time.
Could there be a possibility that pytest is not configured properly that it uses compile method to generate code object ? or could the other imported libraries use compile internally ?
Could there be a possibility that pytest is not configured properly that it uses compile method to generate code object ?
Though I have never looked, I would fully expect pytest to compile files to bytecode by hand, for the simple reason that it performs assertion rewriting by default in order to instrument assert statements: when an assertion fails, rather than just show the assertion message pytest shows the various intermediate values. This requires compiling either way: either they're compiling the code to bytecode and rewriting the bytecode, or they're parsing the code to the AST, updating the AST, and still compiling to bytecode.
It's possible to disable this behaviour (--assert=plain), but I would not expect there to be much gain from it (though I could be wrong): pytest simply does that instead of the interpreter performing the compilation on its own. It has to be done one way or an other for the test suite to run.
Though taking 5 minutes does sound like a lot, do you have a large amounts of very small files or something? Rough benching indicates that compile works at about 5usec/line on my machine (though it probably depends on code complexity). I've got 6kLOC worth of test, and while the test suite takes ages it's because the tests themselves are expensive, the collection is unnoticeable.
Of course it's possible you could be triggering some sort of edge case or issue in pytest e.g. maybe you have an ungodly number of assert statements which causes pytest to generate an insane amount of rewritten code? The aforementioned --assert=plain could hint at that if it makes running the test suite significantly shorter.
You could also try running e.g. --collect-only to see what that yields, though I don't know whether the assertion rewriting is performed during or after the collection. FWIW on the 6kLOC test suite above I get 216 tests collected in 1.32s.
Either way this seems like something more suitable to the pytest bug tracker.
or could the other imported libraries use compile internally ?
You could use a flamegraph-based profiler to record the entire stack. cprofile is, frankly, kinda shit.
In py.test there must be a function that loops over each found test and executes them sequentially.
Is there a way to overwrite this function/loop?
Because I want to run the tests sequentially, as before, but run the failed tests again a second time (or a third time) to see if they might succeed on the second (third) try.
(Some background explanation: The system to test is a GUI application which depends on many different systems, and rarely some of them fail or does not behave as expected. When you have some of these tests it becomes quite likely that at least on test will fail. Therefore I want to repeat the failed test for a couple of times. Also, that system cannot be changed).
I am running a large unit test repository for a complex project.
This project has some things that don't play well with large test amounts:
caches (memoization) that cause objects not to be freed between tests
complex objects at module level that are singletons and might gather data when being used
I am interested in each test (or at least each test suite) having its own "python-object-pool" and being able to free it after.
Sort of a python-garbage-collector-problem workaround.
I imagine a python self-contained temporary and discardable interpreter that can run certain code for me and after i can call "interpreter.free()" and be assured it doesn't leak.
One tough solution for this I found is to use Nose or implement this via subprocess for each time I need an expendable interpreter that will run a test. So each test becomes "fork_and_run(conditions)" and leaks no memory in the original process.
Also saw Nose single process per each test and run the tests sequantially - though people mentioned it sometimes freezes midway - less fun..
Is there a simpler solution?
P.S.
I am not interested in going through vast amounts of other peoples code and trying to make all their caches/objects/projects be perfectly memory-managed objects that can be cleaned.
P.P.S
Our PROD code also creates a new process for each job, which is very comfortable since we don't have to mess around with "surviving forever" and other scary stories.
TL;DR
Module reload trick I tried worked locally, broke when used on a machine with a different python version... (?!)
I ended up taking any and all caches I wrote in code and adding them to a global cache list - then clearing them between tests.
Sadly this will break if anyone uses a cache/manual cache mechanism and misses this, tests will start growing in memory again...
For starters I wrote a loop that goes over sys.modules dict and reloads (loops twice) all modules of my code. this worked amazingly - all references were freed properly, but it seems it cannot be used in production/serious code for multiple reasons:
old python versions break when reloading and classes that inherit meta-classes are redefined (I still don't get how this breaks).
unit tests survive the reload and sometimes have bad instances to old classes - especially if the class uses another classes instance. Think super(class_name, self) where self is the previously defined class, and now class_name is the redefined-same-name-class.
I am using py.test (version 2.4, on Windows 7) with xdist to run a number of numerical regression and interface tests for a C++ library that provides a Python interface through a C module.
The number of tests has grown to ~2,000 over time, but we are running into some memory issues now. Whether using xdist or not, the memory usage of the python process running the tests seems to be ever increasing.
In single-process mode we have even seen a few issues of bad allocation errors, whereas with xdist total memory usage may bring down the OS (8 processes, each using >1GB towards the end).
Is this expected behaviour? Or did somebody else experience the same issue when using py.test for a large number of tests? Is there something I can do in tearDown(Class) to reduce the memory usage over time?
At the moment I cannot exclude the possibility of the problem lying somewhere inside the C/C++ code, but when running some long-running program using that code through the Python interface outside of py.test, I do see relatively constant memory usage over time. I also do not see any excessive memory usage when using nose instead of py.test (we are using py.test as we need junit-xml reporting to work with multiple processes)
py.test's memory usage will grow with the number of tests. Each test is collected before they are executed and for each test run a test report is stored in memory, which will be much larger for failures, so that all the information can be reported at the end. So to some extend this is expected and normal.
However I have no hard numbers and have never closely investigated this. We did run out of memory on some CI hosts ourselves before but just gave them more memory to solve it instead of investigating. Currently our CI hosts have 2G of mem and run about 3500 tests in one test run, it would probably work on half of that but might involve more swapping. Pypy is also a project that manages to run a huge test suite with py.test so this should certainly be possible.
If you suspect the C code to leak memory I recommend building a (small) test script which just tests the extension module API (with or without py.test) and invoke that in an infinite loop while gathering memory stats after every loop. After a few loops the memory should never increase anymore.
Try using --tb=no which should prevent pytest from accumulating stacks on every failure.
I have found that it's better to have your test runner run smaller instances of pytest in multiple processes, rather than one big pytest run, because of it's accumulation in memory of every error.
pytest should probably accumulate test results on-disk, rather than in ram.
We also experience similar problems. In our case we run about ~4600 test cases.
We use extensively pytest fixtures and we managed to save the few MB by scoping the fixtures slightly differently (scoping several from "session" to "class" of "function"). However we dropped in test performances.
I'm in the process of writing a small/medium sized GUI application with PyGObject (the new introspection based bindings for Gtk). I started out with a reasonable test suite based on nose that was able to test most of the functions of my app by simply importing the modules and calling various functions and checking the results.
However, recently I've begun to take advantage of some Gtk features like GLib.timeout_add_seconds which is a fairly simple callback mechanism that simply calls the specified callback after a timer expires. The problem I'm naturally facing now is that my code seems to work when I use the app, but the testsuite is poorly encapsulated so when one test checks that it's starting with clean state, it finds that it's state has been trampled all over by a callback that was registered by a different test. Specifically, the test successfully checks that no files are loaded, then it loads some files, then checks that the files haven't been modified since loading, and the test fails!
It took me a while to figure out what was going on, but essentially one test would modify some files (which initiates a timer) then close them without saving, then another test would reopen the unmodified files and find that they're modified, because the callback altered the files after the timer was up.
I've read about Python's reload() builtin for reloading modules in the hopes that I could make it unload and reload my app to get a fresh start, but it just doesn't seem to be working.
I'm afraid that I might have to resort to launching the app as a subprocess, tinkering with it, then ending the subprocess and relaunching it when I need to guarantee fresh state. Are there any test frameworks out there that would make this easy, particularly for pygobject code?
Would a mocking framework help you isolate the callbacks? That way, you should be able to get back to the same state as when you started. Note that a SetUp() and tearDown() pattern may help you there as well -- but I am kind of assuming that you already are using that.