I have some unit tests that are timing sensitive: an action is timed and an error is triggered if it takes too long. When run individually, these tests pass, but when running nosetest recursively on my modules, they often fail. I run concurrent tests, which likely is one reason why the timing is off. Is there any way to indicate that I want this test to be run with no interruptions?
I think your problem is dependent from how you implemented the timing. The solution I would personally adopt would be to set an environment variable that controls the behaviour of the tests. Candidates could be:
if WITH_TIMING == False [turn off timing altogether]
TIME_STRETCH_FACTOR = ... [apply a time-stretching multiplier in case of concurrent test are run, so that for example a time limit of 5 would become 7.5 if TIME_STRETCH_FACTOR would be 1.5]
If this is not an option, a possible ugly workaround would be to mock the time.time() function, making it return a constant value [this would only work if you use time.time() in your tests directly of course]...
HTH
Related
i have a pytest test suite with about 1800 tests which takes more than 10 minutes to collect and execute. i tried to create a cprofile on the test and found out that majority of the time, around 300 seconds went in {built-in method builtins.compile}
There were some other compile method calls from the regular expression package which i tried to remove and saw a reduction of about 50 seconds. but it still takes 9.5 minutes which is huge.
What i understood till now is that the builtins compile method is used to convert the script into code object and that pytest internally uses this function for creating and executing code objects. But 9-10 minutes is insanely huge amount of time for running 1800 tests. I am new to pytest and python so trying to figure out the reason for this time.
Could there be a possibility that pytest is not configured properly that it uses compile method to generate code object ? or could the other imported libraries use compile internally ?
Could there be a possibility that pytest is not configured properly that it uses compile method to generate code object ?
Though I have never looked, I would fully expect pytest to compile files to bytecode by hand, for the simple reason that it performs assertion rewriting by default in order to instrument assert statements: when an assertion fails, rather than just show the assertion message pytest shows the various intermediate values. This requires compiling either way: either they're compiling the code to bytecode and rewriting the bytecode, or they're parsing the code to the AST, updating the AST, and still compiling to bytecode.
It's possible to disable this behaviour (--assert=plain), but I would not expect there to be much gain from it (though I could be wrong): pytest simply does that instead of the interpreter performing the compilation on its own. It has to be done one way or an other for the test suite to run.
Though taking 5 minutes does sound like a lot, do you have a large amounts of very small files or something? Rough benching indicates that compile works at about 5usec/line on my machine (though it probably depends on code complexity). I've got 6kLOC worth of test, and while the test suite takes ages it's because the tests themselves are expensive, the collection is unnoticeable.
Of course it's possible you could be triggering some sort of edge case or issue in pytest e.g. maybe you have an ungodly number of assert statements which causes pytest to generate an insane amount of rewritten code? The aforementioned --assert=plain could hint at that if it makes running the test suite significantly shorter.
You could also try running e.g. --collect-only to see what that yields, though I don't know whether the assertion rewriting is performed during or after the collection. FWIW on the 6kLOC test suite above I get 216 tests collected in 1.32s.
Either way this seems like something more suitable to the pytest bug tracker.
or could the other imported libraries use compile internally ?
You could use a flamegraph-based profiler to record the entire stack. cprofile is, frankly, kinda shit.
In py.test there must be a function that loops over each found test and executes them sequentially.
Is there a way to overwrite this function/loop?
Because I want to run the tests sequentially, as before, but run the failed tests again a second time (or a third time) to see if they might succeed on the second (third) try.
(Some background explanation: The system to test is a GUI application which depends on many different systems, and rarely some of them fail or does not behave as expected. When you have some of these tests it becomes quite likely that at least on test will fail. Therefore I want to repeat the failed test for a couple of times. Also, that system cannot be changed).
I'am using py.test module to benchmark two versions of my algorithm and results I'm shown are way lower that those I get when I manually run the program. First variant is reference algorithm, while other improves reference algorithm execution by parallelization. For parallelization I use multiprocessing.Process. Benchmark shows ~4s execution time for parallel version (compared to ~90 sec for sequential), which is great. However, when i run parallel version manually, it takes way more than 4 s (I don't even get to finish execution, my PC overloads (all cores jump to 100% usage in htop) and I am forced to interrupt execution. And yes, I have added if __name__ == '__main__' part before creating processes.
I have timed first variant with time.time() and time.clock() and they both show around 100 sec (which is still higher than pytest shows, but as executin time depends on initial random setup, this might be understandable).
I've searched in documentation but couldn't find any explanation why this might happen. Do you have any ideas ? Is py.test even good way to benchmark parallel program, and do you have any other suggestions ?
I'm using pytest to run my tests, and testing my web application. My test file looks like
def test_logins():
# do stuff
def test_signups():
# do stuff
def testing_posting():
# do stuff
There are about 20 of them, and many of them have elements that run in constant time or rely on external HTTP requests, so it seems like it would lead to a large increase in testing speed if I could get pytest to start up 20 different mutliprocessing processes (one for each test) to run each testing function. Is this possible / reasonable / recommended?
I looked into xdist but splitting the tests so that they ran based on the amount of cores on my computer isn't what I want.
Also in case it's relevant, the bulk of the tests are done using python's requests library (although they will be moved to selenium eventually)
I would still recommend using pytest-xdist. And, as you mentioned already because your tests mostly do network IO, it's ok to start pytest with (much) more parallel processes than you have cores (like 20), it will be still beneficial, as GIL will not be preventing the speedup from the parallelization.
So you run it like:
py.test tests -n<number>
The additional benefit of xdist is that you can easily scale your test run to multiple machines with no effort.
For easier scaling among multiple machines, pytest-cloud can help a lot.
I am using py.test (version 2.4, on Windows 7) with xdist to run a number of numerical regression and interface tests for a C++ library that provides a Python interface through a C module.
The number of tests has grown to ~2,000 over time, but we are running into some memory issues now. Whether using xdist or not, the memory usage of the python process running the tests seems to be ever increasing.
In single-process mode we have even seen a few issues of bad allocation errors, whereas with xdist total memory usage may bring down the OS (8 processes, each using >1GB towards the end).
Is this expected behaviour? Or did somebody else experience the same issue when using py.test for a large number of tests? Is there something I can do in tearDown(Class) to reduce the memory usage over time?
At the moment I cannot exclude the possibility of the problem lying somewhere inside the C/C++ code, but when running some long-running program using that code through the Python interface outside of py.test, I do see relatively constant memory usage over time. I also do not see any excessive memory usage when using nose instead of py.test (we are using py.test as we need junit-xml reporting to work with multiple processes)
py.test's memory usage will grow with the number of tests. Each test is collected before they are executed and for each test run a test report is stored in memory, which will be much larger for failures, so that all the information can be reported at the end. So to some extend this is expected and normal.
However I have no hard numbers and have never closely investigated this. We did run out of memory on some CI hosts ourselves before but just gave them more memory to solve it instead of investigating. Currently our CI hosts have 2G of mem and run about 3500 tests in one test run, it would probably work on half of that but might involve more swapping. Pypy is also a project that manages to run a huge test suite with py.test so this should certainly be possible.
If you suspect the C code to leak memory I recommend building a (small) test script which just tests the extension module API (with or without py.test) and invoke that in an infinite loop while gathering memory stats after every loop. After a few loops the memory should never increase anymore.
Try using --tb=no which should prevent pytest from accumulating stacks on every failure.
I have found that it's better to have your test runner run smaller instances of pytest in multiple processes, rather than one big pytest run, because of it's accumulation in memory of every error.
pytest should probably accumulate test results on-disk, rather than in ram.
We also experience similar problems. In our case we run about ~4600 test cases.
We use extensively pytest fixtures and we managed to save the few MB by scoping the fixtures slightly differently (scoping several from "session" to "class" of "function"). However we dropped in test performances.