pytest takes 10 minutes to collect at builtins compile method - python

i have a pytest test suite with about 1800 tests which takes more than 10 minutes to collect and execute. i tried to create a cprofile on the test and found out that majority of the time, around 300 seconds went in {built-in method builtins.compile}
There were some other compile method calls from the regular expression package which i tried to remove and saw a reduction of about 50 seconds. but it still takes 9.5 minutes which is huge.
What i understood till now is that the builtins compile method is used to convert the script into code object and that pytest internally uses this function for creating and executing code objects. But 9-10 minutes is insanely huge amount of time for running 1800 tests. I am new to pytest and python so trying to figure out the reason for this time.
Could there be a possibility that pytest is not configured properly that it uses compile method to generate code object ? or could the other imported libraries use compile internally ?

Could there be a possibility that pytest is not configured properly that it uses compile method to generate code object ?
Though I have never looked, I would fully expect pytest to compile files to bytecode by hand, for the simple reason that it performs assertion rewriting by default in order to instrument assert statements: when an assertion fails, rather than just show the assertion message pytest shows the various intermediate values. This requires compiling either way: either they're compiling the code to bytecode and rewriting the bytecode, or they're parsing the code to the AST, updating the AST, and still compiling to bytecode.
It's possible to disable this behaviour (--assert=plain), but I would not expect there to be much gain from it (though I could be wrong): pytest simply does that instead of the interpreter performing the compilation on its own. It has to be done one way or an other for the test suite to run.
Though taking 5 minutes does sound like a lot, do you have a large amounts of very small files or something? Rough benching indicates that compile works at about 5usec/line on my machine (though it probably depends on code complexity). I've got 6kLOC worth of test, and while the test suite takes ages it's because the tests themselves are expensive, the collection is unnoticeable.
Of course it's possible you could be triggering some sort of edge case or issue in pytest e.g. maybe you have an ungodly number of assert statements which causes pytest to generate an insane amount of rewritten code? The aforementioned --assert=plain could hint at that if it makes running the test suite significantly shorter.
You could also try running e.g. --collect-only to see what that yields, though I don't know whether the assertion rewriting is performed during or after the collection. FWIW on the 6kLOC test suite above I get 216 tests collected in 1.32s.
Either way this seems like something more suitable to the pytest bug tracker.
or could the other imported libraries use compile internally ?
You could use a flamegraph-based profiler to record the entire stack. cprofile is, frankly, kinda shit.

Related

Is there a way to overwrite the main loop in py.test?

In py.test there must be a function that loops over each found test and executes them sequentially.
Is there a way to overwrite this function/loop?
Because I want to run the tests sequentially, as before, but run the failed tests again a second time (or a third time) to see if they might succeed on the second (third) try.
(Some background explanation: The system to test is a GUI application which depends on many different systems, and rarely some of them fail or does not behave as expected. When you have some of these tests it becomes quite likely that at least on test will fail. Therefore I want to repeat the failed test for a couple of times. Also, that system cannot be changed).

Temporary object-pool for unit tests?

I am running a large unit test repository for a complex project.
This project has some things that don't play well with large test amounts:
caches (memoization) that cause objects not to be freed between tests
complex objects at module level that are singletons and might gather data when being used
I am interested in each test (or at least each test suite) having its own "python-object-pool" and being able to free it after.
Sort of a python-garbage-collector-problem workaround.
I imagine a python self-contained temporary and discardable interpreter that can run certain code for me and after i can call "interpreter.free()" and be assured it doesn't leak.
One tough solution for this I found is to use Nose or implement this via subprocess for each time I need an expendable interpreter that will run a test. So each test becomes "fork_and_run(conditions)" and leaks no memory in the original process.
Also saw Nose single process per each test and run the tests sequantially - though people mentioned it sometimes freezes midway - less fun..
Is there a simpler solution?
P.S.
I am not interested in going through vast amounts of other peoples code and trying to make all their caches/objects/projects be perfectly memory-managed objects that can be cleaned.
P.P.S
Our PROD code also creates a new process for each job, which is very comfortable since we don't have to mess around with "surviving forever" and other scary stories.
TL;DR
Module reload trick I tried worked locally, broke when used on a machine with a different python version... (?!)
I ended up taking any and all caches I wrote in code and adding them to a global cache list - then clearing them between tests.
Sadly this will break if anyone uses a cache/manual cache mechanism and misses this, tests will start growing in memory again...
For starters I wrote a loop that goes over sys.modules dict and reloads (loops twice) all modules of my code. this worked amazingly - all references were freed properly, but it seems it cannot be used in production/serious code for multiple reasons:
old python versions break when reloading and classes that inherit meta-classes are redefined (I still don't get how this breaks).
unit tests survive the reload and sometimes have bad instances to old classes - especially if the class uses another classes instance. Think super(class_name, self) where self is the previously defined class, and now class_name is the redefined-same-name-class.

How do I stop testing in Python unittest tearDown()?

I've got a large number of tests written using Python unittest (Python 2.7.8) as a large TestSuite. Many of these tests invoke other programs. Sometimes these other programs dump core. When they do, I want to discover that and ensure the test fails. If some number of cores are dumped, I want to abort the entire test environment and exit rather than continuing: my total test suite has >6000 tests and if everything is dumping core it's useless (and dangerous: disk space etc.) to continue.
In order to ensure we look for coredumps after every test (so I have the best possible idea of what program/invocation dumped core) I decided to look for cores in tearDown(), which I am doing successfully. If I find a core, I can run an assert variant in tearDown() to specify that the test failed.
But I can't figure out how to just give up on all my testing completely from within tearDown() if I find too many cores. I even tried to run sys.exit("too many cores"), but unittest case.py catches every exception thrown by tearDown() except KeyboardInterrupt (if I try to raise that by hand my script hangs until I do a real ^C).
I thought about trying to call stop(), but this is a method on the result and I can't find any way to get access to the result object from within tearDown() (!).
So far my only option seems to be to invoke os._exit() which is really annoying because it keeps any results from being reported at all!
Is there really no facility in Python unittest.TestCase to tell the test environment to just stop right now, generate what results you have but don't run anything else?
Can you check how many cores have been dumped in setUp()? If so, you could just call self.skipTest('Too many cores dumped.') when things get bad.
If you don't want to look for cores in setUp(), you could probably use a class variable to hold the core dump count and check that instead.

Py.test: excessive memory usage with large number of tests

I am using py.test (version 2.4, on Windows 7) with xdist to run a number of numerical regression and interface tests for a C++ library that provides a Python interface through a C module.
The number of tests has grown to ~2,000 over time, but we are running into some memory issues now. Whether using xdist or not, the memory usage of the python process running the tests seems to be ever increasing.
In single-process mode we have even seen a few issues of bad allocation errors, whereas with xdist total memory usage may bring down the OS (8 processes, each using >1GB towards the end).
Is this expected behaviour? Or did somebody else experience the same issue when using py.test for a large number of tests? Is there something I can do in tearDown(Class) to reduce the memory usage over time?
At the moment I cannot exclude the possibility of the problem lying somewhere inside the C/C++ code, but when running some long-running program using that code through the Python interface outside of py.test, I do see relatively constant memory usage over time. I also do not see any excessive memory usage when using nose instead of py.test (we are using py.test as we need junit-xml reporting to work with multiple processes)
py.test's memory usage will grow with the number of tests. Each test is collected before they are executed and for each test run a test report is stored in memory, which will be much larger for failures, so that all the information can be reported at the end. So to some extend this is expected and normal.
However I have no hard numbers and have never closely investigated this. We did run out of memory on some CI hosts ourselves before but just gave them more memory to solve it instead of investigating. Currently our CI hosts have 2G of mem and run about 3500 tests in one test run, it would probably work on half of that but might involve more swapping. Pypy is also a project that manages to run a huge test suite with py.test so this should certainly be possible.
If you suspect the C code to leak memory I recommend building a (small) test script which just tests the extension module API (with or without py.test) and invoke that in an infinite loop while gathering memory stats after every loop. After a few loops the memory should never increase anymore.
Try using --tb=no which should prevent pytest from accumulating stacks on every failure.
I have found that it's better to have your test runner run smaller instances of pytest in multiple processes, rather than one big pytest run, because of it's accumulation in memory of every error.
pytest should probably accumulate test results on-disk, rather than in ram.
We also experience similar problems. In our case we run about ~4600 test cases.
We use extensively pytest fixtures and we managed to save the few MB by scoping the fixtures slightly differently (scoping several from "session" to "class" of "function"). However we dropped in test performances.

Proper way to automatically test performance in Python (for all developers)?

Our Python application (a cool web service) has a full suite of tests (unit tests, integration tests etc.) that all developers must run before committing code.
I want to add some performance tests to the suite to make sure no one adds code that makes us run too slow (for some rather arbitrary definition of slow).
Obviously, I can collect some functionality into a test, time it and compare to some predefined threshold.
The tricky requirements:
I want every developer to be able to test the code on his machine (varies with CPU power, OS(! Linux and some Windows) and external configurations - the Python version, libraries and modules are the same). A test server, while generally a good idea, does not solve this.
I want the test to be DETERMINISTIC - regardless of what is happening on the machine running the tests, I want multiple runs of the test to return the same results.
My preliminary thoughts:
Use timeit and do a benchmark of the system every time I run the tests. Compare the performance test results to the benchmark.
Use cProfile to instrument the interpreter to ignore "outside noise". I'm not sure I know how to read the pstats structure yet, but I'm sure it is doable.
Other thoughts?
Thanks!
Tal.
Check out funkload - it's a way of running your unit tests as either functional or load tests to gauge how well your site is performing.
Another interesting project which can be used in conjunction with funkload is codespeed. This is an internal dashboard that measures the "speed" of your codebase for every commit you make to your code, presenting graphs with trends over time. This assumes you have a number of automatic benchmarks you can run - but it could be a useful way to have an authoritative account of performance over time. The best use of codespeed I've seen so far is the speed.pypy.org site.
As to your requirement for determinism - perhaps the best approach to that is to use statistics to your advantage? Automatically run the test N times, produce the min, max, average and standard deviation of all your runs? Check out this article on benchmarking for some pointers on this.
I want the test to be DETERMINISTIC - regardless of what is happening on the machine running the tests, I want multiple runs of the test to return the same results.
Fail. More or less by definition this is utterly impossible in a multi-processing system with multiple users.
Either rethink this requirement or find a new environment in which to run tests that doesn't involve any of the modern multi-processing operating systems.
Further, your running web application is not deterministic, so imposing some kind of "deterministic" performance testing doesn't help much.
When we did time-critical processing (in radar, where "real time" actually meant real time) we did not attempt deterministic testing. We did code inspections and ran simple performance tests that involved simple averages and maximums.
Use cProfile to instrument the interpreter to ignore "outside noise". I'm not sure I know how to read the pstats structure yet, but I'm sure it is doable.
The Stats object created by the profiler is what you're looking for.
http://docs.python.org/library/profile.html#the-stats-class
Focus on 'pcalls', primitive call count, in the profile statistics and you'll have something that's approximately deterministic.

Categories