I'm using pytest to run my tests, and testing my web application. My test file looks like
def test_logins():
# do stuff
def test_signups():
# do stuff
def testing_posting():
# do stuff
There are about 20 of them, and many of them have elements that run in constant time or rely on external HTTP requests, so it seems like it would lead to a large increase in testing speed if I could get pytest to start up 20 different mutliprocessing processes (one for each test) to run each testing function. Is this possible / reasonable / recommended?
I looked into xdist but splitting the tests so that they ran based on the amount of cores on my computer isn't what I want.
Also in case it's relevant, the bulk of the tests are done using python's requests library (although they will be moved to selenium eventually)
I would still recommend using pytest-xdist. And, as you mentioned already because your tests mostly do network IO, it's ok to start pytest with (much) more parallel processes than you have cores (like 20), it will be still beneficial, as GIL will not be preventing the speedup from the parallelization.
So you run it like:
py.test tests -n<number>
The additional benefit of xdist is that you can easily scale your test run to multiple machines with no effort.
For easier scaling among multiple machines, pytest-cloud can help a lot.
Related
I'am using py.test module to benchmark two versions of my algorithm and results I'm shown are way lower that those I get when I manually run the program. First variant is reference algorithm, while other improves reference algorithm execution by parallelization. For parallelization I use multiprocessing.Process. Benchmark shows ~4s execution time for parallel version (compared to ~90 sec for sequential), which is great. However, when i run parallel version manually, it takes way more than 4 s (I don't even get to finish execution, my PC overloads (all cores jump to 100% usage in htop) and I am forced to interrupt execution. And yes, I have added if __name__ == '__main__' part before creating processes.
I have timed first variant with time.time() and time.clock() and they both show around 100 sec (which is still higher than pytest shows, but as executin time depends on initial random setup, this might be understandable).
I've searched in documentation but couldn't find any explanation why this might happen. Do you have any ideas ? Is py.test even good way to benchmark parallel program, and do you have any other suggestions ?
has anyone managed to test their fabric tasks? is there a library out there that can help with this?
I'm quite familiar with patching/mocking, but its pretty difficult with fabric, I've also had a look through fabrics own test suite, which was of no use unfortunately, and there don't seem to be any topics on it in fabric docs.
These are the tasks I'm trying to test... Id like to avoid bringing up a VM if possible.
Any help is appreciated, Thanks in advance
Disclaimer: Below, Functional Testing is used synonymously with System Testing. The lack of a formalized spec for most Fabric projects renders the distinction moot.
Furthermore, I may get casual between the terms Functional Testing and Integration Testing, since the border between them blurs with any configuration management software.
Local Functional Testing for Fabric is Hard (or Impossible)
I'm pretty sure that it is not possible to do functional testing without either bringing up a VM, which you give as one of your constraints, or doing extremely extensive mocking (which will make your testsuite inherently fragile).
Consider the following simple function:
def agnostic_install_lsb():
def install_helper(installer_command):
ret = run('which %s' % installer_command)
if ret.return_code == 0:
sudo('%s install -y lsb-release' % installer_command)
return True
return False
install_commands = ['apt-get', 'yum', 'zypper']
for cmd in install_commands:
if install_helper(cmd):
return True
return False
If you have a task that invokes agnostic_install_lsb, how can you do functional testing on a local box?
You can do unit testing by mocking the calls to run, local and sudo, but not much in terms of higher level integration tests.
If you're willing to be satisfied with simple unit tests, there's not really much call for a testing framework beyond mock and nose, since all of your unit tests operate in tightly controlled conditions.
How You Would Do The Mocking
You could mock the sudo, local, and run functions to log their commands to a set of StringIOs or files, but, unless there's something clever that I'm missing, you would also have to mock their return values very carefully.
To continue stating the things that you probably already know, your mocks would either have to be aware of the Fabric context managers (hard), or you would have to mock all of the context managers that you use (still hard, but not as bad).
If you do want to go down this path, I think it is safer and easier to build a test class whose setup instantiates mocks for all of the context managers, run, sudo, and any other parts of Fabric that you are using, rather than trying to do a more minimal amount of mocking on a per-test basis.
At that point, you will have built a somewhat generic testing framework for Fabric, and you should probably share it on PyPi as... "mabric"?
I contend that this wouldn't be much use for most cases, since your tests end up caring about how a run is done, rather than just what is done by the end of it.
Switching a command to sudo('echo "cthulhu" > /etc/hostname') from run('echo "cthulhu" | sudo tee /etc/hostname') shouldn't break the tests, and it's hard to see how to achieve that with simple mocks.
This is because we've started to blur the line between functional and unit testing, and this kind of basic mocking is an attempt to apply unit testing methodologies to functional tests.
Testing Configuration Management Software on VMs is an Established Practice
I would urge you to reconsider how badly you want to avoid spinning up VMs for your functional tests.
This is the commonly accepted practice for Chef testing, which faces many of the same challenges.
If you are concerned about the automation for this, Vagrant does a very good job of simplifying the creation of VMs from a template.
I've even heard that there's good Vagrant/Docker integration, if you're a Docker fan.
The only downside is that if you are a VMWare fan, Vagrant needs VMWare Workstation ($$$).
Alternatively, just use Vagrant with Virtualbox for free.
If you're working in a cloud environment like AWS, you even get the option of spinning up new VMs with the same base images as your production servers for the sole purpose of doing your tests.
Of course, a notable downside is that this costs money.
However, it's not a significant fraction of your costs if you are already running your full software stack in a public cloud because the testing servers are only up for a few hours total out of a month.
In short, there are a bunch of ways of tackling the problem of doing full, functional testing on VMs, and this is a tried and true technique for other configuration management software.
If Not Using Vagrant (or similar), Keep a Suite of Locally Executable Unit Tests
One of the obvious problems with making your tests depend upon running a VM is that it makes testing for developers difficult.
This is especially true for iterated testing against a local code version, as some projects (ex Web UI dev) may require.
If you are using Vagrant + Virtualbox, Docker (or raw LXC), or a similar solution for your testing virtualization, then local testing is not tremendously expensive.
These solutions make spinning up fresh VMs doable on cheap laptop hardware in under ten minutes.
For particularly fast iterations, you may be able to test multiple times against the same VM (and then replace it with a fresh one for a final test run).
However, if you are doing your virtualization in a public cloud or similar environment where doing too much testing on your VMs is costly, you should separate your tests into an extensive unit testsuite which can run locally, and integration or system tests which require the VM.
This separate set of tests allows for development without the full testsuite, running against the unit tests as development proceeds.
Then, before merging/shipping/signing off on changes, they should run against the functional tests on a VM.
Ultimately, nothing should make its way into your codebase that hasn't passed the functional tests, but it would behoove you to try to achieve as near to full code coverage for such a suite of unit tests as you can.
The more that you can do to enhance the confidence that your unit tests give you, the better, since it reduces the number of spurious (and potentially costly) runs of your system tests.
I am using py.test (version 2.4, on Windows 7) with xdist to run a number of numerical regression and interface tests for a C++ library that provides a Python interface through a C module.
The number of tests has grown to ~2,000 over time, but we are running into some memory issues now. Whether using xdist or not, the memory usage of the python process running the tests seems to be ever increasing.
In single-process mode we have even seen a few issues of bad allocation errors, whereas with xdist total memory usage may bring down the OS (8 processes, each using >1GB towards the end).
Is this expected behaviour? Or did somebody else experience the same issue when using py.test for a large number of tests? Is there something I can do in tearDown(Class) to reduce the memory usage over time?
At the moment I cannot exclude the possibility of the problem lying somewhere inside the C/C++ code, but when running some long-running program using that code through the Python interface outside of py.test, I do see relatively constant memory usage over time. I also do not see any excessive memory usage when using nose instead of py.test (we are using py.test as we need junit-xml reporting to work with multiple processes)
py.test's memory usage will grow with the number of tests. Each test is collected before they are executed and for each test run a test report is stored in memory, which will be much larger for failures, so that all the information can be reported at the end. So to some extend this is expected and normal.
However I have no hard numbers and have never closely investigated this. We did run out of memory on some CI hosts ourselves before but just gave them more memory to solve it instead of investigating. Currently our CI hosts have 2G of mem and run about 3500 tests in one test run, it would probably work on half of that but might involve more swapping. Pypy is also a project that manages to run a huge test suite with py.test so this should certainly be possible.
If you suspect the C code to leak memory I recommend building a (small) test script which just tests the extension module API (with or without py.test) and invoke that in an infinite loop while gathering memory stats after every loop. After a few loops the memory should never increase anymore.
Try using --tb=no which should prevent pytest from accumulating stacks on every failure.
I have found that it's better to have your test runner run smaller instances of pytest in multiple processes, rather than one big pytest run, because of it's accumulation in memory of every error.
pytest should probably accumulate test results on-disk, rather than in ram.
We also experience similar problems. In our case we run about ~4600 test cases.
We use extensively pytest fixtures and we managed to save the few MB by scoping the fixtures slightly differently (scoping several from "session" to "class" of "function"). However we dropped in test performances.
Our Python application (a cool web service) has a full suite of tests (unit tests, integration tests etc.) that all developers must run before committing code.
I want to add some performance tests to the suite to make sure no one adds code that makes us run too slow (for some rather arbitrary definition of slow).
Obviously, I can collect some functionality into a test, time it and compare to some predefined threshold.
The tricky requirements:
I want every developer to be able to test the code on his machine (varies with CPU power, OS(! Linux and some Windows) and external configurations - the Python version, libraries and modules are the same). A test server, while generally a good idea, does not solve this.
I want the test to be DETERMINISTIC - regardless of what is happening on the machine running the tests, I want multiple runs of the test to return the same results.
My preliminary thoughts:
Use timeit and do a benchmark of the system every time I run the tests. Compare the performance test results to the benchmark.
Use cProfile to instrument the interpreter to ignore "outside noise". I'm not sure I know how to read the pstats structure yet, but I'm sure it is doable.
Other thoughts?
Thanks!
Tal.
Check out funkload - it's a way of running your unit tests as either functional or load tests to gauge how well your site is performing.
Another interesting project which can be used in conjunction with funkload is codespeed. This is an internal dashboard that measures the "speed" of your codebase for every commit you make to your code, presenting graphs with trends over time. This assumes you have a number of automatic benchmarks you can run - but it could be a useful way to have an authoritative account of performance over time. The best use of codespeed I've seen so far is the speed.pypy.org site.
As to your requirement for determinism - perhaps the best approach to that is to use statistics to your advantage? Automatically run the test N times, produce the min, max, average and standard deviation of all your runs? Check out this article on benchmarking for some pointers on this.
I want the test to be DETERMINISTIC - regardless of what is happening on the machine running the tests, I want multiple runs of the test to return the same results.
Fail. More or less by definition this is utterly impossible in a multi-processing system with multiple users.
Either rethink this requirement or find a new environment in which to run tests that doesn't involve any of the modern multi-processing operating systems.
Further, your running web application is not deterministic, so imposing some kind of "deterministic" performance testing doesn't help much.
When we did time-critical processing (in radar, where "real time" actually meant real time) we did not attempt deterministic testing. We did code inspections and ran simple performance tests that involved simple averages and maximums.
Use cProfile to instrument the interpreter to ignore "outside noise". I'm not sure I know how to read the pstats structure yet, but I'm sure it is doable.
The Stats object created by the profiler is what you're looking for.
http://docs.python.org/library/profile.html#the-stats-class
Focus on 'pcalls', primitive call count, in the profile statistics and you'll have something that's approximately deterministic.
This question already has answers here:
Can Python's unittest test in parallel, like nose can?
(7 answers)
Closed 6 years ago.
I'm using python unittest in order to test some other external application but it takes too much time to run the test one by one.
I would like to know how can I speedup this process by using the power of multi-cores.
Can I tweak unittest to execute tests in parallel? How?
This question is not able python GIL limitation because in fact not the python code takes time but the external application that I execute, currently via os.system().
If your tests are not too involved, you may be able to run them using py.test which has support for distributed testing. If you are not running on Windows, then nose might also work for you.
The testtools package is an extension of unittest which supports running tests concurrently. It can be used with your old test classes that inherit unittest.TestCase.
For example:
import unittest
import testtools
class MyTester(unittest.TestCase):
# Tests...
suite = unittest.TestLoader().loadTestsFromTestCase(MyTester)
concurrent_suite = testtools.ConcurrentStreamTestSuite(lambda: ((case, None) for case in suite))
concurrent_suite.run(testtools.StreamResult())
Maybe you can run each test on a different process using the multiprocessing library. This implies that each unit test (or group of unit tests) should be independent and doesn't need to share the state.
It will open other processes, and will make use of other cores.
Check specifically the 'Using a pool of workers' on this page ( http://docs.python.org/library/multiprocessing.html#using-a-pool-of-workers)
EDIT: This module is included since version 2.6
As the #vinay-sajip suggested, a few non-core python packages like py.test and nose provided parallel execution of unit tests via multiprocessing lib right out of the box.
However, one thing to consider is that if you are testing a web app with database backend and majority of your test cases are relying on connecting to the same test database, then your unit test execution speed is bottlenecked on the DB not I/O per se. And using multiprocess won't speed it up.
Given that each unit test case requires an independent setup of the database schema + data, you cannot scale out the execution speed only on CPU but restricted with a single test database connection to a single test database server (otherwise the state of the data may interfere with other other while parallel executing each test case so on and so forth).