has anyone managed to test their fabric tasks? is there a library out there that can help with this?
I'm quite familiar with patching/mocking, but its pretty difficult with fabric, I've also had a look through fabrics own test suite, which was of no use unfortunately, and there don't seem to be any topics on it in fabric docs.
These are the tasks I'm trying to test... Id like to avoid bringing up a VM if possible.
Any help is appreciated, Thanks in advance
Disclaimer: Below, Functional Testing is used synonymously with System Testing. The lack of a formalized spec for most Fabric projects renders the distinction moot.
Furthermore, I may get casual between the terms Functional Testing and Integration Testing, since the border between them blurs with any configuration management software.
Local Functional Testing for Fabric is Hard (or Impossible)
I'm pretty sure that it is not possible to do functional testing without either bringing up a VM, which you give as one of your constraints, or doing extremely extensive mocking (which will make your testsuite inherently fragile).
Consider the following simple function:
def agnostic_install_lsb():
def install_helper(installer_command):
ret = run('which %s' % installer_command)
if ret.return_code == 0:
sudo('%s install -y lsb-release' % installer_command)
return True
return False
install_commands = ['apt-get', 'yum', 'zypper']
for cmd in install_commands:
if install_helper(cmd):
return True
return False
If you have a task that invokes agnostic_install_lsb, how can you do functional testing on a local box?
You can do unit testing by mocking the calls to run, local and sudo, but not much in terms of higher level integration tests.
If you're willing to be satisfied with simple unit tests, there's not really much call for a testing framework beyond mock and nose, since all of your unit tests operate in tightly controlled conditions.
How You Would Do The Mocking
You could mock the sudo, local, and run functions to log their commands to a set of StringIOs or files, but, unless there's something clever that I'm missing, you would also have to mock their return values very carefully.
To continue stating the things that you probably already know, your mocks would either have to be aware of the Fabric context managers (hard), or you would have to mock all of the context managers that you use (still hard, but not as bad).
If you do want to go down this path, I think it is safer and easier to build a test class whose setup instantiates mocks for all of the context managers, run, sudo, and any other parts of Fabric that you are using, rather than trying to do a more minimal amount of mocking on a per-test basis.
At that point, you will have built a somewhat generic testing framework for Fabric, and you should probably share it on PyPi as... "mabric"?
I contend that this wouldn't be much use for most cases, since your tests end up caring about how a run is done, rather than just what is done by the end of it.
Switching a command to sudo('echo "cthulhu" > /etc/hostname') from run('echo "cthulhu" | sudo tee /etc/hostname') shouldn't break the tests, and it's hard to see how to achieve that with simple mocks.
This is because we've started to blur the line between functional and unit testing, and this kind of basic mocking is an attempt to apply unit testing methodologies to functional tests.
Testing Configuration Management Software on VMs is an Established Practice
I would urge you to reconsider how badly you want to avoid spinning up VMs for your functional tests.
This is the commonly accepted practice for Chef testing, which faces many of the same challenges.
If you are concerned about the automation for this, Vagrant does a very good job of simplifying the creation of VMs from a template.
I've even heard that there's good Vagrant/Docker integration, if you're a Docker fan.
The only downside is that if you are a VMWare fan, Vagrant needs VMWare Workstation ($$$).
Alternatively, just use Vagrant with Virtualbox for free.
If you're working in a cloud environment like AWS, you even get the option of spinning up new VMs with the same base images as your production servers for the sole purpose of doing your tests.
Of course, a notable downside is that this costs money.
However, it's not a significant fraction of your costs if you are already running your full software stack in a public cloud because the testing servers are only up for a few hours total out of a month.
In short, there are a bunch of ways of tackling the problem of doing full, functional testing on VMs, and this is a tried and true technique for other configuration management software.
If Not Using Vagrant (or similar), Keep a Suite of Locally Executable Unit Tests
One of the obvious problems with making your tests depend upon running a VM is that it makes testing for developers difficult.
This is especially true for iterated testing against a local code version, as some projects (ex Web UI dev) may require.
If you are using Vagrant + Virtualbox, Docker (or raw LXC), or a similar solution for your testing virtualization, then local testing is not tremendously expensive.
These solutions make spinning up fresh VMs doable on cheap laptop hardware in under ten minutes.
For particularly fast iterations, you may be able to test multiple times against the same VM (and then replace it with a fresh one for a final test run).
However, if you are doing your virtualization in a public cloud or similar environment where doing too much testing on your VMs is costly, you should separate your tests into an extensive unit testsuite which can run locally, and integration or system tests which require the VM.
This separate set of tests allows for development without the full testsuite, running against the unit tests as development proceeds.
Then, before merging/shipping/signing off on changes, they should run against the functional tests on a VM.
Ultimately, nothing should make its way into your codebase that hasn't passed the functional tests, but it would behoove you to try to achieve as near to full code coverage for such a suite of unit tests as you can.
The more that you can do to enhance the confidence that your unit tests give you, the better, since it reduces the number of spurious (and potentially costly) runs of your system tests.
Related
I'm refactoring some e2e tests that are failing often. Likely because I can never be sure that the resources written in step 1 and there for steps 2 and 3.
There is sort of a strict chain of logic, but each test is not atomic in any way. Im not looking for specific python style advice (but I'd be happy to take some advice on how best to use pytest for end to end testing)
Is there a best practice for creation, verification and deletion of remote resources in an end to end test?
The four tests do the following:
test_write_credentials_to_cloud #This one always works
test_get_credentials_from_cloud #This is the one that often fails
test_delete_credentials_from_cloud #sometimes this one fails
test_verify_credentials_deleted_in_cloud #this one never is the problem```
During e2e testing it is often the case that you have to deal with credentials,
clearly the test plan should never get these hardcoded, and when loading to make sure there are not text plain but have some form of encryption.
Ideally, all cloud resources should start with a clean state when initializing a test plan, or use case. After test execution, you may consider disposing of these resources, but this sometimes can get tricky. For example some cloud provider have a rate limit on deletion per second for various resources like buckets, API Gateway.
Find more practical examples of how to build manage and run e2e testing with AWS and Google Cloud Platform: Cloud/serverless e2e testing
You might be also find interesting Devlopement automation
I read multiple times that one should use mock to mimic outside calls and there should be no calls made to any outside service because your tests need to run regardless of outside services.
This totally makes sense....BUT
What about outside services changing? What good is a test, testing that my code works like it should if I will never know when it breaks because of the outside service being modified/updated/removed/deprecated/etc...
How can I reconcile this? The pseudocode is below
function post_tweet:
data = {"tweet":"tweetcontent"}
send request to twitter
receive response
return response
If I mock this there is no way I will be notified that twitter changed their API and now I have to update my test...
There are different levels of testing.
Unit tests are testing, as you might guess from the name, a unit. Which is for example a function or a method, maybe a class. If you interpret it wider it might include a view to be tested with Djangos test client. Unittests never test external stuff like libraries, dependencies or interfaces to other Systems. Theses thing will be mocked.
Integration tests are testing if your interfaces and usage of outside libraries, systems and APIs is implemented properly. If the dependency changes, you will notice have to change your code and unit tests.
There are other levels of tests as well, like behavior tests, UI tests, usability tests. You should make sure to separate theses tests classes in your project.
The project I'm working on is a business logic software wrapped up as a Python package. The idea is that various script or application will import it, initialize it, then use it.
It currently has a top level init() method that does the initialization and sets up various things, a good example is that it sets up SQLAlchemy with a db connection and stores the SA session for later access. It is being stored in a subpackage of my project (namely myproj.model.Session, so other code could get a working SA session after import'ing the model).
Long story short, this makes my package a stateful one. I'm writing unit tests for the project and this stafeful behaviour poses some problems:
tests should be isolated, but the internal state of my package breaks this isolation
I cannot test the main init() method since its behavior depends on the state
future tests will need to be run against the (not yet written) controller part with a well known model state (eg. a pre-populated sqlite in-memory db)
Should I somehow refactor my package because the current structure is not the Best (possible) Practice(tm)? :)
Should I leave it at that and setup/teardown the whole thing every time? If I'm going to achieve complete isolation that'd mean fully erasing and re-populating the db at every single test, isn't that overkill?
This question is really on the overall code & tests structure, but for what it's worth I'm using nose-1.0 for my tests. I know the Isolate plugin could probably help me but I'd like to get the code right before doing strange things in the test suite.
You have a few options:
Mock the database
There are a few trade offs to be aware of.
Your tests will become more complex as you will have to do the setup, teardown and mocking of the connection. You may also want to do verification of the SQL/commands sent. It also tends to create an odd sort of tight coupling which may cause you to spend additonal time maintaining/updating tests when the schema or SQL changes.
This is usually the purest for of test isolation because it reduces a potentially large dependency from testing. It also tends to make tests faster and reduces the overhead to automating the test suite in say a continuous integration environment.
Recreate the DB with each Test
Trade offs to be aware of.
This can make your test very slow depending on how much time it actually takes to recreate your database. If the dev database server is a shared resource there will have to be additional initial investment in making sure each dev has their own db on the server. The server may become impacted depending on how often tests get runs. There is additional overhead to running your test suite in a continuous integration environment because it will need at least, possibly more dbs (depending on how many branches are being built simultaneously).
The benefit has to do with actually running through the same code paths and similar resources that will be used in production. This usually helps to reveal bugs earlier which is always a very good thing.
ORM DB swap
If your using an ORM like SQLAlchemy their is a possibility that you can swap the underlying database with a potentially faster in-memory database. This allows you to mitigate some of the negatives of both the previous options.
It's not quite the same database as will be used in production, but the ORM should help mitigate the risk that obscures a bug. Typically the time to setup an in-memory database is much shorter that one which is file-backed. It also has the benefit of being isolated to the current test run so you don't have to worry about shared resource management or final teardown/cleanup.
Working on a project with a relatively expensive setup (IPython), I've seen an approach used where we call a get_ipython function, which sets up and returns an instance, while replacing itself with a function which returns a reference to the existing instance. Then every test can call the same function, but it only does the setup for the first one.
That saves doing a long setup procedure for every test, but occasionally it creates odd cases where a test fails or passes depending on what tests were run before. We have ways of dealing with that - a lot of the tests should do the same thing regardless of the state, and we can try to reset the object's state before certain tests. You might find a similar trade-off works for you.
Mock is a simple and powerfull tool to achieve some isolation. There is a nice video from Pycon2011 which shows how to use it. I recommend to use it together with py.test which reduces the amount of code required to define tests and is still very, very powerfull.
Our Python application (a cool web service) has a full suite of tests (unit tests, integration tests etc.) that all developers must run before committing code.
I want to add some performance tests to the suite to make sure no one adds code that makes us run too slow (for some rather arbitrary definition of slow).
Obviously, I can collect some functionality into a test, time it and compare to some predefined threshold.
The tricky requirements:
I want every developer to be able to test the code on his machine (varies with CPU power, OS(! Linux and some Windows) and external configurations - the Python version, libraries and modules are the same). A test server, while generally a good idea, does not solve this.
I want the test to be DETERMINISTIC - regardless of what is happening on the machine running the tests, I want multiple runs of the test to return the same results.
My preliminary thoughts:
Use timeit and do a benchmark of the system every time I run the tests. Compare the performance test results to the benchmark.
Use cProfile to instrument the interpreter to ignore "outside noise". I'm not sure I know how to read the pstats structure yet, but I'm sure it is doable.
Other thoughts?
Thanks!
Tal.
Check out funkload - it's a way of running your unit tests as either functional or load tests to gauge how well your site is performing.
Another interesting project which can be used in conjunction with funkload is codespeed. This is an internal dashboard that measures the "speed" of your codebase for every commit you make to your code, presenting graphs with trends over time. This assumes you have a number of automatic benchmarks you can run - but it could be a useful way to have an authoritative account of performance over time. The best use of codespeed I've seen so far is the speed.pypy.org site.
As to your requirement for determinism - perhaps the best approach to that is to use statistics to your advantage? Automatically run the test N times, produce the min, max, average and standard deviation of all your runs? Check out this article on benchmarking for some pointers on this.
I want the test to be DETERMINISTIC - regardless of what is happening on the machine running the tests, I want multiple runs of the test to return the same results.
Fail. More or less by definition this is utterly impossible in a multi-processing system with multiple users.
Either rethink this requirement or find a new environment in which to run tests that doesn't involve any of the modern multi-processing operating systems.
Further, your running web application is not deterministic, so imposing some kind of "deterministic" performance testing doesn't help much.
When we did time-critical processing (in radar, where "real time" actually meant real time) we did not attempt deterministic testing. We did code inspections and ran simple performance tests that involved simple averages and maximums.
Use cProfile to instrument the interpreter to ignore "outside noise". I'm not sure I know how to read the pstats structure yet, but I'm sure it is doable.
The Stats object created by the profiler is what you're looking for.
http://docs.python.org/library/profile.html#the-stats-class
Focus on 'pcalls', primitive call count, in the profile statistics and you'll have something that's approximately deterministic.
I wrote an application server (using python & twisted) and I want to start writing some tests. But I do not want to use Twisted's Trial due to time constraints and not having time to play with it now. So here is what I have in mind: write a small test client that connects to the app server and makes the necessary requests (the communication protocol is some in-house XML), store in a static way the received XML and then write some tests on those static data using unitest.
My question is: Is this a correct approach and if yes, what kind of tests are covered with this approach?
Also, using this method has several disadvantages, like: not being able to access the database layer in order to build/rebuild the schema, when will the test client going to connect to the server: per each unit test or before running the test suite?
You should use Trial. It really isn't very hard. Trial's documentation could stand to be improved, but if you know how to use the standard library unit test, the only difference is that instead of writing
import unittest
you should write
from twisted.trial import unittest
... and then you can return Deferreds from your test_ methods. Pretty much everything else is the same.
The one other difference is that instead of building a giant test object at the bottom of your module and then running
python your/test_module.py
you can simply define your test cases and then run
trial your.test_module
If you don't care about reactor integration at all, in fact, you can just run trial on a set of existing Python unit tests. Trial supports the standard library 'unittest' module.
"My question is: Is this a correct approach?"
It's what you chose. You made a lot of excuses, so I'm assuming that your pretty well fixed on this course. It's not the best, but you've already listed all your reasons for doing it (and then asked follow-up questions on this specific course of action). "correct" doesn't enter into it anymore, so there's no answer to this question.
"what kind of tests are covered with this approach?"
They call it "black-box" testing. The application server is a black box that has a few inputs and outputs, and you can't test any of it's internals. It's considered one acceptable form of testing because it tests the bottom-line external interfaces for acceptable behavior.
If you have problems, it turns out to be useless for doing diagnostic work. You'll find that you need to also to white-box testing on the internal structures.
"not being able to access the database layer in order to build/rebuild the schema,"
Why not? This is Python. Write a separate tool that imports that layer and does database builds.
"when will the test client going to connect to the server: per each unit test or before running the test suite?"
Depends on the intent of the test. Depends on your use cases. What happens in the "real world" with your actual intended clients?
You'll want to test client-like behavior, making connections the way clients make connections.
Also, you'll want to test abnormal behavior, like clients dropping connections or doing things out of order, or unconnected.
I think you chose the wrong direction. It's true that the Trial docs is very light. But Trial is base on unittest and only add some stuff to deal with the reactor loop and the asynchronous calls (it's not easy to write tests that deal with deffers). All your tests that are not including deffer/asynchronous call will be exactly like normal unittest.
The Trial command is a test runner (a bit like nose), so you don't have to write test suites for your tests. You will save time with it. On top of that, the Trial command can output profiling and coverage information. Just do Trial -h for more info.
But in any way the first thing you should ask yourself is which kind of tests do you need the most, unit tests, integration tests or system tests (black-box). It's possible to do all with Trial but it's not necessary allways the best fit.
haven't used twisted before, and the twisted/trial documentation isn't stellar from what I just saw, but it'll likely take you 2-3 days to implement correctly the test system you describe above. Now, like I said I have no idea about Trial, but I GUESS you could probably get it working in 1-2 days, since you already have a Twisted application. Now if Trial gives you more coverage in less time, I'd go with Trial.
But remember this is just an answer from a very cursory look at the docs