How to run only half of python-behave tests - python

We have two machines with the purpose to split our testing across machines to make testing faster. I would like to know of a way to tell behave to run half of the tests. I am aware of the --tags argument but this is too cumbersome as, when the test suite grows, so must our --tags argument if we wish to keep it at the halfway point. I would also need to know which of the other half of tests were not run so I can run those on the other machine.
TL;DR Is there a simple way to get behave to run, dynamically, half of the tests? (that doesn't include specifying which tests through the use of --tags)
And is there a way of finding the other half of tests that were not run?
Thanks

No there is not, you would have to write your own runner to do that. But that would be complex to do as trying to piece together content of two separate test runs, which are half of each other would be rather complex if any errors are to show up.
A better and faster solution will be to write a simple bash/python script that will traverse given directory for .feature files and then fire indivisdual behave process against it. Then with properly configured outputs it should be collision free in terms of outputs and if you separate your cases give you a much better boost than running half. And of course delegate that task to other machine by some means, be it bare SSH command or queues.

Related

Unit test architecture with pytest

I try to think of an optimal architecture of unit-tests with pytest framework, which will let me maintain and expand my code easily, but I faced some difficulties. Let me expalin a problem.
I have around 100 individual tasks. Each task has its own unit-tests (e.g. 5 tests per task).
As an argument I recieve raw python code. After that I run this code using exec function and then pass this code to unit-tests.
I want to organise my unit-tests in the way that I could easily remove tasks and appropriate unit-tests / easily add new tasks and unit-tests and so on.
The problem is that now I store tests for each task in separate file. I name files with task id, e.g. test_1.py. But if this ID changes for some reason, I will have problems.
--- Test Folder
------test_1.py
---------test_1
---------test_2
---------test_3
------test_2.py
---------test_1
---------test_2
---------test_3
Do you have any ideas how to orginise my unit-tests in the most competent way?

Best practice submitting SLURM jobs via Python

This is kind of a general best practice question.
I have a Python script which iterates over some arguments and calls another script with those arguments (it's basically a grid search for some simple Deep Learning models). This works fine on my local machine, but now I need the resources of my unis computer cluster, which uses SLURM.
I have some logic in the python script that I think would be difficult to implement, and maybe out of place, in a shell script. I also can't just throw all the jobs at the cluster at once, because I want to skip certain parameter combination depending on the outcome (loss) of others. Now I'd like to submit the SLURM jobs directly from my python script and still handle the more complexe logic there. My question now is what the best way to implement something like this is and if running a python script on the login node would be bad mannered. Should I use the subprocess module? Snakemake? Joblib? Or are there other, more elegant ways?
Snakemake and Joblib are valid options, they will handle the communication with the Slurm cluster. Another possibility is Fireworks. This one is a bit more tedious to get running ; it needs a MongoDB database, and has a vocabulary that needs getting used to, but in the end it can do very complex stuff. You can for instance create a workflow that submits jobs to multiple clusters and run other jobs dependent of the output of the previous ones, and automatically re-submit the ones that failed, with other parameters if needed.

Is there a way to overwrite the main loop in py.test?

In py.test there must be a function that loops over each found test and executes them sequentially.
Is there a way to overwrite this function/loop?
Because I want to run the tests sequentially, as before, but run the failed tests again a second time (or a third time) to see if they might succeed on the second (third) try.
(Some background explanation: The system to test is a GUI application which depends on many different systems, and rarely some of them fail or does not behave as expected. When you have some of these tests it becomes quite likely that at least on test will fail. Therefore I want to repeat the failed test for a couple of times. Also, that system cannot be changed).

Proper way to automatically test performance in Python (for all developers)?

Our Python application (a cool web service) has a full suite of tests (unit tests, integration tests etc.) that all developers must run before committing code.
I want to add some performance tests to the suite to make sure no one adds code that makes us run too slow (for some rather arbitrary definition of slow).
Obviously, I can collect some functionality into a test, time it and compare to some predefined threshold.
The tricky requirements:
I want every developer to be able to test the code on his machine (varies with CPU power, OS(! Linux and some Windows) and external configurations - the Python version, libraries and modules are the same). A test server, while generally a good idea, does not solve this.
I want the test to be DETERMINISTIC - regardless of what is happening on the machine running the tests, I want multiple runs of the test to return the same results.
My preliminary thoughts:
Use timeit and do a benchmark of the system every time I run the tests. Compare the performance test results to the benchmark.
Use cProfile to instrument the interpreter to ignore "outside noise". I'm not sure I know how to read the pstats structure yet, but I'm sure it is doable.
Other thoughts?
Thanks!
Tal.
Check out funkload - it's a way of running your unit tests as either functional or load tests to gauge how well your site is performing.
Another interesting project which can be used in conjunction with funkload is codespeed. This is an internal dashboard that measures the "speed" of your codebase for every commit you make to your code, presenting graphs with trends over time. This assumes you have a number of automatic benchmarks you can run - but it could be a useful way to have an authoritative account of performance over time. The best use of codespeed I've seen so far is the speed.pypy.org site.
As to your requirement for determinism - perhaps the best approach to that is to use statistics to your advantage? Automatically run the test N times, produce the min, max, average and standard deviation of all your runs? Check out this article on benchmarking for some pointers on this.
I want the test to be DETERMINISTIC - regardless of what is happening on the machine running the tests, I want multiple runs of the test to return the same results.
Fail. More or less by definition this is utterly impossible in a multi-processing system with multiple users.
Either rethink this requirement or find a new environment in which to run tests that doesn't involve any of the modern multi-processing operating systems.
Further, your running web application is not deterministic, so imposing some kind of "deterministic" performance testing doesn't help much.
When we did time-critical processing (in radar, where "real time" actually meant real time) we did not attempt deterministic testing. We did code inspections and ran simple performance tests that involved simple averages and maximums.
Use cProfile to instrument the interpreter to ignore "outside noise". I'm not sure I know how to read the pstats structure yet, but I'm sure it is doable.
The Stats object created by the profiler is what you're looking for.
http://docs.python.org/library/profile.html#the-stats-class
Focus on 'pcalls', primitive call count, in the profile statistics and you'll have something that's approximately deterministic.

unit testing for an application server

I wrote an application server (using python & twisted) and I want to start writing some tests. But I do not want to use Twisted's Trial due to time constraints and not having time to play with it now. So here is what I have in mind: write a small test client that connects to the app server and makes the necessary requests (the communication protocol is some in-house XML), store in a static way the received XML and then write some tests on those static data using unitest.
My question is: Is this a correct approach and if yes, what kind of tests are covered with this approach?
Also, using this method has several disadvantages, like: not being able to access the database layer in order to build/rebuild the schema, when will the test client going to connect to the server: per each unit test or before running the test suite?
You should use Trial. It really isn't very hard. Trial's documentation could stand to be improved, but if you know how to use the standard library unit test, the only difference is that instead of writing
import unittest
you should write
from twisted.trial import unittest
... and then you can return Deferreds from your test_ methods. Pretty much everything else is the same.
The one other difference is that instead of building a giant test object at the bottom of your module and then running
python your/test_module.py
you can simply define your test cases and then run
trial your.test_module
If you don't care about reactor integration at all, in fact, you can just run trial on a set of existing Python unit tests. Trial supports the standard library 'unittest' module.
"My question is: Is this a correct approach?"
It's what you chose. You made a lot of excuses, so I'm assuming that your pretty well fixed on this course. It's not the best, but you've already listed all your reasons for doing it (and then asked follow-up questions on this specific course of action). "correct" doesn't enter into it anymore, so there's no answer to this question.
"what kind of tests are covered with this approach?"
They call it "black-box" testing. The application server is a black box that has a few inputs and outputs, and you can't test any of it's internals. It's considered one acceptable form of testing because it tests the bottom-line external interfaces for acceptable behavior.
If you have problems, it turns out to be useless for doing diagnostic work. You'll find that you need to also to white-box testing on the internal structures.
"not being able to access the database layer in order to build/rebuild the schema,"
Why not? This is Python. Write a separate tool that imports that layer and does database builds.
"when will the test client going to connect to the server: per each unit test or before running the test suite?"
Depends on the intent of the test. Depends on your use cases. What happens in the "real world" with your actual intended clients?
You'll want to test client-like behavior, making connections the way clients make connections.
Also, you'll want to test abnormal behavior, like clients dropping connections or doing things out of order, or unconnected.
I think you chose the wrong direction. It's true that the Trial docs is very light. But Trial is base on unittest and only add some stuff to deal with the reactor loop and the asynchronous calls (it's not easy to write tests that deal with deffers). All your tests that are not including deffer/asynchronous call will be exactly like normal unittest.
The Trial command is a test runner (a bit like nose), so you don't have to write test suites for your tests. You will save time with it. On top of that, the Trial command can output profiling and coverage information. Just do Trial -h for more info.
But in any way the first thing you should ask yourself is which kind of tests do you need the most, unit tests, integration tests or system tests (black-box). It's possible to do all with Trial but it's not necessary allways the best fit.
haven't used twisted before, and the twisted/trial documentation isn't stellar from what I just saw, but it'll likely take you 2-3 days to implement correctly the test system you describe above. Now, like I said I have no idea about Trial, but I GUESS you could probably get it working in 1-2 days, since you already have a Twisted application. Now if Trial gives you more coverage in less time, I'd go with Trial.
But remember this is just an answer from a very cursory look at the docs

Categories