Unit test architecture with pytest - python

I try to think of an optimal architecture of unit-tests with pytest framework, which will let me maintain and expand my code easily, but I faced some difficulties. Let me expalin a problem.
I have around 100 individual tasks. Each task has its own unit-tests (e.g. 5 tests per task).
As an argument I recieve raw python code. After that I run this code using exec function and then pass this code to unit-tests.
I want to organise my unit-tests in the way that I could easily remove tasks and appropriate unit-tests / easily add new tasks and unit-tests and so on.
The problem is that now I store tests for each task in separate file. I name files with task id, e.g. test_1.py. But if this ID changes for some reason, I will have problems.
--- Test Folder
------test_1.py
---------test_1
---------test_2
---------test_3
------test_2.py
---------test_1
---------test_2
---------test_3
Do you have any ideas how to orginise my unit-tests in the most competent way?

Related

Best practice submitting SLURM jobs via Python

This is kind of a general best practice question.
I have a Python script which iterates over some arguments and calls another script with those arguments (it's basically a grid search for some simple Deep Learning models). This works fine on my local machine, but now I need the resources of my unis computer cluster, which uses SLURM.
I have some logic in the python script that I think would be difficult to implement, and maybe out of place, in a shell script. I also can't just throw all the jobs at the cluster at once, because I want to skip certain parameter combination depending on the outcome (loss) of others. Now I'd like to submit the SLURM jobs directly from my python script and still handle the more complexe logic there. My question now is what the best way to implement something like this is and if running a python script on the login node would be bad mannered. Should I use the subprocess module? Snakemake? Joblib? Or are there other, more elegant ways?
Snakemake and Joblib are valid options, they will handle the communication with the Slurm cluster. Another possibility is Fireworks. This one is a bit more tedious to get running ; it needs a MongoDB database, and has a vocabulary that needs getting used to, but in the end it can do very complex stuff. You can for instance create a workflow that submits jobs to multiple clusters and run other jobs dependent of the output of the previous ones, and automatically re-submit the ones that failed, with other parameters if needed.

How to run only half of python-behave tests

We have two machines with the purpose to split our testing across machines to make testing faster. I would like to know of a way to tell behave to run half of the tests. I am aware of the --tags argument but this is too cumbersome as, when the test suite grows, so must our --tags argument if we wish to keep it at the halfway point. I would also need to know which of the other half of tests were not run so I can run those on the other machine.
TL;DR Is there a simple way to get behave to run, dynamically, half of the tests? (that doesn't include specifying which tests through the use of --tags)
And is there a way of finding the other half of tests that were not run?
Thanks
No there is not, you would have to write your own runner to do that. But that would be complex to do as trying to piece together content of two separate test runs, which are half of each other would be rather complex if any errors are to show up.
A better and faster solution will be to write a simple bash/python script that will traverse given directory for .feature files and then fire indivisdual behave process against it. Then with properly configured outputs it should be collision free in terms of outputs and if you separate your cases give you a much better boost than running half. And of course delegate that task to other machine by some means, be it bare SSH command or queues.

Python framework for task execution and dependencies handling

I need a framework which will allow me to do the following:
Allow to dynamically define tasks (I'll read an external configuration file and create the tasks/jobs; task=spawn an external command for instance)
Provide a way of specifying dependencies on existing tasks (e.g. task A will be run after task B is finished)
Be able to run tasks in parallel in multiple processes if the execution order allows it (i.e. no task interdependencies)
Allow a task to depend on some external event (don't know exactly how to describe this, but some tasks finish and they will produce results after a while, like a background running job; I need to specify some of the tasks to depend on this background-job-completed event)
Undo/Rollback support: if one tasks fail, try to undo everything that has been executed before (I don't expect this to be implemented in any framework, but I guess it's worth to ask..)
So, obviously, this looks more or less like a build system, but I don't seem to be able to find something that will allow me to dynamically create tasks, most things I've seem already have them defined in the "Makefile".
Any ideas?
I've been doing a little more research and I've stumbled upon doit which provides the core functionality I need, without being overkill (not saying that Celery wouldn't have solved the job, but this does it better for my use case).
Another option is to use make.
Write a Makefile manually or let a python script write it
use meaningful intermediate output file stages
Run make, which should then call out the processes. The processes would be a python (build) script with parameters that tell it which files to work on and what task to do.
parallel execution is supported with -j
it also deletes output files if tasks fail
This circumvents some of the python parallelisation problems (GIL, serialisation).
Obviously only straightforward on *nix platforms.
AFAIK, there is no such framework in python which does exactly what you describe. So your options include either building something on your own or hack some bits of your requirements and model them using an existing tool. Which smells like celery.
You may have a celery task which reads a configuration file which contains some python functions' source code, then use eval or ast.literal_eval to execute them.
Celery provides a way to define subtasks (dependencies between tasks), so if you are aware of your dependencies, you can model them accordingly.
Provided that you know the execution order of your tasks you can route them to as many worker machines as you want.
You can periodically poll this background job's result and then start your tasks that are dependent on it.
Undo/Rollback: this might be tricky and depends on what you want to undo; results? state?

How to correctly achieve test isolation with a stateful Python module?

The project I'm working on is a business logic software wrapped up as a Python package. The idea is that various script or application will import it, initialize it, then use it.
It currently has a top level init() method that does the initialization and sets up various things, a good example is that it sets up SQLAlchemy with a db connection and stores the SA session for later access. It is being stored in a subpackage of my project (namely myproj.model.Session, so other code could get a working SA session after import'ing the model).
Long story short, this makes my package a stateful one. I'm writing unit tests for the project and this stafeful behaviour poses some problems:
tests should be isolated, but the internal state of my package breaks this isolation
I cannot test the main init() method since its behavior depends on the state
future tests will need to be run against the (not yet written) controller part with a well known model state (eg. a pre-populated sqlite in-memory db)
Should I somehow refactor my package because the current structure is not the Best (possible) Practice(tm)? :)
Should I leave it at that and setup/teardown the whole thing every time? If I'm going to achieve complete isolation that'd mean fully erasing and re-populating the db at every single test, isn't that overkill?
This question is really on the overall code & tests structure, but for what it's worth I'm using nose-1.0 for my tests. I know the Isolate plugin could probably help me but I'd like to get the code right before doing strange things in the test suite.
You have a few options:
Mock the database
There are a few trade offs to be aware of.
Your tests will become more complex as you will have to do the setup, teardown and mocking of the connection. You may also want to do verification of the SQL/commands sent. It also tends to create an odd sort of tight coupling which may cause you to spend additonal time maintaining/updating tests when the schema or SQL changes.
This is usually the purest for of test isolation because it reduces a potentially large dependency from testing. It also tends to make tests faster and reduces the overhead to automating the test suite in say a continuous integration environment.
Recreate the DB with each Test
Trade offs to be aware of.
This can make your test very slow depending on how much time it actually takes to recreate your database. If the dev database server is a shared resource there will have to be additional initial investment in making sure each dev has their own db on the server. The server may become impacted depending on how often tests get runs. There is additional overhead to running your test suite in a continuous integration environment because it will need at least, possibly more dbs (depending on how many branches are being built simultaneously).
The benefit has to do with actually running through the same code paths and similar resources that will be used in production. This usually helps to reveal bugs earlier which is always a very good thing.
ORM DB swap
If your using an ORM like SQLAlchemy their is a possibility that you can swap the underlying database with a potentially faster in-memory database. This allows you to mitigate some of the negatives of both the previous options.
It's not quite the same database as will be used in production, but the ORM should help mitigate the risk that obscures a bug. Typically the time to setup an in-memory database is much shorter that one which is file-backed. It also has the benefit of being isolated to the current test run so you don't have to worry about shared resource management or final teardown/cleanup.
Working on a project with a relatively expensive setup (IPython), I've seen an approach used where we call a get_ipython function, which sets up and returns an instance, while replacing itself with a function which returns a reference to the existing instance. Then every test can call the same function, but it only does the setup for the first one.
That saves doing a long setup procedure for every test, but occasionally it creates odd cases where a test fails or passes depending on what tests were run before. We have ways of dealing with that - a lot of the tests should do the same thing regardless of the state, and we can try to reset the object's state before certain tests. You might find a similar trade-off works for you.
Mock is a simple and powerfull tool to achieve some isolation. There is a nice video from Pycon2011 which shows how to use it. I recommend to use it together with py.test which reduces the amount of code required to define tests and is still very, very powerfull.

unit testing for an application server

I wrote an application server (using python & twisted) and I want to start writing some tests. But I do not want to use Twisted's Trial due to time constraints and not having time to play with it now. So here is what I have in mind: write a small test client that connects to the app server and makes the necessary requests (the communication protocol is some in-house XML), store in a static way the received XML and then write some tests on those static data using unitest.
My question is: Is this a correct approach and if yes, what kind of tests are covered with this approach?
Also, using this method has several disadvantages, like: not being able to access the database layer in order to build/rebuild the schema, when will the test client going to connect to the server: per each unit test or before running the test suite?
You should use Trial. It really isn't very hard. Trial's documentation could stand to be improved, but if you know how to use the standard library unit test, the only difference is that instead of writing
import unittest
you should write
from twisted.trial import unittest
... and then you can return Deferreds from your test_ methods. Pretty much everything else is the same.
The one other difference is that instead of building a giant test object at the bottom of your module and then running
python your/test_module.py
you can simply define your test cases and then run
trial your.test_module
If you don't care about reactor integration at all, in fact, you can just run trial on a set of existing Python unit tests. Trial supports the standard library 'unittest' module.
"My question is: Is this a correct approach?"
It's what you chose. You made a lot of excuses, so I'm assuming that your pretty well fixed on this course. It's not the best, but you've already listed all your reasons for doing it (and then asked follow-up questions on this specific course of action). "correct" doesn't enter into it anymore, so there's no answer to this question.
"what kind of tests are covered with this approach?"
They call it "black-box" testing. The application server is a black box that has a few inputs and outputs, and you can't test any of it's internals. It's considered one acceptable form of testing because it tests the bottom-line external interfaces for acceptable behavior.
If you have problems, it turns out to be useless for doing diagnostic work. You'll find that you need to also to white-box testing on the internal structures.
"not being able to access the database layer in order to build/rebuild the schema,"
Why not? This is Python. Write a separate tool that imports that layer and does database builds.
"when will the test client going to connect to the server: per each unit test or before running the test suite?"
Depends on the intent of the test. Depends on your use cases. What happens in the "real world" with your actual intended clients?
You'll want to test client-like behavior, making connections the way clients make connections.
Also, you'll want to test abnormal behavior, like clients dropping connections or doing things out of order, or unconnected.
I think you chose the wrong direction. It's true that the Trial docs is very light. But Trial is base on unittest and only add some stuff to deal with the reactor loop and the asynchronous calls (it's not easy to write tests that deal with deffers). All your tests that are not including deffer/asynchronous call will be exactly like normal unittest.
The Trial command is a test runner (a bit like nose), so you don't have to write test suites for your tests. You will save time with it. On top of that, the Trial command can output profiling and coverage information. Just do Trial -h for more info.
But in any way the first thing you should ask yourself is which kind of tests do you need the most, unit tests, integration tests or system tests (black-box). It's possible to do all with Trial but it's not necessary allways the best fit.
haven't used twisted before, and the twisted/trial documentation isn't stellar from what I just saw, but it'll likely take you 2-3 days to implement correctly the test system you describe above. Now, like I said I have no idea about Trial, but I GUESS you could probably get it working in 1-2 days, since you already have a Twisted application. Now if Trial gives you more coverage in less time, I'd go with Trial.
But remember this is just an answer from a very cursory look at the docs

Categories