Test Driven Development with Django

Test Driven Development with Django - python

I have a conceptual question about doing test driven development with Django, may also apply to other frameworks as well.
TDD states that the firts step in the development cycle is to write failing tests.
Suppose for a unit test, I want to verify that an item is actually created when a request arrives. To test this functioanlty, I want to issue a request with the test client, and check with the db that this object is actully created. To be able to do that, I need to import the related model in the test file, but as the first step is writing this test, I don't even have a model yet. So I won't be able to run the tests to see them fail.
What is the suggested approach here? Maybe write a simpler test first, then modify the test after enough level of production code is implemented?

Important note: What you describe is not a unit test. It does not test one unit. It tests whole bunch of things starting form django url wiring, views and ending with models. This is integration tests. Secondly, don't create a test that uses external API (or test client which is mostly the same) to create data but checks that entity was created by going directly to DB. This is not good. If you create data via some API you should use the API of the same level to check the data is created. So my explanation will talk about this approach.
What you describe is a common problem when you start with TDD.
Important things about TDD are that you:
do small steps
refactor after test is green (including test refactoring)
This may sound like simple and you most probably have read and know that but the consequences for how you structure you work might not be that obvious.
Main consequence is that you do not write the full test from the scratch before implementing the functionality. You start with simplest test you can do, make it work (by implementing some piece of functionality), refactor. Then you change the test by adding more things that you want to check to it, implement that piece to make test green, refactor and so on.
That has consequence that you need to split the work (or plan how you implement it by simple steps) to be able to work in this mode. This requires some practice and I guess is one the main barriers for TDD adoption.
It is similar (but with important difference) to what you wrote:
Maybe write a simpler test first, then modify the test after enough level of production code is implemented?
You need to have simple test first, then modify it iteratively with small steps but before you implement production code not after.
In this particular case you can implement it in following steps:
1 Create the test that uses test client
def test_entity_creation(self):
post_result = test_client.post(POST_URL, {})
get_result = test_client.get(get_entity_url_from(post_result))
assert_that(get_result, not_none())
You have a test that fails but no line of code is written.
Note that no data is passed yet and the check is very basic.
2 Create url wiring and empty view
Do it so that the test pass. You need very few changes in the code and the view will not return much if anything. View can return some hardcoded json/dict at this point.
3.1 Check that id of the entity is generated
def test_entity_creation(self):
post_result = test_client.post(POST_URL, {})
get_result = test_client.get(get_entity_url_from(post_result))
assert_that(get_result, not_none())
assert_that(get_result, has_field('id', not_none()))
You can make this test work by adding id to the hardcoded dict.
3.1 Check that unique id of the entity is generated
Add a new test that checks that ids are unique:
def test_create_generates_unique_id(self):
post_result1 = test_client.post(POST_URL, {})
post_result2 = test_client.post(POST_URL, {})
assert_that(get_id(post_result1), not_(equal_to(get_id(post_result2)))
4 Add the model with only id
It is not hard to add a model with only id and add its creation and retrieval from the view. Don't add all the fields that you need, you will do that step by step later.
5 Add one field to you test
def test_entity_creation(self):
post_result = test_client.post(POST_URL, {'field': 'value'})
get_result = test_client.get(get_entity_url_from(post_result))
assert_that(get_result, not_none())
assert_that(get_result, has_field('field', 'value'))
Add a field to the model and make the test pass.
6 Continue doing TDD
Add more tests and production code.
Some more thoughts
Step 4 might be too big as for one TDD cycle. It requires making changes at least to three things:
post view handler
get view handler
model
In many cases it makes sense to split it by first creation a test for the model itself. The test that will not work with test client but will look like this:
def test_entity(self):
entity = Entity.objects.create()
entity = Entity.objects.get(entity.id)
assert_that(entity.id, not_none())
Then you add a model. Make sure that test_entity pass and only after that modify view to use you (already tested) model.
I hope this gives the idea how to approach this problem.

In Django, the approach is always to recreate a working environment for testing and staging. In testing the data is fake, in staging the data is "old" or very similar to the production.

Related

How do you design Page Object Model without duplicates?

I'm working on UI automating testing, using POM with Python and Selenium.
I want to know how to handle duplicate test cases.
For example, you have two webpages: Login page and Homepage.
I want to test three test cases.
Homepage functions without login: test_homepage_before_login.py
Login with valid/invalid username and password: test_login.py
Homepage functions with login: test_homepage_after_login.py
(1 and 3 have a lot in common. 3 has additional functions. 1 is subset of 3)
There are three files for each test case, and I already implemented 1 and 2. But for the third one, I just imported relevant functions from 1 and 2 modules.
The thing is validating login is duplicate. In this case, do you do login validation every time? Also do you give order or dependency when automating these cases by using pytest-ordering or pytest-dependency?
Another case I can think of is "logout". When you automate logout function, you need to log in first. In this case, do you add login validation beforehand again and implement logout? Do you give dependency in this case as well or just make scripts independent?
Thank you in advance.

You can use cookies to handle authentication. It will greatly speed up your tests. An example:
public void setAuthenticationCookies() {
Cookie at = new Cookie("Cookie_AccessToken", prop.getAccessToken(), "/", DatatypeConverter.parseDateTime("2030-01-01T12:00:00Z").getTime());
Cookie rt = new Cookie("Cookie_RefreshToken", prop.getRefreshToken(), "/", DatatypeConverter.parseDateTime("2030-01-01T12:00:00Z").getTime());
driver.manage().addCookie(at);
driver.manage().addCookie(rt);
}
For more: https://seleniumhq.github.io/selenium/docs/api/java/org/openqa/selenium/Cookie.html
For logout issue, I advice you to login first then logout in order to make your tests independent of each other.

As this is a design question, there's no one right answer, but I'll give you some tips and guidelines that I follow.
I don't design each test to correspond to a page. Instead, I design short scenarios that verify behaviors and outcomes that are most relevant to the customer
Login is a very specific and special case. Usually most tests need to start with login in order to get to what they actually need to test, though they don't really care to verify the login process per se. In addition, you usually want to have some other tests that specifically verify the login process. For the first category of tests, you can perform the login either before each test and even before all the tests using a fixture. The fixture may verify that the login succeeded, just to prevent the test from continuing in case of a failure, but I don't consider it to be part of what the test validates. This fixture should perform the login in the simplest and most reliable way, because it's purpose is not find more bugs, but rather to help us get to where we really care about in the test. This means that you may use API or take any other shortcut in order to be in a logged in state.
For the tests that are specific to login I'll often combine them with the registration process, as the main purpose of the registration is that it will allow the user to login, and the success of logging in is the result of the registration process, so there's no much purpose to test one without the other.
Regarding reuse of code and page objects: following my first bullet, the process I follow is to write each test in a way that describes the scenario in the most readable fashion, using classes and methods that don't exists yet. Then I implement theses classes and methods to make the test pass. On the second test and on, if I realize that I already implemented an action, I reuse this code, and if I need something similar to something I already implemented, but not exactly the same, I refactor the original function to be usable in my new test, removing any duplication along the way. This help me design my code in a very reusable and duplication-free manner. Sometimes it leads me to create page objects, and other times it leads me to other design patterns. You can find a much more detailed explanation about this method including a step by step tutorial on my book Completed Guide to Test Automation.
HTH.

Posting data to database through a "workflow" (Ex: on field changed to 20, create new record)

I'm looking to post new records on a user triggered basis (i.e. workflow). I've spent the last couple of days reasearching the best way to approach this and so far I've come up with the following ideas:
(1) Utilize Django signals to check for conditions on a field change, and then post data originating from my Django app.
(2) Utilize JS/AJAX on the front-end to post data to the app based upon a user changing certain fields.
(3) Utilize a prebuilt workflow app like http://viewflow.io/, again based upon changes triggers by users.
Of the three above options, is there a best practice? Are there any other options I'm not considering for how to take this workflow based approach to post new records?

The second approach of monitoring the changes in the front end and then calling a backend view to update go database would be a better approach because processing on the backend or any other site would put the processing on the server which would slow down the site whereas second approach is more of a client side solution thereby keeping server relieved.
I do not think there will be a data loss, you are just trying to monitor a change, as soon as it changes your view will update the database, you can also use cookies or sessions to keep appending values as a list and update the database when site closes. Also django gives https errors you could put proper try and except conditions in that case as well. Anyways cookies would be a good approach I think

For anyone that finds this post I ended up deciding to take the Signals route. Essentially I'm utilizing Signals to track when users change a fields, and based on the field that changes I'm performing certain actions on the database.
For testing purposes this has been working well. When I reach production with this project I'll try to update this post with any challenges I run into.
Example:
#receiver(pre_save, sender=subTaskChecklist)
def do_something_if_changed(sender, instance, **kwargs):
try:
obj = sender.objects.get(pk=instance.pk) #define obj as "old" before change values
except sender.DoesNotExist:
pass
else:
previous_Value = obj.FieldToTrack
new_Value = instance.FieldToTrack #instance represents the "new" after change object
DoSomethingWithChangedField(new_Value)

How to setup a 3-tier web application project

EDIT:
I have added [MVC] and [design-patterns] tags to expand the audience for this question as it is more of a generic programming question than something that has direclty to do with Python or SQLalchemy. It applies to all applications with business logic and an ORM.
The basic question is if it is better to keep business logic in separate modules, or to add it to the classes that our ORM provides:
We have a flask/sqlalchemy project for which we have to setup a structure to work in. There are two valid opinions on how to set things up, and before the project really starts taking off we would like to make our minds up on one of them.
If any of you could give us some insights on which of the two would make more sense and why, and what the advantages/disadvantages would be, it would be greatly appreciated.
My example is an HTML letter that needs to be sent in bulk and/or displayed to a single user. The letter can have sections that display an invoice and/or a list of articles for the user it is addressed to.
Method 1:
Split the code into 3 tiers - 1st tier: web interface, 2nd tier: processing of the letter, 3rd tier: the models from the ORM (sqlalchemy).
The website will call a server side method in a class in the 2nd tier, the 2nd tier will loop through the users that need to get this letter and it will have internal methods that generate the HTML and replace some generic fields in the letter, with information for the current user. It also has internal methods to generate an invoice or a list of articles to be placed in the letter.
In this method, the 3rd tier is only used for fetching data from the database and perhaps some database related logic like generating a full name from a users' first name and last name. The 2nd tier performs most of the work.
Method 2:
Split the code into the same three tiers, but only perform the loop through the collection of users in the 2nd tier.
The methods for generating HTML, invoices and lists of articles are all added as methods to the model definitions in tier 3 that the ORM provides. The 2nd tier performs the loop, but the actual functionality is enclosed in the model classes in the 3rd tier.
We concluded that both methods could work, and both have pros and cons:
Method 1:
separates business logic completely from database access
prevents that importing an ORM model also imports a lot of methods/functionality that we might not need, also keeps the code for the model classes more compact.
might be easier to use when mocking out ORM models for testing
Method 2:
seems to be in line with the way Django does things in Python
allows simple access to methods: when a model instance is present, any function it
performs can be immediately called. (in my example: when I have a letter-instance available, I can directly call a method on it that generates the HTML for that letter)
you can pass instances around, having all appropriate methods at hand.

Normally, you use the MVC pattern for this kind of stuff, but most web frameworks in python have dropped the "Controller" part for since they believe that it is an unnecessary component. In my development I have realized, that this is somewhat true: I can live without it. That would leave you with two layers: The view and the model.
The question is where to put business logic now. In a practical sense, there are two ways of doing this, at least two ways in which I am confrontet with where to put logic:
Create special internal view methods that handle logic, that might be needed in more than one view, e.g. _process_list_data
Create functions that are related to a model, but not directly tied to a single instance inside a corresponding model module, e.g. check_login.
To elaborate: I use the first one for strictly display-related methods, i.e. they are somehow concerned with processing data for displaying purposes. My above example, _process_list_data lives inside a view class (which groups methods by purpose), but could also be a normal function in a module. It recieves some parameters, e.g. the data list and somehow formats it (for example it may add additional view parameters so the template can have less logic). It then returns the data set to the original view function which can either pass it along or process it further.
The second one is used for most other logic which I like to keep out of my direct view code for easier testing. My example of check_login does this: It is a function that is not directly tied to display output as its purpose is to check the users login credentials and decide to either return a user or report a login failure (by throwing an exception, return False or returning None). However, this functionality is not directly tied to a model either, so it cannot live inside an ORM class (well it could be a staticmethod for the User object). Instead it is just a function inside a module (remember, this is Python, you should use the simplest approach available, and functions are there for something)
To sum this up: Display logic in the view, all the other stuff in the model, since most logic is somehow tied to specific models. And if it is not, create a new module or package just for logic of this kind. This could be a separate module or even a package. For example, I often create a util module/package for helper functions, that are not directly tied for any view, model or else, for example a function to format dates that is called from the template but contains so much python could it would be ugly being defined inside a template.
Now we bring this logic to your task: Processing/Creation of letters. Since I don't know exactly what processing needs to be done, I can only give general recommendations based on my assumptions.
Let's say you have some data and want to bring it into a letter. So for example you have a list of articles and a costumer who bought these articles. In that case, you already have the data. The only thing that may need to be done before passing it to the template is reformatting it in such a way that the template can easily use it. For example it may be desired to order the purchased articles, for example by the amount, the price or the article number. This is something that is independent of the model, the order is now only display related (you could have specified the order already in your database query, but let's assume you didn't). In this case, this is an operation your view would do, so your template has the data ready formatted to be displayed.
Now let's say you want to get the data to create a specifc letter, for example a list of articles the user bough over time, together with the date when they were bought and other details. This would be the model's job, e.g. create a query, fetch the data and make sure it is has all the properties required for this specifc task.
Let's say in both cases you with to retrieve a price for the product and that price is determined by a base value and some percentages based on other properties: This would make sense as a model method, as it operates on a single product or order instance. You would then pass the model to the template and call the price method inside it. But you might as well reformat it in such a way, that the call is made already in the view and the template only gets tuples or dictionaries. This would make it easier to pass the same data out as an API (see below) but it might not necessarily be the easiest/best way.
A good rule for this decision is to ask yourself If I were to provide a JSON API additionally to my standard view, how would I need to modify my code to be as DRY as possible?. If theoretical is not enough at the start, build some APIs for the templates and see where you need to change things to the API makes sense next to the views themselves. You may never use this API and so it does not need to be perfect, but it can help you figure out how to structure your code. However, as you saw above, this doesn't necessarily mean that you should do preprocessing of the data in such a way that you only return things that can be turned into JSON, instead you might want to make some JSON specifc formatting for the API view.
So I went on a little longer than I intended, but I wanted to provide some examples to you because that is what I missed when I started and found out those things via trial and error.

Django - Populating a database for test purposes

I need to populate my database with a bunch of dummy entries (around 200+) so that I can test the admin interface I've made and I was wondering if there was a better way to do it. I spent the better part of my day yesterday trying to fill it in by hand (i.e by wrapping stuff like this my_model(title="asdfasdf", field2="laksdj"...) in a bunch of "for x in range(0,200):" loops) and gave up because it didn't work the way I expected it to. I think this is what I need to use, but don't you need to have (existing) data in the database for this to work?

Check this app
https://github.com/aerosol/django-dilla/
Let's say you wrote your blog application (oh yeah, your favorite!) in Django. Unit tests went fine, and everything runs extremely fast, even those ORM-generated ultra-long queries. You've added several categorized posts and it's still stable as a rock. You're quite sure the app is efficient and ready to for live deployment. Right? Wrong.

You can use fixtures for this purpose, and the loaddata management command.
One approach is to do it like this.
Prepare your test database.
Use dumpdata to create JSON export of the database.
Put this in the fixtures directory of your application.
Write your unit tests to load this "fixture": https://docs.djangoproject.com/en/2.2/topics/testing/tools/#django.test.TransactionTestCase.fixtures

Django fixtures provide a mechanism for importing data on syncdb. However, doing this initial data propagation is often easier via Python code. The technique you outline should work, either via syncdb or a management command. For instance, via syncdb, in my_app/management.py:
def init_data(sender, **kwargs):
for i in range(1000):
MyModel(number=i).save()
signals.post_syncdb.connect(init_data)
Or, in a management command in myapp/management/commands/my_command.py:
from django.core.management.base import BaseCommand, CommandError
from models import MyModel
class MyCommand(BaseCommand):
def handle(self, *args, **options):
if len(args) > 0:
raise CommandError('need exactly zero arguments')
for i in range(1000):
MyModel(number=i).save()
You can then export this data to a fixture, or continue importing using the management command. If you choose to continue to use the syncdb signal, you'll want to conditionally run the init_data function to prevent the data getting imported on subsequent syncdb calls. When a fixture isn't sufficient, I personally like to do both: create a management command to import data, but have the first syncdb invocation do the import automatically. That way, deployment is more automated but I can still easily make modifications to the initial data and re-run the import.

I'm not sure why you require any serialization. As long as you have setup your Django settings.py file to point to your test database, populating a test database should be nothing more than saving models.
for x in range(0, 200):
m = my_model(title=random_title(), field2=random_string(), ...)
m.save()
There are better ways to do this, but if you want a quick test set, this is the way to go.

The app recommended by the accepted answer is no longer being maintained however django-seed can be used as a replacement:
https://github.com/brobin/django-seed

I would recommend django-autofixtures to you. I tried both django_seed and django-autofixtures, but django_seed has a lot of issues with unique keys.
django-autofixtures takes care of unique, primary and other db constraints while filling up the database

Code refactoring help - how to reorganize validations

We have a web application that takes user inputs or database lookups to form some operations against some physical resources. The design can be simply presented as following diagram:
user input <=> model object <=> database storage
validations are needed with request coming from user input but NOT when coming from database lookup hits (since if a record exists, those attributes must have already been validated before). I am trying to refactoring the code so that the validations happen in the object constructor instead of the old way (a separate few validation routines)
How would you decide which way is better? (The fundamental difference of method 1 (the old way) and 2 is that validations in 1 are not mandatory and decoupled from object instantiation but 2 binds them and makes them mandatory for all requests)
Here are two example code snippets for design 1 and 2:
Method 1:
# For processing single request.
# Steps: 1. Validate all incoming data. 2. instantiate the object.
ValidateAttribures(request) # raise Exceptions if failed
resource = Resource(**request)
Method 2:
# Have to extract out this since it does not have anything to do with
# the object.
# raise Exceptions if some required params missing.
# steps: 1. Check whether its a batching request. 2. instantiate the object.
# (validations are performed inside the constructor)
CheckIfBatchRequest(request)
resource = Resource(**request) # raise Exceptions when validations failed
In a batch request:
Method 1:
# steps: 1. validate each request and return error to the client if any found.
# 2. perform the object instantiate and creation process. Exceptions are
# captured.
# 3. when all finished, email out any errors.
for request in batch_requests:
try:
ValidateAttribute(request)
except SomeException, e:
return ErrorPage(e)
errors = []
for request in batch_requests:
try:
CreatResource(Resource(**request), request)
except CreationError, e:
errors.append('failed to create with error: %s', e)
email(errors)
Method 2:
# steps: 1. validate batch job related data from the request.
# 2. If success, create objects for each request and do the validations.
# 3. If exception, return error found, otherwise,
# return a list of pairs with (object, request)
# 4. Do the creation process and email out any errors if encountered.
CheckIfBatchRequest(request)
request_objects = []
for request in batch_requests:
try:
resource = Resource(**request)
except SomeException, e:
return ErrorPage(e)
request_objects.append((resource, request))
email(CreateResource(request_objects)) # the CreateResource will also need to be refactored.
Pros and Cons as I can see here are:
Method 1 follows more close to the business logic. No redundant validations check when objects come from db lookup. The validation routines are better maintainable and read.
Method 2 makes easy and clean for the caller. Validations are mandatory even if from db lookup. Validations are less maintainable and read.

Doing validation in the constructor really isn't the "Django way". Since the data you need to validate is coming from the client-side, using new forms (probably with a ModelForm) is the most idiomatic method to validate because it wraps all of your concerns into one API: it provides sensible validation defaults (with the ability to easily customize), plus model forms integrates the data-entry side (the html form) with the data commit (model.save()).
However, it sounds like you have what may be a mess of a legacy project; it may be outside the scope of your time to rewrite all your form handling to use new forms, at least initially. So here are my thoughts:
First of all, it's not "non-Djangonic" to put some validation in the model itself - after all, html form submissions may not be the only source of new data. You can override the save() method or use signals to either clean the data on save or throw an exception on invalid data. Long term, Django will have model validation, but it's not there yet; in the interim, you should consider this a "safety" to ensure you don't commit invalid data to your DB. In other words, you still need to validate field by field before committing so you know what error to display to your users on invalid input.
What I would suggest is this. Create new forms classes for each item you need to validate, even if you're not using them initially. Malcolm Tredinnick outlined a technique for doing model validation using the hooks provided in the forms system. Read up on that (it's really quite simple and elegant), and hook in into your models. Once you've got the newforms classes defined and working, you'll see that it's not very difficult - and will in fact greatly simplify your code - if you rip out your existing form templates and corresponding validation, and handle your form POSTs using the forms framework. There is a bit of a learning curve, but the forms API is extremely well thought out and you'll be grateful how much cleaner it will make your code.

Thanks Daniel for your reply. Especially for the newforms API, I will definitely spend time digging into it and see if I can adopt it for the better long-term benefits. But just for the sake of getting my work done for this iteration (meet my deadline before EOY), I'd probably still have to stick with the current legacy structure, after all, either way will get me to what I want, just that I want to make it sane and clean as possible as I can without breaking too much.
So sounds like doing validations in model isn't a too bad idea, but in another sense, my old way of doing validations in views against the request seems also close to the concept of encapsulating them inside the newforms API (data validation is decoupled from model creation). Do you think it is ok to just keep my old design? It make more sense to me to touch this with the newforms API instead of juggling them right now...
(Well I got this refactoring suggestion from my code reviewer but I am really not so sure that my old way violates any mvc patterns or too complicated to maintain. I think my way makes more senses but my reviewer thought binding validation and model creation together makes more sense...)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.