A better tool than Fixtures to populate a Django Database? - python

I set fixtures in my Django project to populate my database. This works well but has a serious limit: you can't create lots of stuff.
In theory, you can put as much elements as you want, but since you need to write them one by one, it's impossible to have 20 000 items in your db.
I need a tool that would fill the primary keys itself, and would be able to generate random typed data to fill the fixtures (e.g: emails, integers in a range, dates in a range, phones). Another nice functionally would be to set functional rules in data generation.
Does someone knows a way (library, ...) to do this in a Django project?
I took a look at https://github.com/joke2k/faker - the tool itself seems good, but no integration with Django.
Otherwise, I guess I could write it myself using Faker (since writing a fixture file just consists on json generation), but I don't like to reinvent the wheel :)
Thanks.

Factory Boy: https://factoryboy.readthedocs.org
It's a fixtures replacement that works really well for unit testing or otherwise making fixture data. You can write classes that hook into your models and generate populated model instances and you can construct them to save to the database, or not.

Related

Django model with hundreds of fields

I have a model with hundreds of properties. The properties can be of different types (integer, strings, uploaded files, ...). I would like to implement this complex model step by step, starting with the most important properties. I can think of two options:
Define the properties as regular model fields
Define a separate model to hold each property separately, and link it to the main model with a ForeignKey
I have not found any suggestions on how to handle models with lots of properties with django. What are the advantages / drawbacks of both approaches?
You definitely should not define your properties as ForeignKeys. Every time you need a full model, your database server will have to make hundreds of JOINs, therefore ruining your performance.
If your properties are needed almost every time you access the model, you should keep them in the same model. If not, you could make a separate Properties model and link it to your original model via OneToOneField.
I personally had such an experience. We had to build a hotel recomendation engine, and we were using Drupal back then. And as Drupal stores every custom property in a separate MySQL table, we quickly realised we should switch the framework, because every single query crashed our production servers (20+ JOINs are a deadly thing to MySQL). BTW, we ended up using a custom solution based on ElasticSearch, which handles hundreds of fields just fine.
Update: If you're lucky enough to be using a recent version of PostgreSQL, you could leverage the JSONField storage to pack all your fields to a single model field. Note, though, that you'll have to implement a validation scheme by yourself.
customer requirement.
First off, I feel your pain and wish you the best! I wish to reiterate if this wasn't the case that you should be first looking to change this as there should never be any need for hundreds of properties on a single object, it normally shows a need for an array, inheritance, or separate classes etc..
Going forward, you're going to need to heavily make use of values and values_list to only return the properties that you actually need from the database since performance will be severely crippled from this.
Since you can't do anything with the model, you should try to address your performance issues from the design side of things. The single responsibility principle should feature heavily in your website which will mean you'll only ever have a few values needed to be returned from the model. This way it really won't make much difference what option you choose since what is returned will be very limited.
Filter where you can, and use ordering sparingly.
You could group them into a few separate models, linked by OneToOneFields to the main model. That would "namespace" your data, and namespaces are "one honking great idea".

Is there a django idiom to store app-related variables in the DB?

I'm quite new to django, and moved to it from Drupal.
In Drupal is possible to define module-level variables (read "application" for django) which are stored in the DB and use one of Drupal's "core tables". The idiom would be something like:
variable_set('mymodule_variablename', $value);
variable_get('mymodule_variablename', $default_value);
variable_del('mymodule_variablename');
The idea is that it wouldn't make sense to have each module (app) to instantiate a whole "module table" to just store one value, so the core provides a common one to be shared across modules.
To the best of my newbie understanding of django, django lack such a functionality, but - since it is a common pattern - I thought to turn to SO community to check if there is a typical/standard/idiomatic way that django devs use to solve this problem.
(BTW: the value is not a constant that I could put in a settings file. It's a value that should be refreshed daily, and should be read at each request).
There are apps to achieve this, but I'd like to recommend django-modeldict from disqus, as its brief
ModelDict is a very efficient way to store things like settings in
your database. The entire model is transformed into a dictionary
(lazily) as well as stored in your cache. It's invalidated only when
it needs to be (both in process and based on CACHE_BACKEND).
Data that is not static is stored in a model. If you need to share data or functions between apps I have seen the convention of making a shared app, something like 'common'. This would house shared models, or utility functions.
In the django projects I have seen the data is usually specific. The data you are storing should be in a model that is representative of that data, I would rather have an explicit model/object representing my data then a generic object that houses vastly different data.
If you are only defining 1 or two variables which are changed daily, perhaps just a key/value store like memcached would work for you?
Another +1 for ModelDict. Another potential, similar solution is Django Constance:
https://github.com/jazzband/django-constance
It's meant to store app config parameters in the database and has the advantage that it exposes a nice backend to edit them for administrators (with the right permissions), handles default values and also has caching etc.
EDIT:
In case it's not clear from the documentation (which it isn't), you can set settings the same the 'Pythonic way.' I.e. to set a setting to a value, you do
from constance import config
config.variable_name = value

Converting Python App into Django

I've got a Python program with about a dozen classes, with several classes possessing instances of other classes, e.g. ObjectA has a list of ObjectB's, and a dictionary of (ObjectC, ObjectD) pairs.
My goal is to put the program's functionality on a website.
I've written and tested JSON encode and decode methods for each class. The problem as I see it now is that I need to choose between starting over and writing the models and logic afresh from a database perspective, or simply storing the python objects (encoded as JSON) in the database, and pulling out the saved states for changes.
Can someone confirm that these are both valid approaches, and that I'm not missing any other simple options?
Man, what I think you can do is convert the classes you already have made into django model classes. Of course, only the ones that need to be saved to a database. The other classes, as the rest of the code, I recommend you to encapsulate them for use as helper functions. So you don't have to change too much your code and it's going to work fine. ;D
Or, another choice, that can be easier to implement is: put everything in a helper, the classes, the functions and everything else.
SO you'll just need to call the functions in your views and define models to save your data into the database.
Your idea of saving the objects as JSON on the database works, but it's ugly. ;)
Anyway, if you are in a hurry to deliver the website, anything is valid. Just remember that things made in this way always give us lots of problems in the future.
It hopes that it could be useful! :D

How to make django test framework read from live database?

I realize there's a similar question here, but this one has a different approach: I have a django app that does queries over data indexed with djapian ; I'd like to write unit tests for this app's search component, and, obviously, I'd need the django settings module and all connections with the database active, so the test runner that django provides seems ideal. however, the django testing framework creates a dummy database and I'd hate to dump all my data to a fixture and then index it (the tests would take forever!);
My data isn't at risk because the tests would only read from the database, so, how could this be achieved? -I'm new at this whole unit testing thing, so the solution of writing a new test runner I read in that similar question doesn't enlighten me a bit, at least not without some details
Reading the test cases for djapian I found something really interesting: what those guys do is use the setUp method for the TestCase class: they create an object and then use the update method for the indexer, so they effectively have a document to search for and a way to write controlled query tests!
For the curious, the method looks something like this:
def setUp(self):
p = Person.objects.create(name="Alex")
for i in range(self.num_entries):
Entry.objects.create(author=p, title="Entry with number %s" % i, text="foobar " * i)
Entry.indexer.update()
I think this would do, but we have to remember I'm testing a little search engine here, so this solution might be the easy way out; I can't come up with an objection, though, so if you guys have an answer that'll help define a strategy for testing this kind of webApps in python in general, it's more than welcome!
-I think I'll settle for something like this for now (I wanted to test the latency of the queries with a fully populated database also, but I think I could do that later with bench tests in Funkload)
EDIT: Ok, to be faithful to a solution for anyone interested, I ran into another issue: the xapian index (as stated in the comment). To solve it, I created a default test runner that changed the production xapian index for a test index (a smaller one, created with a management script). This runner is fairly simple:
def custom_run_tests(test_labels, verbosity=1, interactive=True, extra_tests=[]):
"""Set the test indices"""
settings.CATEGORY_CLASSIFIER_DATA = settings.TEST_CLASSIFIER_DATA
return run_tests(test_labels, verbosity, interactive, extra_tests)
And, to use it, I simply added a setting:
TEST_RUNNER = 'search.tests.custom_run_tests'
I dropped the aforementioned approach (creating the documents in the setUp) for performance and readability reasons: to test the database I needed a decent amount of documents with some text (a paragraph or two), so I ended up creating a fixture for that (I used a management command that created the documents in the real database, serialized them -writing them to a file- and then deleted 'em).
So, in the end, I didn't read from the live db at all and instead used test fixtures created with a somewhat hacky script and a custom runner, and it wasn't that hard :)

Keeping track of changes - Django

I have various models of which I would like to keep track and collect statistical data.
The problem is how to store the changes throughout time.
I thought of various alternative:
Storing a log in a TextField, open it and update it every time the model is saved.
Alternatively pickle a list and store it in a TextField.
Save logs on hard drive.
What are your suggestions?
Don't reinvent the wheel.. Use django-reversion for logging changes.
I'd break statistics off into a separate model though.
Quoth my elementary chemistry teacher: "If you don't write it down, it didn't happen", therefore save logs in a file.
Since the log information is disjoint from your application data (it's meta-data, actually), keep them separate. You could log to a database table but it should be distinct from your model.
Text pickle data is difficult for humans to read, binary pickle data even more so; log in an easily parsed format and the data can be imported into analysis software easily.
I've had similar situation in which we were supposed to keep the history of changed. But we also needed audit to track who made the changes and the ability to revert. In our approach storing in database seemed more logical. However considering you have statistical data and it's gonnna be large, perhaps separate file based approach would be better for you.
In any case you should use a generic mechanism to log the changes on models rather than coding each model invidually.
Take a look at this: http://www.djangosnippets.org/snippets/1052/

Categories