Django prefetch_related and select_related

Django prefetch_related and select_related - python

I am wondering about the behaviour of prefetch_related() and select_related().
If I do something like Model.objects.filter(...).prefetch_related(), I notice that there are much less database queries occurring. So my initial guess is that, if one does not specify the needed look-ups in prefetch_related(), it will automatically go through all of the model fields and do the needed job. However, I cannot find any reference to it on the web, which seems pretty strange to me. Is my guess correct or am I missing something?

From the FineManual(tm) (emphasis is mine):
There may be some situations where you wish to call select_related()
with a lot of related objects, or where you don’t know all of the
relations. In these cases it is possible to call select_related() with
no arguments. This will follow all non-null foreign keys it can find -
nullable foreign keys must be specified. This is not recommended in
most cases as it is likely to make the underlying query more complex,
and return more data, than is actually needed.

Related

Django JSONField isnull lookup

As far as I know you can't use __isnull lookups on django native JSONField. On the Internet I found this inactive issue.
As possible workaround we can of course use these hacks:
model.objects.filter(field__contains={'key': None}), which isn't so flexible since you might need to query multiple keys or whatever.
model.objects.exclude(field__key=True).exclude(field__key=False), which is hacky and works only for boolean data.
I hope there is a better way ((c) Raymond Hettinger) of doing this. Any advises will be appreciated. For now, I will go with the first approach

According to this (see the last, closing comment) the following should work model.objects.filter(field__key=None) (But obviously you should use the Django version with the fix).
The django docs
warn that
Since any string could be a key in a JSON object, any lookup other
than those listed below will be interpreted as a key lookup. No errors
are raised. Be extra careful for typing mistakes, and always check
your queries work as you intend.
and here they are

Django model with hundreds of fields

I have a model with hundreds of properties. The properties can be of different types (integer, strings, uploaded files, ...). I would like to implement this complex model step by step, starting with the most important properties. I can think of two options:
Define the properties as regular model fields
Define a separate model to hold each property separately, and link it to the main model with a ForeignKey
I have not found any suggestions on how to handle models with lots of properties with django. What are the advantages / drawbacks of both approaches?

You definitely should not define your properties as ForeignKeys. Every time you need a full model, your database server will have to make hundreds of JOINs, therefore ruining your performance.
If your properties are needed almost every time you access the model, you should keep them in the same model. If not, you could make a separate Properties model and link it to your original model via OneToOneField.
I personally had such an experience. We had to build a hotel recomendation engine, and we were using Drupal back then. And as Drupal stores every custom property in a separate MySQL table, we quickly realised we should switch the framework, because every single query crashed our production servers (20+ JOINs are a deadly thing to MySQL). BTW, we ended up using a custom solution based on ElasticSearch, which handles hundreds of fields just fine.
Update: If you're lucky enough to be using a recent version of PostgreSQL, you could leverage the JSONField storage to pack all your fields to a single model field. Note, though, that you'll have to implement a validation scheme by yourself.

customer requirement.
First off, I feel your pain and wish you the best! I wish to reiterate if this wasn't the case that you should be first looking to change this as there should never be any need for hundreds of properties on a single object, it normally shows a need for an array, inheritance, or separate classes etc..
Going forward, you're going to need to heavily make use of values and values_list to only return the properties that you actually need from the database since performance will be severely crippled from this.
Since you can't do anything with the model, you should try to address your performance issues from the design side of things. The single responsibility principle should feature heavily in your website which will mean you'll only ever have a few values needed to be returned from the model. This way it really won't make much difference what option you choose since what is returned will be very limited.
Filter where you can, and use ordering sparingly.

You could group them into a few separate models, linked by OneToOneFields to the main model. That would "namespace" your data, and namespaces are "one honking great idea".

How can I do this in SQLAlchemy?

I have an SQLAlchemy model, say Entity and this has a column, is_published. If I query this, say by id, I only want to return this if is_published is set to True. I think I can achieve using a filter. But in case there is a relationship and I am able to and require to access it like another_model_obj.entity and I only want this to give the corresponding entity object if the is_published for that instance is set to True. How should I do this? One solution would be wrap this around using an if block each time I use this. But I use this too many times and if I use it again, I will have to remember this detail. Is there any way I can automate this in SQLAlchemy, or is there any other better solution to this problem as a whole? Thank you

It reads as if you would need a join:
session().query(AnotherModel).join(Entity).filter(Entity.is_published)

This question was asked many times here and still doesn't have a good answer. Here are possible solutions suggested by the author of SQLAlchemy. A more elaborate query class to exclude unpublished objects is provided in iktomi library. It works with SQLAlchemy 0.8.* branch only for now but should be ported to 0.9.* soon. See test cases for limitations (tests that fail are marked with #unittest.skip()) and usage examples.

Can django lazy-load fields in a model?

One of my django models has a large TextField which I often don't need to use. Is there a way to tell django to "lazy-load" this field? i.e. not to bother pulling it from the database unless I explicitly ask for it. I'm wasting a lot of memory and bandwidth pulling this TextField into python every time I refer to these objects.
The alternative would be to create a new table for the contents of this field, but I'd rather avoid that complexity if I can.

The functionality happens when you make the query, using the defer() statement, instead of in the model definition. Check it out here in the docs:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#defer
Now, actually, your alternative solution of refactoring and pulling the data into another table is a really good solution. Some people would say that the need to lazy load fields means there is a design flaw, and the data should have been modeled differently.
Either way works, though!

There are two options for lazy-loading in Django: https://docs.djangoproject.com/en/1.6/ref/models/querysets/#django.db.models.query.QuerySet.only
defer(*fields)
Avoid loading those fields that require expensive processing to convert them to Python objects.
Entry.objects.defer("text")
only(*fields)
Only load the fields that you actually need
Person.objects.only("name")
Personally, I think only is better than defer since the code is not only easier to understand, but also more maintainable in the long run.

For something like this you can just override the default manager. Usually, it's not advised but for a defer() it makes sense:
class CustomManager(models.Manager):
def get_queryset(self):
return super(CustomManager, self).get_queryset().defer('YOUR_TEXTFIELD_FIELDNAME')
class DjangoModel(models.Model):
objects = CustomerManager()

What's a good general way to look SQLAlchemy transactions, complete with authenticated user, etc?

I'm using SQLAlchemy's declarative extension. I'd like all changes to tables logs, including changes in many-to-many relationships (mapping tables). Each table should have a separate "log" table with a similar schema, but additional columns specifying when the change was made, who made the change, etc.
My programming model would be something like this:
row.foo = 1
row.log_version(username, change_description, ...)
Ideally, the system wouldn't allow the transaction to commit without row.log_version being called.
Thoughts?

There are too many questions in one, so they that full answers to all them won't fit StackOverflow answer format. I'll try to describe hints in short, so ask separate question for them if it's not enough.
Assigning user and description to transaction
The most popular way to do so is assigning user (and other info) to some global object (threading.local() in threaded application). This is very bad way, that causes hard to discover bugs.
A better way is assigning user to the session. This is OK when session is created for each web request (in fact, it's the best design for application with authentication anyway), since there is the only user using this session. But passing description this way is not as good.
And my favorite solution is to extent Session.commit() method to accept optional user (and probably other info) parameter and assign it current transaction. This is the most flexible, and it suites well to pass description too. Note that info is bound to single transaction and is passed in obvious way when transaction is closed.
Discovering changes
There is a sqlalchemy.org.attributes.instance_state(obj) contains all information you need. The most useful for you is probably state.committed_state dictionary which contains original state for changed fields (including many-to-many relations!). There is also state.get_history() method (or sqlalchemy.org.attributes.get_history() function) returning a history object with has_changes() method and added and deleted properties for new and old value respectively. In later case use state.manager.keys() (or state.manager.attributes) to get a list of all fields.
Automatically storing changes
SQLAlchemy supports mapper extension that can provide hooks before and after update, insert and delete. You need to provide your own extension with all before hooks (you can't use after since the state of objects is changed on flush). For declarative extension it's easy to write a subclass of DeclarativeMeta that adds a mapper extension for all your models. Note that you have to flush changes twice if you use mapped objects for log, since a unit of work doesn't account objects created in hooks.

We have a pretty comprehensive "versioning" recipe at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/LogVersions . It seems some other users have contributed some variants on it. The mechanics of "add a row when something changes at the ORM level" are all there.
Alternatively you can also intercept at the execution level using ConnectionProxy, search through the SQLA docs for how to use that.
edit: versioning is now an example included with SQLA: http://docs.sqlalchemy.org/en/rel_0_8/orm/examples.html#versioned-objects

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.