Does the Django ORM provide a way to conditionally create an object?
For example, let's say you want to use some sort of optimistic concurrency control for inserting new objects.
At a certain point, you know the latest object to be inserted in that table, and you want to only create a new object only if no new objects have been inserted since then.
If it's an update, you could filter based on a revision number:
updated = Account.objects.filter(
id=self.id,
version=self.version,
).update(
balance=balance + amount,
version=self.version + 1,
)
However, I can't find any documented way to provide conditions for a create() or save() call.
I'm looking for something that will apply these conditions at the SQL query level, so as to avoid "read-modify-write" problems.
EDIT: This is not an Optimistic Lock attempt. This is a direct answer to OP's provided code.
Django offers a way to implement conditional queries. It also offers the update_or_create(defaults=None, **kwargs) shortcut which:
The update_or_create method tries to fetch an object from the database based on the given kwargs. If a match is found, it updates the fields passed in the defaults dictionary.
The values in defaults can be callables.
So we can attempt to mix and match those two in order to recreate the supplied query:
obj, created = Account.objects.update_or_create(
id=self.id,
version=self.version,
defaults={
balance: Case(
When(version=self.version, then=F('balance')+amount),
default=amount
),
version: Case(
When(version=self.version, then=F('version')+1),
default=self.version
)
}
)
Breakdown of the Query:
The update_or_create will try to retrieve an object with id=self.id and version=self.version in the database.
Found: The object's balance and version fields will get updated with the values inside the Case conditional expressions accordingly (see the next section of the answer).
Not Found: The object with id=self.id and version=self.version will be created and then it will get its balance and version fields updated.
Breakdown of the Conditional Queries:
balance Query:
If the object exists, the When expression's condition will be true, therefore the balance field will get updated with the value of:
# Existing balance # Added amount
F('balance') + amount
If the object gets created, it will receive as an initial balance the amount value.
version Query:
If the object exists, the When expression's condition will be true, therefore the version field will get updated with the value of:
# Existing version # Next Version
F('version') + 1
If the object gets created, it will receive as an initial version the self.version value (it can also be a default initial version like 1.0.0).
Notes:
You may need to provide an output_field argument to the Case expression, have a look here.
In case (pun definitely intended) of curiosity about what F() expression is and how it is used, I have a Q&A style example here: How to execute arithmetic operations between Model fields in django
Except for QuerySet.update returning the number of affected rows Django doesn't provide any primitives to deal with optimistic locking.
However there's a few third-party apps out there that provide such a feature.
django-concurrency which is the most popular option that provides both database level constraints and application one
django-optimistic-lock which is a bit less popular but I've tried in a past project and it was working just fine.
django-locking unmaintained.
Edit: It looks like OP was not after optimistic locking solutions after all.
Related
The question related to python - django framework, and probably to experienced django developers. Googled it for some time, also seeked in django queryset itself, but have no answer. Is it possible to know if queryset has been filtered and if so, get key value of filtered parameters?
I'm developing web system with huge filter set, and I must predefine some user-background behavior if some filters had been affected.
Yes, but since to the best of my knowledge this is not documented, you probably should not use it. Furthermore it looks to me like bad design if you need to obtain this from a QuerySet.
For a QuerySet, for example qs, you can obtain the .query attribute, and then query for the .where attribute. The truthiness of that attribute checks if that node (this attribute is a WhereNode, which is a node in the syntax of the query) has children (these children are then individual WHERE conditions, or groups of such conditions), hence has done some filtering.
So for example:
qs = Model.objects.all()
bool(qs.query.where) # --> False
qs = Model.objects.filter(foo='bar')
bool(qs.query.where) # --> True
If you inspect the WhereNode, you can see the elements out of which it is composed, for example:
>>> qs.query.where
<WhereNode: (AND: <django.db.models.lookups.Exact object at 0x7f2c55615160>)>
and by looking to the children, we even can obtain details:
>>> qs.query.where.children[0]
>>> c1.lhs
Col(app_model, app.Model.foo)
>>> c1.lookup_name
'exact'
>>> c1.rhs
'bar'
But the notation is rather cryptic. Furthermore the WhereNode is not per se a conjunctive one (the AND), it can also be an disjunctive one (the OR), and it is not said that any filtering will be done (since the tests can trivially be true, like 1 > 0). We thus only query if there will be a non-empty WHERE in the SQL query. Not whether this query will restrict the queryset in any way (although you can of course inspect the WhereNode, and look if that holds).
Note that some constraints are not part of the WHERE, for example if you make a JOIN, you will perform an ON, but this is not a WHERE clause.
Since however the above is - to the best of my knowledge - not extenstively documented, it is probably not a good idea to depend on this, since that means that it can easily change, and thus no longer work.
You can use the query attribute (i.e. queryset.query) to get the data used in the SQL query (the output isn't exactly valid SQL).
You can also use queryset.query.__dict__ to get that data in a dictionary format.
I agree with Willem Van Onsen, in that accessing the internals of the query object isn't guaranteed to work in the future. It's correct for now, but might change.
But going half-way down that path, you could use the following:
is_filtered_query = bool(' WHERE ' in str(queryset.query))
which will pretty much do the job!
I'm using Pony ORM version 0.7 with a Sqlite3 database on disk, and running into this issue: I am performing a select, then an update, then a select, then another update, and getting an error message of
pony.orm.core.UnrepeatableReadError: Value of Task.order_id for
Task[23654] was updated outside of current transaction (was: 1, now: 2)
I've reduced the problem to the minimum set of commands that causes the problem (i.e. removing anything causes the problem not to occur):
#db_session
def test_method():
tasks = list(map(Task.to_dict, Task.select()))
db.execute("UPDATE Task SET order_id=order_id*2")
task_to_move = select(task for task in Task if task.order_id == 2).first()
task_to_move.order_id = 1
test_method()
For completeness's sake, here is the definition of Task:
class Task(db.Entity):
text = Required(unicode)
heading = Required(int)
create_timestamp = Required(datetime)
done_timestamp = Optional(datetime)
order_id = Required(int)
Also, if I remove the constraint that task.order_id == 2 from my select, the problem no longer occurs, so I assume the problem has something to do with querying based on a field that has been changed since the transaction has started, but I don't know why the error message is telling me that it was changed by a different transaction (unless maybe db.execute is executing in a separate transaction because it is raw SQL?)
I've already looked at this similar question, but the problem was different (Pony ORM reports record "was updated outside of current transaction" while there is not other transaction) and at this documentation (https://docs.ponyorm.com/transactions.html) but neither solved my problem.
Any ideas what might be going on here?
Pony uses optimistic concurrency control by default. For each attribute Pony remembers its current value (potentially modified by application code) as well as original value which was read from the database. During UPDATE Pony checks that the value of column in the database is still the same. If the value is changed, Pony assumes that some concurrent transaction did it, and throw exception in order to avoid the "lost update" situation.
If you execute some raw SQL query, Pony does not know what exactly was modified in the database. So when Pony encounters that the counter value was changed, it mistakenly thinks that the value was changed by another transaction.
In order to avoid the problem you can mark order_id attribute as volatile. Then Pony will assume, that the value of attribute can change at any time (by trigger or raw SQL update), and will exclude that attribute from optimistic checks:
class Task(db.Entity):
text = Required(unicode)
heading = Required(int)
create_timestamp = Required(datetime)
done_timestamp = Optional(datetime)
order_id = Required(int, volatile=True)
Note that Pony will cache the value of volatile attribute and will not re-read the value from the database until the object was saved, so in some situation you can get obsolete value in Python.
Update:
Starting from release 0.7.4 you can also specify optimistic=False option to db_session to turn off optimistic checks for specific transaction that uses raw SQL queries:
with db_session(optimistic=False):
...
or
#db_session(optimistic=False)
def some_function():
...
Also it is possible now to specify optimistic=False option for attribute instead of specifying volatile=True. Then Pony will not make optimistic checks for that attribute, but will still consider treat it as non-volatile
I have a Query object which was initially configured to lazyload() all relations on a model:
query = session.query(Article).options(lazyload('author'))
Is it possible to revert the relationship loading back to default? E.g. the relationship was configured with lazy='joined', and I want the query to have joinedload() behavior without using joinedload() explicitly.
I was expecting defaultload() to have this behavior, but in fact it does not: it references the query default instead of the relationship default. So I'm searching for kinda resetload() solution.
The reason for doing this is because I'm creating a JSON-based query syntax, and no relations should be loaded unless the user explicitly names them.
Currently, I'm using lazyload() on all relations that were not explicitly requested, but want to go the other way around: lazyload() all relations first, and then override it for some of them.
This would have made the code more straigntforward.
Just to be clear:
By default, all inter-object relationships are lazy loading.
http://docs.sqlalchemy.org/en/latest/orm/loading.html
So we are talking about a case in which a relation has been specifically marked as eager loading, then the queries are configured as lazy loading, then you want to "override the override" as it were.
Chaining calls to options will override earlier calls. I did test this a bit.
q = s.query(User) # lazy loads 'addresses'
q = s.query(User).options(contains_eager('addresses')) # eager loads
q = s.query(User).options(contains_eager('addresses'))\
.options(lazyload('addresses')) # lazy loads
q = s.query(User).options(contains_eager('addresses'))\
.options(lazyload('addresses'))\
.options(contains_eager('addresses')) # eager loads
However, it sounds like you're talking about just reverting the lazyload option, whereas the above case involves an explicit change to eager loading.
The defaultload docstring says its use case is to be chained to other loader options, so I don't think it's related.
Based on a glance through the source, I don't think this behavior is supported. When you update the loading strategy option, it updates a dictionary with the new loading strategy and I don't think there's still a reference to the old strategy, at least as far as I can tell.
You could keep a reference to the query object before .options(lazyload(...)), or just have an option to generate the query with or without the lazyload on everything.
To force everything to lazyload, ignoring what was specified on the relationship, you can use the '*' target. From the docs:
affecting all relationships not otherwise specified in the query. This
feature is available by passing the string '*' as the argument to any
of these options:
session.query(Article).options(lazyload('*'))
Then you can specify whatever load types you want per relationship or relationship chain.
# not sure how you are mapping json data to relationships
# once you know the relationships, you can build a list of them to load
my_loads = [joinedload(rel) for rel in json_rel_data]
query = session.query(Article).options(lazyload('*'), *my_loads)
# query lazy loads **everything** except the explicitly set joined loads
If you are joining on the relationships for query purposes, you can use contains_eager instead of joinedload in the options to use the already joined relationship.
my_eagers = [contains_eager(rel) for rel in json_rel_joins]
my_loads = [joinedload(rel) for rel in json_rel_loads]
query = session.query(Article
).join(*json_rel_joins
).options(lazyload('*'), *my_eagers, *my_loads)
I have probably not grasped the use of #hybrid_property fully. But what I try to do is to make it easy to access a calculated value based on a column in another table and thus a join is required.
So what I have is something like this (which works but is awkward and feels wrong):
class Item():
:
#hybrid_property
def days_ago(self):
# Can I even write a python version of this ?
pass
#days_ago.expression
def days_ago(cls):
return func.datediff(func.NOW(), func.MAX(Event.date_started))
This requires me to add the join on the Action table by the caller when I need to use the days_ago property. Is the hybrid_property even the correct approach to simplifying my queries where I need to get hold of the days_ago value ?
One way or another you need to load or access Action rows either via join or via lazy load (note here it's not clear what Event vs. Action is, I'm assuming you have just Item.actions -> Action).
The non-"expression" version of days_ago intends to function against Action objects that are relevant only to the current instance. Normally within a hybrid, this means just iterating through Item.actions and performing the operation in Python against loaded Action objects. Though in this case you're looking for a simple aggregate you could instead opt to run a query, but again it would be local to self so this is like object_session(self).query(func.datediff(...)).select_from(Action).with_parent(self).scalar().
The expression version of the hybrid when formed against another table typically requires that the query in which it is used already have the correct FROM clauses set up, so it would look like session.query(Item).join(Item.actions).filter(Item.days_ago == xyz). This is explained at Join-Dependent Relationship Hybrid.
your expression here might be better produced as a column_property, if you can afford using a correlated subquery. See that at http://docs.sqlalchemy.org/en/latest/orm/mapping_columns.html#using-column-property-for-column-level-options.
Working a django project and trying to speed up the calls. I noticed that Django automatically does a second query to evaulate any foreign key relationships. For instance if my models look like:
Model Person:
name = model.CharField("blah")
Model Address:
person = model.ForeignKey(Person)
Then I make:
p1 = Person("Bob")
address1 = Address(p1)
print (p1.id) #let it be 1 cause it is the first entry
then when I call:
address1.objects.filter(person_id = "1")
I get:
Query #1: SELECT address.id, address.person_id FROM address
Query #2: SELECT person.id, person.name FROM person
I want to get rid of the 2nd call, query #2. I have tried using "defer" from django documentation, but that did not work (in fact it makes even more calls). "values" is a possibility but in actual practice, there are many more fields I want to pull. The only thing I want it to do is not evaluate the FOREIGN KEY. I would be happy to get the person_id back, or not. This drastically reduces the runtime especially when I do a command like: Address.objects.all(), because it Django evaluates every foreign key.
Having just seen your other question on the same issue, I'm going to guess that you have defined a __unicode__ method that references the ForeignKey field. If you query for some objects in the shell and output them, the __unicode__ method will be called, which requires a query to get the ForeignKey. The solution is to either rewrite that method so it doesn't need that reference, or - as I say in the other question - use select_related().
Next time, please provide full code, including some that actually demonstrates the problem you are having.