Django - How to avoid query the same obj

Django - How to avoid query the same obj - python

I apologize for the extremely long message, but I'm new with this and need your knowledge and advice about Python and Django.
Basically, I am developing a small "questions and answers" game.I have a model with all the questions with their codes, etc.
Participants will login (they could be from 5 to 20 participants on each game), and on the screen, they will have an option that says "ASK A QUESTION". All participants can click the button at the same time, so how can I be sure that each user gets a different question?. Obviously I already thought, in placing true / false fields, if the question was already used, but the idea is to avoid duplication.
I come from JAVA, so with synchronized methods are not going to allow the method to be accessed at the same time, avoiding duplicate results. So is there something similar in Python? or is there a way to avoid that duplication?
Thanks a lot!

You could do this using transaction.atomic and a field used as a flag to indicate that a question has been given. (is_asked?)
def myview(request):
with transaction.atomic():
valid_questions = Question.objects.select_for_update()\
.filter(is_asked=False)
# some code here to get a valid question
# and return the question, then saving it as being
# is_asked = True before exiting the atomic block.
What transaction.atomic() guarantees is that the whole interaction is wrapped in a single transaction and select_for_update() locks the table until this transaction is done. That way, a concurrent click waits its turn and the database is in a consistent state before the next lookup fires.

Related

Django - Best way to create snapshots of objects

I am currently working on a Django 2+ project involving a blockchain, and I want to make copies of some of my object's states into that blockchain.
Basically, I have a model (say "contract") that has a list of several "signature" objects.
I want to make a snapshot of that contract, with the signatures. What I am basically doing is taking the contract at some point in time (when it's created for example) and building a JSON from it.
My problem is: I want to update that snapshot anytime a signature is added/updated/deleted, and each time the contract is modified.
The intuitive solution would be to override each "delete", "create", "update" of each of the models involved in that snapshot, and pray that all of them the are implemented right, and that I didn't forget any. But I think that this is not scalable at all, hard to debug and ton maintain.
I have thought of a solution that might be more centralized: using a periodical job to get the last update date of my object, compare it to the date of my snapshot, and update the snapshot if necessary.
However with that solution, I can identify changes when objects are modified or created, but not when they are deleted.
So, this is my big question mark: how with django can you identify deletions in relationships, without any prior context, just by looking at the current database's state ? Is there a django module to record deleted objects ? What are your thoughts on my issue ?

All right?
I think that, as I understand your problem, you are in need of a module like Django Signals, which listens for changes in the database and, when identified (and if all the desired conditions are met), executes certain commands in your application ( even be able to execute in the database).
This is the most recent documentation:
https://docs.djangoproject.com/en/3.1/topics/signals/

How to avoid integrity errors occuring because of concurrent creations/updates?

So let's say there's a model A which looks like this:
class A(model):
name = char(unique=True)
When a user tries to create a new A, a view will check whether the name is already taken. Like that:
name_taken = A.objects.get(name=name_passed_by_user)
if name_taken:
return "Name exists!"
# Creating A here
It used to work well, but as the system grew there started to appear concurrent attempts at creating A's with the same name. And sometimes multiple requests pass the "name exists check" in the same few milliseconds, resulting in integrity errors, since the name field has to be UNIQUE, and multiple requests to create a certain name pass the check.
The current solution is a lot of "try: except IntegrityError:" wraps around creation parts, despite the prior check. Is there a way to avoid that? Because there are a lot of models with UNIQUE constraints like that, thus a lot of ugly "try: except IntegrityError:" wraps. Is it possible to lock as to not prevent from SELECTing, but lock as to prevent from SELECTing FOR UPDATE? Or maybe there's a more proper solution? I'm certain it's a common problem with usernames and other fields/columns like them, and there must be a proper approach rather than exception catching.
The DB is Postgres10, ORM is SQLAlchemy of Python, but tweaks to db directly are applicable too.

The only thing you can do is to set the appropriate transaction isolation directly to postgres. Neither python nor the ORM can do anything about it. serialized level will most likely solve your problem. But it might slow down performance, so you should try repeatable read too.

If you are using Python, you should have heard of the “ask forgiveness, not permission” design principle.
To avoid the race condition you describe, simply try to add the new row to the table.
If you get a unique_violation (SQLSTATE 23505), rollback the transaction and return that the name exists.

Ndb strong consistency and frequent writes

I'm trying to achieve strong consistency with ndb using python.
And looks like I'm missing something as my reads behave like they're not strongly consistent.
The query is:
links = Link.query(ancestor=lead_key).filter(Link.last_status ==
None).fetch(keys_only=True)
if links:
do_action()
The key structure is:
Lead root (generic key) -> Lead -> Website (one per lead) -> Link
I have many tasks that are executed concurrently using TaskQueue and this query is performed at the end of every task. Sometimes I'm getting "too much contention" exception when updating the last_status field but I deal with it using retries. Can it break strong consistency?
The expected behavior is having do_action() called when there are no links left with last_status equal to None. The actual behavior is inconsistent: sometimes do_action() is called twice and sometimes not called at all.

Using an ancestor key to get strong consistency has a limitation: you're limited to one update per second per entity group. One way to work around this is to shard the entity groups. Sharding Counters describes the technique. It's an old article, but as far as I know, the advise is still sound.

Adding to Dave's answer which is the 1st thing to check.
One thing which isn't well documented and can be a bit surprising is that the contention can be caused by read operations as well, not only by the write ones.
Whenever a transaction starts the entity groups being accessed (by read or write ops, doesn't matter) are marked as such. The too much contention error indicates that too many parallel transactions simultaneously try to access the same entity group. It can happen even if none of the transactions actually attempts to write!
Note: this contention is NOT emulated by the development server, it can only be seen when deployed on GAE, with the real datastore!
What can add to the confusion is the automatic re-tries of the transactions, which can happen after both actual write conflicts or just plain access contention. These retries may appear to the end-user as suspicious repeated execution of some code paths - which I suspect could explain your reports of do_action() being called twice.
Usually when you run into such problems you have to re-visit your data structures and/or the way you're accessing them (your transactions). In addition to solutions maintaining the strong consistency (which can be quite expensive) you may want to re-check if consistency is actually a must. In some cases it's added as a blanket requirement just because appears to simplify things. From my experience it doesn't :)

There is nothing in your sample that ensures that your code is only called once.
For the moment, I am going to assume that your "do_action" function does something to the Link entities, specifically that it sets the "last_status" property.
If you do not perform the query and the write to the Link Entity inside a transaction, then it is possible for two different requests (task queue tasks) to get results back from the query, then both write their new value to the Link entity (with the last write overwriting the previous value).
Remember that even if you do use a transaction, you don't know until the transaction is successfully completed that nobody else tried to perform a write. This is important if you are trying to do something external to datastore (for example, making a http request to an external system), as you may see http requests from transactions that would eventually fail with a concurrent modification exception.

SQLAlchemy(Postgresql) - Race Conditions

We are writing an inventory system and I have some questions about sqlalchemy (postgresql) and transactions/sessions. This is a web app using TG2, not sure this matters but to much info is never a bad.
How can make sure that when changing inventory qty's that i don't run into race conditions. If i understand it correctly if user on is going to decrement inventory on an item to say 0 and user two is also trying to decrement the inventory to 0 then if user 1s session hasn't been committed yet then user two starting inventory number is going to be the same as user one resulting in a race condition when both commit, one overwriting the other instead of having a compound effect.
If i wanted to use postgresql sequence for things like order/invoice numbers how can I get/set next values from sqlalchemy without running into race conditions?
EDIT: I think i found the solution i need to use with_lockmode, using for update or for share. I am going to leave open for more answers or for others to correct me if I am mistaken.
TIA

If two transactions try to set the same value at the same time one of them will fail. The one that loses will need error handling. For your particular example you will want to query for the number of parts and update the number of parts in the same transaction.
There is no race condition on sequence numbers. Save a record that uses a sequence number the DB will automatically assign it.
Edit:
Note as Limscoder points out you need to set the isolation level to Repeatable Read.

Setup the scenario you are talking about and see how your configuration handles it. Just open up two separate connections to test it.
Also read up on FOR UPDATE For Update and also on transaction isolation level Isolation Level

Django Models: Keep track of activity through related models?

I have something of a master table of Persons. Everything in my Django app some relates to one or more People, either directly or through long fk chains. Also, all my models have the standard bookkeeping fields 'created_at' and 'updated_at'. I want to add a field on my Person table called 'last_active_at', mostly for raw sql ordering purposes.
Creating or editing certain related models produces new timestamps for those objects. I need to somehow update Person.'last_active_at' with those values. Functionally, this isn't too hard to accomplish, but I'm concerned about undue stress on the app.
My two greatest causes of concern are that I'm restricted to a real db field--I can't assign a function to the Person table as a #property--and one of these 'activity' models receives and processes new instances from a foreign datasource I have no control over, sporadically receiving a lot of data at once.
My first thought was to add a post_save hook to the 'activity' models. Still seems like my best option, but I know nothing about them, how hard they hit the db, etc.
My second thought was to write some sort of script that goes through the day's activity and updates those models over the night. My employers a 'live'er stream, though.
My third thought was to modify the post_save algo to check if the 'updated_at' is less than half an hour from the Person's 'last_active_at', and not update the person if true.
Are my thoughts tending in a scalable direction? Are there other approaches I should pursue?

It is said that premature optimization is the mother of all problems. You should start with the dumbest implementation (update it every time), and then measure and - if needed - replace it with something more efficient.
First of all, let's put a method to update the last_active_at field on Person. That way, all the updating logic itself is concentrated here, and we can easily modify it later.
The signals are quite easy to use : it's just about declaring a function and registering it as a receiver, and it will be ran each time the signal is emitted. See the documentation for the full explanation, but here is what it might look like :
from django.db.models.signals import post_save
from django.dispatch import receiver
#receiver(post_save, sender=RelatedModel)
def my_handler(sender, **kwargs):
# sender is the object being saved
person = # Person to be updated
person.update_activity()
As for the updating itself, start with the dumbest way to do it.
def update_activity(self):
self.last_active_at = now()
Then measure and decide if it's a problem or not. If it's a problem, some of the things you can do are :
Check if the previous update is recent before updating again. Might be useless if a read to you database is not faster than a write. Not a problem if you use a cache.
Write it down somewhere for a deferred process to update later. No need to be daily : if the problem is that you have 100 updates per seconds, you can just have a script update the database every 10 seconds, or every minutes. You can probably find a good performance/uptodatiness trade-off using this technique.
These are just some though based on what you proposed, but the right choice depends on the kind of figures you have. Determine what kind of load you'll have, what kind of reaction time is needed for that field, and experiment.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.