locking some special rows of a database table n django - python

I have a model in Django like follows:
class A(models.Model):
STATUS_DEFAULT = "default"
STATUS_ACCEPTED = "accepted"
STATUS_REJECTED = "rejected"
STATUS_CHOICES = (
(STATUS_DEFAULT, 'Just Asked'),
(STATUS_ACCEPTED, 'Accepted'),
(STATUS_REJECTED, 'Rejected'),
)
status = models.CharField(choices=STATUS_CHOICES, max_length=20, default=STATUS_DEFAULT)
question = models.ForeignKey(Question)
Notice that Question is another model in my project. I have a constraint on the A model. Between rows with the same question only one of them can has status=STATUS_ACCEPTED and at the first all of them have status=STATUS_DEFAULT. I want to write a function that does the following :
def accept(self):
self.status = STATUS_ACCEPTED
self.save()
A.objects.filter(question=self.question).update(status=STATUS_REJECTED)
But if two instances of A with same question call this function maybe a race condition will happen. So the one who calls this function sooner should lock other instances with same question to prevent race condition.
How should I do this?

Assuming you are using a DB backend that supports locks, you can lock the question using select_for_update
You code could then look like:
#transaction.atomic
def accept(self):
# Lock related question so no other instance will run the following code at the same time.
Question.objects.filter(pk=self.question.pk).select_for_update()
# now we have the lock, reload to make sure we have not been updated meanwhile
self.refresh_from_db()
if self.status != STATUS_REJECTED:
A.objects.filter(question=self.question).exclude(pk=self.pk).update()
self.status = STATUS_ACCEPTED
self.save()
else:
raise Exception('An answer has already been accepted !')
With that code, only one instance at a time will be able to run the code after select_for_update (for a given question).
Note the refresh_from_db call as while waiting to acquire the lock, another instance may have accepted another answer...

As I understand it, you want to make sure that two instances of A which share a Question cannot both simultaneously have the 'accepted' status. A objects are initiated at the default status.
Perhaps you should rethink your approach:
Let the question itself tell you which A has the accepted status.
Solution:
add the following to your Question model:
accepted_a = models.OneToOneField(A, null = true, default = null)
since you seem to want the accept method to be part of the A class, you can write your accept the way you have it laid out in your question. I disagree though, I think the behaviour of the Question is that the Question accepts the A, so the method should be defined in Question class.
def accept(self,A):
self.accepted_a = A
now, in your views, when you want the A to get accepted, you would write:
q = Question.objects.get(Question_id)
a = A.objects.get(A_id)
q.accept(A)
q.save()
How this works:
Django (and databases in general) provides a mechanism by which a relationship can specify One-to-One relationships. By using that in the Question model, we specify that each question can have exactly one accepted A. This does not override or alter the behaviour of the Many-to-One relationship the Question has with A.
Our accept is a bit naive though, it doesn't look to see if the question is a foreign key to A. We chan change that (or any other logic you wish):
Edit: With information provided in comments, we need to ensure that the first Ask (A) To accept the question locks it out. To that end, we will check if the question already has an acceptor Ask. Since a question defaults to null, we can simply test if it is null.
def accept(self, A):
if (A.question == self) and (self.accepted_a==null):
self.accepted_a = A
return True
else:
return False

Related

Django Model inheritance for efficient code

I have a Django app that uses an Abstract Base Class ('Answer') and creates different Answers depending on the answer_type required by the Question objects. (This project started life as the Polls tutorial). Question is now:
class Question(models.Model):
ANSWER_TYPE_CHOICES = (
('CH', 'Choice'),
('SA', 'Short Answer'),
('LA', 'Long Answer'),
('E3', 'Expert Judgement of Probabilities'),
('E4', 'Expert Judgment of Values'),
('BS', 'Brainstorms'),
('FB', 'Feedback'),
)
answer_type = models.CharField(max_length=2,
choices=ANSWER_TYPE_CHOICES,
default='SA')
question_text = models.CharField(max_length=200, default="enter a question here")
And Answer is:
class Answer(models.Model):
"""
Answer is an abstract base class which ensures that question and user are
always defined for every answer
"""
question = models.ForeignKey(Question, on_delete=models.CASCADE)
user = models.ForeignKey(User, on_delete=models.CASCADE, default=1)
class Meta:
abstract = True
ordering = ['user']
At the moment, I have a single method in Answer (overwriting get_or_update_answer()) with type-specific instructions to look in the right table and collect or create the right type of object.
#classmethod
def get_or_update_answer(self, user, question, submitted_value={}, pk_ans=None):
"""
this replaces get_or_update_answer with appropriate handling for all
different Answer types. This allows the views answer and page_view to get
or create answer objects for every question type calling this function.
"""
if question.answer_type == 'CH':
if not submitted_value:
# by default, select the top of a set of radio buttons
selected_choice = question.choice_set.first()
answer, _created = Vote.objects.get_or_create(
user=user,
question=question,
defaults={'choice': selected_choice})
else:
selected_choice = question.choice_set.get(pk=submitted_value)
answer = Vote.objects.get(user=user, question=question)
answer.choice = selected_choice
elif question.answer_type == 'SA':
if not submitted_value:
submitted_value = ""
answer, _created = Short_Answer.objects.get_or_create(
user=user,
question=question,
defaults={'short_answer': submitted_value})
else:
answer = Short_Answer.objects.get(
user=user,
question=question)
answer.short_answer = hashtag_cleaner(submitted_value['short_answer'])
etc... etc... (similar handling for five more types)
By putting all this logic in 'models.py', I can load user answers for a page_view for any number of questions with:
for question in page_question_list:
answers[question] = Answer.get_or_update_answer(user, question, submitted_value, pk_ans)
I believe there is a more Pythonic way to design this code - something that I haven't learned to use, but I'm not sure what. Something like interfaces, so that each object type can implement its own version of Answer.get_or_update_answer(), and Python will use the version appropriate for the object. This would make extending 'models.py' a lot neater.
I've revisited this problem recently, replaced one or two hundred lines of code with five or ten, and thought it might one day be useful to someone to find what I did here.
There are several elements to the problem I had - first, many answer types to be created, saved and retrieved when required; second, the GET vs POST dichotomy (and my idiosyncratic solution of always creating an answer, sending it to a form); third, some of the types have different logic (the Brainstorm can have multiple answers per user, the FeedBack does not even need a response - if it is created for a user, it has been presented.) These elements probably obscured some opportunity to remove repetition, which make the visitor pattern quite appropriate.
Solution for elements 1 & 2
A dictionary of question.answer_type codes that map to the relevant Answer sub-class, is created in views.py (because its hard to place it in models.py and resolve dependencies):
# views.py:
ANSWER_CLASS_DICT = {
'CH': Vote,
'SA': Short_Answer,
'LA': Long_Answer,
'E3': EJ_three_field,
'E4': EJ_four_field,
'BS': Brainstorm,
'FB': FB,}
Then I can get the class of Answer that I want 'get_or_created' for any question with:
ANSWER_CLASS_DICT[question.answer_type]
I pass it as a parameter to the class method:
# models.py:
def get_or_update_answer(self, user, question, Cls, submitted_value=None, pk_ans=None):
if not submitted_value:
answer, _created = Cls.objects.get_or_create(user=user, question=question)
elif isinstance(submitted_value, dict):
answer, _created = Cls.objects.get_or_create(user=user, question=question)
for key, value in submitted_value.items():
setattr(answer, key, value)
else:
pass
So the same six lines of code handles get_or_creating any Answer when submitted_value=None (GET) or not (submitted_value).
Solution for element 3
The solution for element three has been to extend the model to separate at least three types of handling for users revisiting the same question:
'S' - single, which allows them to record only one answer, revisit and amend the answer, but never to give two different answers.
'T' - tracked, which allows them to update their answer every time, but makes the history of what their answer was available (e.g. to researchers.)
'M' - multiple, which allows many answers to be submitted to a question.
Still bug-fixing after all these changes, so I won't post code.
Next feature: compound questions and question templates, so people can use the admin to screen to make their own answer types.
Based on what you've shown, you're most of the way to reimplementing the Visitor pattern, which is a pretty standard way of handling this sort of situation (you have a bunch of related subclasses, each needing its own handling logic, and want to iterate over instances of them and do something with each).
I'd suggest taking a look at how that pattern works, and perhaps implementing it more explicitly.

Building Django Q() objects from other Q() objects, but with relation crossing context

I commonly find myself writing the same criteria in my Django application(s) more than once. I'll usually encapsulate it in a function that returns a Django Q() object, so that I can maintain the criteria in just one place.
I will do something like this in my code:
def CurrentAgentAgreementCriteria(useraccountid):
'''Returns Q that finds agent agreements that gives the useraccountid account current delegated permissions.'''
AgentAccountMatch = Q(agent__account__id=useraccountid)
StartBeforeNow = Q(start__lte=timezone.now())
EndAfterNow = Q(end__gte=timezone.now())
NoEnd = Q(end=None)
# Now put the criteria together
AgentAgreementCriteria = AgentAccountMatch & StartBeforeNow & (NoEnd | EndAfterNow)
return AgentAgreementCriteria
This makes it so that I don't have to think through the DB model more than once, and I can combine the return values from these functions to build more complex criterion. That works well so far, and has saved me time already when the DB model changes.
Something I have realized as I start to combine the criterion from these functions that is that a Q() object is inherently tied to the type of object .filter() is being called on. That is what I would expect.
I occasionally find myself wanting to use a Q() object from one of my functions to construct another Q object that is designed to filter a different, but related, model's instances.
Let's use a simple/contrived example to show what I mean. (It's simple enough that normally this would not be worth the overhead, but remember that I'm using a simple example here to illustrate what is more complicated in my app.)
Say I have a function that returns a Q() object that finds all Django users, whose username starts with an 'a':
def UsernameStartsWithAaccount():
return Q(username__startswith='a')
Say that I have a related model that is a user profile with settings including whether they want emails from us:
class UserProfile(models.Model):
account = models.OneToOneField(User, unique=True, related_name='azendalesappprofile')
emailMe = models.BooleanField(default=False)
Say I want to find all UserProfiles which have a username starting with 'a' AND want use to send them some email newsletter. I can easily write a Q() object for the latter:
wantsEmails = Q(emailMe=True)
but find myself wanting to something to do something like this for the former:
startsWithA = Q(account=UsernameStartsWithAaccount())
# And then
UserProfile.objects.filter(startsWithA & wantsEmails)
Unfortunately, that doesn't work (it generates invalid PSQL syntax when I tried it).
To put it another way, I'm looking for a syntax along the lines of Q(account=Q(id=9)) that would return the same results as Q(account__id=9).
So, a few questions arise from this:
Is there a syntax with Django Q() objects that allows you to add "context" to them to allow them to cross relational boundaries from the model you are running .filter() on?
If not, is this logically possible? (Since I can write Q(account__id=9) when I want to do something like Q(account=Q(id=9)) it seems like it would).
Maybe someone suggests something better, but I ended up passing the context manually to such functions. I don't think there is an easy solution, as you might need to call a whole chain of related tables to get to your field, like table1__table2__table3__profile__user__username, how would you guess that? User table could be linked to table2 too, but you don't need it in this case, so I think you can't avoid setting the path manually.
Also you can pass a dictionary to Q() and a list or a dictionary to filter() functions which is much easier to work with than using keyword parameters and applying &.
def UsernameStartsWithAaccount(context=''):
field = 'username__startswith'
if context:
field = context + '__' + field
return Q(**{field: 'a'})
Then if you simply need to AND your conditions you can combine them into a list and pass to filter:
UserProfile.objects.filter(*[startsWithA, wantsEmails])

How to model a 'Like' mechanism via ndb?

We are about to introduce a social aspect into our app, where users can like each others events.
Getting this wrong would mean a lot of headache later on, hence I would love to get input from some experienced developers on GAE, how they would suggest to model it.
It seems there is a similar question here however the OP didn't provide any code to begin with.
Here are two models:
class Event(ndb.Model):
user = ndb.KeyProperty(kind=User, required=True)
time_of_day = ndb.DateTimeProperty(required=True)
notes = ndb.TextProperty()
timestamp = ndb.FloatProperty(required=True)
class User(UserMixin, ndb.Model):
firstname = ndb.StringProperty()
lastname = ndb.StringProperty()
We need to know who has liked an event, in case that the user may want to unlike it again. Hence we need to keep a reference. But how?
One way would be introducing a RepeatedProperty to the Event class.
class Event(ndb.Model):
....
ndb.KeyProperty(kind=User, repeated=True)
That way any user that would like this Event, would be stored in here. The number of users in this list would determine the number of likes for this event.
Theoretically that should work. However this post from the creator of Python worries me:
Do not use repeated properties if you have more than 100-1000 values.
(1000 is probably already pushing it.) They weren't designed for such
use.
And back to square one. How am I supposed to design this?
RepeatProperty has limitation in number of values (< 1000).
One recommended way to break the limit is using shard:
class Event(ndb.Model):
# use a integer to store the total likes.
likes = ndb.IntegerProperty()
class EventLikeShard(ndb.Model):
# each shard only store 500 users.
event = ndb.KeyProperty(kind=Event)
users = ndb.KeyProperty(kind=User, repeated=True)
If the limitation is more than 1000 but less than 100k.
A simpler way:
class Event(ndb.Model):
likers = ndb.PickleProperty(compressed=True)
Use another model "Like" where you keep the reference to user and event.
Old way of representing many to many in a relational manner. This way you keep all entities separated and can easily add/remove/count.
I would recommend the usual many-to-many relationship using an EventUser model given that the design seems to require unlimited number of user linking an event. The only tricky part is that you must ensure that event/user combination is unique, which can be done using _pre_put_hook. Keeping a likes counter as proposed by #lucemia is indeed a good idea.
You would then would capture the liked action using a boolean, or, you can make it a bit more flexible by including an actions string array. This way, you could also capture action such as signed-up or attended.
Here is a sample code:
class EventUser(ndb.Model):
event = ndb.KeyProperty(kind=Event, required=True)
user = ndb.KeyProperty(kind=User, required=True)
actions = ndb.StringProperty(repeated=True)
# make sure event/user is unique
def _pre_put_hook(self):
cur_key = self.key
for entry in self.query(EventUser.user == self.user, EventUser.event == self.event):
# If cur_key exists, means that user is performing update
if cur_key.id():
if cur_key == entry.key:
continue
else:
raise ValueError("User '%s' is a duplicated entry." % (self.user))
# If adding
raise ValueError("User Add '%s' is a duplicated entry." % (self.user))

Avoiding django QuerySet caching in a #staticmethod [duplicate]

This question already has answers here:
How do I force Django to ignore any caches and reload data?
(6 answers)
Closed 2 years ago.
The following few lines of code illustrate a distributed worker model that I use to crunch data. Jobs are being created in a database, their data goes onto the big drives, and once all information is available, the job status is set to 'WAITING'. From here, multiple active workers come into play: from time to time each of them issues a query, in which it attempts to "claim" a job. In order to synchronize the claims, the queries are encapsulated into a transaction that immediately changes the job state if the query returns a candidate. So far so good.
The problem is that the call to claim only works the first time. Reading up on QuerySets and their caching behavior, it seems to me that combining static methods and QuerySet caching always falls back on the cache... see for yourselves:
I have a class derived from django.db.models.Model:
class Job(models.Model):
[...]
in which I define the following static function.
#staticmethod
#transaction.commit_on_success
def claim():
# select the oldest, top priority job and
# update its record
jobs = Job.objects.filter(state__exact = 'WAITING').order_by('-priority', 'create_timestamp')
if jobs.count() > 0:
j = jobs[0]
j.state = 'CLAIMED'
j.save()
logger.info('Job::claim: claimed %s' % j.name)
return j
return None
Is there any obvious thing that I am doing wrong? What would be a better way of dealing with this? How can I make sure that the QuerySet does not cache its results across different invocations of the static method? Or am I missing something and chasing a phantom? Any help would be greatly appreciated... Thanks!
Why not just have a plain module-level function claim_jobs() that would run the query?
def claim_jobs():
jobs = Job.objects.filter(...)
... etc.

Django - AutoField with regards to a foreign key

I have a model with a unique integer that needs to increment with regards to a foreign key, and the following code is how I currently handle it:
class MyModel(models.Model):
business = models.ForeignKey(Business)
number = models.PositiveIntegerField()
spam = models.CharField(max_length=255)
class Meta:
unique_together = (('number', 'business'),)
def save(self, *args, **kwargs):
if self.pk is None: # New instance's only
try:
highest_number = MyModel.objects.filter(business=self.business).order_by('-number').all()[0].number
self.number = highest_number + 1
except ObjectDoesNotExist: # First MyModel instance
self.number = 1
super(MyModel, self).save(*args, **kwargs)
I have the following questions regarding this:
Multiple people can create MyModel instances for the same business, all over the internet. Is it possible for 2 people creating MyModel instances at the same time, and .count() returns 500 at the same time for both, and then both try to essentially set self.number = 501 at the same time (raising an IntegrityError)? The answer seems like an obvious "yes, it could happen", but I had to ask.
Is there a shortcut, or "Best way" to do this, which I can use (or perhaps a SuperAutoField that handles this)?
I can't just slap a while model_not_saved: try:, except IntegrityError: in, because other restraints in the model could lead to an endless loop, and a disaster worse than Chernobyl (maybe not quite that bad).
You want that constraint at the database level. Otherwise you're going to eventually run into the concurrency problem you discussed. The solution is to wrap the entire operation (read, increment, write) in a transaction.
Why can't you use an AutoField for instead of a PositiveIntegerField?
number = models.AutoField()
However, in this case number is almost certainly going to equal yourmodel.id, so why not just use that?
Edit:
Oh, I see what you want. You want a numberfield that doesn't increment unless there's more than one instance of MyModel.business.
I would still recommend just using the id field if you can, since it's certain to be unique. If you absolutely don't want to do that (maybe you're showing this number to users), then you will need to wrap your save method in a transaction.
You can read more about transactions in the docs:
http://docs.djangoproject.com/en/dev/topics/db/transactions/
If you're just using this to count how many instances of MyModel have a FK to Business, you should do that as a query rather than trying to store a count.

Categories