Django - how to deal with concurrency?

Django - how to deal with concurrency? - python

I'm currently developing a game, in which user can get experience points. My (custom) user model looks something like this:
class TriariadUser(AbstractBaseUser, PermissionsMixin):
pseudonym = models.CharField(max_length=40)
level = models.PositiveSmallIntegerField(default=1)
experience_points = models.PositiveIntegerField(default=0)
def add_ep_points(self, points):
self.experience_points += points
if self.experience_points >= get_next_level_ep(self.level):
level += 1
self.save()
Now I have various signal listeners that can add experience points to a user. The problem is: If multiple XP gains occur during one request, the last XP gain overwrites all others.
Clearly this is a race condition, so I tried to modify my function to be static and use select_for_update:
#staticmethod
def add_ep_points(user_id, points):
user = TriariadUser.objects.select_for_update.get(pk=user_id)
user.experience_points += points
...
This works as intended, however the user object in the template is not updated. That is, that a new request must be made that the user sees what happened. Using the django debug toolbar I can see, that the request, which loads the user is made at the beginning. After that all relevant updates are made. But the user is not reloaded afterwards, so the old state is displayed.
I can think of various workarounds like a reload by using JavaScript, but there must be some other solutions for this (at least I hope so).
Is there a way to lock a object in django? Is there a way to tell that an object needs to be reloaded? Is there a better way to accomplish this (maybe using some kind of Middleware?).

To avoid that here's what I would do. First create a UserExpRecord model with a relation to the user and a +/- amount for how much xp you're adding or removing. Then these signals can add a new UserExpRecord for giving a user xp. Have the UserExpRecord emit a save signal that notifies the user model it needs to recollect (SUM) all the xp records related for a user and save that value to the user.
This gives you the immediate benefit of having a record of when and how much xp was added for a user. The secondary benefit is you can avoid any sort of race condition because you aren't trying to lock a table row and increment the value.
That said, depending on your backend there may be an atomic thread-safe “upsert” or “increment” function that will allow you to within a transaction safely increment a value while blocking all other writes. This will allow the writes to stack correctly. I believe the first solution (separate xp records) will come with a smaller headache (cache invalidation) than this or your current solution (unknown race conditions with missing / dropped xp updates).

Related

django-ratelimit stack keys. Not the intended behaviour

I think my understanding of django-ratelimit is incorrect. I am using v3.0.0 but v2.0 produces the same results.
Lets say I have this code:
#ratelimit(key='post:username', rate='5/h', block=True)
#ratelimit(key='post:tenant', rate='5/h', block=True)
#csrf_exempt
def index(request):
print(request.POST["username"])
print(request.POST["tenant"])
print("")
return HttpResponse('hallo', content_type='text/plain', status=200)
Let's say tenant A submits username "Antwon" 6 times, then tenant A will be blocked for 1 hour, which is good. But, lets say tenant B also has a user "Antwon", then that user for tenant B will not be able to log in.
I would assume that Antwon for tenant B should still be able to log in, otherwise tenant A can DOS other tenants?
Is this intended behavior or is my implementation incorrect?

My understanding is incorrect. Here is a response from the creator of django-ratelimit:
Hi there! I think there might be some confusion about how multiple ratelimits interact, and how the cache keys get set/rotated.
Multiple limits are ORed together—that is, if you fail any of them,
you fail—but order matters. In this case, the outer limit,
post:username, is tested first, then the tenant is tested. If the user
succeeds, it counts as an attempt, regardless of what happens later.
That's why the user keeps accruing attempts. If you want tenant to
count first, you could re-order the decorators. However...
If you want a single limit to the (username, tenant) pair, you could
combine the two fields with custom logic, by creating a callable key
that returned some combination of tenant and username (maybe
{tenant}\u04{username} or hash them together).
In terms of locking out tenants, there are a couple of things:
First, ratelimit uses a staggered fixed-window strategy, instead of a
sliding windows. So based on the rate, the period, the key value, and
the group, there'll be some calculated window. For example, for 1
hour, 11:03:29 will be the reset time, and the next reset time for
that same combination will be 12:03:29, then 13:03:29, etc... The
downside is that you if you had a limit of 100/h, you could do 200
requests in a short span around the reset point. The upsides are that
the calculation is share-nothing and can be done independently, the
window reset even if you accidentally try again a little too early.
Second, yes if you're doing hard blocks on user-supplied data like
username (instead of e.g. using the authenticated user) it creates a
denial-of-service vector. Where possible, another option is to use
block=False, and do something like require a captcha rather than fully
blocking.

Posting data to database through a "workflow" (Ex: on field changed to 20, create new record)

I'm looking to post new records on a user triggered basis (i.e. workflow). I've spent the last couple of days reasearching the best way to approach this and so far I've come up with the following ideas:
(1) Utilize Django signals to check for conditions on a field change, and then post data originating from my Django app.
(2) Utilize JS/AJAX on the front-end to post data to the app based upon a user changing certain fields.
(3) Utilize a prebuilt workflow app like http://viewflow.io/, again based upon changes triggers by users.
Of the three above options, is there a best practice? Are there any other options I'm not considering for how to take this workflow based approach to post new records?

The second approach of monitoring the changes in the front end and then calling a backend view to update go database would be a better approach because processing on the backend or any other site would put the processing on the server which would slow down the site whereas second approach is more of a client side solution thereby keeping server relieved.
I do not think there will be a data loss, you are just trying to monitor a change, as soon as it changes your view will update the database, you can also use cookies or sessions to keep appending values as a list and update the database when site closes. Also django gives https errors you could put proper try and except conditions in that case as well. Anyways cookies would be a good approach I think

For anyone that finds this post I ended up deciding to take the Signals route. Essentially I'm utilizing Signals to track when users change a fields, and based on the field that changes I'm performing certain actions on the database.
For testing purposes this has been working well. When I reach production with this project I'll try to update this post with any challenges I run into.
Example:
#receiver(pre_save, sender=subTaskChecklist)
def do_something_if_changed(sender, instance, **kwargs):
try:
obj = sender.objects.get(pk=instance.pk) #define obj as "old" before change values
except sender.DoesNotExist:
pass
else:
previous_Value = obj.FieldToTrack
new_Value = instance.FieldToTrack #instance represents the "new" after change object
DoSomethingWithChangedField(new_Value)

Django : How to count number of people viewed

I'm making a simple BBS application in Django and I want it so that whenever someone sees a post, the number of views on that post (post_view_no) is increased.
At the moment, I face two difficulties:
I need to limit the increase in post_view_no so that one user can only increase it once regardless of how many times the user refreshes/clicks on the post.
I also need to be able to track the users that are not logged in.
Regards to the first issue, it seems pretty easy as long as I create a model called 'View' and check the db but I have a feeling this may be an overkill.
In terms of second issue, all I can think of is using cookies / IP address to track the users but IP is hardly unique and I cannot figure out how to use cookies
I believe this is a common feature on forum/bbs solutions but google search only turned up with plugins or 'dumb' solutions that increase the view each time the post is viewed.
What would be the best way to go about this?

I think you can do both things via cookies. For example, when user visits a page, you can
Check if they have “viewed_post_%s” (where %s is post ID) key set in their session.
If they have, do nothing. If they don't, increase view_count numeric field of your corresponding Post object by one, and set the key (cookie) “viewed_post_%s” in their session (so that it won't count in future).
This would work with both anonymous and registered users, however by clearing cookies or setting up browser to reject them user can game the view count.
Now using cookies (sessions) with Django is quite easy: to set a value for current user, you just invoke something like
request.session['viewed_post_%s' % post.id] = True
in your view, and done. (Check the docs, and especially examples.)
Disclaimer: this is off the top of my head, I haven't done this personally, usually when there's a need to do some page view / activity tracking (so that you see what drives more traffic to your website, when users are more active, etc.) then there's a point in using a specialized system (e.g., Google Analytics, StatsD). But for some specific use case, or as an exercise, this should work.

Just to offer a secondary solution, which I think would work but is also prone to gaming (if coming by proxy or different devices). I haven't tried this either but I think it should work and wouldn't require to think about cookies, plus you aggregate some extra data which is noice.
I would make a model called TrackedPosts.
class TrackedPosts(models.Model):
post = models.ForeignKey(Post)
ip = models.CharField(max_length=16) #only accounting for ipv4
user = models.ForeignKey(User) #if you want to track logged in or anonymous
Then when you view a post, you would take the requests ip.
def my_post_view(request, post_id):
#you could check for logged in users as well.
tracked_post, created = TrackedPost.objects.get_or_create(post__pk=id, ip=request.ip, user=request.user) #note, not actual api
if created:
tracked_post.post.count += 1
tracked_post.post.save()
return render_to_response('')

Django creating duplicate follow relationships -- concurrency issues suspected when creating new data in table

Let me explain my particular situation:
A business can pick a neighbourhood they are in, this option is persisted to the DB, a signal is fired when this objects is saved. A method listens for this signal and should only once update all other users who follow this neighbourhood as well. There is a check that happens in this method, trying to verify if any other user is already following this business, for every user that is not following this business, but following this neighbourhood, a follow relation will be created in the db. Everything should be fine, if user is already following this business, then no relation is set...
But what happes is that sometimes two or more of these transactions happen at the same time, all of them checking if the user is following this business, of course, since none of them can see a follow relation between the user and the business, multiple follow relations are now established.
I tried making sure the signal isn't sent multiple times, but I'm not sure why these multiple transactions are happening at the same time.
While I have found some answers to doing row locking when trying to avoid concurrency problems on updates, I am at a loss about how to make sure that only one insert happens.
Is table locking the only way to ensure that one one insert of a kind happens?
# when a business updates their neighborhood, this sets the follow relationship
def follow_biz(sender, instance, created, **kwargs):
if instance.neighborhood:
neighborhood_followers = FollowNeighborhood.objects.filter(neighborhood=instance.neighborhood)
# print 'INSTANCE Neighborhood %s ' % instance.neighborhood
for follower in neighborhood_followers:
if not Follow.objects.filter(user=follower.user, business=instance).count():
follow = Follow(business=instance, user=follower.user)
follow.save()
# a unique static string to prevent signal from being called twice
follow_biz_signal_uid = hashlib.sha224("follow_biz").hexdigest()
# signal
post_save.connect(follow_biz, sender=Business, dispatch_uid=follow_biz_signal_uid)

By ensuring uniqueness[1] of the rows at the database level using a constraint on the relevant columns Django, AFAICT, will do the right thing and insert or update as necessary.
In this case the constraint should be on the user and business id.
[1] Of course ensuring uniqueness where applicable is always a good idea.

Django Delete all but last five of queryset

I have a super simple django model here:
class Notification(models.Model):
message = models.TextField()
user = models.ForeignKey(User)
timestamp = models.DateTimeField(default=datetime.datetime.now)
Using ajax, I check for new messages every minute. I only show the five most recent notifications to the user at any time. What I'm trying to avoid, is the following scenario.
User logs in and has no notifications. While the user's window is up, he receives 10 new messages. Since I'm only showing him five, no big deal. The problem happens when the user starts to delete his notifications. If he deletes the five that are displayed, the five older ones will be displayed on the next ajax call or refresh.
I'd like to have my model's save method delete everything but the 5 most recent objects whenever a new one is saved. Unfortunately, you can't use [5:] to do this. Help?
EDIT
I tried this which didn't work as expected (in the model's save method):
notes = Notification.objects.filter(user=self.user)[:4]
Notification.objects.exclude(pk__in=notes).delete()
i couldn't find a pattern in strange behavior, but after a while of testing, it would only delete the most recent one when a new one was created. i have NO idea why this would be. the ordering is taken care of in the model's Meta class (by timestamp descending). thanks for the help, but my way seems to be the only one that works consistently.

This is a bit old, but I believe you can do the following:
notes = Notification.objects.filter(user=self.user)[:4]
Notification.objects.exclude(pk__in=list(notes)).delete() # list() forces a database hit.
It costs two hits, but avoids using the for loop with transactions middleware.
The reason for using list(notes) is that Django creates a single query without it and, in Mysql 5.1, this raises the error
(1235, "This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'")
By using list(notes), we force a query of notes, avoiding this.
This can be further optimized to:
notes = Notification.objects.filter(user=self.user)[:4].values_list("id", flat=True) # only retrieve ids.
Notification.objects.exclude(pk__in=list(notes)).delete()

Use an inner query to get the set of items you want to keep and then filter on them.
objects_to_keep = Notification.objects.filter(user=user).order_by('-created_at')[:5]
Notification.objects.exclude(pk__in=objects_to_keep).delete()
Double check this before you use it. I have found that simpler inner queries do not always behave as expected. The strange behavior I have experienced has been limited to querysets that are just an order_by and a slice. Since you will have to filter on user, you should be fine.

this is how i ended up doing this.
notes = Notification.objects.filter(user=self.user)
for note in notes[4:]:
note.delete()
because i'm doing this in the save method, the only way the loop would ever have to run more than once would be if the user got multiple notifications at once. i'm not worried about that happening (while it may happen it's not likely to be enough to cause a problem).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.