Django Delete all but last five of queryset - python

I have a super simple django model here:
class Notification(models.Model):
message = models.TextField()
user = models.ForeignKey(User)
timestamp = models.DateTimeField(default=datetime.datetime.now)
Using ajax, I check for new messages every minute. I only show the five most recent notifications to the user at any time. What I'm trying to avoid, is the following scenario.
User logs in and has no notifications. While the user's window is up, he receives 10 new messages. Since I'm only showing him five, no big deal. The problem happens when the user starts to delete his notifications. If he deletes the five that are displayed, the five older ones will be displayed on the next ajax call or refresh.
I'd like to have my model's save method delete everything but the 5 most recent objects whenever a new one is saved. Unfortunately, you can't use [5:] to do this. Help?
EDIT
I tried this which didn't work as expected (in the model's save method):
notes = Notification.objects.filter(user=self.user)[:4]
Notification.objects.exclude(pk__in=notes).delete()
i couldn't find a pattern in strange behavior, but after a while of testing, it would only delete the most recent one when a new one was created. i have NO idea why this would be. the ordering is taken care of in the model's Meta class (by timestamp descending). thanks for the help, but my way seems to be the only one that works consistently.

This is a bit old, but I believe you can do the following:
notes = Notification.objects.filter(user=self.user)[:4]
Notification.objects.exclude(pk__in=list(notes)).delete() # list() forces a database hit.
It costs two hits, but avoids using the for loop with transactions middleware.
The reason for using list(notes) is that Django creates a single query without it and, in Mysql 5.1, this raises the error
(1235, "This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'")
By using list(notes), we force a query of notes, avoiding this.
This can be further optimized to:
notes = Notification.objects.filter(user=self.user)[:4].values_list("id", flat=True) # only retrieve ids.
Notification.objects.exclude(pk__in=list(notes)).delete()

Use an inner query to get the set of items you want to keep and then filter on them.
objects_to_keep = Notification.objects.filter(user=user).order_by('-created_at')[:5]
Notification.objects.exclude(pk__in=objects_to_keep).delete()
Double check this before you use it. I have found that simpler inner queries do not always behave as expected. The strange behavior I have experienced has been limited to querysets that are just an order_by and a slice. Since you will have to filter on user, you should be fine.

this is how i ended up doing this.
notes = Notification.objects.filter(user=self.user)
for note in notes[4:]:
note.delete()
because i'm doing this in the save method, the only way the loop would ever have to run more than once would be if the user got multiple notifications at once. i'm not worried about that happening (while it may happen it's not likely to be enough to cause a problem).

Related

Posting data to database through a "workflow" (Ex: on field changed to 20, create new record)

I'm looking to post new records on a user triggered basis (i.e. workflow). I've spent the last couple of days reasearching the best way to approach this and so far I've come up with the following ideas:
(1) Utilize Django signals to check for conditions on a field change, and then post data originating from my Django app.
(2) Utilize JS/AJAX on the front-end to post data to the app based upon a user changing certain fields.
(3) Utilize a prebuilt workflow app like http://viewflow.io/, again based upon changes triggers by users.
Of the three above options, is there a best practice? Are there any other options I'm not considering for how to take this workflow based approach to post new records?
The second approach of monitoring the changes in the front end and then calling a backend view to update go database would be a better approach because processing on the backend or any other site would put the processing on the server which would slow down the site whereas second approach is more of a client side solution thereby keeping server relieved.
I do not think there will be a data loss, you are just trying to monitor a change, as soon as it changes your view will update the database, you can also use cookies or sessions to keep appending values as a list and update the database when site closes. Also django gives https errors you could put proper try and except conditions in that case as well. Anyways cookies would be a good approach I think
For anyone that finds this post I ended up deciding to take the Signals route. Essentially I'm utilizing Signals to track when users change a fields, and based on the field that changes I'm performing certain actions on the database.
For testing purposes this has been working well. When I reach production with this project I'll try to update this post with any challenges I run into.
Example:
#receiver(pre_save, sender=subTaskChecklist)
def do_something_if_changed(sender, instance, **kwargs):
try:
obj = sender.objects.get(pk=instance.pk) #define obj as "old" before change values
except sender.DoesNotExist:
pass
else:
previous_Value = obj.FieldToTrack
new_Value = instance.FieldToTrack #instance represents the "new" after change object
DoSomethingWithChangedField(new_Value)

How can I quickly set a field of all instances of a Django model at once?

To clarify, I've got several thousands of Property items, each with a 'present' field (among others). To reset the system for use again, I need to set every item's 'property' field to false. Now, of course there's the easy way to do it, which is just:
for obj in Property.objects.all():
obj.present = False
obj.save()
But this takes nearly 30 seconds on my development server. I feel there must be a better way, so I tried limiting the loaded fields using Django's only queryset:
for obj in Property.objects.only('present'):
obj.present = False
obj.save()
For whatever reason, this actually takes longer than just getting the entire object.
Because I need to indiscriminately set all of these values to False, is there a faster way? This function takes no user input other than the 'go do it' command, so I feel a native SQL command would be a safe option, but I don't know SQL enough to draft such a command.
Thanks everyone.
Use the update query:
Property.objects.all().update(present=False)
Note that update() query runs at SQL level, so if your model has a custom save() method then it is not going to be called here. In that case, the normal for-loop version that you're using is the way to go.

Django - how to deal with concurrency?

I'm currently developing a game, in which user can get experience points. My (custom) user model looks something like this:
class TriariadUser(AbstractBaseUser, PermissionsMixin):
pseudonym = models.CharField(max_length=40)
level = models.PositiveSmallIntegerField(default=1)
experience_points = models.PositiveIntegerField(default=0)
def add_ep_points(self, points):
self.experience_points += points
if self.experience_points >= get_next_level_ep(self.level):
level += 1
self.save()
Now I have various signal listeners that can add experience points to a user. The problem is: If multiple XP gains occur during one request, the last XP gain overwrites all others.
Clearly this is a race condition, so I tried to modify my function to be static and use select_for_update:
#staticmethod
def add_ep_points(user_id, points):
user = TriariadUser.objects.select_for_update.get(pk=user_id)
user.experience_points += points
...
This works as intended, however the user object in the template is not updated. That is, that a new request must be made that the user sees what happened. Using the django debug toolbar I can see, that the request, which loads the user is made at the beginning. After that all relevant updates are made. But the user is not reloaded afterwards, so the old state is displayed.
I can think of various workarounds like a reload by using JavaScript, but there must be some other solutions for this (at least I hope so).
Is there a way to lock a object in django? Is there a way to tell that an object needs to be reloaded? Is there a better way to accomplish this (maybe using some kind of Middleware?).
To avoid that here's what I would do. First create a UserExpRecord model with a relation to the user and a +/- amount for how much xp you're adding or removing. Then these signals can add a new UserExpRecord for giving a user xp. Have the UserExpRecord emit a save signal that notifies the user model it needs to recollect (SUM) all the xp records related for a user and save that value to the user.
This gives you the immediate benefit of having a record of when and how much xp was added for a user. The secondary benefit is you can avoid any sort of race condition because you aren't trying to lock a table row and increment the value.
That said, depending on your backend there may be an atomic thread-safe “upsert” or “increment” function that will allow you to within a transaction safely increment a value while blocking all other writes. This will allow the writes to stack correctly. I believe the first solution (separate xp records) will come with a smaller headache (cache invalidation) than this or your current solution (unknown race conditions with missing / dropped xp updates).

Django models - assign id instead of object

I apologize if my question turns out to be silly, but I'm rather new to Django, and I could not find an answer anywhere.
I have the following model:
class BlackListEntry(models.Model):
user_banned = models.ForeignKey(auth.models.User,related_name="user_banned")
user_banning = models.ForeignKey(auth.models.User,related_name="user_banning")
Now, when i try to create an object like this:
BlackListEntry.objects.create(user_banned=int(user_id),user_banning=int(banning_id))
I get a following error:
Cannot assign "1": "BlackListEntry.user_banned" must be a "User" instance.
Of course, if i replace it with something like this:
user_banned = User.objects.get(pk=user_id)
user_banning = User.objects.get(pk=banning_id)
BlackListEntry.objects.create(user_banned=user_banned,user_banning=user_banning)
everything works fine. The question is:
Does my solution hit the database to retrieve both users, and if yes, is it possible to avoid it, just passing ids?
The answer to your question is: YES.
Django will hit the database (at least) 3 times, 2 to retrieve the two User objects and a third one to commit your desired information. This will cause an absolutelly unnecessary overhead.
Just try:
BlackListEntry.objects.create(user_banned_id=int(user_id),user_banning_id=int(banning_id))
These is the default name pattern for the FK fields generated by Django ORM. This way you can set the information directly and avoid the queries.
If you wanted to query for the already saved BlackListEntry objects, you can navigate the attributes with a double underscore, like this:
BlackListEntry.objects.filter(user_banned__id=int(user_id),user_banning__id=int(banning_id))
This is how you access properties in Django querysets. with a double underscore. Then you can compare to the value of the attribute.
Though very similar, they work completely different. The first one sets an atribute directly while the second one is parsed by django, that splits it at the '__', and query the database the right way, being the second part the name of an attribute.
You can always compare user_banned and user_banning with the actual User objects, instead of their ids. But there is no use for this if you don't already have those objects with you.
Hope it helps.
I do believe that when you fetch the users, it is going to hit the db...
To avoid it, you would have to write the raw sql to do the update using method described here:
https://docs.djangoproject.com/en/dev/topics/db/sql/
If you decide to go that route keep in mind you are responsible for protecting yourself from sql injection attacks.
Another alternative would be to cache the user_banned and user_banning objects.
But in all likelihood, simply grabbing the users and creating the BlackListEntry won't cause you any noticeable performance problems. Caching or executing raw sql will only provide a small benefit. You're probably going to run into other issues before this becomes a problem.

Django creating duplicate follow relationships -- concurrency issues suspected when creating new data in table

Let me explain my particular situation:
A business can pick a neighbourhood they are in, this option is persisted to the DB, a signal is fired when this objects is saved. A method listens for this signal and should only once update all other users who follow this neighbourhood as well. There is a check that happens in this method, trying to verify if any other user is already following this business, for every user that is not following this business, but following this neighbourhood, a follow relation will be created in the db. Everything should be fine, if user is already following this business, then no relation is set...
But what happes is that sometimes two or more of these transactions happen at the same time, all of them checking if the user is following this business, of course, since none of them can see a follow relation between the user and the business, multiple follow relations are now established.
I tried making sure the signal isn't sent multiple times, but I'm not sure why these multiple transactions are happening at the same time.
While I have found some answers to doing row locking when trying to avoid concurrency problems on updates, I am at a loss about how to make sure that only one insert happens.
Is table locking the only way to ensure that one one insert of a kind happens?
# when a business updates their neighborhood, this sets the follow relationship
def follow_biz(sender, instance, created, **kwargs):
if instance.neighborhood:
neighborhood_followers = FollowNeighborhood.objects.filter(neighborhood=instance.neighborhood)
# print 'INSTANCE Neighborhood %s ' % instance.neighborhood
for follower in neighborhood_followers:
if not Follow.objects.filter(user=follower.user, business=instance).count():
follow = Follow(business=instance, user=follower.user)
follow.save()
# a unique static string to prevent signal from being called twice
follow_biz_signal_uid = hashlib.sha224("follow_biz").hexdigest()
# signal
post_save.connect(follow_biz, sender=Business, dispatch_uid=follow_biz_signal_uid)
By ensuring uniqueness[1] of the rows at the database level using a constraint on the relevant columns Django, AFAICT, will do the right thing and insert or update as necessary.
In this case the constraint should be on the user and business id.
[1] Of course ensuring uniqueness where applicable is always a good idea.

Categories