django-ratelimit stack keys. Not the intended behaviour - python

I think my understanding of django-ratelimit is incorrect. I am using v3.0.0 but v2.0 produces the same results.
Lets say I have this code:
#ratelimit(key='post:username', rate='5/h', block=True)
#ratelimit(key='post:tenant', rate='5/h', block=True)
#csrf_exempt
def index(request):
print(request.POST["username"])
print(request.POST["tenant"])
print("")
return HttpResponse('hallo', content_type='text/plain', status=200)
Let's say tenant A submits username "Antwon" 6 times, then tenant A will be blocked for 1 hour, which is good. But, lets say tenant B also has a user "Antwon", then that user for tenant B will not be able to log in.
I would assume that Antwon for tenant B should still be able to log in, otherwise tenant A can DOS other tenants?
Is this intended behavior or is my implementation incorrect?

My understanding is incorrect. Here is a response from the creator of django-ratelimit:
Hi there! I think there might be some confusion about how multiple ratelimits interact, and how the cache keys get set/rotated.
Multiple limits are ORed together—that is, if you fail any of them,
you fail—but order matters. In this case, the outer limit,
post:username, is tested first, then the tenant is tested. If the user
succeeds, it counts as an attempt, regardless of what happens later.
That's why the user keeps accruing attempts. If you want tenant to
count first, you could re-order the decorators. However...
If you want a single limit to the (username, tenant) pair, you could
combine the two fields with custom logic, by creating a callable key
that returned some combination of tenant and username (maybe
{tenant}\u04{username} or hash them together).
In terms of locking out tenants, there are a couple of things:
First, ratelimit uses a staggered fixed-window strategy, instead of a
sliding windows. So based on the rate, the period, the key value, and
the group, there'll be some calculated window. For example, for 1
hour, 11:03:29 will be the reset time, and the next reset time for
that same combination will be 12:03:29, then 13:03:29, etc... The
downside is that you if you had a limit of 100/h, you could do 200
requests in a short span around the reset point. The upsides are that
the calculation is share-nothing and can be done independently, the
window reset even if you accidentally try again a little too early.
Second, yes if you're doing hard blocks on user-supplied data like
username (instead of e.g. using the authenticated user) it creates a
denial-of-service vector. Where possible, another option is to use
block=False, and do something like require a captcha rather than fully
blocking.

Related

Posting data to database through a "workflow" (Ex: on field changed to 20, create new record)

I'm looking to post new records on a user triggered basis (i.e. workflow). I've spent the last couple of days reasearching the best way to approach this and so far I've come up with the following ideas:
(1) Utilize Django signals to check for conditions on a field change, and then post data originating from my Django app.
(2) Utilize JS/AJAX on the front-end to post data to the app based upon a user changing certain fields.
(3) Utilize a prebuilt workflow app like http://viewflow.io/, again based upon changes triggers by users.
Of the three above options, is there a best practice? Are there any other options I'm not considering for how to take this workflow based approach to post new records?
The second approach of monitoring the changes in the front end and then calling a backend view to update go database would be a better approach because processing on the backend or any other site would put the processing on the server which would slow down the site whereas second approach is more of a client side solution thereby keeping server relieved.
I do not think there will be a data loss, you are just trying to monitor a change, as soon as it changes your view will update the database, you can also use cookies or sessions to keep appending values as a list and update the database when site closes. Also django gives https errors you could put proper try and except conditions in that case as well. Anyways cookies would be a good approach I think
For anyone that finds this post I ended up deciding to take the Signals route. Essentially I'm utilizing Signals to track when users change a fields, and based on the field that changes I'm performing certain actions on the database.
For testing purposes this has been working well. When I reach production with this project I'll try to update this post with any challenges I run into.
Example:
#receiver(pre_save, sender=subTaskChecklist)
def do_something_if_changed(sender, instance, **kwargs):
try:
obj = sender.objects.get(pk=instance.pk) #define obj as "old" before change values
except sender.DoesNotExist:
pass
else:
previous_Value = obj.FieldToTrack
new_Value = instance.FieldToTrack #instance represents the "new" after change object
DoSomethingWithChangedField(new_Value)

Ambiguous step in Python Behave

My business user likes to use the then sentence "It should be created", where it is determined by the context of the scenario. For example:
Given I have gift certificate for "<name>"
When I enter the gift certificate
Then It should be created
or
Given Customer order return for order "<order_no>"
When I create the customer order return
Then It should be created
In the "Then It should be created", I would like to retrieve either the created gift certificate or customer order return for comparison. However, they have completely different API and object.
Firstly, is there a way to do this in Python Behave without getting "Exception AmbiguousStep:"?
If not, what would be best practice in BDD world for this without forcing user to have to repeat themselves constantly by saying "Then The gift certificate should be created" or "Then The customer order return should be created"?
Thanks.
In the specific case you are giving us here, I would write the steps more verbosely to avoid the "it". So I would write "Then the gift certificate should be created", etc. I prefer to avoid having steps depend on state passed through context.
However...
There are times where it would be problematic to do this. In your case, maybe the politics of dealing with your business user make it so that asking for more verbosity would not fly well. Or there can be technical reasons that cause what I suggested above to be undesirable or flat out unworkable.
What you can do if you cannot use more verbose steps, is have the Then it should be created step be dependent on a context field being set to a value that will provide enough information to the step to perform its work. It could be something like context.created_object. The step that creates the object would set this field to an appropriate value so that Then it should be created can perform its work. What exactly you would store in there depends on the specifics of your application.
For one application of mine where I test the appearance of a contextual menu on the basis of mouse clicks in a browser window, sometimes what I record is a reference to the DOM element on which the user did the right-click that brought up the menu. Sometimes it is an object providing x, y coordinates. This is what my application needs in order to perform its checks. In this case it is preferable to have the information be passed through context because having Selenium query the DOM all over again in later steps can be very expensive over the network. Over dozens of tests, it can easily add minutes to a test suite's run, and then consider that the suite has to be run for multiple combinations of browser, OS, and browser version.

Django : How to count number of people viewed

I'm making a simple BBS application in Django and I want it so that whenever someone sees a post, the number of views on that post (post_view_no) is increased.
At the moment, I face two difficulties:
I need to limit the increase in post_view_no so that one user can only increase it once regardless of how many times the user refreshes/clicks on the post.
I also need to be able to track the users that are not logged in.
Regards to the first issue, it seems pretty easy as long as I create a model called 'View' and check the db but I have a feeling this may be an overkill.
In terms of second issue, all I can think of is using cookies / IP address to track the users but IP is hardly unique and I cannot figure out how to use cookies
I believe this is a common feature on forum/bbs solutions but google search only turned up with plugins or 'dumb' solutions that increase the view each time the post is viewed.
What would be the best way to go about this?
I think you can do both things via cookies. For example, when user visits a page, you can
Check if they have “viewed_post_%s” (where %s is post ID) key set in their session.
If they have, do nothing. If they don't, increase view_count numeric field of your corresponding Post object by one, and set the key (cookie) “viewed_post_%s” in their session (so that it won't count in future).
This would work with both anonymous and registered users, however by clearing cookies or setting up browser to reject them user can game the view count.
Now using cookies (sessions) with Django is quite easy: to set a value for current user, you just invoke something like
request.session['viewed_post_%s' % post.id] = True
in your view, and done. (Check the docs, and especially examples.)
Disclaimer: this is off the top of my head, I haven't done this personally, usually when there's a need to do some page view / activity tracking (so that you see what drives more traffic to your website, when users are more active, etc.) then there's a point in using a specialized system (e.g., Google Analytics, StatsD). But for some specific use case, or as an exercise, this should work.
Just to offer a secondary solution, which I think would work but is also prone to gaming (if coming by proxy or different devices). I haven't tried this either but I think it should work and wouldn't require to think about cookies, plus you aggregate some extra data which is noice.
I would make a model called TrackedPosts.
class TrackedPosts(models.Model):
post = models.ForeignKey(Post)
ip = models.CharField(max_length=16) #only accounting for ipv4
user = models.ForeignKey(User) #if you want to track logged in or anonymous
Then when you view a post, you would take the requests ip.
def my_post_view(request, post_id):
#you could check for logged in users as well.
tracked_post, created = TrackedPost.objects.get_or_create(post__pk=id, ip=request.ip, user=request.user) #note, not actual api
if created:
tracked_post.post.count += 1
tracked_post.post.save()
return render_to_response('')

Django - how to deal with concurrency?

I'm currently developing a game, in which user can get experience points. My (custom) user model looks something like this:
class TriariadUser(AbstractBaseUser, PermissionsMixin):
pseudonym = models.CharField(max_length=40)
level = models.PositiveSmallIntegerField(default=1)
experience_points = models.PositiveIntegerField(default=0)
def add_ep_points(self, points):
self.experience_points += points
if self.experience_points >= get_next_level_ep(self.level):
level += 1
self.save()
Now I have various signal listeners that can add experience points to a user. The problem is: If multiple XP gains occur during one request, the last XP gain overwrites all others.
Clearly this is a race condition, so I tried to modify my function to be static and use select_for_update:
#staticmethod
def add_ep_points(user_id, points):
user = TriariadUser.objects.select_for_update.get(pk=user_id)
user.experience_points += points
...
This works as intended, however the user object in the template is not updated. That is, that a new request must be made that the user sees what happened. Using the django debug toolbar I can see, that the request, which loads the user is made at the beginning. After that all relevant updates are made. But the user is not reloaded afterwards, so the old state is displayed.
I can think of various workarounds like a reload by using JavaScript, but there must be some other solutions for this (at least I hope so).
Is there a way to lock a object in django? Is there a way to tell that an object needs to be reloaded? Is there a better way to accomplish this (maybe using some kind of Middleware?).
To avoid that here's what I would do. First create a UserExpRecord model with a relation to the user and a +/- amount for how much xp you're adding or removing. Then these signals can add a new UserExpRecord for giving a user xp. Have the UserExpRecord emit a save signal that notifies the user model it needs to recollect (SUM) all the xp records related for a user and save that value to the user.
This gives you the immediate benefit of having a record of when and how much xp was added for a user. The secondary benefit is you can avoid any sort of race condition because you aren't trying to lock a table row and increment the value.
That said, depending on your backend there may be an atomic thread-safe “upsert” or “increment” function that will allow you to within a transaction safely increment a value while blocking all other writes. This will allow the writes to stack correctly. I believe the first solution (separate xp records) will come with a smaller headache (cache invalidation) than this or your current solution (unknown race conditions with missing / dropped xp updates).

Django creating duplicate follow relationships -- concurrency issues suspected when creating new data in table

Let me explain my particular situation:
A business can pick a neighbourhood they are in, this option is persisted to the DB, a signal is fired when this objects is saved. A method listens for this signal and should only once update all other users who follow this neighbourhood as well. There is a check that happens in this method, trying to verify if any other user is already following this business, for every user that is not following this business, but following this neighbourhood, a follow relation will be created in the db. Everything should be fine, if user is already following this business, then no relation is set...
But what happes is that sometimes two or more of these transactions happen at the same time, all of them checking if the user is following this business, of course, since none of them can see a follow relation between the user and the business, multiple follow relations are now established.
I tried making sure the signal isn't sent multiple times, but I'm not sure why these multiple transactions are happening at the same time.
While I have found some answers to doing row locking when trying to avoid concurrency problems on updates, I am at a loss about how to make sure that only one insert happens.
Is table locking the only way to ensure that one one insert of a kind happens?
# when a business updates their neighborhood, this sets the follow relationship
def follow_biz(sender, instance, created, **kwargs):
if instance.neighborhood:
neighborhood_followers = FollowNeighborhood.objects.filter(neighborhood=instance.neighborhood)
# print 'INSTANCE Neighborhood %s ' % instance.neighborhood
for follower in neighborhood_followers:
if not Follow.objects.filter(user=follower.user, business=instance).count():
follow = Follow(business=instance, user=follower.user)
follow.save()
# a unique static string to prevent signal from being called twice
follow_biz_signal_uid = hashlib.sha224("follow_biz").hexdigest()
# signal
post_save.connect(follow_biz, sender=Business, dispatch_uid=follow_biz_signal_uid)
By ensuring uniqueness[1] of the rows at the database level using a constraint on the relevant columns Django, AFAICT, will do the right thing and insert or update as necessary.
In this case the constraint should be on the user and business id.
[1] Of course ensuring uniqueness where applicable is always a good idea.

Categories