I'm building web service with ranking function.
I don't have powerful servers: whole service would be hosted in standard PC.
There could be time, when many users (in this case many = ~100) are refreshing ranking, so I would do it way, in which users wouldn't crash server because of this.
There is no problem in no real-time refreshing: I can show user ranking generated some time before.
There is no problem for me in generating ranking.
I can easily do this:
User.objects.filter(...).order_by('rank')
EDIT: More details:
I have some workers doing some calculating.
When worker ends its work, it changes rank field of some User instance.
You can assume, all users would do actions leading to several (5-20) calculating, each causing rank change of this user.
If updating the ranking is too long a task to do per-request, then here are a few solutions you could be using:
After something that updates a ranking happens, create an asynchronous task that will update the rankings but not block the request. You could use celery or gearman
Update periodically the rankings, using a celery periodic task or a cron job
Solution 1 is better performance wise but harder to get right. Solution 2 is easier to do, but could less optimal.
Related
I have a Django applications serving multiple users. Each user can submit resource-intensive tasks (minutes to hours) to be executed. I want to execute the tasks based on a fair distribution of the resources. The backend uses Celery and RabbitMQ for task execution.
I have looked extensively and haven't been able to find any solution for my particular case (or haven't been able to piece it together.) As far as I can tell, there isn't any build-in features able to do this in Celery and RabbitMQ. Is it possible to have custom code to handle the order of execution of the tasks? This would allow to calculate priorities based on user data and chose which task should be executed next.
Related: How can Celery distribute users' tasks in a fair way?
The AMPQ queues are FIFO. So it is impossible to grab items from the middle of the queue to execute. The two solutions that come to mind are:
a.) As mentioned in the other post, use a lock to limit resources by user.
b.) Have 2 queues; a submission queue and an execution queue. The submission queue keeps the execution queue full of work based on whatever algorithm you choose to implement. This will likely be more complex, but may be more along the lines of what you are looking for.
So far I have investigated two different ways of persistently tracking player attribute skills in a game. These are mainly conceptual except for the threading option I came up with / found an example for.
The case:
Solo developing a web game. Geo political simulator but with a little twist in comparison to others out there which I won't reveal.
I'm using a combination of Flask and SQLAlchemy for which I have written routes for and have templates extending into a base dynamically.
Currently running it in dev mode locally with the intention of putting it behind a WSGI and a reverse proxy like Nginx on the cloud based Linux vm.
About the player attribute mechanics - a player will submit a post request which will specify a few bits of information. First we want to know which skill, intelligence, endurance etc. Next wee need to know which player, but all of this will be generated automatically, we can use Flask-LoginManager to get the current user with our nifty user_loader decorator and function. We can use the user ID it provides to query the rest of it, namely what level the player is. We can specify the math used to decide the wait time increase later in seconds.
The options;
Option 1:
As suggested by a colleague of mine. Allow the database to manage the timings of the skills. When the user submits the form, we will have created a new table to hold skill upgrade information. We take a note of what time the user submitted the form and also we multiply the current skill level by a factor of X amount of time and we put both pieces of data into the database. Then we create a new process that manages the constant checking of this table. Using timedelta, we can check if the amount of time that has elapsed since the form was submitted is equal to or greater than the time the player must wait until the upgrade is complete.
Option 2:
Import threading and create a class which expects the same information as abovr supplied on init and then simply use time.sleep for X amount of time then fire the upgrade and kill the thread when it's finished.
I hope this all makes sense. I haven't written either yet because I am undecided about which is the most efficient way around it.
I'm looking for the most scalable solution (even if it's not an option listed here) but one that is also as practical or an improvement on my concept of the skill tracking mechanic.
I'm open to adding another lib to the package but I really would rather not.
I'll expand on my comment a little bit:
In terms of scaleability:
What if the upgrade processes become very long? Hours or days?
What if you have a lot of users
What if people disconnect and reconnect to sessions at different times?
Hopefully it is clear you cannot ensure a robust process with option 2. Threading and waiting will put a continuous and potentially limiting load on a server and if a server fails all those threads likely to be lost.
In terms of robustness:
On the other hand if you record all of the information to a database you have the facility to cross check the states of any items and perform upgrade/downgrade actions as deemed necessary by some form of task scheduler. This allows you to ensure that character states are always consistent with what you expect. And you only need one process to scan through the DB periodically and perform actions on all of the open rows flagged for an upgrade.
You could, if you wanted, also avoid a global task scheduler altogether. When a user performs an activity on the site a little task could run in the background (as a kind of decorator) that checks the upgrade status and if the time is right performs the DB activity, otherwise just passes. But a user would need to be actively in a session to make sure this happens, as opposed to the scheduled task above.
The task I'm implementing is related to scrape some basic info about a URL, such as title, description and OGP metadata. If User A requests 200 URLs to scrape, and after User B requests for 10 URLs, User B may wait much more than s/he expect.
What I'm trying to achieve is to rate limit a specific task on a per user basis or, at least, to be fair between users.
The Celery implementation for rate limiting is too broad, since it uses the task name only
Do you have any suggestion to achieve this kind of fairness?
Related Celery (Django) Rate limiting
Another way would be to rate limit individual users using a lock. Use the user id as the lock name. If the lock is already held retry after some task dependent delay.
Basically, do this:
Ensuring a task is only executed one at a time
Lock on the user id and retry instead of doing nothing if the lock can't be acquired. Also, it would be better to use Redis instead of the the Django cache, but either way will work.
One way to work this around could be to control that a user does not enqueue more than x tasks, which means counting for each user the number of non-processed tasks enqueued (on the django side, not trying to do this with celery).
How about, instead of running all URL scrapes in a single task, make each scrape into a single task and then run them as chains or groups?
I am using Flask.
I am currently using a fabfile to check which users should get a bill and I set up a cron job to run the fabfile every morning at 5am. This automatically creates bills in Stripe and in my database and sends out emails to the users to inform them. This could be used for birthday reminders or anything else similar.
Is setting up a cronjob the standard way of doing this sort of thing? Is there a better way/standard?
I would define "this sort of thing" as. Anything that needs to happen automatically in the app when certain criteria are met without a user interacting with said app.
I could not find much when I googled this.
Using cron is in effect the most straightforward way of doing it. However, there are other kind of services that trigger tasks on a periodic basis and offer some additional control. For instance, Celery's scheduler. There seems to be a tutorial about building periodic tasks with celery here.
What I think you have to ask yourself is:
Is a cron job the most reliable way of billing your customers?
I've written small/simple apps that use an internal timer. e.g: https://bitbucket.org/prologic/irclogger which roates it's irc log files once per day. Is this any better or more reliable? Not really; if the daemon/bot were to die prematurely or the system were to crash; what happens then? In this case it just gets started again and logs continue to rorate at the next "day" interval.
I think two things are important here:
Reliability
Robustness
I am trying to set up some scheduled tasks for a Django app with celery, hosted on heroku. Aside from not know how everything should be configured, what is the best way to approach this?
Let's say users can opt to receive a daily email at a time of their choosing.
Should I have a scheduled job that run every, say 5 minutes. Looks up every user who wants to be emailed at that time and then fire off the emails?
OR
Schedule a task for each user, when they set their preference. (Not sure how I would actually implement this yet)
It depends on how much accuracy you need. Do you want users to select the time down to the minute? second? or will allowing them to select the hour they wish to be emailed be enough.
If on the hour is accurate enough, then use a task that polls for users to mail every hour.
If your users need the mail to go out accurate to the second, then set a task for each user timed to complete on that second.
Everything in between comes down to personal choice. What are you more comfortable doing, and even more importantly: what produces the simplest code with the fewest failure modes?
I would suggest the first option (scheduled job that looks up outstanding jobs) - easier to scale and manage. What if you have 1000s of users - that is a lot of tasks JUST for sending emails.
If you use your database as celery broker, you can use django-celery's built in cron-like scheduling, which would allow you to create and destroy tasks dynamically. I don't like using the DB for my broker, though.
Also, you might want to check out chronos