I'm working on a Django web app. The app includes messages that will self-delete after a certain amount of time. I'm using timezone.now() as the sent time and the user inputs a timedelta to display the message until. I'm checking to see if the message should delete itself by checking if current time is after sent time plus the time delta. Will this place a heavy load on the server? How frequently will it automatically check? Is there a way that I can tell it to check once a minute (or otherwise set the frequency)?
Thanks
How frequently will it automatically check?
who is "it" ? If you mean "the django process", then it will NOT check anything by itself. You will have to use either a cronjob or some async queue to take care of removing "dead" messages.
Is there a way that I can tell it to check once a minute (or otherwise set the frequency)?
Well yes, cf above. cronjobs are the simplest solution, async queues (like celery) are much more heavy-weight but if you have a lot of "off-band" processing (processes you want to launch from the request/response cycle BUT execute outside of it) then it's the way to go.
Will this place a heavy load on the server?
It's totally impossible to answer this. It depends on your exact models, the way you write the "check & clean" code, and, of course, data volumes. But using either a cronjob or an async queue this won't run within the django server process(es) itself, and can even be runned on another server as long as it can access the database. IOW the load will be on the database mostly (well, on the server running the process too of course but given your problem description a simple SQL delete query should be enough so..).
Related
I am working on Real Time based app, it needs to update location of user whenever it is changed.
Android app is used as frontend, which get location using Google/Fused Api and in onLocationChanged(loc:Location), I am sending the latest location over the Websocket. The location update is then received by a django channel consumer, and job of this consumer is to store location in database asynchronously (I am using #database_sync_to_async decorator.
But the problem is, server crashes when Android app tries to send 10-15 location updates per second. What will be the efficient way of updating real time location?
Note: Code can be supplied on demand
Ask yourself what kind of resolution you need for that data. Do you really need 10 updates a second? If not, take every nth update or see if Android will just give you the updates slower. Secondly, look for a native async database library. #database_sync_to_async runs a different thread every time you call it which kills the performance gains you're getting from the event loop. If you say in one thread you'll keep the CPU caches fresh. You won't get to use the ORM. But do you really need a database or would Redis work? If so, call aioredis directly and it will be a lot faster since its in memory and you can use it's fast data structures like queues and sets. If you need Redis to be even faster look at it's multithreaded fork KeyDB.
So far I have investigated two different ways of persistently tracking player attribute skills in a game. These are mainly conceptual except for the threading option I came up with / found an example for.
The case:
Solo developing a web game. Geo political simulator but with a little twist in comparison to others out there which I won't reveal.
I'm using a combination of Flask and SQLAlchemy for which I have written routes for and have templates extending into a base dynamically.
Currently running it in dev mode locally with the intention of putting it behind a WSGI and a reverse proxy like Nginx on the cloud based Linux vm.
About the player attribute mechanics - a player will submit a post request which will specify a few bits of information. First we want to know which skill, intelligence, endurance etc. Next wee need to know which player, but all of this will be generated automatically, we can use Flask-LoginManager to get the current user with our nifty user_loader decorator and function. We can use the user ID it provides to query the rest of it, namely what level the player is. We can specify the math used to decide the wait time increase later in seconds.
The options;
Option 1:
As suggested by a colleague of mine. Allow the database to manage the timings of the skills. When the user submits the form, we will have created a new table to hold skill upgrade information. We take a note of what time the user submitted the form and also we multiply the current skill level by a factor of X amount of time and we put both pieces of data into the database. Then we create a new process that manages the constant checking of this table. Using timedelta, we can check if the amount of time that has elapsed since the form was submitted is equal to or greater than the time the player must wait until the upgrade is complete.
Option 2:
Import threading and create a class which expects the same information as abovr supplied on init and then simply use time.sleep for X amount of time then fire the upgrade and kill the thread when it's finished.
I hope this all makes sense. I haven't written either yet because I am undecided about which is the most efficient way around it.
I'm looking for the most scalable solution (even if it's not an option listed here) but one that is also as practical or an improvement on my concept of the skill tracking mechanic.
I'm open to adding another lib to the package but I really would rather not.
I'll expand on my comment a little bit:
In terms of scaleability:
What if the upgrade processes become very long? Hours or days?
What if you have a lot of users
What if people disconnect and reconnect to sessions at different times?
Hopefully it is clear you cannot ensure a robust process with option 2. Threading and waiting will put a continuous and potentially limiting load on a server and if a server fails all those threads likely to be lost.
In terms of robustness:
On the other hand if you record all of the information to a database you have the facility to cross check the states of any items and perform upgrade/downgrade actions as deemed necessary by some form of task scheduler. This allows you to ensure that character states are always consistent with what you expect. And you only need one process to scan through the DB periodically and perform actions on all of the open rows flagged for an upgrade.
You could, if you wanted, also avoid a global task scheduler altogether. When a user performs an activity on the site a little task could run in the background (as a kind of decorator) that checks the upgrade status and if the time is right performs the DB activity, otherwise just passes. But a user would need to be actively in a session to make sure this happens, as opposed to the scheduled task above.
I'm working with Django1.8 and Python2.7.
In a certain part of the project, I open a socket and send some data through it. Due to the way the other end works, I need to leave some time (let's say 10 miliseconds) between each data that I send:
while True:
send(data)
sleep(0.01)
So my question is: is it considered a bad practive to simply use sleep() to create that pause? Is there maybe any other more efficient approach?
UPDATED:
The reason why I need to create that pause is because the other end of the socket is an external service that takes some time to process the chunks of data I send. I should also point out that it doesnt return anything after having received or let alone processed the data. Leaving that brief pause ensures that each chunk of data that I send gets properly processed by the receiver.
EDIT: changed the sleep to 0.01.
Yes, this is bad practice and an anti-pattern. You will tie up the "worker" which is processing this request for an unknown period of time, which will make it unavailable to serve other requests. The classic pattern for web applications is to service a request as-fast-as-possible, as there is generally a fixed or max number of concurrent workers. While this worker is continually sleeping, it's effectively out of the pool. If multiple requests hit this endpoint, multiple workers are tied up, so the rest of your application will experience a bottleneck. Beyond that, you also have potential issues with database locks or race conditions.
The standard approach to handling your situation is to use a task queue like Celery. Your web-application would tell Celery to initiate the task and then quickly finish with the request logic. Celery would then handle communicating with the 3rd party server. Django works with Celery exceptionally well, and there are many tutorials to help you with this.
If you need to provide information to the end-user, then you can generate a unique ID for the task and poll the result backend for an update by having the client refresh the URL every so often. (I think Celery will automatically generate a guid, but I usually specify one.)
Like most things, short answer: it depends.
Slightly longer answer:
If you're running it in an environment where you have many (50+ for example) connections to the webserver, all of which are triggering the sleep code, you're really not going to like the behavior. I would strongly recommend looking at using something like celery/rabbitmq so Django can dump the time delayed part onto something else and then quickly respond with a "task started" message.
If this is production, but you're the only person hitting the webserver, it still isn't great design, but if it works, it's going to be hard to justify the extra complexity of the task queue approach mentioned above.
I'm working on a project to learn Python, SQL, Javascript, running servers -- basically getting a grip of full-stack. Right now my basic goal is this:
I want to run a Python script infinitely, which is constantly making API calls to different services, which have different rate limits (e.g. 200/hr, 1000/hr, etc.) and storing the results (ints) in a database (PostgreSQL). I want to store these results over a period of time and then begin working with that data to display fun stuff on the front. I need this to run 24/7. I'm trying to understand the general architecture here, and searching around has proven surprisingly difficult. My basic idea in rough pseudocode is this:
database.connect()
def function1(serviceA):
while(True):
result = makeAPIcallA()
INSERT INTO tableA result;
if(hitRateLimitA):
sleep(limitTimeA)
def function2(serviceB):
//same thing, different limits, etc.
And I would ssh into my server, run python myScript.py &, shut my laptop down, and wait for the data to roll in. Here are my questions:
Does this approach make sense, or should I be doing something completely different?
Is it considered "bad" or dangerous to open a database connection indefinitely like this? If so, how else do I manage the DB?
I considered using a scheduler like cron, but the rate limits are variable. I can't run the script every hour when my limit is hit say, 5min into start time and has a wait time of 60min after that. Even running it on minute intervals seems messy: I need to sleep for persistent rate limit wait times which will keep varying. Am I correct in assuming a scheduler is not the way to go here?
How do I gracefully handle any unexpected potentially fatal errors (namely, logging and restarting)? What about manually killing the script, or editing it?
I'm interested in learning different approaches and best practices here -- any and all advice would be much appreciated!
I actually do exactly what you do for one of my personal applications and I can explain how I do it.
I use Celery instead of cron because it allows for finer adjustments in scheduling and it is Python and not bash, so it's easier to use. I have different tasks (basically a group of API calls and DB updates) to different sites running at different intervals to account for the various different rate limits.
I have the Celery app run as a service so that even if the system restarts it's trivial to restart the app.
I use the logging library in my application extensively because it is difficult to debug something when all you have is one difficult to read stack trace. I have INFO-level and DEBUG-level logs spread throughout my application, and any WARNING-level and above log gets printed to the console AND gets sent to my email.
For exception handling, the majority of what I prepare for are rate limit issues and random connectivity issues. Make sure to surround whatever HTTP request you send to your API endpoints in try-except statements and possibly just implement a retry mechanism.
As far as the DB connection, it shouldn't matter how long your connection is, but you need to make sure to surround your main application loop in a try-except statement and make sure it gracefully fails by closing the connection in the case of an exception. Otherwise you might end up with a lot of ghost connections and your application not being able to reconnect until those connections are gone.
I am using Flask.
I am currently using a fabfile to check which users should get a bill and I set up a cron job to run the fabfile every morning at 5am. This automatically creates bills in Stripe and in my database and sends out emails to the users to inform them. This could be used for birthday reminders or anything else similar.
Is setting up a cronjob the standard way of doing this sort of thing? Is there a better way/standard?
I would define "this sort of thing" as. Anything that needs to happen automatically in the app when certain criteria are met without a user interacting with said app.
I could not find much when I googled this.
Using cron is in effect the most straightforward way of doing it. However, there are other kind of services that trigger tasks on a periodic basis and offer some additional control. For instance, Celery's scheduler. There seems to be a tutorial about building periodic tasks with celery here.
What I think you have to ask yourself is:
Is a cron job the most reliable way of billing your customers?
I've written small/simple apps that use an internal timer. e.g: https://bitbucket.org/prologic/irclogger which roates it's irc log files once per day. Is this any better or more reliable? Not really; if the daemon/bot were to die prematurely or the system were to crash; what happens then? In this case it just gets started again and logs continue to rorate at the next "day" interval.
I think two things are important here:
Reliability
Robustness