Efficient way of updating real time location using django-channels/websocket

Efficient way of updating real time location using django-channels/websocket - python

I am working on Real Time based app, it needs to update location of user whenever it is changed.
Android app is used as frontend, which get location using Google/Fused Api and in onLocationChanged(loc:Location), I am sending the latest location over the Websocket. The location update is then received by a django channel consumer, and job of this consumer is to store location in database asynchronously (I am using #database_sync_to_async decorator.
But the problem is, server crashes when Android app tries to send 10-15 location updates per second. What will be the efficient way of updating real time location?
Note: Code can be supplied on demand

Ask yourself what kind of resolution you need for that data. Do you really need 10 updates a second? If not, take every nth update or see if Android will just give you the updates slower. Secondly, look for a native async database library. #database_sync_to_async runs a different thread every time you call it which kills the performance gains you're getting from the event loop. If you say in one thread you'll keep the CPU caches fresh. You won't get to use the ORM. But do you really need a database or would Redis work? If so, call aioredis directly and it will be a lot faster since its in memory and you can use it's fast data structures like queues and sets. If you need Redis to be even faster look at it's multithreaded fork KeyDB.

Related

How to implement an auto-scaling solution requiring synchronous subscription?

I have to process requests published in a Pub/ Sub Topic by the users of my "service" in a Python application, having a main loop.
Each instance of the service will only be able to process one request a time due to the limitation of the application. Burst of requests (~10 at the beginning, growing to ~ 10^6/7) will mix into mostly total idle time. The cold start time of the application is very high compared to the processing time for a single request.
My code will be a plug-in, which polls the subscription, calls methods in the application based on it and then saves data in Cloud Storage and Big Query.
I have read through the cloud documentation and it seems that Cloud Run is the right solution, together with the synchronous subscription API in Python. Cloud Functions I excluded, because it seems more be suited for asynchronous stuff.
However, I did not fully understand how the auto-scaling works. Basically it would have to do this based on the average processing time for each request and the length of the queue, considering the average start-up time of a container.
Unfortunately I did not find a tutorial or example for such a use-case, especially not really explaining how the auto-scaling happens in detail.
Does anyone have something like that or could one explain me here?

How to create a "buffer" program to act as a holding ground for MySql data lines, that will then be passed on when the cloud DB is available

I want to create a Python3 program that takes in MySQL data and holds it temporarily, and can then pass this data onto a cloud MySQL database.
The idea would be that it acts as a buffer for entries in the event that my local network goes down, the buffer would then be able to pass those entries on at a later date, theoretically providing fault-tolerance.
I have done some research into Replication and GTIDs and I'm currently in the process of learning these concepts. However I would like to write my own solution, or at least have it be a smaller program rather than a full implementation of replication server-side.
I already have a program that generates some MySQL data to fill my DB, the key part I need help with would be the buffer aspect/implementation (The code itself I have isn't important as I can rework it later on).
I would greatly appreciate any good resources or help, thank you!

I would implement what you describe using a message queue.
Example: https://hevodata.com/learn/python-message-queue/
The idea is to run a message queue service on your local computer. Your Python application pushes items into the MQ instead of committing directly to the database.
Then you need another background task, called a worker, which you may also write in Python or another language, which consumes items from the MQ and writes them to the cloud database when it's available. If the cloud database is not available, then the background worker pauses.
The data in the MQ can grow while the background worker is paused. If this goes on too long, you may run out of space. But hopefully the rate of growth is slow enough and the cloud database is available regularly, so the risk of this happening is low.
Re your comment about performance.
This is a different application architecture, so there are pros and cons.
On the one hand, if your application is "writing" to a local MQ instead of the remote database, it's likely to appear to the app as if writes have lower latency.
On the other hand, posting to the MQ does not write to the database immediately. There still needs to be a step of the worker pulling an item and initiating its own write to the database. So from the application's point of view, there is a brief delay before the data appears in the database, even when the database seems available.
So the app can't depend on the data being ready to be queried immediately after the app pushes it to the MQ. That is, it might be pretty prompt, under 1 second, but that's not the same as writing to the database directly, which ensures that the data is ready to be queried immediately after the write.
The performance of the worker writing the item to the database should be identical to that of the app writing that same item to the same database. From the database perspective, nothing has changed.

Is it a bad practice to use sleep() in a web server in production?

I'm working with Django1.8 and Python2.7.
In a certain part of the project, I open a socket and send some data through it. Due to the way the other end works, I need to leave some time (let's say 10 miliseconds) between each data that I send:
while True:
send(data)
sleep(0.01)
So my question is: is it considered a bad practive to simply use sleep() to create that pause? Is there maybe any other more efficient approach?
UPDATED:
The reason why I need to create that pause is because the other end of the socket is an external service that takes some time to process the chunks of data I send. I should also point out that it doesnt return anything after having received or let alone processed the data. Leaving that brief pause ensures that each chunk of data that I send gets properly processed by the receiver.
EDIT: changed the sleep to 0.01.

Yes, this is bad practice and an anti-pattern. You will tie up the "worker" which is processing this request for an unknown period of time, which will make it unavailable to serve other requests. The classic pattern for web applications is to service a request as-fast-as-possible, as there is generally a fixed or max number of concurrent workers. While this worker is continually sleeping, it's effectively out of the pool. If multiple requests hit this endpoint, multiple workers are tied up, so the rest of your application will experience a bottleneck. Beyond that, you also have potential issues with database locks or race conditions.
The standard approach to handling your situation is to use a task queue like Celery. Your web-application would tell Celery to initiate the task and then quickly finish with the request logic. Celery would then handle communicating with the 3rd party server. Django works with Celery exceptionally well, and there are many tutorials to help you with this.
If you need to provide information to the end-user, then you can generate a unique ID for the task and poll the result backend for an update by having the client refresh the URL every so often. (I think Celery will automatically generate a guid, but I usually specify one.)

Like most things, short answer: it depends.
Slightly longer answer:
If you're running it in an environment where you have many (50+ for example) connections to the webserver, all of which are triggering the sleep code, you're really not going to like the behavior. I would strongly recommend looking at using something like celery/rabbitmq so Django can dump the time delayed part onto something else and then quickly respond with a "task started" message.
If this is production, but you're the only person hitting the webserver, it still isn't great design, but if it works, it's going to be hard to justify the extra complexity of the task queue approach mentioned above.

How to ensure lower server load in Django

I'm working on a Django web app. The app includes messages that will self-delete after a certain amount of time. I'm using timezone.now() as the sent time and the user inputs a timedelta to display the message until. I'm checking to see if the message should delete itself by checking if current time is after sent time plus the time delta. Will this place a heavy load on the server? How frequently will it automatically check? Is there a way that I can tell it to check once a minute (or otherwise set the frequency)?
Thanks

How frequently will it automatically check?
who is "it" ? If you mean "the django process", then it will NOT check anything by itself. You will have to use either a cronjob or some async queue to take care of removing "dead" messages.
Is there a way that I can tell it to check once a minute (or otherwise set the frequency)?
Well yes, cf above. cronjobs are the simplest solution, async queues (like celery) are much more heavy-weight but if you have a lot of "off-band" processing (processes you want to launch from the request/response cycle BUT execute outside of it) then it's the way to go.
Will this place a heavy load on the server?
It's totally impossible to answer this. It depends on your exact models, the way you write the "check & clean" code, and, of course, data volumes. But using either a cronjob or an async queue this won't run within the django server process(es) itself, and can even be runned on another server as long as it can access the database. IOW the load will be on the database mostly (well, on the server running the process too of course but given your problem description a simple SQL delete query should be enough so..).

How to ensure several Python processes access the data base one by one?

I got a lot scripts running: scrappers, checkers, cleaners, etc. They have some things in common:
they are forever running;
they have no time constrain to finish their job;
they all access the same MYSQL DB, writting and reading.
Accumulating them, it's starting to slow down the website, which runs on the same system, but depends on these scripts.
I can use queues with Kombu to inline all writtings.
But do you know a way to make the same with reading ?
E.G: if one script need to read from the DB, his request is sent to a blocking queue, et it resumes when it got the answer ? This way everybody is making request to one process, and the process is the only one talking to the DB, making one request at the time.
I have no idea how to do this.
Of course, in the end I may have to add more servers to the mix, but before that, is there something I can do at the software level ?

You could use a connection pooler and make the connections from the scripts go through it. It would limit the number of real connections hitting your DB while being transparent to your scripts (their connections would be held in a "wait" state until a real connections is freed).
I don't know what DB you use, but for Postgres I'm using PGBouncer for similiar reasons, see http://pgfoundry.org/projects/pgbouncer/

You say that your dataset is <1GB, the problem is CPU bound.
Now start analyzing what is eating CPU cycles:
Which queries are really slow and executed often. MySQL can log those queries.
What about the slow queries? Can they be accelerated by using an index?
Are there unused indices? Drop them!
Nothing helps? Can you solve it by denormalizing/precomputing stuff?

You could create a function that each process must call in order to talk to the DB. You could re-write the scripts so that they must call that function rather than talk directly to the DB. Within that function, you could have a scope-based lock so that only one process would be talking to the DB at a time.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.